Pandas Library for Data Manipulation


Data analysis is one of the most empowered part of Data Science, and when it comes to data analysis then Python’s Pandas Library just couldn’t be skipped.

Pandas library is a open source tool for data analysis and its manipulation, which comes handy for every data analyst. This provides some significant data structures and is in fact build over the numpy package (learn about numpy here), thus giving it more reliability and strength.


Pandas | Datastructures

Pandas library provides us with two important and unique data structures, namely Series and Dataframes.

Series –

Its nothing but a one-dimensional data structure which stores data just like a python list, or array; however, we could provide data as well as label / index of any data type to the series object, which makes it unique and powerful. Also, the data of the series object could be a python dictionary, tuple, list, numpy array, scalar value, etc.

Dataframe –

Dataframes are two-dimensional objects which have rows and columns (just like an excel sheet). Think of a dataframe as a pandas Series object (i.e. columns) stacked together on the basis of same index / label values. Since, a dataframe is build over a series object therefore this is much used than the former data structure.

import numpy as np
import pandas as pd

### creation of a pandas series ###
data_values = np.array(['Ana', 'Margaret', 'Sara', 'Hearth'])
index_values = [i for i in range(1,5)]
series = pd.Series(data = data_values, index = index_values)
print(series)
print('Object type = ' + str(type(series)))


### creation of a pandas dataframe ###
dictionary = {'Employee' : ['Rach', 'Tom', 'Barry', 'Alan'],
             'Post' : ['Manager', 'Executive Er.', 'Data Scientist', 'Intern'],
             'Age' : [54, 35, 32, 25]}
dataframe = pd.DataFrame(data=dictionary)
print(dataframe)
print('Object type = ' + str(type(dataframe)))

Similarly, you could try creating series and dataframes with different datatypes mentioned above.


Pandas Input/Output

We could also import or export the data in form of csv (comma-separated files), json, html, sql, etc. with the help of pandas’ built-in functions.

Pandas provide us with various functions like pd.read_filetype(‘filename’) to read/import data from various file types (click here to know about all supported file types). Similarly, you could write the final results into several supported file types using dataframe.to_filetype(‘path/filename’). Both the functions comes with more parameters which could change the index of our data, or which takes in various delimiters (symbol by which the text is separated by).


Selecting Rows and/or Columns

When dealing with a dataset, we could select a particular column from our dataset by just passing the label of the column in square brackets, as shown under.

import numpy as np
import pandas as pd

### creating a dictionary ###
dictionary = {'Employee' : ['Rach', 'Tom', 'Barry', 'Alan'],
             'Post' : ['Manager', 'Executive Er.', 'Data Scientist', 'Intern'],
             'Age' : [54, 35, 32, 25]}

### creation of a dataframe ###
dataframe = pd.DataFrame(data=dictionary)
print('Dataframe is :\n')
print(dataframe)

### selecting particular column from dataframe
print('\n\n\nEmployee column of the dataframe :\n')
print(dataframe['Employee'])

selection.py hosted with ❤ by GitHub

However to select a particular row from our dataset, the process is quite different; you need to call the function dataframe.loc[‘row_label’, ‘column_label’] or dataframe.iloc[‘row_index’, ‘column_index’], where the column label is just optional or as per the choice of user.

Note : To select combination of rows or columns, you can pass a tuple with the desired row/column label or index (based of the function you’re using).

dataframe and selection

Also, dataframe.ix[row, column] can also be used as a replacement, which can take either index or label (user’s choice) and prints out the desired row or column.


Creation/deletion

Just like selecting a row or column (as illustrated above), we can add new rows (columns) into the dataset by using .loc or .iloc  and passing the new label of the row (column) with the data as well.

>>>   dataframe[‘new_column_label’] = data 

And to drop a particular row or column,  dataframe.drop() method is used, which takes in parameters like labels, index, axis, inplace etc.


Aggregate Functions, GroupBy and Sorting

For numerical data, the need to find statistical details (max, min, standard deviation, etc.) is always there.

Pandas assigns this responsibility to various functions and procedures as under :

In the above snippet, describe() method is used. That basically prints out each statistical details out of the numerical data implicitly; however, we can also call aggregate functions like

  • abs() –  returns absolute value
  • max()
  • min()
  • count()
  • mean()
  • median()
  • mode(), etc.

Also, there are methods to sort the data of a dataframe/series; they take parameters like axis (0 : rows, 1: columns), inplace, ascending, and many more to apply sort accordingly.

  • sort_index() – sort by labels along an axis
  • sort_values(by = ’Chemistry’)  –  Sort by the values along an axis

Also, groupby() is used to groupby data on the basis of some label values.


Dealing with Missing Data

Pandas comes loaded with functions to deal with missing data values in a dataset. Missing data in a dataframe is of no importance and to deal with it, we can either drop the axis having missing data or we can fill it in.

  • dataframe.dropna()
  • dataframe.fillna( value = ‘  ‘ ) 

Each method takes in parameters like axis (0 : row, 1 : column), inplace (True/False), and many more.

Note : Fill null values in your dataset with some values (may it be mean or mode or any other relevant value) rather than dropping it. As it can be an important parameter for your observations.


Application of user-defined functions

Pandas gives us the privilege to apply user-defined functions over our dataset using DataFrame.apply()  method.

import pandas as pd
dictionary = {'Student' : ['Ana', 'Bruce', 'Kathy', 'Paul', 'Mihir','Zayn'],
             'Math' : [87,54,97,91,54,74],
             'Physics' : [64,68,84,85,64,64],
             'Chemistry' : [75,59,90,80,59,78]}

### dataframe creation ###
data = pd.DataFrame(data=dictionary, index= {i for i in range(15,21)})
print(dataframe)

### user-defined function to count vowels in name ###
def count_vowels(name):
    count = 0
    vowels = ['a', 'e', 'i', 'o', 'u']
    for char in name.lower():
        if char in vowels:
            count += 1
    print('Vowels in %s = %d' %(name, count))
    
### use of apply() method ###
dataframe['Student'].apply(count_vowels)
myFunction.py hosted with ❤ by GitHub

Consider the above snippet which illustrates one such instance. Similarly, you can apply built-in functions (for eg, np.mean) as well.


Pandas | Useful Methods

There are several useful methods in pandas however, some basic and most used ones while doing data analysis as as follows –

### take "df" to be the name of your already created dataframe ###
import pandas as pd

### to check top entries of our dataframe ###
df.head()  

### check last entries of our dataframe ###
df.tail()

### indicates if the values are Null with returning True/False ###
df.isnull()

### to get shape of our dataframe ###
df.shape

### to get columns of our dataframe ###
df.columns

### gives information about dataframe like no. of columns, data objects, data types, no. of entries, etc. ###
df.info()

### prints correlation b/w columns in df ###
df.corr()
methods.py hosted with ❤ by GitHub

Merging, Joining and Concatenation

To join various dataframe objects or to add more columns to the original dataframe object, we can use either of the three methods which are fulfilled by pandas library :

  • dataframe1.append( dataframe2 )   –   appends/concatenates the second dataframe object at the end of first, given that both have same size of axis. They only concatenates along axis=0(namely the index) and have been existing before concat method.
  • pandas.concat( [ dataframe1 , dataframe2 ] , axis=1 )   –   concatenates two dataframes on row-basis or column-basis, as specified.
  • dataframe1.join( dataframe2 , how = ’inner’ , on = column1 )   –much likeSQLjoining opeartion, it joins or merges two or more dataframe objects.
  • pd.merge(left, right, on=None)   –   very much similar to join method , merge behaves as the entry point for all standard database join operations between DataFrame objects.

That is very much it to give you all the basic intuition of most versatile state-of-the-art data manipulation library in python  : Pandas.

If you’ve found it much useful then like and share the post. Also, ping your doubts (if any) in the comments section below, or you can also reach us at the contact forum of the site. Don’t forget to subscribe to my news feeds, so that you stay updated of the various blog posts.

Happy Learning!


DATA ANALYSISMACHINE LEARNING

0 0 votes
Article Rating
Subscribe
Notify of
guest
10 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
JATIN AHUJA
5 years ago

Thanks for posting the article here.

adidas nmd
5 years ago

I wish to show appreciation to this writer for bailing me out of such a dilemma. Right after searching through the internet and finding strategies which were not beneficial, I figured my life was gone. Being alive without the presence of strategies to the problems you’ve resolved as a result of your good guide is a serious case, as well as ones that might have negatively damaged my career if I hadn’t encountered your web page. Your talents and kindness in dealing with all areas was excellent. I’m not sure what I would have done if I hadn’t come across… Read more »

adidas ultra boost
5 years ago

Thanks a lot for providing individuals with remarkably brilliant possiblity to read in detail from here. It can be very useful and also full of fun for me and my office co-workers to search the blog a minimum of 3 times every week to see the fresh stuff you have. And definitely, I am just actually happy for the fantastic information you give. Certain 2 ideas in this article are in truth the most beneficial we have all had.

yeezy boost
5 years ago

I precisely wished to thank you very much once more. I’m not certain what I would’ve created in the absence of the secrets revealed by you on this situation. It was actually the alarming case for me personally, however , looking at the specialised strategy you resolved that forced me to jump for fulfillment. I’m happier for this work and expect you really know what a powerful job you are getting into teaching the mediocre ones through the use of your web site. I know that you haven’t come across any of us.

Adidas NMD New Geometry Black Grey

I together with my pals were checking the best pointers found on your web page then at once I got a terrible feeling I had not thanked you for those secrets. All the guys came absolutely joyful to read them and have sincerely been tapping into these things. Appreciate your genuinely so kind as well as for making a decision on certain tremendous subject areas most people are really needing to understand about. Our own honest regret for not expressing appreciation to you earlier.

Mens Originals NMD Beige

I am glad for commenting to make you be aware of of the helpful encounter my child experienced studying the blog. She even learned so many issues, which include how it is like to have an incredible teaching mood to make certain people smoothly gain knowledge of a number of extremely tough matters. You undoubtedly exceeded people’s expected results. Many thanks for rendering those powerful, dependable, educational not to mention unique guidance on this topic to Gloria.

Adidas NMD New White Pink

I am also writing to let you know of the incredible experience my friend’s child experienced reading through your webblog. She discovered several pieces, not to mention what it’s like to have a wonderful teaching heart to have folks without hassle fully understand a number of very confusing subject matter. You actually exceeded our own expectations. Many thanks for rendering the important, trustworthy, explanatory and unique tips about this topic to Evelyn.

golden goose
5 years ago

I simply wanted to compose a simple comment to thank you for some of the amazing concepts you are showing here. My particularly long internet investigation has at the end of the day been compensated with beneficial knowledge to go over with my visitors. I ‘d express that we readers are really blessed to live in a wonderful community with very many marvellous professionals with very beneficial things. I feel very much fortunate to have come across the web page and look forward to so many more excellent minutes reading here. Thanks a lot once again for a lot of… Read more »

ESTA offdicial
4 years ago

Outstanding post however I was wondering if you could write a
litte more on this topic? I’d be very grateful if you could elaborate a little bit more.
Kudos!

ปั้มไลค์

Like!! I blog frequently and I really thank you for your content. The article has truly peaked my interest.