Column and Row operations in Pandas
Lets learn some advanced Column and Row functions in Pandas to operate on datasets.
In the previous blog we have learned about creating Series, DataFrames and Panels with Pandas. In this blog we will learn about some advanced features and operations we can perform with Pandas. But for this we first need to create a DataFrame.
Lets get started...
import pandas as pd
import numpy as np
d = {'one': pd.Series([2, 4, 6, 8], index = ['a', 'b', 'c', 'd']),
'two': pd.Series([1, 3, 5, 7, 9], index = ['a', 'b', 'c', 'd', 'e'])}
df = pd.DataFrame(d)
print(df)
Output:
one two
a 2.0 1
b 4.0 3
c 6.0 5
d 8.0 7
e NaN 9
Performing Operations on Columns
Column Selection
The line of code below performs selection operation on the DataFrame. The passed argument is 'one' which means this will select the dict which have 'one' as its key and return all the values and index related to that key.
print df['one']
Output:
a 2.0
b 4.0
c 6.0
d 8.0
e NaN
Name: one, dtype: float64
We can also perform the same selection on 'two' like shown below:
print df['two']
Output:
a 1
b 3
c 5
d 7
e 9
Name: two, dtype: int64
In both the cases the output consists of indices and the Series related to the indices. You can also see that it prints the key value as 'Name' and the datatype of the Series.
Column Insertion
The code below adds a new column 'three' to the existing DataFrame
#Adding a new column to the DataFrame by passing a Series
print("Adding a new column to the existing DataFrame")
df['three'] = pd.Series([12, 14, 16, 18, 20], index=['a', 'b', 'c', 'd', 'e'])
print(df)
Output:
Adding a new column to the existing DataFrame
one two three
a 2.0 1 12
b 4.0 3 14
c 6.0 5 16
d 8.0 7 18
e NaN 9 20
The code below adds the columns 'one' and 'two' and stores the result in 'four' and then displays the column 'four'.
#Performing addition on columns and storing the result in new column
print("Adding columns 'one' and 'two' and storing the result in 'four'")
df['four'] = df['one'] + df['two']
print(df['four'])
Output:
Adding columns 'one' and 'two' and storing the result in 'four'
a 3.0
b 7.0
c 11.0
d 15.0
e NaN
Name: four, dtype: float64
Column Deletion
- pop() function
We will use the pop() function to delete a specified column. The line of code below deletes the column 'two'
#Deleting column 'two'
df.pop('two')
print(df)
Output:
one three four
a 2.0 12 3.0
b 4.0 14 7.0
c 6.0 16 11.0
d 8.0 18 15.0
e NaN 20 NaN
We can see that the resulted output does not have the column two. Because we popped it.
- del keyword
Now, we will use del keyword to perform deletion on the DataFrame.
#Deleting column 'four'
del df['four']
print(df)
Output:
one three
a 2.0 12
b 4.0 14
c 6.0 16
d 8.0 18
e NaN 20
We can see that the resulted output does not have the column 'four'.
Performing Operations on Rows
Row Selection by Label
We can perform selection operation on Rows by using label and passing the row label to the loc[ ]
#Printing the row with 'b'
print(df, "\n")
print(df.loc['b'])
Output:
one three
a 2.0 12
b 4.0 14
c 6.0 16
d 8.0 18
e NaN 20
one 4.0
three 14.0
Name: b, dtype: float64
We can see that only the content related to row: b are returned form the columns 'one' and 'three'.
Row selection by Integer Location
We can also perform selection operation on the Rows by passing the integer value to the iloc[ ].
#Printing the row at 2
print(df, "\n")
print(df.iloc[2])
Output:
one three
a 2.0 12
b 4.0 14
c 6.0 16
d 8.0 18
e NaN 20
one 6.0
three 16.0
Name: c, dtype: float64
In the above code, the content of both the row present at location '2' in columns 'one' and 'three' is returned.
Row Insertion
We can use append() function to insert a DataFrame in another DataFrame. The code below inserts the DataFrame d2 in the DataFrame d1.
import pandas as pd
#Creating two DataFrames
d1 = pd.DataFrame([[2, 4, 6], [3, 5, 7]], columns=['a', 'b', 'c'])
d2 = pd.DataFrame([[10, 20, 30], [40, 50, 60]], columns=['a', 'b', 'c'])
#Inserting d2 in d1
d1 = d1.append(d2)
print(d1)
Output:
a b c
0 2 4 6
1 3 5 7
0 10 20 30
1 40 50 60
Row Deletion
We can use the drop() function to delete the specified row.
#Deleting the rows with label 0
d1 = d1.drop(0)
print(d1)
Output:
a b c
1 3 5 7
1 40 50 60
The above code deletes all the rows which have label as '0'. Similarly we can also delete the rows with label '1' by passing '1' as argument to the drop() function.
#Deleting the rows with label 1
d1 = d1.drop(1)
print(d1)
Output:
a b c
0 2 4 6
0 10 20 30