Lets talk about the methods of creating Data Structures with Pandas in Python
Hello again. Lets start with second blog in our Pandas series. So, in our previous blog in this series we have learned about the features and installation of Pandas on our system and the data structures supported by Pandas. But in this blog we will be discussing more about the data structures supported by Pandas and how to operate with
- Series
- Data Frame
- Panel
You can read the previous blog on Introduction to Pandas and installing pandas here.
We will be using NumPy along with Pandas to get a better exposure with actual working of the libraries. Lets start with Series...
Series
The Series is a One Dimensional array which is Labelled and it is capable of holding array of any type like: Integer, Float, String and Python Objects. Lets first look at the method of creating Series with Pandas.
- pandas.Series(data, index, dtype, copy)
We can use this method for creating a series in Pandas. Lets discuss how the Series method takes four arguments:
- data: It is the array that needs to be passed so as to convert it into a series. This can be Python lists, NumPy Array or a Python Dictionary or Constants.
- index: This holds the index values for each element passed in data. If it is not specified, default is numpy.arange(length_of_data).
- dtype: It is the datatype of the data passed in the method.
- copy: It takes a Boolean value specifying whether or not to copy the data. If not specified, default is false.
Using ndarray to create a series:
We can create a Pandas Series using a numpy array, for this we just need to pass the numpy array to the Series() Method. We can observe in the output below that the series created has index values which are given by default using the 'range(n)' where 'n' is the size of the numpy array. And at the end of the series the Series() Method prints the datatype of the series created, in this case the datatype is 'int32'. The numpy array can be of any type: int, float, object, character, etc.
Using dictionary to create a series:
We can also create a Pandas Series using a dictionary, for this we just need to pass the dictionary in a pandas Series() Method. In the output below the series created has keys of the dictionary as the index and object as the value.
Data Frame
A Data Frame is a Two Dimensional data structure. In this kind of data structure the data is arranged in a tabular form (Rows and Columns). Lets first look at the method of creating a Data Frame with Pandas.
- pandas.DataFrame(data, index, columns, dtype, copy)
We can use this method to create a DataFrame in Pandas. Now lets discuss about the arguments required fro DataFrame() Method:
- data: The data that is needed to be passed to the DataFrame() Method can be of any form line ndarray, series, map, dictionary, lists, constants and another DataFrame.
- index: This argument holds the index value of each element in the DataFrame. The default index is np.arange(n).
- columns: The default values for columns is np.arange(n).
- dtype: This is the datatype of the data passed in the method.
- copy: It takes a Boolean value to specify whether or not to copy the data. The default value is false.
Using lists to create a DataFrame
We can create a Pandas DataFrame using lists. In the output below, we can observe that the Pandas has created a DataFrame with six rows (0, 1, 2, 3, 4, 5) and one column (0).
Using Dict of ndarrays and lists to create a DataFrame
We can also use a dictionary of list to create a DataFrame. In the code below we have passed a dictionary of list 'information' to the DataFrame() Method and the output is a DataFrame of rows (0, 1, 2, 3, 4) and columns(Brand, Founded).
Panel
A pandas Panel is a 3 Dimensional Container of Data. Lets first take a look at the method of creating Panel with Pandas.
- pandas.Panel(data, item, major_axis, minor_axis, dtype, copy)
- data: The data can be of any form like ndarray, list, dict, map, DataFrame.
- item: axis 0
- major_axis: axis 1
- minor_axis: axis 2
- dtype: The data type of each column
- copy: It takes a Boolean value to specify whether or not to copy the data. The default value is false.
Using 3D ndarray to create a Panel
We can use a 3 Dimensional ndarray to create a Pandas Panel, let us see an example below. We are using a random method to create a ndarray and then we will pass this ndarray to the pd.Panel() Method.
But you will get a "Future Warning" from the system when you will try to create a Panel. It says that the Panel will be removed from the pandas library and it also suggests an alternative to creating a panel.
Using dict of DataFrame to create a Panel
We can also use a dict of DataFrames to create a Panel. We can clearly see from the code below number of Items: 2, number of major_axis: 1, number of minor_axis: 3.