Skip to main content

Command Palette

Search for a command to run...

Introduction to Pandas

Updated
6 min read

Python Library : Pandas

PANDAS : Open source data analysis library written in python. It leverages the power and speed of numpy to make data analysis and EDA (Exploratory Data Analysis) really easy task for any data scientist.

Analogy : As we took vegetable from the market cleaning before cooking it, that's what pandas do (cleaning of the present data)


Feature means : Columns

Records means : Rows


Importing pandas and numpy libraries :

import pandas as pd
import numpy as np

Importing Dataset from local directory

data = pd.read_csv("/train.csv")

Dataset Datatype : DataFrame which is basically tabular form of data

type(data) #data structure type

Output :

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
0103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS
1211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C85C
2313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS
3411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S
4503Allen, Mr. William Henrymale35.0003734508.0500NaNS


How many null values are available in a given dataset

data.isnull().sum() # finds sum of all features(column) null values count

Output :

0
PassengerId0
Survived0
Pclass0
Name0
Sex0
Age177
SibSp0
Parch0
Ticket0
Fare0
Cabin687
Embarked2

dtype: int64


Structure of the Dataset in given row and column

data.shape # gives row and column

Output :

(891, 12) #891 records, 12 features

Dictionary to DataFrame data type

Creating Dictionary :

dic1 = {"name": ["Priya", "Nikhil", "Sanjay"], "age": [25, 25, 37], "city": ["chandigarh", "Bengaluru", "Delhi"]}

Print dic1 dictionary:

dic1

Output :

{'name': ['Priya', 'Nikhil', 'Sanjay'],
 'age': [25, 25, 37],
 'city': ['chandigarh', 'Bengaluru', 'Delhi']}

Converting dictionary to DataFrame

dataFrame = pd.DataFrame(dic1) #convert dic1 to DataFrame

Printing dataFrame

dataFrame

Output :

nameagecity
0Priya25chandigarh
1Nikhil25Bengaluru
2Sanjay37Delhi

List of all features present in a given dataset

data.columns

Output :

Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp',
       'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'],
      dtype='object')

Every Column's datatypes

data.dtypes # for every features datatypes

Output :

0
PassengerIdint64
Survivedint64
Pclassint64
Nameobject
Sexobject
Agefloat64
SibSpint64
Parchint64
Ticketobject
Farefloat64
Cabinobject
Embarkedobject

dtype: object


Gives All Information about Dataset with respect to numerical features

data.info # information about dataset

Output :

PassengerIdSurvivedPclassAgeSibSpParchFare
count891.000000891.000000891.000000714.000000891.000000891.000000891.000000
mean446.0000000.3838382.30864229.6991180.5230080.38159432.204208
std257.3538420.4865920.83607114.5264971.1027430.80605749.693429
min1.0000000.0000001.0000000.4200000.0000000.0000000.000000
25%223.5000000.0000002.00000020.1250000.0000000.0000007.910400
50%446.0000000.0000003.00000028.0000000.0000000.00000014.454200
75%668.5000001.0000003.00000038.0000001.0000000.00000031.000000
max891.0000001.0000003.00000080.0000008.0000006.000000512.329200

dtype: float64


Series and DataFrame Data Structure

Data type of given code:

type(data['Age']) #gives datatype of 'age' , data type is Series because this is one dimentional

Output : pandas.core.series.Series

Age
022.0
138.0
226.0
335.0
435.0
......
88627.0
88719.0
888NaN
88926.0
89032.0

891 rows × 1 columns


Data type representation of given code:

type(data[["Name", "Age","Ticket"]] ) # data type is DataFrame , because this is greater than one dimensional data.

Output : pandas.core.frame.DataFrame

NameAgeTicket
0Braund, Mr. Owen Harris22.0A/5 21171
1Cumings, Mrs. John Bradley (Florence Briggs Th...38.0PC 17599
2Heikkinen, Miss. Laina26.0STON/O2. 3101282
3Futrelle, Mrs. Jacques Heath (Lily May Peel)35.0113803
4Allen, Mr. William Henry35.0373450
............
886Montvila, Rev. Juozas27.0211536
887Graham, Miss. Margaret Edith19.0112053
888Johnston, Miss. Catherine Helen "Carrie"NaNW./C. 6607
889Behr, Mr. Karl Howell26.0111369
890Dooley, Mr. Patrick32.0370376

891 rows × 3 columns


Difference between loc and iloc:

Row indexing : Preview 0th and 4th records only with all features

data.loc[[0,4]]

Output :

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
0103Braund, Mr. Owen Harrismale22.010A/5 211717.25NaNS
4503Allen, Mr. William Henrymale35.0003734508.05NaNS


Specific row and column indexing : repesent records and features with a given range, --excluding last record and feature

data.iloc[2:5, 0:4]

Output :

PassengerIdSurvivedPclassName
2313Heikkinen, Miss. Laina
3411Futrelle, Mrs. Jacques Heath (Lily May Peel)
4503Allen, Mr. William Henry