Learn Data Frames in Pandas With Example –
A DataFrame in Pandas is a two-dimensional labelled data structure with columns that can be of different data types. It is similar to a table in a relational database or a spreadsheet in which data is organized in rows and columns.

Advantages of Using Data Frames
Data frames are a tabular data structure used in various programming languages for handling and analyzing structured data. Here are some advantages of using data frames:
- Structured Representation: Data frames provide a structured way to represent and organize data in a tabular format. This makes it easy to understand and work with data, especially when dealing with multiple variables or attributes.
- Columnar Operations: Data frames allow for efficient column-wise operations, making it easy to perform computations or manipulations on specific variables without having to iterate through each element individually. This can significantly improve performance and readability of code.
- Integration with Statistical Packages: Data frames are commonly used in statistical programming languages like R and Python (with libraries like Pandas). They seamlessly integrate with statistical packages and libraries, making it easy to perform data analysis, visualization, and modeling.
- Data Cleaning and Transformation: Data frames provide built-in functions and methods for cleaning and transforming data. This includes handling missing values, filtering rows based on conditions, and transforming data types, making the data preparation process more straightforward.
- Indexing and Subsetting: Data frames support indexing and subsetting operations, allowing users to access and manipulate specific portions of the data easily. This is crucial for selecting relevant subsets for analysis or visualization.
- Data Alignment: In data frames, columns are aligned based on their index, which ensures that operations are performed on corresponding elements. This alignment simplifies code and prevents common errors that might arise when working with mismatched data.
- Ease of Import and Export: Data frames facilitate the import and export of data from various file formats, including CSV, Excel, SQL databases, and more. This makes it easy to work with data from different sources and systems.
- Support for Time Series Data: Data frames often include features for handling time series data, such as date-time indexing and functions for time-based operations. This is crucial for analyzing data with temporal components.
- Wide Range of Functions: Data frames come with a rich set of functions for data manipulation, exploration, and analysis. This includes aggregation functions, statistical summaries, and descriptive statistics, which are essential for understanding the characteristics of the data.
- Interoperability: Data frames provide a common data structure that is widely supported across various libraries and tools in the data science ecosystem. This interoperability makes it easy to transition between different stages of the data analysis pipeline and collaborate with others.
In summary, data frames offer a versatile and efficient way to handle and analyze structured data, providing numerous advantages for data scientists, analysts, and programmers working on data-related tasks.
Different Methods To Create Data Frame in Python Pandas
Creating a DataFrame in pandas can be done in various ways, but one of the most common methods is using the pd.DataFrame() constructor. Here are a few examples:
Example 1: Creating a DataFrame from a Dictionary
Here’s an example of creating and working with a DataFrame in Pandas:
import pandas as pd
# Creating a DataFrame from a dictionary
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'San Francisco', 'Los Angeles']
}
df = pd.DataFrame(data)
# Displaying the DataFrame
print("Original DataFrame:")
print(df)
Output:
Name Age City 0 Alice 25 New York 1 Bob 30 San Francisco 2 Charlie 35 Los Angeles
This will create a DataFrame with three columns: ‘Name’, ‘Age’, and ‘City’.
In this example:
Creating and Displaying a Data Frame in Pandas
- Importing Pandas: The
import pandas as pdstatement imports the Pandas library and aliases it aspdfor convenience. - Creating a DataFrame: The
pd.DataFrame()constructor is used to create a DataFrame from a dictionary (data). Keys of the dictionary become column names, and values become the data in each column. - Displaying the DataFrame: The
print(df)statement prints the DataFrame to the console.
Example 2: Creating a DataFrame from a List of Lists
import pandas as pd
# Creating a DataFrame from a list of lists
data_list = [
['Alice', 25, 'New York'],
['Bob', 30, 'San Francisco'],
['Charlie', 35, 'Los Angeles']
]
df = pd.DataFrame(data_list, columns=['Name', 'Age', 'City'])
print(df)
Here, we explicitly specify the column names using the columns parameter.
Example 3: Creating an Empty DataFrame and Adding Rows
import pandas as pd
# Creating an empty DataFrame
df = pd.DataFrame(columns=['Name', 'Age', 'City'])
# Adding rows to the DataFrame
df = df.append({'Name': 'Alice', 'Age': 25, 'City': 'New York'}, ignore_index=True)
df = df.append({'Name': 'Bob', 'Age': 30, 'City': 'San Francisco'}, ignore_index=True)
df = df.append({'Name': 'Charlie', 'Age': 35, 'City': 'Los Angeles'}, ignore_index=True)
print(df)
Here, we create an empty DataFrame and use the append method to add rows to it.
Example Python Code for Common Operations on Data Frame
# Accessing columns
print("\nAccessing 'Name' column:")
print(df['Name'])
# Accessing rows
print(“\nAccessing the second row:”)
print(df.iloc[1])
# Descriptive statistics
print(“\nDescriptive statistics:”)
print(df.describe())
# Adding a new column
df[‘Salary’] = [50000, 60000, 75000]
print(“\nDataFrame with ‘Salary’ column added:”)
print(df)
# Filtering data
print(“\nFiltering based on age:”)
print(df[df[‘Age’] > 30])
Output:
Accessing 'Name' column:
0 Alice
1 Bob
2 Charlie
Name: Name, dtype: object
Accessing the second row:
Name Bob
Age 30
City San Francisco
Name: 1, dtype: object
Descriptive statistics:
Age Salary
count 3.0 3.000000
mean 30.0 61666.666667
std 5.0 12291.864875
min 25.0 50000.000000
25% 27.5 55000.000000
50% 30.0 60000.000000
75% 32.5 67500.000000
max 35.0 75000.000000
DataFrame with 'Salary' column added:
Name Age City Salary
0 Alice 25 New York 50000
1 Bob 30 San Francisco 60000
2 Charlie 35 Los Angeles 75000
Filtering based on age:
Name Age City Salary
2 Charlie 35 Los Angeles 75000
These examples demonstrate basic DataFrame operations, including accessing columns and rows, descriptive statistics, adding a new column, and filtering data. Pandas provides a rich set of functionality for data manipulation, analysis, and visualization, making it a powerful tool for working with structured data.
Example 2 for Using Data Frames in Pandas
Let’s create another example with a different DataFrame. In this example, I’ll use data related to employee information:
import pandas as pd
# Creating a DataFrame from a list of dictionaries
employee_data = [
{'Name': 'Alice', 'Age': 28, 'Department': 'HR', 'Salary': 60000},
{'Name': 'Bob', 'Age': 35, 'Department': 'Engineering', 'Salary': 75000},
{'Name': 'Charlie', 'Age': 30, 'Department': 'Marketing', 'Salary': 70000},
{'Name': 'David', 'Age': 25, 'Department': 'HR', 'Salary': 55000},
]
employee_df = pd.DataFrame(employee_data)
# Displaying the DataFrame
print("Employee DataFrame:")
print(employee_df)
Output:
Employee DataFrame:
Name Age Department Salary
0 Alice 28 HR 60000
1 Bob 35 Engineering 75000
2 Charlie 30 Marketing 70000
3 David 25 HR 55000
Now, let’s perform some operations on this employee DataFrame:
# Accessing columns
print("\nAccessing 'Name' and 'Department' columns:")
print(employee_df[['Name', 'Department']])
# Descriptive statistics
print("\nDescriptive statistics:")
print(employee_df.describe())
# Adding a new column
employee_df['Bonus'] = [5000, 7500, 7000, 4500]
print("\nEmployee DataFrame with 'Bonus' column added:")
print(employee_df)
# Filtering data
print("\nFiltering based on department (HR):")
print(employee_df[employee_df['Department'] == 'HR'])
Output:
Accessing 'Name' and 'Department' columns:
Name Department
0 Alice HR
1 Bob Engineering
2 Charlie Marketing
3 David HR
Descriptive statistics:
Age Salary Bonus
count 4.00000 4.000000 4.000000
mean 29.50000 65000.000000 6000.000000
std 4.04124 8225.979652 1359.017573
min 25.00000 55000.000000 4500.000000
25% 27.25000 58750.000000 4875.000000
50% 29.00000 65000.000000 6000.000000
75% 31.25000 71250.000000 7125.000000
max 35.00000 75000.000000 7500.000000
Employee DataFrame with 'Bonus' column added:
Name Age Department Salary Bonus
0 Alice 28 HR 60000 5000
1 Bob 35 Engineering 75000 7500
2 Charlie 30 Marketing 70000 7000
3 David 25 HR 55000 4500
Filtering based on department (HR):
Name Age Department Salary Bonus
0 Alice 28 HR 60000 5000
3 David 25 HR 55000 4500
This example demonstrates basic operations on a different DataFrame, including accessing specific columns, descriptive statistics, adding a new column, and filtering data based on a condition.
Thanks!
Learn Data Frames in Pandas With Example
- Learn Data Frames in Pandas With Example
- Basic Data Structures in Pandas
- Introduction To Python Pandas
![]()