Basic Data Structures in Pandas –
Pandas provides two primary data structures for handling and manipulating data: Series and DataFrame.
- Series:
- Series is a one-dimensional labelled array capable of holding any data type.
- Essentially a column in an Excel spreadsheet or a single column in a database table.
- It has an associated array of data labels, called an index.
- Can be created from a list, array, or dictionary.
Example:
import pandas as pd
data = [1, 3, 5, np.nan, 6, 8]
s = pd.Series(data, name='MySeries')
Output:
0 1.0 1 3.0 2 5.0 3 NaN 4 6.0 5 8.0 Name: MySeries, dtype: float64
2. DataFrame:
- Data Frame is a two-dimensional labeled data structure with columns that can be of different data types.
- Similar to a spreadsheet or SQL table, where each column is a Series.
- Can be created from dictionaries, lists, NumPy arrays, or other DataFrames.
Example:
import pandas as pd data = { 'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35], 'City': ['New York', 'San Francisco', 'Los Angeles'] } df = pd.DataFrame(data)
Output:
Name Age City 0 Alice 25 New York 1 Bob 30 San Francisco 2 Charlie 35 Los Angeles
Index:
- An immutable array that labels the axes of a DataFrame or Series.
- The default index for Series is integer-based, while for DataFrame, it is a sequence of integers.
- Can be explicitly defined or automatically generated.
Example:
import pandas as pd data = [1, 3, 5, np.nan, 6, 8] s = pd.Series(data, index=['a', 'b', 'c', 'd', 'e', 'f'])
Output:
a 1.0 b 3.0 c 5.0 d NaN e 6.0 f 8.0 dtype: float64
These data structures form the foundation of Pandas, enabling efficient and flexible manipulation and analysis of structured data. Series and DataFrame provide a high-level interface for handling various data types and performing operations like indexing, selection, grouping, merging, and more.