Anand

Posted on Jun 25

Step-by-Step with Pandas: Basic Operations to Intermediate Mastery 🐍🐼

#datascience #python #machinelearning #pandas

Pandas is a powerful and flexible data manipulation library for Python. It provides data structures like Series (one-dimensional) and DataFrame (two-dimensional) for working with structured data efficiently. Here, I'll cover some basic and intermediate advanced concepts in Pandas.

Basic Concepts

Series:
- A one-dimensional array-like object containing a sequence of values and an associated array of data labels, called its index.

   import pandas as pd
   s = pd.Series([1, 3, 5, 6, 8])
   print(s)

DataFrame:
- A two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns).

   df = pd.DataFrame({
       'A': [1, 2, 3],
       'B': [4, 5, 6],
       'C': [7, 8, 9]
   })
   print(df)

Reading and Writing Data:
- Reading data from CSV:
```
 df = pd.read_csv('data.csv')
```

Writing data to CSV:
```
 df.to_csv('output.csv', index=False)
```

Indexing and Selection:
- Selecting a column:
```
 df['A']
```

Selecting multiple columns:
```
 df[['A', 'B']]
```

Selecting rows by index:

 df.iloc[0]  # First row
 df.loc[0]  # Row with index 0

Data Cleaning:

Handling missing values:

 df.dropna()  # Drop rows with missing values
 df.fillna(0)  # Replace missing values with 0

Intermediate Concepts

GroupBy:
- Grouping data and performing aggregate functions.
```
 grouped = df.groupby('A')
 grouped.mean()
```

Merging and Joining:

Combining DataFrames using merge and join operations.

 df1 = pd.DataFrame({'key': ['A', 'B', 'C'], 'value': [1, 2, 3]})
 df2 = pd.DataFrame({'key': ['B', 'C', 'D'], 'value': [4, 5, 6]})
 merged = pd.merge(df1, df2, on='key', how='inner')
 print(merged)

Pivot Tables:

Creating pivot tables to summarize data.

 df.pivot_table(values='value', index='key', columns='category', aggfunc='sum')

Applying Functions:

Applying custom functions to DataFrames.

 df['new_column'] = df['A'].apply(lambda x: x * 2)

Reshaping Data:

Melting and pivoting DataFrames to reshape data.

 melted = pd.melt(df, id_vars=['A'], value_vars=['B', 'C'])
 pivoted = melted.pivot(index='A', columns='variable', values='value')

Time Series:

Handling and manipulating time series data.

 df['date'] = pd.to_datetime(df['date'])
 df.set_index('date', inplace=True)
 df.resample('M').mean()

Handling Duplicate Data:
- Removing or handling duplicate rows in DataFrames.
```
 df.drop_duplicates()
```

Advanced Indexing:

Using hierarchical indexing for multi-level data.

 arrays = [np.array(['bar', 'bar', 'baz', 'baz']),
           np.array(['one', 'two', 'one', 'two'])]
 df = pd.DataFrame(np.random.randn(4, 2), index=arrays, columns=['A', 'B'])

Performance Optimization:
- Using techniques like vectorization, avoiding loops, and using efficient data structures to improve performance.

Conclusion

Mastering Pandas is essential for anyone involved in data analysis and manipulation. By understanding the basics such as Series and DataFrames, indexing, and data cleaning, you build a solid foundation. Progressing to intermediate concepts like GroupBy operations, merging DataFrames, pivot tables, and time series analysis allows you to handle more complex data tasks efficiently. Leveraging these skills not only enhances your ability to analyze data but also optimizes your workflow, making you a more effective and proficient data professional. With Pandas, you can unlock powerful capabilities to turn raw data into actionable insights.

About Me:
🖇️LinkedIn
🧑‍💻GitHub

DEV Community

Step-by-Step with Pandas: Basic Operations to Intermediate Mastery 🐍🐼

Basic Concepts

Intermediate Concepts

Conclusion

Top comments (0)

Read next

Quick and Dirty Guide to Running a Local LLM and Making API Requests

Let's Learn Unit Testing in Python with pytest! 🚀

How Machine Learning Models Learn: A Journey from Basics to Foundation Models (2)

Top 10 Platforms to Practice Python