The syntax df['column'] = expression
in pandas is used to create, modify, or assign values to a column in a pandas DataFrame (df
). Let’s break it down, step by step, from basic to advanced levels.
Basic Level
1. Creating a New Column
- When a column does not exist in the DataFrame, assigning values to
df['column']
creates a new column. -
Example:
import pandas as pd df = pd.DataFrame({'A': [1, 2, 3]}) print(df) # Output: # A # 0 1 # 1 2 # 2 3 # Create a new column 'B' with all values set to 0 df['B'] = 0 print(df) # Output: # A B # 0 1 0 # 1 2 0 # 2 3 0
2. Modifying an Existing Column
- If the column already exists, assigning values replaces its contents.
-
Example:
df['B'] = [4, 5, 6] # Replace values in column 'B' print(df) # Output: # A B # 0 1 4 # 1 2 5 # 2 3 6
Intermediate Level
3. Assigning Values Based on an Expression
- You can assign values to a column based on a computation or transformation.
-
Example:
df['C'] = df['A'] + df['B'] # Create column 'C' as the sum of 'A' and 'B' print(df) # Output: # A B C # 0 1 4 5 # 1 2 5 7 # 2 3 6 9
4. Assigning Values Using a Condition
- You can assign values to a column conditionally using pandas’ boolean indexing.
-
Example:
df['D'] = df['A'].apply(lambda x: 'Even' if x % 2 == 0 else 'Odd') print(df) # Output: # A B C D # 0 1 4 5 Odd # 1 2 5 7 Even # 2 3 6 9 Odd
5. Using Multiple Columns in Expressions
- You can use multiple columns in a single expression for more complex computations.
-
Example:
df['E'] = (df['A'] + df['B']) * df['C'] print(df) # Output: # A B C D E # 0 1 4 5 Odd 25 # 1 2 5 7 Even 49 # 2 3 6 9 Odd 81
Advanced Level
6. Vectorized Operations
- Assigning values to a column can use vectorized operations for high performance.
-
Example:
df['F'] = df['A'] ** 2 + df['B'] ** 2 # Fast vectorized calculation print(df) # Output: # A B C D E F # 0 1 4 5 Odd 25 17 # 1 2 5 7 Even 49 29 # 2 3 6 9 Odd 81 45
7. Assigning Values with Conditional Logic Using np.where
- You can use
numpy
for conditional assignment. -
Example:
import numpy as np df['G'] = np.where(df['A'] > 2, 'High', 'Low') print(df) # Output: # A B C D E F G # 0 1 4 5 Odd 25 17 Low # 1 2 5 7 Even 49 29 Low # 2 3 6 9 Odd 81 45 High
8. Assigning Values Using External Functions
- Assign column values based on a custom function applied to rows or columns.
-
Example:
def custom_function(row): return row['A'] * row['B'] df['H'] = df.apply(custom_function, axis=1) print(df) # Output: # A B C D E F G H # 0 1 4 5 Odd 25 17 Low 4 # 1 2 5 7 Even 49 29 Low 10 # 2 3 6 9 Odd 81 45 High 18
9. Chaining Operations
- You can chain multiple operations to keep your code concise.
-
Example:
df['I'] = df['A'].add(df['B']).mul(df['C']) print(df) # Output: # A B C D E F G H I # 0 1 4 5 Odd 25 17 Low 4 25 # 1 2 5 7 Even 49 29 Low 10 49 # 2 3 6 9 Odd 81 45 High 18 81
10. Assigning Multiple Columns at Once
- Use
assign()
to create or modify multiple columns in a single call. -
Example:
df = df.assign( J=df['A'] + df['B'], K=lambda x: x['J'] * 2 ) print(df) # Output: # A B C D E F G H I J K # 0 1 4 5 Odd 25 17 Low 4 25 5 10 # 1 2 5 7 Even 49 29 Low 10 49 7 14 # 2 3 6 9 Odd 81 45 High 18 81 9 18
Expert Level
11. Dynamic Column Assignment
- Dynamically create column names based on external inputs.
-
Example:
columns_to_add = ['L', 'M'] for col in columns_to_add: df[col] = df['A'] + df['B'] print(df)
12. Using External Data for Assignment
- Assign values to a column based on an external DataFrame or a dictionary.
-
Example:
mapping = {1: 'Low', 2: 'Medium', 3: 'High'} df['N'] = df['A'].map(mapping) print(df) # Output: # A B C D E F G H I J K N # 0 1 4 5 Odd 25 17 Low 4 25 5 10 Low # 1 2 5 7 Even 49 29 Low 10 49 7 14 Medium # 2 3 6 9 Odd 81 45 High 18 81 9 18 High
13. Performance Optimization:
- Use pandas' built-in functions (
apply
,vectorized operations
) for better performance over Python loops when assigning values.
Takeaway
The syntax df['column'] = expression
is a core feature of pandas and is highly versatile. It allows you to:
- Add, modify, and manipulate columns in a DataFrame.
- Perform complex computations, including condition-based logic and multi-column transformations.
- Chain operations and dynamically generate new columns.
This makes pandas a powerful library for data manipulation and analysis.
Top comments (0)