Transform: Clean and process the data (e.g., filter rows, modify columns).
Load: Insert the transformed data into a SQLite database.
Code Example
# Step 1: Import necessary libraries
importpandasaspd# For data manipulation
importsqlite3# For interacting with SQLite databases
# Step 2: Extract - Load data from a CSV file
defextract_data(file_path):"""
Reads data from a CSV file into a Pandas DataFrame.
"""df=pd.read_csv(file_path)# Read the CSV file
returndf# Step 3: Transform - Clean and process the data
deftransform_data(df):"""
Cleans and transforms the data.
"""# Drop rows with missing values
df=df.dropna()# Convert a column to uppercase (example transformation)
df['name']=df['name'].str.upper()# Filter rows where 'age' is greater than 18
df=df[df['age']>18]returndf# Step 4: Load - Insert data into a SQLite database
defload_data(df,db_path,table_name):"""
Loads the transformed data into a SQLite database.
"""# Connect to the SQLite database (creates it if it doesn't exist)
conn=sqlite3.connect(db_path)# Insert the DataFrame into the database as a table
df.to_sql(table_name,conn,if_exists='replace',index=False)# Close the database connection
conn.close()# Step 5: Main function to orchestrate the ETL process
defmain():# Define file paths
input_file='data.csv'# Path to the input CSV file
db_file='example.db'# Path to the SQLite database
table_name='users'# Name of the table to create
# Step 1: Extract
print("Extracting data...")data=extract_data(input_file)# Step 2: Transform
print("Transforming data...")transformed_data=transform_data(data)# Step 3: Load
print("Loading data into the database...")load_data(transformed_data,db_file,table_name)print("ETL process completed successfully!")# Run the main function
if__name__=="__main__":main()
Explanation of Each Line
Step 1: Import Libraries
importpandasaspdimportsqlite3
pandas: A powerful library for data manipulation and analysis. It provides the DataFrame object, which is ideal for handling tabular data.
sqlite3: A built-in Python library for interacting with SQLite databases.
sqlite3.connect(db_path): Connects to the SQLite database. If the database doesn't exist, it creates one.
df.to_sql(table_name, conn, if_exists='replace', index=False): Inserts the DataFrame into the database as a table.
table_name: Name of the table to create.
if_exists='replace': Replaces the table if it already exists.
index=False: Prevents Pandas from writing row indices to the database.
conn.close(): Closes the database connection.
Step 5: Main Function
defmain():input_file='data.csv'db_file='example.db'table_name='users'print("Extracting data...")data=extract_data(input_file)print("Transforming data...")transformed_data=transform_data(data)print("Loading data into the database...")load_data(transformed_data,db_file,table_name)print("ETL process completed successfully!")
Orchestrates the ETL process by calling the extract_data, transform_data, and load_data functions.
Prints progress messages to the console.
Run the Script
if__name__=="__main__":main()
Ensures the main() function runs only when the script is executed directly (not when imported as a module).
Insight IT Training Institute, located in the heart of Hyderabad, is a premier destination for IT education and skill development. we will provide Quality Training with 100% Placement Assitance.
Top comments (1)
Nice blog and informative content,
We are providing Best SAP BODS Training in Hyderabad,
Thanks for sharing with us,
SAP BODS Training in Hyderabad
SAP BODS Online Training in Hyderabad
SAP Training in Hyderabad