DEV Community

Cover image for Import CSV Data Into SQL Databases From The Terminal
Kofi Mupati
Kofi Mupati

Posted on • Edited on • Originally published at devcodes.co

Import CSV Data Into SQL Databases From The Terminal

The Data Scientist and several professionals in the same space work with data from several sources. It does not take long for one to face the challenge of importing data from a CSV file into an SQL database. I am a great fan of the terminal so I want to show you how to do it from the terminal.

Prerequisites

I am going to use a Python Package for this. I suggest that you work in a virtual environment. The Real Python's article on virtual environments is a good resource if need be. I assume you are using Python3.

  • Install the needed packages
pip install PyMySQL psycopg2 csvkit
Enter fullscreen mode Exit fullscreen mode

PyMySQL: This is a pure Python MySQL driver which helps us to connect to a MySQL database using Python.
psycopg2: This the most popular Postgresql database adapter for Python.
csvkit: It is a suite of command-line tools for converting to and working with CSV. It provides the csvsql module we are going to use to import the data from our CSV file.

For the sake of this tutorial, I have uploaded a CSV file which contains some customer information you can use to test this tool I am introducing. Click this link to download.

  • Create a new MySQL or Postgresql database. I named mine test. The data will be imported into a table I have named customer_info.

Importing Your Data

The Quick Approach.

This approach automatically generates the Schema from the CSV data, creates the table and inserts the data into the table.
db_user: This is the database user with access to the test database you created.
password: This is the password of the user.

For MySQL

csvsql --db 'mysql+pymysql://db_user:password@localhost/test'  --tables customer_info --insert customer_info.csv
Enter fullscreen mode Exit fullscreen mode

For Postgresql

csvsql --db 'postgresql:///test' --tables customer_info  --insert customer_info.csv
Enter fullscreen mode Exit fullscreen mode

The Lengthier (and Safer) Approach.

  • Generate the query for creating the customer_info table from your terminal.

For MySQL

csvsql -i mysql customer_info.csv
Enter fullscreen mode Exit fullscreen mode

Expected Result:

CREATE TABLE customer_info (
    `contractId` DECIMAL(38, 0) NOT NULL, 
    `AccountNumber` DECIMAL(38, 0) NOT NULL, 
    `CreatedAt` TIMESTAMP NULL, 
    `Amount` DECIMAL(38, 0) NOT NULL, 
    `Count` DECIMAL(38, 0) NOT NULL, 
    `Duration` DECIMAL(38, 0) NOT NULL
);
Enter fullscreen mode Exit fullscreen mode

The result above is the query we are going to use to create the customer_info table in the test database. You can edit the generated schema to suit your need. I will leave it as it is in this tutorial.

For Postgresql

csvsql -i postgresql
Enter fullscreen mode Exit fullscreen mode

Expected Result:

CREATE TABLE customer_info (
    "contractId" DECIMAL NOT NULL, 
    "AccountNumber" DECIMAL NOT NULL, 
    "CreatedAt" TIMESTAMP WITHOUT TIME ZONE, 
    "Amount" DECIMAL NOT NULL, 
    "Count" DECIMAL NOT NULL, 
    "Duration" DECIMAL NOT NULL
);
Enter fullscreen mode Exit fullscreen mode

NOTE THE DIFFERENCE IN HOW EACH DATABASE EXPECTS THE QUERY TO BE. MySQL is wrapped in single quotes but Postgresql is wrapped in double-quotes.

Import your data into the table.

In MySQL

csvsql --db 'mysql+pymysql://db_user:password@host/test' --no-create  --insert customer_info.csv
Enter fullscreen mode Exit fullscreen mode

In Postgresql

csvsql --db 'postgresql:///test' --no-create  --insert customer_info.csv
Enter fullscreen mode Exit fullscreen mode

SUMMARY

We just learnt how to import data from a CSV file into tables in our MySQL and Postgresql databases with the csvkit package.You can take the direct approach which automatically creates your table and inserts the data.The longer and in my opinion safer approach is to generate the query that creates the table, inspect and edit the generated schema and import your data into the table.

CSVKIT is a powerful tool every Data Scientist should have in his or her toolbox.

Top comments (3)

Collapse
 
euantorano profile image
Euan T

If using PostgreSQL, there is a super easy way to solve these kinds of tasks - the COPY statement: postgresql.org/docs/current/sql-co...

Syntax looks like:

COPY customer_info(contractId,AccountNumber,CreatedAt,Amount,Count,Duration) 
FROM 'C:\tmp\customer_info.csv' DELIMITER ',' CSV HEADER;
Enter fullscreen mode Exit fullscreen mode
Collapse
 
mupati profile image
Kofi Mupati

Thanks for sharing. We are here to learn from one another. I skimmed through the link you shared and my guess is that the table should be created for you to use the COPY command. csvsql doesn't just help to copy from the CSV file into the database table. It helps with schema generation from the CSV file, table creation and inserting data into the created table.

Collapse
 
euantorano profile image
Euan T

Yeah, unfortunately the schema must exist first when using COPY.