Varun Gujarathi

Posted on Oct 16, 2023 • Edited on Oct 22, 2023

Exporting a large PostgreSQL table using Python

#webdev #postgres #python #sql

When working with large datasets, it's common to extract PostgreSQL tables to CSV files for further analysis and sharing. However, naively exporting a large table can crash your machine by consuming all available RAM.

In this article, I'll explain an efficient technique to export big PostgreSQL tables to CSV using Python, based on my experience as a Research Assistant analyzing criminal record relief data.

The Naive Approach

You can export a table like this:


# Execute a SELECT query 
cursor.execute("SELECT * FROM employee")

# Fetch all rows 
rows = cursor.fetchall()

# Write rows to CSV
with open("employees.csv", "w") as f:
  for row in rows:
    f.write(",".join([str(cell) for cell in row]) + "\n")

However, this loads the entire table into memory - which will crash your machine for a big table!

Streaming Data in Chunks with psycopg2

The key is to stream the data in chunks from the database directly to disk. The psycopg2 module provides a handy COPY method that lets us do just that.

First, connect to the database and open the output CSV:


import psycopg2

conn = psycopg2.connect("dbname=mydb user=postgres")

with open('out.csv', 'w') as f:

Next, use copy_expert() to stream the table to the file in chunks:

  cur = conn.cursor()

  cur.copy_expert('COPY employee TO STDOUT WITH CSV HEADER', f)

The table is streamed to the CSV without loading everything into memory!

Putting It All Together
Here is the full script to efficiently export a large PostgreSQL table to a CSV:

import psycopg2

conn = psycopg2.connect("dbname=mydb user=postgres")

cur = conn.cursor()

with open('out.csv', 'w') as f:
  cur.copy_expert('COPY employees TO STDOUT WITH CSV HEADER', f)
print("Table exported to CSV!")

By streaming the data in chunks directly to a file, we avoid RAM issues and efficiently export the table.

Extracting results of SQL query

The above method copies all the data from the employee table to a CSV file. But if you want to store the results of a query with the above method you will have to modify it slightly as below.

import psycopg2

conn = psycopg2.connect("dbname=mydb user=postgres")

cur = conn.cursor()

# list of departments from which we want all employees
dept_list = ['dept1', 'dept2']

query = """SELECT *
        FROM employee
        WHERE department IN %s"""

query_params = (tuple(dept_list),)

query = cur.mogrify(query, query_params)
query = query.decode('utf-8')

with open('out.csv', 'w') as f:
  cur.copy_expert("COPY ({}) TO STDOUT WITH CSV HEADER".format(query), f)
print("Query results exported to CSV!")

This technique has helped me quickly extract large criminal record relief datasets from PostgreSQL databases for analysis. Let me know if you have any other tips and tricks!

Top comments (3)

Aman Kumar • Oct 17 '23 • Edited

\COPY my_table TO 'filename' CSV HEADER

Aarya Jha • Apr 29 '24

hey could you also elaborate on how can we control the chunk size being processed here?

Varun Gujarathi • May 30 '24

There is no built-in method to control the chunk size. The COPY protocol itself focuses on efficient bulk data transfer. If you DO want to control you can change the config on the PostgreSQL server OR iterate through your data in loops.

DEV Community

Exporting a large PostgreSQL table using Python

The Naive Approach

Streaming Data in Chunks with psycopg2

Extracting results of SQL query

Top comments (3)

Read next

Daily JavaScript Challenge #JS-60: Find the First Non-Consecutive Number

I Built a Chrome Extension to Bring Back the Google Maps Tab

Building a Single Page Weather Application in JavaScript

Radix UI Replace Radix Color with Custom Palette