Bobur Umurzokov

Posted on Dec 20, 2024 • Edited on Feb 12

Migrating from Azure Database for PostgreSQL to Neon

#postgres #sql #database #programming

Until recently, Azure Database for PostgreSQL was your only Postgres option within the Azure ecosystem.

And it is a good option. Being based on Postgres means you get all those great Postgres features you know and love while having Microsoft handle the infrastructure. But you also get the traditional cloud constraints: fixed compute and storage capacity that you need to provision in advance, leading to overprovisioning and higher costs.

But along came Neon. As Neon is now available within Azure, developers have the option to use the serverless architecture of Neon while maintaining their Azure-based infrastructure. This means you can leverage Neon's auto-scaling capabilities, where compute resources scale down to zero when inactive and storage is billed based on actual data size rather than provisioned capacity.

But if you already use Azure Database for PostgreSQL, how do you switch?

Here is the step-by-step guide on how you can migrate easily:

Introducing pg_dump and pg_restore

You can migrate a Postgres database in a few ways, but the most reliable way is using pg_dump and pg_restore.

pg_dump is a utility that takes a consistent snapshot of your database, generating a backup file containing the schema and data in a format that can be used for restoration.
pg_restore is its companion tool that reconstructs a database from a pg_dump backup file, handling dependencies and relationships to ensure data integrity.

This is going to work 99% of the time. Here, we will use each in sequence to dump from Azure and then restore into Neon.

Creating the Dump File

First, pg_dump will be used to backup our Azure Postgres. pg_dump connects to your database and traverses the database object tree, starting with schemas and proceeding through tables, functions, and other objects. It generates the appropriate SQL commands for each object to recreate that object and its data. This process maintains referential integrity by handling dependencies in the correct order.

The command syntax we’ll use is:

pg_dump \
   -Fd \
   -j 2 \
   "$SOURCE_DB" \
   -h "$SOURCE_HOST" \
   -p "$SOURCE_PORT" \
   -U "$SOURCE_USER" \
   -N cron \
   -v \
   -f postgres_dump

The "$SOURCE_DB", -h "$SOURCE_HOST", -p "$SOURCE_PORT", and -U "$SOURCE_USER" are all just the connection parameters for the database. The most important flags here are:

-Fd uses directory format output, creating a directory with multiple files instead of a single dump file
-j 2 enables parallel dumping using two jobs, which can speed up the process for large databases
-N cron excludes the 'cron' schema from the backup
-v enables verbose mode to show progress
-f postgres_dump specifies the output directory name

Several other options are available, but these suffice for most applications.

Restoring to Neon

Once you have your dump file, use pg_restore to import it into your Neon database. pg_restore processes the schema definitions, table structures, data, indexes, constraints, and triggers. This ordered approach ensures that dependencies are satisfied and foreign key relationships are preserved.

pg_restore \
   -j 2 \
   -h "$TARGET_HOST" \
   -U "$TARGET_USER" \
   -d "$TARGET_DB" \
   -v \
   postgres_dump

Again, the connection details are needed, and the syntax is the same or similar to pg_dump. We’re using two jobs for faster parallelization, a verbose output, and reading from postgres_dump.

Some other restore options to consider:

--clean drops database objects before recreating them
--create creates the database before restoring
--no-owner sets objects to be owned by the target user

Migrating Your Data

OK, let’s put all this together. We can do this via a single bash script. We’re creating this in a file called migrate_script.sh. Before you run it, make it executable like so:

chmod +x migrate.sh

Additionally, you’ll have to install the Postgres libpq C library to communicate with Postgres:

brew install libpq

Then, here is our full migration script:

#!/bin/bash

# Environment variables - replace these with your values
SOURCE_HOST="azure-migration.postgres.database.azure.com"
SOURCE_DB="postgres"
SOURCE_USER="username"
SOURCE_PASSWORD="password"

TARGET_HOST="ep-evening-sunshine-a54gw97b.us-east-2.aws.neon.tech"
TARGET_DB="neondb"
TARGET_USER="neondb_owner"
TARGET_PASSWORD="password"

simple_dump() {
   # Exit on any error
   set -e

   # Debug: Print variables (redact passwords)
   echo "Starting dump from Azure PostgreSQL..."
PGPASSWORD="$SOURCE_PASSWORD" pg_dump \
   -Fd \
   -j 2 \
   "$SOURCE_DB" \
   -h "$SOURCE_HOST" \
   -p "$SOURCE_PORT" \
   -U "$SOURCE_USER" \
   -N cron \
   -v \
   -f postgres_dump

echo "Dump completed. Starting restore to Neon..."
PGPASSWORD="$TARGET_PASSWORD" pg_restore \
   -j 2 \
   -h "$TARGET_HOST" \
   -U "$TARGET_USER" \
   -d "$TARGET_DB" \
   -v \
   postgres_dump

echo "Migration completed!"

# Verify row counts
echo "Verifying migration..."
echo "Source database counts:"
PGPASSWORD="$SOURCE_PASSWORD" psql \
   -h "$SOURCE_HOST" \
   -p "$SOURCE_PORT" \
   -U "$SOURCE_USER" \
   -d "$SOURCE_DB" \
   -c "
   SELECT 'customers' as table_name, COUNT(*) FROM customers UNION ALL
   SELECT 'products', COUNT(*) FROM products UNION ALL
   SELECT 'orders', COUNT(*) FROM orders UNION ALL
   SELECT 'order_items', COUNT(*) FROM order_items;
   "

echo "Target database counts:"
PGPASSWORD="$TARGET_PASSWORD" psql \
   -h "$TARGET_HOST" \
   -U "$TARGET_USER" \
   -d "$TARGET_DB" \
   -c "
   SELECT 'customers' as table_name, COUNT(*) FROM customers UNION ALL
   SELECT 'products', COUNT(*) FROM products UNION ALL
   SELECT 'orders', COUNT(*) FROM orders UNION ALL
   SELECT 'order_items', COUNT(*) FROM order_items;
   "
}

# Call the function and capture its exit status
simple_dump
exit_status=$?

if [ $exit_status -ne 0 ]; then
   echo "Migration failed with exit status $exit_status"
   exit $exit_status
fi

Let’s walk through the main components of this script:

Env variables: At the start of the script, we set up source and target connection details, including hosts, databases, users, and passwords, for easier management and modification.

pg_dump and pg_restore: Here are our core commands from above, wrapped in a simple function called simple_dump that handles both the backup and restore operations in sequence.

Verification: After the migration, we verify the data integrity by comparing row counts between source and target databases for key tables. This gives us immediate feedback on whether all records were successfully transferred.

Function call: Finally, we execute our simple_dump function and check its exit status, providing clear feedback if anything goes wrong during the migration process.

We can then run it like so:

./migration_script.sh

We asked for verbosity, and that is what we’ll get. You should see a lot of text running down your terminal, kinda like you’re Neo. Hopefully, it should look like this:

pg_dump: last built-in OID is 16383
pg_dump: reading extensions
pg_dump: identifying extension members
pg_dump: reading schemas
pg_dump: reading user-defined tables
pg_dump: reading user-defined functions
pg_dump: reading user-defined types
pg_dump: reading procedural languages
pg_dump: reading user-defined aggregate functions
pg_dump: reading user-defined operators
...

pg_restore: processing item 4110 ENCODING ENCODING
pg_restore: processing item 4111 STDSTRINGS STDSTRINGS
pg_restore: processing item 4112 SEARCHPATH SEARCHPATH
pg_restore: processing item 4113 DATABASE postgres
pg_restore: processing item 4114 COMMENT DATABASE postgres
pg_restore: processing item 7 SCHEMA public
pg_restore: creating SCHEMA "public"
pg_restore: while PROCESSING TOC:
pg_restore: from TOC entry 7; 2615 2200 SCHEMA public azure_pg_admin
pg_restore: error: could not execute query: ERROR:  role "azure_pg_admin" does not exist
Command was: ALTER SCHEMA public OWNER TO azure_pg_admin;
...

And so on. You will see errors. Some roles and tables are specific to Azure that aren’t needed for Neon, but eventually, you should see a positive verification:

Verifying migration...
Source database counts:
 table_name  | count 
-------------+-------
 customers   |  1000
 products    |   100
 orders      |  5000
 order_items | 15044
(4 rows)
Target database counts:
 table_name  | count 
-------------+-------
 customers   |  1000
 products    |   100
 orders      |  5000
 order_items | 15044
(4 rows)

What about that other 1%?

The dump/restore process almost always works with some finagling. However, in some scenarios, you need to plan a more robust migration process.

1. Large Data Volumes & Zero Downtime Migration

A straight pg_dump operation can take days to complete when dealing with databases in the terabyte range. During this time, your source database continues to receive writes, leading to data inconsistency. The restore process requires significant downtime, as your application needs to be offline to prevent data drift.

A more robust approach is to use logical replication:

-- On source database
ALTER SYSTEM SET wal_level = logical;
CREATE PUBLICATION azure_pub FOR ALL TABLES;

-- On target database
CREATE SUBSCRIPTION neon_sub 
CONNECTION 'host=source dbname=mydb' 
PUBLICATION azure_pub;

Logical replication works by decoding the Write-Ahead Log (WAL) into a stream of changes that can be applied to the target database. Unlike physical replication, it operates at the logical level of rows and tables rather than data blocks. This means it can replicate between different Postgres versions and platforms, making it ideal for migrations. The publisher/subscriber model allows for real-time synchronization while your application continues to write to the source database.

2. Complex Dependencies

Complex dependencies in Postgres often create circular migration challenges. For example, tables might reference roles that don't exist yet, while Postgres roles need permissions on tables that have yet to be created. Views and functions may depend on other objects that need to be migrated in a specific order, and cross-schema references can create additional complexity. Custom extensions and database-specific features might not have direct equivalents in the target system.

The solution is to use a three-phase migration approach:

Pre-schema: Create roles and extensions
Schema: Generate table structures and basic constraints
Post-schema: Add views, functions, and complex constraints

Then, execute each phase using appropriate error handling and rollback capabilities.

3. Security and Compliance

Postgres deployments can implement row-level security policies, custom roles with fine-grained permissions, and encrypted columns using pgcrypto. Audit logging might track all data modifications, and compliance requirements could mandate specific data retention policies. Moving these security configurations requires careful planning to prevent unauthorized access during migration.

Thus, it’s a good idea to implement a security-first migration pattern:

Export and map all roles and permissions
Create equivalent security policies in the target
Set up audit logging before migration starts
Migrate data with encryption maintained
Verify security controls post-migration

Making Your Move

Migrating from Azure Database for PostgreSQL to Neon isn’t complex. For most applications, a straightforward pg_dump and pg_restore will do the job. Robust solutions are available for those edge cases–large data volumes, complex dependencies, or strict security requirements. The key is identifying your specific migration challenges early and planning accordingly.

With Neon, you get the familiar Postgres experience you're used to from Azure, plus the benefits of true serverless architecture: automatic scaling, capacity that matches your actual usage, and consequently, better cost efficiency. The migration process might take some planning, but the long-term advantages make it worthwhile.

DEV Community

Migrating from Azure Database for PostgreSQL to Neon

Introducing pg_dump and pg_restore

Creating the Dump File

Restoring to Neon

Migrating Your Data

What about that other 1%?

1. Large Data Volumes & Zero Downtime Migration

2. Complex Dependencies

3. Security and Compliance

Making Your Move

Top comments (0)

Read next

Netflix But better

SQLite Tutorial: Installation, Usage, and Best Practices

AI Unit Test Generation: Automating Software Testing with AI

Serverless Magic: How to build apps without servers