Mariadb to postgresql
Recently, we did a migration of a rails application from mariadb to postgresql. This article goes into why and how of this migration.
K1 acquired MariaDB and appointed a new CEO. This signaled a shift in the database's direction. A private equity firm was now in control of the company overseeing the open-source development of a database that our application heavily relied upon.
Over the past few years, MariaDB's feature development has stagnated, largely due to its commitment to maintaining MySQL compatibility and a stronger focus on sales rather than innovation. Meanwhile, PostgreSQL has been on a relentless path of growth, continuously closing the gap with Oracle by rolling out feature after feature.
PostgreSQL’s journey started in 1996, and its development is spearheaded by the PostgreSQL Global Development Group, with contributions from major players like AWS, Microsoft, NTT Data, EnterpriseDB, Crunchy Data, and PgEdge. The core contributor team is distributed across the globe, with experts from Europe, Japan, the US, China, India, and Latin America. Much like Linux, PostgreSQL is likely to remain open-source for the foreseeable future. For our client, this acquisition was the last straw that broke camel's back.
Why move from mariadb to postgresql ?
The feature gap: The feature gap between postgresql and mariadb has grown significantly over last 5-7 years. With each release, PostgreSQL has integrated valuable features inspired by Oracle, making it an increasingly attractive choice for enterprises. As a result, it has evolved into a one-stop database solution for all kinds of application needs.
Postgres's Rich extension system: If you need postgres to manage your cron jobs, there is pg_cron. If you want to store spatial data in the database, there is postGIS. For timeseries, timescaleDB; for vector storage, there is pgvector. Moreover, there are many startups modifying postgres in unexpected ways to bring new kind of database to the market. For instance, Neon Tech has separated the storage and compute layers, allowing databases to scale to zero. This enables independent scaling of storage and compute, much like Databricks and Snowflake do for data lakes. The rapid pace of innovation in PostgreSQL presents immense potential benefits for any application.
Corporate interest vs community interest:
MariaDB operates as a corporate entity, prioritizing business interests, while PostgreSQL thrives on its vibrant open-source community. The PostgreSQL Global Development Group’s mission is to advance the database based on the needs of its developers and users, ensuring a steady flow of meaningful enhancements.High Risk
While this migration looks very lucrative, it is not without its own risks.
Long time to finish: These migrations take a long time to finish even for a simple migration. A seemingly straightforward upgrade from MySQL 5 to MySQL 8 can take up to nine months. This process includes:
time to migrate
Updating application code
Conducting regression and load testing
Coordination between different application teams: Teams need to make sure that the new database doesn't cause features to break and passes all the tests. Having good test coverage is very crucial to such a migration project.
Fundamental change to the application: Since everything sits on top of the database, it impacts everything: application, background jobs, BI etc.
Migrating requires adjustments to the schema, data, and application code. Given that application code can run into millions of lines, the risk of something breaking is significant.
High Rewards
More features: With a thriving open-source community behind the project, new features are continuously being added. The ecosystem of extensions is expanding rapidly, bringing innovative capabilities to the database core.
Simple applictaion logic makes it easier to maintain: A feature-rich database like PostgreSQL allows developers to offload logic to the database itself, reducing complexity at the application level. This not only improves performance but also enhances maintainability and throughput.
Yandex data migration case study
Yandex Mail initially stored its metadata in an Oracle database. By 2012, the growing feature demands and high licensing costs of Oracle became unsustainable, with expenses running into millions of dollars. Additionally, Oracle's restrictive licensing prohibited publishing benchmarks comparing its database performance, which led to its absence from clickbench benchmark. So, they tried migrating to a different database which could support their needs at a resonable cost. It took them 10 mean years to finish the migration.
First time, they tried switching to mysql. During this time, they tried to fix everything that is wrong with the codebase increasing the scope of the project and eventually setting it up for failure. Second time, they tried writing their own custom DBMS (maybe they tried to reverse engineer oracle). That failed as well. Well, third time was a charm.
How ?
We first started with schema migration, then data export and import. And finally ended with making code changes to make the code compatible with the new postgresql database.
Schema transfer
There are many nuances one has to keep in mind while migrating from one database to the another.
Datatype challenges: PostgreSQL does not support unsigned integers, unlike MariaDB. Additionally, PostgreSQL does not allow text fields with size attributes, nor does it accept backticks (`) in queries, unlike MariaDB. Once we jumped through these hoops, we were able to generate the matching schema in the postgresql database.
Fulltext indexes: Yandex Mail relied on full-text search for keyword retrieval in entity titles and bodies. While MariaDB offers simple full-text search indexing, PostgreSQL provides richer configurations for advanced search capabilities.
Binary fields: Binary fields are little tricky, as in how they deal with null characters. Mariadb allows null character in the fields, while postgresql does not. It is not a feature that they decide to offer/not offer. But difference arises from storage mechanisms. MariaDB uses a length-prefixed buffer, while PostgreSQL employs C-style null-terminated strings.
Data export and import
We will cover three ways in which data can be exported from mariadb to postgresql.
pgloader: It is an automated tool for loading data into postgresql from different sources. It reads data from different sources like mysql, sqlite and csv. After that, it creates the corresponding schema in postgresql and insert the data into relevant tables. Later on, it rebuilds the index after loading the dataset. It is good for simple datasets with simple data types.
Dumping SQL and rerunning on the database: This process involves raw SQL statements from the source database and making changes to the SQL to match the target databases. Its a good option when upgrading the database to a newer version.
CSV Export and Import: This option is good for data that doesn't contain binary fields as that would cause various issues while dumping and parsing csv. Since csv is widely supported format for importing and exporting, it would be fine.
Code changes
Apart from simple changes in the code base to replace quotes and remove certain keyboards, there were some other changes needed in the SQL that was used to run directly on the database, as opposed to an ORM.
Raw SQL in code: The codebase used interval
keyword, which is only supported in the postgresql with escaped quotes. Easy change.
Search: Search used MATCH ... AGAINST
clause which had to be replaced with tsvector(...) @@ tsquery(...)
clause after making the relevant index on appropriate columns.
Binary fields: The syntax is little different in postgres compared to mariadb. But, that could be easily managed.
Learning from this project
Having good test coverage allows the team to make code changes with confidence and help them iterate faster with code changes. This is probably the single most important thing. So, if the codebase doesn't have test coverage for some modules. The team may start with writing some tests.
Limit the scope of the migration. It maybe tempting to fix everything wrong with the codebase that you see along the way. But, limiting the scope to just doing migration will help team stay focused and achieve the results.
Run small experiement independently on a toy dataset. We did this with the fulltext search module. These experiemnt allowed us to find the right search solution when migrating the new database.
Run loadtesting and benchmarks. The old database maybe tuned for performance with a lot of custom indexes. So, take your time to build indexes in the destination database and tune them for your workload.
Conclusion
Migrating from Oracle to PostgreSQL was a long and complex process, but it enabled Yandex Mail to achieve greater scalability, flexibility, and cost savings. The transition reinforced the importance of thorough planning, controlled scope, and rigorous testing in large-scale database migrations.
Top comments (0)