What is UUID?
UUID (universally unique identifier) is a 128-bit label, generated with the use of the standardized algorithm, almost impossible to duplicate, which makes it a perfect identifier. It’s represented as a string that contains five groups of 32 hexadecimal digits separated by hyphens. Here is an example:
f6a7f195-8220-408c-9906-7395b870db61
Why it’s so useful?
The question arises as to why UUID
is better than the typical sequential integer Primary Key (PK) generated by the database, which seems to be handled efficiently.
We have been using integer IDs for many years, as this is what databases and Django does by default. We decided to move to UUID for a few reasons.
Firstly, UUIDs
are unpredictable, unlike standard integer IDs. This lack of predictability enhances security.
Secondly, consider a system usable both online and offline, where users can create new content. When the user reconnects to the internet, the new instances are merged into the database. With conventional auto-incrementing primary keys, there's a significant chance of conflicts. However, by employing UUIDs as primary keys, the likelihood of encountering such conflicts is nearly eliminated, providing a robust solution to potential problems.
Aside from that UUIDs are less prone to human error when writing queries and working with code, it's hard to spot that a wrong field has been used in a JOIN statement and using integer IDs is less likely to cause an error in cases like that. With UUIDs it's less likely.
How to migrate my data from ID to UUID while ensuring both the wolves are satisfied and the sheep remain unharmed
So, what should you do if you decide that UUIDs are necessary and want to migrate your database objects to use UUIDs as primary keys? If you’re in the early stages, without a production environment, the transition is relatively straightforward. However, if your system is already in production, hosting millions of objects with complex relationships, you’ll need to carefully consider this decision. The process can be challenging and require significant effort. But don’t worry, I’ll guide you through it to make it as smooth as possible. 🙂
Happy scenario
To illustrate the problem, imagine we have the following Order
model, which represents an order placed by a customer in an online store.
class Order(models.Model):
id = models.PositiveIntegerField(unique=True)
created_at = models.DateTimeField(auto_now_add=True)
total_price = models.DecimalField(
max_digits=10,
decimal_places=2,
)
currency = models.CharField(
max_length=3,
)
The most straightforward solution, that probably came to your mind, is to add a new UUID
field that will be the new primary key (PK). Let’s name it token
. Populate it with unique values and make this field the new PK. Let's illustrate this process using the Order
model as an example.
Note: All examples in this article are for illustration purposes; production systems will need solutions that scale with locking and batch updates.
-
Migration for adding new
token
field:
from django.db import migrations, models import uuid class Migration(migrations.Migration): dependencies = [ ('order', '0135_alter_order_options'), ] operations = [ migrations.AddField( model_name='order', name='token', field=models.UUIDField(null=True, unique=True, blank=True), ) ]
-
Custom migration that populates the
token
field of existing instances with uniqueUUID
values.
from django.db import migrations from django.contrib.postgres.functions import RandomUUID def set_order_token_values(apps, _schema_editor): Order = apps.get_model("order", "Order") Order.objects.filter(token__isnull=True).update( token=RandomUUID() ) class Migration(migrations.Migration): dependencies = [ ('order', '0136_order_token'), ] operations = [ migrations.RunPython( set_order_token_values, migrations.RunPython.noop, ), ]
-
Changing token into the primary key
from django.db import migrations, models import uuid class Migration(migrations.Migration): dependencies = [ ("order", "0137_fulfil_order_token"), ] operations = [ migrations.AlterField( model_name="order", name="token", field=models.UUIDField( default=uuid.uuid4, editable=False, primary_key=True, serialize=False, unique=True, ), ), ]
And... this will work, but EXCLUSIVELY when your model contains ONLY one to many
relations.
Now, consider that the Order
model has a many-to-one relationship with the Invoice
model, where each Invoice
represents a generated invoice.
class Invoice(models.Model):
order = models.ForeignKey(
Order,
related_name="invoices",
null=True,
on_delete=models.SET_NULL,
)
In this case, the migration would fail with the following error:
django.db.utils.InternalError: cannot drop constraint order_ordereditem_pkey on table order_order because other objects depend on it
.
Why is that?
The reason is that your related models will still have primary key columns pointing to the obsolete pk
values, which will no longer exist. If things seem a bit foggy, no worries! Check out the next section that explains in detail how relationships are established in the database.
How the relations are defined in your database?
Before we go on let me explain how the relations that you define in ORM are implemented in the database. If it’s clear to you, you can skip this section and go directly to So what is the problem?
The relational database consists of tables with columns and constraints that specify the rules for the data in tables. All limitations are ensured by the constraints, like uniqueness, value requirement, primary keys, and relations.
- Many-to-one
In the many-to-one relation, we have two models where the entity of model A
can have multiple associations with entities of model B
, but the entity of model B
can have only one association with the entity of model A
.
In the Django framework, it’s defined with the use of ForeignKey
in model B
, which points the model A
. Under the hood, the new column is added to the model B
, and the foreign key constraint is created. The column name is the field name with _id
ending and will keep the id
s of associated entities. The foreign key constraint is just the rule of the database table (model A
) that says the values in a given column must match the values (in our case the id
) of a column in some other table (model A
). The referenced column (in our case is
on mode a
) must contain only unique values. So the foreign key constraint depends on the unique or primary key constraint from the relational table.
- One-to-one
In one-to-one relationship, the entity of model A
can have only one association with entities of model B
, and the same rule applies to model A
in relation to model B
. In the Django framework, it’s defined with the use of the OneToOne
field on any of those models. Under the hood, the same things happened as in a many-to-one relationship, but also the unique constraint for the newly created column is added, to ensure that only one entity of model A
will exist with the given field value of the pointed column in table B
.
- Many-to-many
In many-to-many relation entities of model A
can have multiple associations with entities of model B
, and model B
can have multiple associations with entities of model A
. In Django framework, it’s defined with the use of the ManyToMany
field on any model. Under the hood, this case is a little bit more complex than the previous ones. As we can have multiple connections from both sides of the relationship, we cannot just create a new column on one of the tables. Instead, a new table is created that keeps the ids from both sides of relations. The new table contains columns, one for each site of relations, that keep the corresponding entity field value. To ensure references, for each field the foreign key constraint is created. Additionally, a unique constraint is added to ensure that there is only one row responsible for the relationship between every two instances.
So what is the problem?
As we learned in the previous section, relationships based on foreign key constraints utilize a primary key or unique constraint. Modifying the model's pk
to UUID
involves recreating the primary key, which deletes the old constraint and creates a new one. If you try to change the primary key, you might encounter errors such as:
django.db.utils.InternalError: cannot drop constraint order_ordereditem_pkey on table order_order because other objects depend on it
.
This error implies that the primary key constraint of the id
field, which defines foreign key
relations, cannot be dropped.
That's why we had to update the relations before modifying the pk
.
The solution in a nutshell
To solve the issue, we will introduce a new field called old_id
to keep a copy of the old Order
id
. Then, we will redirect our existing relationships to this new field, and create another field on related models to store the new UUID
values (that will become the pk
values) of related instances. After changing the ID
to UUID
in our target model, on the related models, we'll transform the fields that store related instances' UUID
values into relational fields, and remove the old relational fields pointing to old_id
s. This way, we’ll maintain the same relationships, but with pointing to new UUID
values. Sounds complicated? Let’s explore the whole process in the example to make it clear.
💡 Solution Shortcut
- On the target model (e.g.
Order
) create the new fields, one the UUID field (e.g.token
field) that will become a PK, and one for keeping the old ID (e.g.old_id
field) - Populate those fields
- Redirect each existing relation to the field with the old id (
old_id
field) - On each relation model create a new field that will keep the new UUID values of the related instances (e.g.
order_token
) - Change the
UUID
field into the primary key. - On each relation model, change the field that keeps new UUID values (
order_token
) into the relation field (e.g.ForeginKey
) - On each relation model, remove the old relation field
- On each relation model, rename the new relation field to the initial name
Preparation
Before delving into managing relationships, let's add an old_id
field to store a copy of the previous integer primary key. We also need a token
field, which will later serve as our new primary key. Let’s update the changes introduced in our happy scenario.
-
The first step is adding a new nullable
token
andold_id
fields.Here are the changes in the
models.py
file:
class Order(models.Model): token = models.UUIDField(null=True, unique=True) old_id = models.PositiveIntegerField(unique=True, null=True, blank=True)
And corresponding data migration:
from django.db import migrations, models import uuid class Migration(migrations.Migration): dependencies = [ ('order', '0135_alter_order_options'), ] operations = [ migrations.AddField( model_name='order', name='token', field=models.UUIDField(null=True, unique=True, blank=True), ), migrations.AddField( model_name='order', name='old_id', field=models.PositiveIntegerField(blank=True, null=True), ), ]
-
Next, besides populating the
token
field of existing instances with uniqueUUID
values, we'll also copy the currentid
to theold_id
field using a custom migration.
from django.db import migrations from django.contrib.postgres.functions import RandomUUID def set_order_token_values(apps, _schema_editor): Order = apps.get_model("order", "Order") Order.objects.filter(token__isnull=True).update( token=RandomUUID() ) def set_order_old_id(apps, schema_editor): Order = apps.get_model("order", "Order") Order.objects.all().update(old_id=F("id")) class Migration(migrations.Migration): dependencies = [ ('order', '0136_order_token_and_old_id'), ] operations = [ migrations.RunPython( set_order_token_values, reverse_code=migrations.RunPython.noop, ), migrations.RunPython( set_order_old_id, reverse_code=migrations.RunPython.noop ), ]
Handling database relations
Now, we can move to the question: what is the easiest and safest way to update the model relations?
We need to handle all relations: many to one
, many to many
and one to one
. In all cases, we can split the changes into two parts: before model id
migration to UUID
, and after.
Now, let's examine each step for each relation type in detail.
Few tips at the beginning:
- Make sure that at each stage of the migration, you hold the identifier of the related objects that will allow to recreate the relationships.
- Always create a full backup and test the code before running the migrations. If something goes wrong, it might be challenging or even impossible to undo.
- Be patient. If the model you're migrating has numerous relations, preparing everything might take some time, but don’t give up it’s doable!
💡 Our goal in this data migration is to preserve the references between objects.
Before UUID
migration
This phase comprises two steps:
- Redirect each existing relation to the
old_id
field. - For each relation model, create and populate a new field (
order_token
) that will store the newUUID
values of the relatedorder
instances.
Let’s see how to apply those steps to both many-to-one
, one-to-one
, and many-to-many
relations.
- Many-to-one and one-to-one relationship
The many-to-one
and one-to-one
relations are similar. The only difference is that the one-to-one
relation has an additional unique constraint. Therefore, the method for handling these relations is the same.
In those cases, we need to prepare migrations on the model where the relation is defined. Let’s see the process on an example of Invoice
that has Many-to-one
relation with Order
model:
-
The first step is to define the
UUID
field that will hold the newUUID
values of related objects. In the example below, we have theInvoice
model that is related toOrder
. We're adding a new field,order_token
, which will be filled with the newtoken
value of the correspondingOrder
instances.
class Invoice(models.Model): order = models.ForeignKey( Order, related_name="invoices", null=True, blank=True, on_delete=models.SET_NULL, ) order_token = models.UUIDField(null=True)
The migration as follows:
from django.db import migrations, models class Migration(migrations.Migration): dependencies = [ ("invoice", "0006_invoiceevent_app"), ] operations = [ migrations.AddField( model_name="invoice", name="order_token", field=models.UUIDField(null=True), ), ]
-
The next step is to populate this new field with corresponding
token
values. We could write a Python function for this, but it could be challenging to write an efficient one for a large dataset. The quickest solution is to write a simple SQL operation:
from django.db import migrations class Migration(migrations.Migration): dependencies = [ ("invoice", "0007_invoice_order_token"), ("order", "0137_fulfil_order_token_and_old_id"), ] operations = [ migrations.RunSQL( """ UPDATE invoice_invoice SET order_token = ( SELECT token FROM order_order WHERE invoice_invoice.order_id = order_order.id ) WHERE order_id IS NOT NULL; """, reverse_sql=migrations.RunSQL.noop, ), ]
So, what's going on here? We're updating the
order_token
column in theinvoice_invoice
table, which corresponds to theInvoice
model. For all instances where theOrder
relation exists, we populate the neworder_token
field with thetoken
value from the correspondingOrder
instance. Thistoken
value is sourced from theorder_order
table, where theid
matches theorder_id
frominvoice_invoice
table.Also, note that the dependencies list is extended with the order migration. We need to ensure that this migration is applied after the migration that populates the new
token
field in the order model.💡 To write an SQL operation, it's essential to know the table and column names. In Django, the table name is formed by joining the app name and model name with an underscore. If you are unsure about your table name, you can always check it in your database shell.
-
The final step is to change the current relation field to point to the
old_id
field. This action will drop the existing foreign key constraint that relies on the primary key constraint, and create a new one for theold_id
field. You can see this change in themodels.py
file:
class Invoice(models.Model): order = models.ForeignKey( Order, related_name="invoices", null=True, blank=True, on_delete=models.SET_NULL, to_field="old_id", ) order_token = models.UUIDField(null=True)
And corresponding migration:
from django.db import migrations, models import django.db.models.deletion class Migration(migrations.Migration): dependencies = [ ("invoice", "0008_fulfill_invoice_order_token"), ] operations = [ migrations.AlterField( model_name="invoice", name="order", field=models.ForeignKey( null=True, on_delete=django.db.models.deletion.SET_NULL, related_name="invoices", to="order.order", to_field="old_id", ), ), ]
Be sure to prepare these migrations for all models that depend on a changing model.
- Many-to-many relations
For many_to_many
relations, we need to perform similar steps but operate on the cross-reference table created for this relation. This involves writing an SQL operation to execute the changes.
We’ll analyze the steps in the example of the relation between GiftCard
and Order
.
class Order(models.Model):
gift_cards = models.ManyToManyField(
GiftCard,
blank=True,
related_name="orders"
)
-
Similar to the previous steps, firstly we will add a new column to store the new
token
values.
ALTER TABLE order_order_gift_cards ADD COLUMN order_token uuid;
-
Next, we aim to fill the newly added column with corresponding values. In this case, we don't need to check if the column is empty since all instances of cross-reference table must have a value.
UPDATE order_order_gift_cards SET order_token = ( SELECT token FROM order_order WHERE order_order_gift_cards.order_id = order_order.id );
-
Now, we can set the condition that the new column
order_token
must not be null:
ALTER TABLE order_order_gift_cards ALTER COLUMN order_token SET NOT NULL;
-
The next step is to remove the old foreign constraint for the column that points to the current
id
field. You need to find the name of this constraint. As before, you can check it in thepsql
command line or the administration platform - pgAdmin.
ALTER TABLE order_order_gift_cards DROP CONSTRAINT order_order_gift_cards_order_id_ce5608c4_fk_order_order_id;
-
The final step involves creating a new foreign key constraint for our column that stores token values.
ALTER TABLE order_order_gift_cards ADD CONSTRAINT order_order_gift_cards_order_id_fk_order_order_old_id FOREIGN KEY (order_id) REFERENCES order_order (old_id);
Now, let's consolidate everything into the migration:
from django.db import migrations
class Migration(migrations.Migration):
dependencies = [
("order", "0137_fulfil_order_token_and_old_id"),
]
operations = [
migrations.RunSQL(
"""
ALTER TABLE order_order_gift_cards
ADD COLUMN order_token uuid;
UPDATE order_order_gift_cards
SET order_token = (
SELECT token
FROM order_order
WHERE order_order_gift_cards.order_id = order_order.id
);
ALTER TABLE order_order_gift_cards
ALTER COLUMN order_token SET NOT NULL;
ALTER TABLE order_order_gift_cards
DROP CONSTRAINT
order_order_gift_cards_order_id_ce5608c4_fk_order_order_id;
ALTER TABLE order_order_gift_cards
ADD CONSTRAINT order_order_gift_cards_order_id_fk_order_order_old_id
FOREIGN KEY (order_id) REFERENCES order_order (old_id);
""",
reverse_sql=migrations.RunSQL.noop,
),
]
As before, remember to create corresponding migrations for all many-to-many
relationships linked to the model you're modifying.
Afterward, we are prepared to change the primary key to the UUID
value.
Set UUID
column as the primary key
Modifying the model pk
also involves two steps, but both can be executed within a single migration.
-
First, we aim to convert our
token
value into a primary key. Django will automatically eliminate the oldid
field along with the primary key constraint. The modifications in the models.py file will appear as follows:
class Order(models.Model): token = models.UUIDField( primary_key=True, editable=False, unique=True, default=uuid4 ) old_id = models.PositiveIntegerField( unique=True, null=True, blank=True )
-
Next, we will rename the
token
toid
to achieve the target state:
class Order(models.Model): id = models.UUIDField( primary_key=True, editable=False, unique=True, default=uuid4 ) old_id = models.PositiveIntegerField( unique=True, null=True, blank=True )
Here is the migration that will combine both changes:
from django.db import migrations, models import uuid class Migration(migrations.Migration): dependencies = [ ("order", "0138_alter_order_gift_cards"), ("invoice", "0008_fulfill_invoice_order_token"), ] operations = [ migrations.RemoveField( model_name="order", name="id", ), migrations.AlterField( model_name="order", name="token", field=models.UUIDField( default=uuid.uuid4, editable=False, primary_key=True, serialize=False, unique=True, ), ), migrations.RenameField( model_name="order", old_name="token", new_name="id", ), ]
What’s important, the migration's dependencies list must include all migrations that set token values on new temporary fields and redirect relation to old_id
. In this context, the required migration to include in the dependencies list is 0008_fulfill_invoice_order_token
from the invoice app.
After UUID
migration
Now we are almost done. The last step is to rewrite the relations to point at the new primary key field instead of old_id
.
The steps for this phase applied to the relation models are:
- Change the field that keeps new UUID values (
order_token
) into the relation field -ForeginKey
- Remove the old relation field
- Rename the new relation field to the initial name
Let’s analyze those in our example.
- Many-to-one and one-to-one relationship
The steps for rewriting the many-to-one
and one-to-one
relationships can be combined into one migration. Each step will be explained separately, and then we'll consolidate them into a single migration.
-
Firstly, our goal is to convert all temporary fields storing instances'
token
values into relational fields. Specifically, we will convert theInvoice.order_token
UUID
field into aForeignKey
, as follows:
class Invoice(models.Model): order = models.ForeignKey( Order, related_name="invoices", null=True, blank=True, on_delete=models.SET_NULL, to_field="old_id", ) order_token = models.ForeignKey( Order, null=False, blank=False, on_delete=models.CASCADE, )
Migration operation:
migrations.AlterField( model_name="invoice", name="order_token", field=models.ForeignKey( on_delete=django.db.models.deletion.CASCADE, to="order.order" ), )
-
The next step is to delete the previous relation field which was pointing to
old_id
. It's no longer needed since we now have a field pointing to the newOrder
primary key. As a result, theInvoice
model will only retain theorder_token
field.
class Invoice(models.Model): order_token = models.ForeignKey( Order, null=False, blank=False, on_delete=models.CASCADE, )
Migration operation:
migrations.RemoveField( model_name="invoice", name="order", )
-
Next, we will rename the
order_token
field to match the name of the field we previously removed. So we are changingorder_token
toorder
field.
class Invoice(models.Model): order = models.ForeignKey( Order, null=False, blank=False, on_delete=models.CASCADE, )
Migration operation:
migrations.RenameField( model_name="invoice", old_name="order_token", new_name="order", )
-
The final step involves modifying our new relational field to match the original one. In our case, the only change is to set the proper
related_name
value.
class Invoice(models.Model): order = models.ForeignKey( Order, null=False, blank=False, on_delete=models.CASCADE, related_name="invoices", )
Migration operation:
migrations.AlterField( model_name="invoice", name="order", field=models.ForeignKey( on_delete=django.db.models.deletion.CASCADE, related_name="invoices", to="order.order", ), )
Bringing it all together, the migration will look as follows:
from django.db import migrations, models
import django.db.models.deletion
class Migration(migrations.Migration):
dependencies = [
("order", "0139_update_order_pk"),
("invoice", "0009_alter_invoice_order"),
]
operations = [
migrations.AlterField(
model_name="invoice",
name="order_token",
field=models.ForeignKey(
on_delete=django.db.models.deletion.CASCADE, to="order.order"
),
),
migrations.RemoveField(
model_name="invoice",
name="order",
),
migrations.RenameField(
model_name="invoice",
old_name="order_token",
new_name="order",
),
migrations.AlterField(
model_name="invoice",
name="order",
field=models.ForeignKey(
on_delete=django.db.models.deletion.CASCADE,
related_name="invoices",
to="order.order",
),
),
]
Naturally, our migration must depend on the migration from the order module that changes id
to UUID
. If not, we will encounter an error during the migration process.
💡 Remember to apply the changes to all corresponding models.
- Many-to-many relations
Changes to many-to-many
relationships, as before, are more complex. We need to perform the same changes as for many-to-one
relationships, but in the proxy table using SQL operations.
-
First, we will add the
ForeignKey
constraint to our field that stores thetoken
value of correspondingorder
instances. Since theid
has already been migrated toUUID
, we will point to theid
field of theOrder
model:
ALTER TABLE order_order_gift_cards ADD CONSTRAINT order_order_gift_cards_order_token_fk_order_order_id FOREIGN KEY (order_token) REFERENCES order_order (id);
-
The next step is to remove the constraint that we previously created, which points to the
old_id
field.
ALTER TABLE order_order_gift_cards DROP CONSTRAINT order_order_gift_cards_order_id_fk_order_order_old_id;
-
Now, we can delete the column that holds the old ID values. We wouldn't be able to do this without the previous step.
ALTER TABLE order_order_gift_cards DROP COLUMN order_id;
-
Afterward, rename the column that points to the
UUID
values to the initial name.
ALTER TABLE order_order_gift_cards RENAME COLUMN order_token TO order_id;
-
The final step involves renaming the constraint that was created in
step 1
to match the new column name:
ALTER TABLE order_order_gift_cards RENAME CONSTRAINT order_order_gift_cards_order_token_fk_order_order_id TO order_order_gift_cards_order_id_ce5608c4_fk_order_order_id;
Bringing it all together, the migration will look as follows:
class Migration(migrations.Migration):
dependencies = [
("order", "0139_update_order_pk"),
]
operations = [
# rewrite order - gift cards relation - all operations are performed on
# order_order_gift_cards table which is responsible for order-gift cards
# many to many relation
# - add fk constraint to order id
# - delete constraint to order old_id
# - delete order_id column
# - rename order_token to order_id
# - rename newly added constraint
migrations.RunSQL(
"""
ALTER TABLE order_order_gift_cards
ADD CONSTRAINT order_order_gift_cards_order_token_fk_order_order_id
FOREIGN KEY (order_token) REFERENCES order_order (id);
ALTER TABLE order_order_gift_cards
DROP CONSTRAINT order_order_gift_cards_order_id_fk_order_order_old_id;
ALTER TABLE order_order_gift_cards
DROP COLUMN order_id;
ALTER TABLE order_order_gift_cards
RENAME COLUMN order_token TO order_id;
ALTER TABLE order_order_gift_cards
RENAME CONSTRAINT order_order_gift_cards_order_token_fk_order_order_id
TO order_order_gift_cards_order_id_ce5608c4_fk_order_order_id;
""",
reverse_sql=migrations.RunSQL.noop,
)
]
Again, remember to include the migrations responsible for changing the primary key in your dependencies list.
And that's it. We have returned to the primary stage with all the relations intact, but with the model id
changed to UUID
. ✨
At the end
At the end, you can discard old_id
if it's not needed. Also, reconsider the default ordering. If your default ordering uses pk
, it will no longer work as before, since the instances won't be sorted in the creation order due to the random value of UUID
. If you wish to maintain the order of instance creation, consider adding a creation date to your model, if one doesn't exist, and sort by this value.
The field responsible for the creation date can be added in the same step where token
and old_id
are added. To preserve the old order, fill the instances with the current date and time, replacing the seconds with the old id value.
Conclusion
As you can see, there are lots of steps to do, but I hope that this tutorial helps you go smoothly through this process with an understanding of each step.
You can also check our https://github.com/saleor/saleor repository for real-life examples. We performed such operations on Order
, OrderLine
, and CheckoutLine
models.
Top comments (0)