This post lists the full steps for how to snapshot/restore a Google Cloud Platform (GCP) compute instance, illustrating a few things that might relieve some frustration about performing this "simple operation" in the Googleplex.
Recently I was making a lot of changes to some GitLab virtual machines as I learnt about setting up GitLab Geo, and often I would make mistakes and need to start over. This is very doable with virtual machines managed through infrastructure-as-code with Terraform and Ansible, as we do with the GitLab Environment Toolkit, but it does take about 25 minutes to spin up each new virtual machine and install GitLab.
Twenty-five minutes is amazing compared to doing it all by hand, but I felt it would be faster to use snapshots of the machine instead:
- Build new machines for Geo primary and secondary
- Snapshot them before making the tricky changes
- and then if one breaks, restore the snapshot.
The virtual machines are GCP compute instances, so I needed to learn how to do this with the gcloud compute
command-line interface to GCP.
Snapshot a GCP compute instance
In GCP, one manages the disks, snapshots, and instances separately. To "snapshot and restore an instance", one really snapshots the disk(s) and then swaps in new disks created from the snapshots.
There are 6 steps:
1. Create a snapshot from the instance's disk
This can be done with the instance online, multiple times.
gcloud compute disks snapshot \
DISK_NAME \
--snapshot-names SNAPSHOT_NAME
2. To restore a snapshot, first create a new disk from the snapshot
Depending on the size of the disk, this might take a while, so it's good to do this only when you discover that you need to restore a snapshot.
gcloud compute disks create \
NEW_DISK_NAME \
--source-snapshot SNAPSHOT_NAME
3. Stop the instance
While the snapshot and new disk can be made with the instance still running, you must stop the instance to swap disks.
gcloud compute instances stop \
INSTANCE_NAME
4. Detatch the current disk from the instance
Instances can only have one boot disk.
gcloud compute instances detach-disk \
INSTANCE_NAME \
--disk DISK_NAME
5. Attach the new disk to the instance
gcloud compute instances attach-disk \
INSTANCE_NAME \
--disk NEW_DISK_NAME \
--boot
6. Start the instance
gcloud compute instances start \
INSTANCE_NAME
Consistency with multiple disks
If the instance has multiple disks, then — to maintain consistency of data — it may be necessary to stop the instance for multiple disk snapshots. That would depend on the application and the how the data are distributed over the disks. Also be mindful of which is the --boot
disk.
Useful snapshot and disk commands
Some more commands are helpful in working with disks and snapshots:
gcloud compute disks list
gcloud compute disks delete DISK_NAME
gcloud compute snapshots list
gcloud compute snapshots delete SNAPSHOT_NAME
Why do this? It's too complicated!
Once you have a snapshot, you can create a new instance from it. But creating new disks from snapshots and swapping them in has the advantage that the compute instance itself is kept, with the same IP address, labels, and other attributes.
If it were a spot instance, or you must not stop the old instance, then creating a new instance may be better.
The gcloud
commands allow this separation of concern, so that you can do snapshot operations and create new disks and instances, without stopping the current instances. In a production environment that's the more common scenario: you can't always stop an instance. But it is still possible, and it's conceptually only 3 steps to do it.
Top comments (0)