Consequences of restoring backups after the persistent data model has been changed

Goal

The ability to restore environment backups is a powerful tool, but only when backups are taken regularly to reflect the rapid changes that can occur over the course of the development of your application. Failing to do so has some consequences, as older backups that don’t reflect major changes can result in data loss once restored.

The goal in this tutorial is to explore some of these consequences, and to illustrate how taking the time to setup scheduled backups can alleviate plenty of headaches down the line.

Preparation

You will need:

Problems

Platform.sh gives you the ability to quickly create backups of the state of your environments, and then easily restore those backups to those environments should you need to. This is not yet an automatic process, and both creation and restoration must be executed manually by the user.

But, of course, people forget to take frequent backups - they’re busy developing. It’s for this reason that we often recommend users install the Platform.sh CLI into their application containers in order to automate this process.

If you have not yet set up automatic backups as described in the above How-to, you may very well find yourself in the following situation.

  • I’ve created a backup close to the start of an environment’s history.
  • I’ve made a lot of changes to that environment (added services, added data to those services, created new mounts, etc.).
  • Something went wrong.
  • Well, look, I have a backup right here. But it’s old. What will happen?

This tutorial is meant to explore this exact scenario, with the intended takeaways:

  • the importance of setting up automatic backups on your projects using the How-to above as a guide (so you don’t find yourself in this situation in the first place).
  • some greater understanding of how Platform.sh handles your data during backups, restores, and syncs.

Steps

Creating backups

note

This tutorial uses a modified version of our Platform.sh Language Examples project. That project is intended to show how to interact with all of our managed services with each of our runtimes. It’s a great resource, and it even powers most of the examples in our public documentation, but it’s gigantic, and much more than we need for this tutorial.

We removed all of the runtimes except for Python, leaving us with only the main and python application container directories. We also removed all of the services save postgresql, but we will be adding a few of them back in the steps below. It is not necessary that you use the same repository if going through this tutorial step by step, just try to match the steps in your own project.

1. Create a new branch

Assuming you have already created a project and initialized it with some code, create a new environment

platform branch add-mongo master

2. Create a backup

Then create a backup of that environment in its current state.

platform backup:create
Creating a backup of add-mongo
Waiting for the activity is5ytp2lbm7p4 (Chad Carlson created a backup of add-mongo):

Creating snapshot of add-mongo
Created backup bivzemjpeqvohqey2t7fo7vs5m
[============================] 15 secs (complete)
Activity is5ytp2lbm7p4 succeeded
Backup name: bivzemjpeqvohqey2t7fo7vs5m

So what happened here? Each service, which includes applications, has its own persistent storage. During a backup, a copy is made for each of them. It’s the collection of these backups that then makes up backup bivzemjpeqvohqey2t7fo7vs5m.

Adding a new service

1. Configure a new service

First, let’s add a new service to the project, one that did not exist when the backup was created. Modify your services.yaml file to include MongoDB:

dbmongo:
    type: mongodb:3.6
    disk: 1024
    size: S

and your .platform.app.yaml to include the new relationship:

relationships:
    mongodb: 'dbmongo:mongodb'

Then commit and push the changes to Platform.sh.

git add .
git commit -m "Adds mongodb."
git push platform add-mongo

Platform.sh will provision MongoDB to your virtual cluster, and expose the following credentials in your PLATFORM_RELATIONSHIPS environment variable:

{
  "username": "main",
  "scheme": "mongodb",
  "service": "mongodb",
  "ip": "169.254.117.167",
  "hostname": "ldh423mk2e7o6qto2syljqbg5u.mongodb.service._.eu-3.platformsh.site",
  "cluster": "rjify4yjcwxaa-master-7rqtwti",
  "host": "mongodb.internal",
  "rel": "mongodb",
  "path": "main",
  "query": {
    "is_master": true
  },
  "password": "main",
  "type": "mongodb:3.6",
  "port": 27017
}

2. Develop and verify

At this point, MongoDB is available for you to develop with. Over the next few days, you may make the following changes as you are developing on the environment:

  • you create a new collection in main called starwars.
  • your application adds a number of documents to that collection. (In the case of our language examples project, the Python app adds a document with the contents {"name": "Rey", "occupation": "Jedi"} as a test each time the site is visited)

You can verify those documents have been added locally by opening an SSH tunnel to the service (platform:tunnel single -r mongodb) and then connecting to MongoDB via that tunnel and the credentials above:

$ mongo --port 30000 -u main -p main --authenticationDatabase main
MongoDB shell version v4.0.3
> use main
switched to db main
> show collections
starwars
> db.starwars.find()
{ "_id" : ObjectId("5e4457d05908440effc53a20"), "name" : "Rey", "occupation" : "Jedi" }
{ "_id" : ObjectId("5e4457d35908440effc53a22"), "name" : "Rey", "occupation" : "Jedi" }
{ "_id" : ObjectId("5e4457d75908440effc53a24"), "name" : "Rey", "occupation" : "Jedi" }
{ "_id" : ObjectId("5e4457d95908440effc53a26"), "name" : "Rey", "occupation" : "Jedi" }
{ "_id" : ObjectId("5e44585c5908440effc53a28"), "name" : "Rey", "occupation" : "Jedi" }
>

There’s all the newly added data.

3. Restore the backup

Now, if we restore the backup we created at the beginning (before MongoDB was a part of the cluster) using the command

platform backup:restore bivzemjpeqvohqey2t7fo7vs5m

The restore will do a few things:

  • All persistent data currently in that environment is wiped.
  • Backup bivzemjpeqvohqey2t7fo7vs5m, which contains a backup of each service present when it was taken, is then applied to the containers present in the environment one by one.

Because of that first point, and because no backup for MongoDB exists in backup bivzemjpeqvohqey2t7fo7vs5m, all data and code pertaining to MongoDB is erased before anything else happens. If you attempt to open a tunnel and locally connect to MongoDB straight away, the service won’t even be recognized as existing in the project.

Run platform redeploy, re-open the tunnel to MongoDB, and then repeat the prior steps to connect to MongoDB:

$ mongo --port 30000 -u main -p main --authenticationDatabase main
MongoDB shell version v4.0.3
> use main
switched to db main
> show collections
starwars
>

As you can see, the added collection no longer exists on the service. This is our first example of why it’s important to create backups regularly. If one had been taken some time after MongoDB existed, we would still have been able to keep some of its data.

Mounts

1. Configure a new mount

Another case where this would be relevant is the addition of new mounts to the project. If you were to create a mount that was not included in backup bivzemjpeqvohqey2t7fo7vs5m, would the files in that mount be erased?

You can probably guess already based on the previous example, but let’s find out. You can add a mount to an application by adding the following lines to your .platform.app.yaml file:

mounts:
    'add-mount':
        source: local
        source_path: add-mount

Then commit and push to Platform.sh:

git add .
git commit -m "Add a mount."
git push platform add-mongo

2. Add data and verify

Let’s just create a simple file

mkdir mount-data && touch mount-data/test.txt
echo "Here's our mounted data on Platform.sh." >> mount-data/test.txt

and then upload it to the newly defined mount:

platform mount:upload --mount add-mount --source ./mount-data

You can then verify that the file was uploaded to the project by runing platform ssh to SSH into the application container. Then run,

web@app.0:~$ echo "$(<add-mount/test.txt)"
Here's our mounted data on Platform.sh.

3. Restore the backup

Looks good - now let’s restore the environment from the backup.

platform backup:restore bivzemjpeqvohqey2t7fo7vs5m

Once again, if we SSH into the container immediately after the backup is restored, the add-mount mount will not be present. After a platform redeploy however, it is, but alas, our data has been deleted:

web@python.0:~$ ls
Pipfile  Pipfile.lock  README.md  add-mount  examples  setup.py  web
web@python.0:~$ echo "$(<add-mount/test.txt)"
-bash: add-mount/test.txt: No such file or directory

Conclusion

In this tutorial we saw that:

  1. The first step during backup restoration wipes all persistent data from the current environment.
  2. All data after that backup was created - including service configuration - will be lost. We saw first hand that this is the case when new service containers are added, as well as mounted data.
  3. There’s a super simple way to configure automatic backups on your projects that will help mitigate this whole mess in the first place.

Read the above How-to and start protecting your data!