my ideas in action
easy backup system with rsync – like Time Machine
February 25, 2014Posted by on
Backup systems are good for recovering in case of accidental lost data. But a more useful feature is the incremental backup where you have access to various snapshots in time like the Time Machine on Apple is doing. To do this in Linux (or any Unix or alike ) systems is actually very easy.
For example we make a backup every day ( or every internal you want) . We need that the amount of data transferred is small and not big. Imagine transferring few TB every day ! in case our important data is changing a little bit then we will backup only the modified parts. For this Rsync is the best tool. Everybody knows that. But there is a problem. How can we keep daily snapshots of the data without filling the disk ? For this we will use softlinks, hardlinks and Rsync options.
So we have to create a script file like this:
#!/bin/bash date=`date "+%Y-%m-%dT%H-%M-%S"` rsync -aP --delete --log-file=/tmp/log_backup.log --exclude=lost+found --link-dest=/mnt/sdb2/Backups/current /mnt/sda1/ /mnt/sdb2/Backups/back-$date rm -f /mnt/sdb2/Backups/current ln -s /mnt/sdb2/Backups/back-$date /mnt/sdb2/Backups/current
So here I make first a “date” variable that will be used in the name of the backup folder to easily know when that backup/snapshot was made.
Then use the rsync with some parameters (see man rsync for more details):
-a = archive mode ( to send only changed parts)
-P = to give a progress info – (optional)
–delete = to delete the deleted files from backup in case they are removed from source
–log-file = to save the log into a file (optional)
–exclude = to exclude some folders/files from backup . This are relative to source path !!! do not use absolute path here !
–link-dest = link to the latest backup snapshot
/mnt/sda1 = source path (here I backup a whole drive)
/mnt/sdb2/Backups/back-$date = destination folder , it will contain all the content from the source.
Then by using rm I remove the old link to the old backup ( the “current” link) and then I replace it with a new soft link to the newly created snapshot.
So now whenever I click on “current” I go in fact to the latest backup .
And because every time I make the backup the date is different the old snapshots will be kept. So for every day I will have a snapshot.
To automate this you have to create a cron job to execute the above script at the convenient time.
Example to run at 4:01AM every day:
1 4 * * * /path/to/script
Please notice that only the first time the full backup will take a long time since it will copy the full data. The second time you will run the script it will transfer only the changed files/bits.
Now on the destination folder you will see a “back-xxx” folder for every time you run the script. You can open/read the files from all this folders as it if they are completely independent files. In fact if you run df and du you will see something interesting.
For example if the backup is 600GB and the script is run every day you will see that the df will show the same 600GB used from disk space. But if you run “du -sh /* ” you will see that each “back-xxx” folder is 600GB each. This is possible because there are only hardlinks to the same data copied. Do not worry, the disk is not full and you should trust the df results and not the du results.
user@box:/mnt/sdb2/Backups$ du -sh ./* 623.8G ./back-2014-02-24T17:47:12 623.8G ./back-2014-02-24T21-46-41 623.8G ./back-2014-02-25T17-05-02 623.8G ./back-2014-02-25T18-45-34 0 ./current user@box:/mnt/sdb2/Backups$ df /mnt/sdb2 Filesystem Size Used Available Use% Mounted on /dev/sdb2 2.7T 623.9G 1.9T 24% /mnt/sdb2
So the Time Machine is in fact only 3 lines of code in a script plus a cron job ! Easy and everybody can do it !
Adapt the script to your needs. Run it when you want with cron jobs.
At any point in time you can delete old backups ( for example backups older than few weeks). This can also be made with cron plus some scripts.