BIPEDU

my ideas in action

easy backup system with rsync – like Time Machine

Backup systems are good for recovering in case of accidental  lost data. But a more useful feature is the incremental backup where you have access to various snapshots in time like the Time Machine on Apple is doing. To do this in Linux (or any Unix  or alike ) systems is actually very easy.

For example we make a backup every day ( or every internal you want) . We need that  the amount of data transferred is small and not big. Imagine transferring few TB every day ! in case our important data is changing a little bit then we will backup only the modified parts. For this Rsync is the best tool. Everybody knows that. But there is a problem. How can we keep daily snapshots of the data without filling the disk ? For this we will use softlinks,  hardlinks and Rsync options.

So we have to create a script file like this:

#!/bin/bash
date=`date "+%Y-%m-%dT%H-%M-%S"`
rsync -aP --delete --log-file=/tmp/log_backup.log --exclude=lost+found --link-dest=/mnt/sdb2/Backups/current /mnt/sda1/ /mnt/sdb2/Backups/back-$date
rm -f /mnt/sdb2/Backups/current
ln -s /mnt/sdb2/Backups/back-$date /mnt/sdb2/Backups/current

So here I make first a “date” variable that will be used in the name of the backup folder to easily know when that backup/snapshot was made.

Then use the rsync with some parameters (see man rsync for more details):

-a = archive mode ( to send only changed parts)

-P = to give a progress info – (optional)

–delete = to delete the deleted files from backup in case they are removed from source

–log-file = to save the log into a file (optional)

–exclude = to exclude some folders/files from backup . This are relative to source path !!! do not use absolute path here !

–link-dest = link to the latest backup snapshot

/mnt/sda1 = source path (here I backup a whole drive)

/mnt/sdb2/Backups/back-$date  = destination folder , it will contain all the content from the source.

Then by using rm I remove the old link to the old backup ( the “current” link) and then I replace it with a new soft link to the newly created snapshot.

So now whenever I click on “current” I go in fact to the latest backup .

And because every time I make the backup the date is different the old snapshots will be kept. So for every day I will have a snapshot.

To automate this you have to create a cron job to execute the above script at the convenient time.

Example to run at 4:01AM every day:

1  4 * * * /path/to/script

Please notice that only the first time the full backup will take a long time since it will copy the full data. The second time you will run the script it will transfer only the changed files/bits.

Now on the destination folder you will see a “back-xxx” folder for every time you run the script. You can open/read the files from all this folders as it if they are completely independent files. In fact if you run df and du you will see something interesting.

For example if the backup is 600GB and the script is run every day you will see that the df will show the same 600GB used from disk space. But if you run “du -sh /* ”  you will see that each “back-xxx” folder is 600GB each. This is possible because there are only hardlinks to the same data copied. Do not worry, the disk is not full and you should trust the df results and not the du results.

user@box:/mnt/sdb2/Backups$ du  -sh ./*
623.8G    ./back-2014-02-24T17:47:12
623.8G    ./back-2014-02-24T21-46-41
623.8G    ./back-2014-02-25T17-05-02
623.8G    ./back-2014-02-25T18-45-34
0    ./current
user@box:/mnt/sdb2/Backups$ df /mnt/sdb2
Filesystem                Size      Used Available Use% Mounted on
/dev/sdb2                 2.7T    623.9G      1.9T  24% /mnt/sdb2

So the Time Machine is in fact only 3 lines of code in a script plus a cron job ! Easy and everybody can do it !

Adapt the script to your needs. Run it when you want with cron jobs.

At any point in time you can delete old backups ( for example backups older than few weeks). This can also be made with cron plus some scripts.

Advertisements

16 responses to “easy backup system with rsync – like Time Machine

  1. Frankie December 21, 2015 at 10:12 PM

    Hi BIPEDU!

    Thank you for posting this script. I have looked at many, and decided yours seemed to fit my needs perfectly. I am just a novice at scripting, and I am not getting the desired results from my script, and wanted to ask if you could point out where I am going wrong.

    Here is my script:
    #!/bin/bash
    date=`date “+%Y%m%d-%H-%M”`
    rsync -aP –delete –log-file=/tmp/log_backup.log –exclude=”lost+found” –exclude=”Anti-Virus Essentials” –exclude=Nas_Prog –exclude=SmartWare –exclude=plex_conf –exclude=Backup –exclude=TimeMachineBackup –exclude=groupings.db –link-dest=/mnt/USB/USB2_c2/MyCloud/Backups/Current /mnt/HD/HD_a2/ /mnt/USB/USB2_c2/MyCloud/Backups/back-$date
    rm -f /mnt/USB/USB2_c2/MyCloud/Backups/Current
    ln -s /mnt/USB/USB2_c2/MyCloud/Backups/back-$date /mnt/USB/USB2_c2/MyCloud/Backups/Current

    I am getting a bit lost on whats actually happening.

    The script creates a time stamped folder in both the /Backups/Current/ directory, and to the /Backups/ directory. So I have 2 versions of those time stamped folders now in two different directories. Is that what was intended?

    Im confused as to where the actual most complete set of recent backed up files reside within the backup structure now.

    Can you help me clear this up? THANK YOU!!

    • bipedu December 22, 2015 at 8:06 AM

      Hi Frankie,

      I did not tried your script but it seems to be correct.
      The result should be two folders.
      /mnt/USB/USB2_c2/MyCloud/Backups/back-$date
      /mnt/USB/USB2_c2/MyCloud/Backups/Current

      The “current” folder is just a link to the latest location of your backup.
      When you run script multiple times ( ex hourly/daily) then you should get multiple “back-$date” folders , but “current” folder is only one.
      “current” is just a symbolic link to your last backup.

      The purpose of the “Current” link is to provide rsync a reference to the old backup data so that when he does the new copy will transfer only the delta between old backup and new backup, and for the old files rsync will just update the pointer ( this is what –link-dest does).
      The “ls -s /source /destination” command is just for convenience since is not easy to update rsync command each time a new backup is available. Also it shows the latest backup. For me I have a shortcut to the “current” folder (since it is fixed name).

      If I understand well you see two back-$date folders , one in /Backups/Current/ and the other one in /Backups/ ?!?
      If yes then you probably have a mistake somewhere. Usually is a “/” at the end of your path.

      Normally you should get something like this :
      back-20151212
      back-20151213
      back-20151214
      current <– this is just a link to "back-20151214", your last backup

      If you are not sure you can do the following :
      Make a temporary folder somewhere and play there until you are happy with the results. Then you can update the paths to the actual backup data. It is better to practice on a temporary folder until you are satisfied by the results.

      Also please look at "man rsync" and read and understand what each option does (scroll down till you find the detailed description ).

      If you are new to scripting the best way to learn is to play on temporary folders/files.
      It is the safest way to learn since you do not endanger your precious backup data.

      Best regards,
      Bipedu

      • Frankie December 22, 2015 at 9:20 PM

        Thank you so much for your great reply Bipedu!!

        I did just as you suggested. I am just learning this scripting, and I thought this would be a great place to start learning.

        I did as you suggested. I simplified and started with a test directory.

        Here is my new test script.

        root@MyCloud bin # bash testbackup.sh
        sending incremental file list
        dry_total_size:63139
        created directory /mnt/USB/USB2_c2/TestBackup/back-2015-12-22T12-13
        ./
        .DS_Store
        10244 100% 0.00kB/s 0:00:00 (xfer#1, to-check=12/14)
        ._.DS_Store
        4096 100% 1.30MB/s 0:00:00 (xfer#2, to-check=11/14)
        Test 1/
        Test 1/._Testdoc2.txt
        4096 100% 666.67kB/s 0:00:00 (xfer#3, to-check=8/14)
        Test 1/Testdoc1.rtf
        4411 100% 615.37kB/s 0:00:00 (xfer#4, to-check=7/14)
        Test 1/Testdoc2.txt
        3929 100% 548.13kB/s 0:00:00 (xfer#5, to-check=6/14)
        Test 2/
        Test 2/._Testdoc3.xt.txt
        4096 100% 500.00kB/s 0:00:00 (xfer#6, to-check=5/14)
        Test 2/._Testdoc4txt.txt
        4096 100% 307.69kB/s 0:00:00 (xfer#7, to-check=4/14)
        Test 2/._Testdoc5.txt
        4096 100% 285.71kB/s 0:00:00 (xfer#8, to-check=3/14)
        Test 2/Testdoc3.xt.txt
        3929 100% 255.79kB/s 0:00:00 (xfer#9, to-check=2/14)
        Test 2/Testdoc4txt.txt
        3929 100% 239.81kB/s 0:00:00 (xfer#10, to-check=1/14)
        Test 2/Testdoc5.txt
        3929 100% 239.81kB/s 0:00:00 (xfer#11, to-check=0/14)

        sent 51690 bytes received 232 bytes 103844.00 bytes/sec
        total size is 50851 speedup is 0.98
        rm: /mnt/USB/USB2_c2/TestBackup/Current: is a directory
        root@MyCloud bin #

        The problem is that the rm command is returning a message that ‘is a directory’, and I think the script can not complete because of this.

        I tried to rm -rf, and it puts the files in the ‘Current’ folder, but seems to hide the back- folder. Once I delete the ‘Current’ folder, the back- folder appears again.

  2. Frankie December 22, 2015 at 9:59 PM

    oh! I forgot to post the test script.

    #!/bin/bash
    date=`date “+%Y-%m-%dT%H-%M”`
    rsync -aP –delete –log-file=/tmp/log_backup.log –link-dest=/mnt/USB/USB2_c2/TestBackup/Current/ /mnt/HD/HD_a2/Test/ /mnt/USB/USB2_c2/Test
    rm -f /mnt/USB/USB2_c2/TestBackup/Current/
    ln -s /mnt/USB/USB2_c2/TestBackup/back-$date /mnt/USB/USB2_c2/TestBackup/Current

    • Frankie December 23, 2015 at 2:40 AM

      Please forgive me. Very new at this. Here is the script that I use above.

      #!/bin/bash
      date=`date “+%Y-%m-%dT%H-%M-%S”`
      rsync -aP –delete –log-file=/tmp/log_backup.log –link-dest=/mnt/USB/USB2_c2/TestBackup/Current/ /mnt/HD/HD_a2/Test/ /mnt/USB/USB2_c2/TestBackup/back-$date
      rm -f /mnt/USB/USB2_c2/TestBackup/Current/
      ln -s /mnt/USB/USB2_c2/TestBackup/back-$date /mnt/USB/USB2_c2/TestBackup/Current

  3. Frankie December 23, 2015 at 7:34 AM

    I think I have narrowed down my problem to the rm -f command. Regardless of what I do, it returns a message of “rm: /mnt/USB/USB2_c2/TestBackup/Current: is a directory”, which I believe is causing that duplication of the time stamped directories. I have tried rm -rf, and that does delete the directory, but the current folder is gone once the script is complete.

    I think im close!! Thanks again. Your script inspired me to dig in to scripting. Thanks for that!

    • bipedu December 24, 2015 at 12:08 AM

      Hi Frankie,

      There is a “/” in your script at the end of rm command line.
      rm -f /mnt/USB/USB2_c2/TestBackup/Current/ <– the "/" at the end should not be there

      Best regards,
      Bipedu

  4. FiL February 22, 2016 at 5:16 PM

    @ bipedu
    Really interesting!
    I always tried to replicate the Time Machine on Linux, I have to test your script!
    One question: How can I remove old backups if the destination Hard Drive is full?

    Time Machine can do that…
    Many thanks! 🙂

  5. FiL February 23, 2016 at 6:27 PM

    @ bipedu
    I tried this script and only works with locally mounted volumes on my Mac (10.11.3), HFS+ and AFP files systems.

    I also tried it with a remote mount volume in AFP (over TCP), but avery times it takes a lot to backup because needs to transfer everything.

    Now I am trying to adapt the script over SSH but I think that it is not necessary:

    This “Time Machine mechanism” in my opinion, works only with locally mounted volumes (inside LAN or devices attached).

    😦

    • bipedu February 23, 2016 at 11:49 PM

      Well..you already have xos so why not using the original time machine?
      My post was intended for linux users that want to replicate time machine functionality. I only tested on LAN . Now I am not using it anyway. I replaced it with freenas and zfs snapshots.

  6. FiL February 24, 2016 at 8:01 AM

    I am trying this way because I would like to backup my Mac with Time Machine outside the LAN.
    Do you think your script can be adapted through the use of ssh?
    Can you help me?

    Thank you and regards.
    Filippo 🙂

  7. TechnoPhil - Filippo Righi February 24, 2016 at 10:09 AM

    Ok many thanks for your links,
    I will try myself to adapt the script through SSH.
    Regards.

  8. TechnoPhil - Filippo Righi February 27, 2016 at 9:05 AM

    @ bipedu
    Hello there, please can you help me to adapt your script through SSH under Linux?
    I am not able to do that …
    Thanks!

  9. prazetyo October 18, 2016 at 5:22 AM

    when I’m running your script for first time, it’s always says that –link-dest arg does not exist: /mnt/BackupRaid/current. And then I push Ctrl+C to stop.
    When I running your script for second time, the message about –link-dest arg does not exist: is gone.
    Is it okay with that?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: