Skip to content

Contains a Python script which makes Rsync backupping more efficient because of detecting file and directory renames / moves

License

Notifications You must be signed in to change notification settings

wapsi/smart-rsync-backup

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

smart-rsync-backup

Contains a Python script which makes Rsync backupping more efficient because of detecting file and directory renames / moves + sample Bash script which wraps up the Python script and Rsync commands so the automated backups of certain directories are easy to setup.

DISCLAIMER: Please note that I won't take any responsibility if the scripts here delete your files or do other damage to your files/systems/hardware etc.

Background: I had an issue when I was taking backups over the Internet to another Linux server by using Rsync: If I renamed some directory or file in the source directory the Rsync transferred the whole file or the whole directory again even the data was actually at the destination already, it was just moved (or renamed). I did some research on the Internet and found that there are couple of patches available for Rsync to improve the detection of file moves: detect-renamed.diff and detect-renamed-lax.diff, see: https://github.com/WayneD/rsync-patches But even I applied those patches and compiled the Rsync, it didn't detect if I moved a whole directory. Rsync re-transferred all the files inside this "new" directory even it was really an existing directory which had just renamed. Then I did some more research and found couple of other possible solutions (like https://sourceforge.net/projects/movesync/ and https://github.com/m-manu/rsync-sidekick) but those didn't meet my needs. Finally I found one good solution that can be easily modified / improved to meet my needs / scenario: http://www.pkrc.net/detect-inode-moves.html I put some new features and made other changes as well in the script:

  • Possibility to exclude directories from the scan
    • Example: I've /foo/bar/.btrfs/ directory which contains BTRFS snapshots and os.walk() went thru all those snapshots which is unnecessary and slows down the script significantly
  • Handles cross/sub mounts inside the directory to be scanned for file/dir moves
    • Or: Sub mounts can be excluded completely by using --one-file-system argument with the detect_inode_moves.py script
  • Excludes all files which have hard links (Rsync can handles these quite well so detect_inode_moves.py exclude them automatically)
  • Possibility to change the source dir to something else in the final output script generated by detect_inode_moves.py (detect)
  • Use argparse for better argument handling

Example: Create the first inode dump file:

./detect_inode_moves.py -d /home -o /tmp/home-inodes.txt -e '.*/\.btrfs$' -r rsync-excluded-dirs.txt

Do some file or directory moves/renames now and then run the detection:

./detect_inode_moves.py -i /tmp/home-inodes.txt -o home-renames.py -d /data/backups/home

Copy and run the generated rename script on the remote host:

scp home-renames.py user@remotehost:
ssh user@remotehost home-renames.py

Then run the actual Rsync:

rsync --exclude-from rsync-excluded-dirs.txt --fuzzy -HAav --delete --numeric-ids /home/ user@remotehost:/data/backups/home

Finally you should generate a new inode dump file which will be used next time by the detect_inode_moves.py -d /home -o /tmp/home-inodes.txt ...:

./detect_inode_moves.py -d /home -o /tmp/home-inodes.txt -e '.*/\.btrfs$' -r rsync-excluded-dirs.txt

I also created a separate Bash script which automatically generates the correct detect_inode_moves.py and Rsync commands depending the parameters I've set on the beginning of the script.

The detect_inode_moves.py and smart-backup.sh scripts have been tested on GNU/Linux only and they won't most probably work on Windows at all (because it uses /proc/mounts to detect mount points for example).

Requirements:

  • detect_inode_moves.py: Python3
  • smart-backup.sh: smart-backup.sh config file, detect_inode_moves.py, SSH and Rsync

Big thanks to Pavel Krc (https://gist.github.com/rolicot) for creating the initial version of detect_inode_moves.py and his idea to use inode numbers to identify the existing / moved files or directories!

About

Contains a Python script which makes Rsync backupping more efficient because of detecting file and directory renames / moves

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published