Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

provide a way to easily exclude sub-trees from backup #6

Open
lyonel opened this issue Nov 26, 2014 · 5 comments
Open

provide a way to easily exclude sub-trees from backup #6

lyonel opened this issue Nov 26, 2014 · 5 comments

Comments

@lyonel
Copy link
Contributor

lyonel commented Nov 26, 2014

[This isn't an issue, more a feature request, open for discussion]

Many (most?) backup tools allow end-users (i.e. not system administrators) to exclude parts of the files they control from the system backup. It would be great if Snebu had a way to achieve that. Two commonly found implementations:

  • stop diving into a directory that contains a .nobackup file
  • honour nodump filesystem flag (cf. chattr's d attribute)

I tend to prefer the nodump flag as it allows finer-grained control and feels more "natural"

@derekp7
Copy link
Owner

derekp7 commented Nov 27, 2014

The file list collection is handled by the system find command, so I'm kind of limited to what it supplies. For the .nobackup option, I could dump the output of find to a temp file, then use a 2-pass awk script to pull out any directory with a .nobackup file in it.

For nodump, that would most likely require a patch to find. That might be an easy patch, as it already has a section for selecting based on file attributes. Would just have to add an additional flag to process extended attributes such as nodump. Come to think of it, there's actually two options I'd have to add. The first would be to not descend into directories that are marked as nodump, the second would be to skip individual files labeled with nodump. Now, I don't know if the GNU coreutils team will accept such a patch (so far they haven't taken in any of the tar patches that Red Hat put out), but it is worth a try. Maybe I can get it accepted into Fedora.

@lyonel
Copy link
Contributor Author

lyonel commented Nov 27, 2014

The .nobackup is probably easier to implement (filtering out sub-trees in snebu-client is indeed a few command away).
AFAIK GNU tar doesn't honour nodump, but BSD tar and Star do (when called with --nodump and -nodump, respectively). Maybe I'll write a patch to add this option.

@derekp7
Copy link
Owner

derekp7 commented Nov 27, 2014

The only problem with handling this in tar, is that the output of find will still show those files as required, for the backup -- not a big deal, but if you do any backup completion reports, this would always show up as an error. (Pre-made backup reports is one of the features I've got cooking, along with a backup integrity checker.) Also, the code that interprets a tar file was written and tested with GNU tar, so it may need some tweaking to recognize other tar formats. Actually, it was written based off standard tar specifications, with the GNU extensions added in, but not tested extensively against the other tar implementations.

The specific areas I'm concerned about are long file name (greater than 100 characters), sparse file handling (currently supported is GNU and PAX format, or at least PAX as implemented by GNU tar), extended/SElinux attributes, file sizes greater than 8GB, and probably a few other edge cases I'm not thinking of. I believe that most of these were addressed when I added PAX format support (which was required to pick up extended/SElinux attributes), and I've done test cases for each of them, but I'll have to assemble the test cases into a self-contained script and include that in the code repository. That reminds me, I should document all the special considerations I discovered when writing snebu's tar code.

BTW, I did submit a feature request to the findutils mailing list. The list doesn't appear to be that active, so if I don't hear back from them in the next week I'll write the patch for find, and submit it to the Fedora bug tracker, and see where it goes from there.

@derekp7
Copy link
Owner

derekp7 commented Dec 2, 2014

Hey, I got an answer back on one of the GNU mailing lists, pointing to an earlier discussion about adding additional attribute flags to find, at: https://lists.gnu.org/archive/html/findutils-patches/2014-11/msg00000.html. Looks like there was a patch posted, so for now I'm going to take a look at that and make sure I can use it to exclude files with the dump flag set to no (or however it is presented). If that works (probably will be a couple days before I get to it), I'll add a patch to the snebu-client script.

Since this isn't a finalized feature in find, I'll probably keep the patch in a GIT branch for the time being, instead of putting it in the next release. Either that, or I'll add another config option to include/exclude files by arbitrary attributes/flags, and put a pointer to this mailing list patch for those that are interested. Let me know if you think this approach will work out.

Oh, and eventually I'll probably end up making a custom client side binary, which combines what I'm using find and tar for now. It will at the minimum replace the functionality find, since older versions (and non-GNU version) of find don't support all (or any) of the -printf options that are needed. (I'm planning on keeping this potential new client code optional, since one of the current strengths of Snebu is no need to install anything on the client if doing pull-based backups).

@lyonel
Copy link
Contributor Author

lyonel commented Dec 3, 2014

Sounds good, having such -nodump predicate in find will definitely be
beneficial to everybody. In the meantime, .nobackup can be a valid option
(and usable even on nodump-incompatible filesystems)....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants