Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

znapzend doesn't ship snapshots to remote DST, if a clone is present on the DST dataset #116

Open
budachst opened this issue Dec 16, 2014 · 20 comments
Labels

Comments

@budachst
Copy link

It seems, that znapzend tries to clear our all other snapshots on the destination, which doesn't "belong" to its backup plan. The issue is, that I am using a clone to back up files from the NFS which take considerably longer than than the interval is set up to. I only found that out through trial and error, though.

If a clone is present on the destination the removal of the underlying snapshot is prevented and trying to remove it, will result in an error from zfs destroy. However, znapzend then bails out and even doesen't attempt to continue with is actions according to it's backup plan.

What is the reasonable behind this? I don't see, where an existing clone would interfere with znapzend's backup in any way.

Having the clone on the source also doesn't do any good since, znapzend seems not capable of determining that the the dataset it's about to chew on is actually a clone and thus starts to perform a complete zfs send/recv to the destination.

-Stephan

@hadfl
Copy link
Collaborator

hadfl commented Dec 16, 2014

to be honest, we haven't paid any attention on clones, as this has not been a requirement, yet.

hope to find some time to have a look at it.

you can try znapzend with --features=oracleMode which will prevent a combined zfs destroy but will destroy all snapshots individually (according to the retention policy). if one destroy fails, znapzend will continue with the next one...

@budachst
Copy link
Author

You meant to pass --features=oracleMode in the startup skript? I'll give it a spin…

@budachst
Copy link
Author

Bummer, this didn't work… unfortuanetly. Seems, that I will have to work around this issue.

@hadfl
Copy link
Collaborator

hadfl commented Dec 16, 2014

hmm you can always do a znapzend --debug --noaction --runonce=<dataset>

the output will show you all the invoked zfs commands. you can execute them manually to see where it fails.

do you run znapzend with log level debug? what does the log tell you about it?

@budachst
Copy link
Author

I think that the default debug level is set to debug, no? Anyway, I have just fired up a znapzend from the terminal:

root@nfsvmpool01:/opt/VSM/InstallAgent# znapzend --debug --noaction --runonce=sasTank
[Tue Dec 16 22:04:10 2014] [info] refreshing backup plans...
ERROR: no backup set defined or enabled, yet. run 'znapzendzetup' to setup znapzend
root@nfsvmpool01:/opt/VSM/InstallAgent# znapzend --debug --noaction --runonce=sasTank/nfsvmpool01sas
[Tue Dec 16 22:04:20 2014] [info] refreshing backup plans...
[Tue Dec 16 22:04:22 2014] [info] found a valid backup plan for sasTank/nfsvmpool01sas...
[Tue Dec 16 22:04:22 2014] [debug] snapshot worker for sasTank/nfsvmpool01sas spawned (2851)
[Tue Dec 16 22:04:22 2014] [info] creating snapshot on sasTank/nfsvmpool01sas

zfs snapshot sasTank/nfsvmpool01sas@2014-12-16-220422

[Tue Dec 16 22:04:22 2014] [debug] snapshot worker for sasTank/nfsvmpool01sas done (2851)
[Tue Dec 16 22:04:22 2014] [debug] send/receive worker for sasTank/nfsvmpool01sas spawned (2852)
[Tue Dec 16 22:04:22 2014] [info] starting work on backupSet sasTank/nfsvmpool01sas
[Tue Dec 16 22:04:22 2014] [debug] sending snapshots from sasTank/nfsvmpool01sas to [email protected]:sasTank/nfsvmpool01sas

zfs list -H -o name -t snapshot -s creation -d 1 sasTank/nfsvmpool01sas

ssh -o Compression=yes -o CompressionLevel=1 -o Cipher=arcfour -o batchMode=yes -o ConnectTimeout=30 [email protected] zfs list -H -o name -t snapshot -s creation -d 1 sasTank/nfsvmpool01sas

ssh -o Compression=yes -o CompressionLevel=1 -o Cipher=arcfour -o batchMode=yes -o ConnectTimeout=30 [email protected] zfs list -H -o name -t snapshot -s creation -d 1 sasTank/nfsvmpool01sas

[Tue Dec 16 22:04:23 2014] [debug] cleaning up snapshots on [email protected]:sasTank/nfsvmpool01sas

zfs list -H -o name -t snapshot -s creation -d 1 sasTank/nfsvmpool01sas

[Tue Dec 16 22:04:23 2014] [debug] cleaning up snapshots on sasTank/nfsvmpool01sas

zfs destroy sasTank/nfsvmpool01sas@2014-12-15-200000

[Tue Dec 16 22:04:23 2014] [info] done with backupset sasTank/nfsvmpool01sas in 1 seconds
[Tue Dec 16 22:04:23 2014] [debug] send/receive worker for sasTank/nfsvmpool01sas done (2852)

I can't actually see, which zfs command causes this issue, as none get's output to the terminal. The zfs list commands do not cause any issues on either side.

@hadfl
Copy link
Collaborator

hadfl commented Dec 16, 2014

--noaction won't snapshot or send/receive or destroy but just print the statements as debug info.

did you run the commands above manually? alternatively you can redo a run w/o noaction and see where it fails. w/o noaction you'll get error messages from zfs if something goes wrong...

@budachst
Copy link
Author

Well… it seems that the receiver reports that the snapshot that the sender wants to ship, actually already exists… which is not the case, however:

root@nfsvmpool01:/opt/VSM/InstallAgent# zfs list -r sasTank
NAME USED AVAIL REFER MOUNTPOINT
sasTank 132G 1,47T 32K /sasTank
sasTank@2014-12-15-160000 0 - 32K -
sasTank@2014-12-15-200000 0 - 32K -
sasTank/nfsvmpool01sas 132G 1,47T 121G /sasTank/nfsvmpool01sas
sasTank/nfsvmpool01sas@2014-12-16-000000 2,65G - 121G -
sasTank/nfsvmpool01sas@2014-12-16-040000 1,21G - 121G -
sasTank/nfsvmpool01sas@2014-12-16-080000 877M - 121G -
sasTank/nfsvmpool01sas@2014-12-16-120000 1,03G - 121G -
sasTank/nfsvmpool01sas@2014-12-16-160000 977M - 121G -
sasTank/nfsvmpool01sas@2014-12-16-200000 641M - 121G -
sasTank/nfsvmpool01sas@2014-12-16-222700 37,0M - 121G -
root@nfsvmpool01:/opt/VSM/InstallAgent# ssh -o Compression=yes -o CompressionLevel=1 -o Cipher=arcfour -o batchMode=yes -o ConnectTimeout=30 '[email protected]' '/opt/csw/bin/mbuffer -q -s 128k -W 60 -m 1G -4 -I 10003|zfs recv -F sasTank/nfsvmpool01sas'
cannot restore to sasTank/nfsvmpool01sas@2014-12-16-222700: destination already exists

On the destination:
root@nfsvmpool02:/usr/local/de.jvm.scripts# zfs list -r sasTank
NAME USED AVAIL REFER MOUNTPOINT
sasTank 66,5G 1,51T 26K /sasTank
sasTank@2014-12-15-160000 1K - 26K -
sasTank@2014-12-15-200000 0 - 26K -
sasTank/nfsvmpool01sas 66,5G 1,51T 54,1G /sasTank/nfsvmpool01sas
sasTank/nfsvmpool01sas@2014-12-13-120000 434M - 54,3G -
sasTank/nfsvmpool01sas@2014-12-13-160000 250M - 54,3G -
sasTank/nfsvmpool01sas@2014-12-13-200000 255M - 54,3G -
sasTank/nfsvmpool01sas@2014-12-14-000000 336M - 54,2G -
sasTank/nfsvmpool01sas@2014-12-14-040000 304M - 54,0G -
sasTank/nfsvmpool01sas@2014-12-14-080000 246M - 54,1G -
sasTank/nfsvmpool01sas@2014-12-14-120000 255M - 54,1G -
sasTank/nfsvmpool01sas@2014-12-14-160000 274M - 54,1G -
sasTank/nfsvmpool01sas@2014-12-14-200000 272M - 54,0G -
sasTank/nfsvmpool01sas@2014-12-15-000000 360M - 54,0G -
sasTank/nfsvmpool01sas@2014-12-15-040000 311M - 54,0G -
sasTank/nfsvmpool01sas@2014-12-15-080000 283M - 54,1G -
sasTank/nfsvmpool01sas@2014-12-15-120000 354M - 54,1G -
sasTank/nfsvmpool01sas@2014-12-15-160000 362M - 54,0G -
sasTank/nfsvmpool01sas@2014-12-15-200000 364M - 54,0G -
sasTank/nfsvmpool01sas@2014-12-16-000000 433M - 54,0G -
sasTank/nfsvmpool01sas@2014-12-16-040000 407M - 54,0G -
sasTank/nfsvmpool01sas@2014-12-16-080000 308M - 54,1G -
sasTank/nfsvmpool01sas@2014-12-16-120000 389M - 54,1G -
sasTank/nfsvmpool01sas@2014-12-16-160000 362M - 54,1G -
sasTank/nfsvmpool01sas@2014-12-16-200000 0 - 54,1G -
sasTank/nfsvmpool01sas@_Backup 0 - 54,1G -
sasTank/nfsvmpool01sas_Backup 1K 1,51T 54,1G /sasTank/nfsvmpool01sas_Backup

@hadfl
Copy link
Collaborator

hadfl commented Dec 17, 2014

zfs recv will fail if snapshot(s) exist on the destination which do not exist on the source. i don't know if this only happens when recv is used with the -F option. can you do a try w/o -F

ssh -o Compression=yes -o CompressionLevel=1 -o Cipher=arcfour -o batchMode=yes -o ConnectTimeout=30 '[email protected]' '/opt/csw/bin/mbuffer -q -s 128k -W 60 -m 1G -4 -I 10003|zfs recv sasTank/nfsvmpool01sas'

if that does not work either there is nothing znapzend can do about since this is a limitation of zfs.

@budachst
Copy link
Author

I digged a bit into my own backup setup, where I do have interwoven snapshots, where some of them are created locally and the updates are actually shipped via ssh from a remote source and while I was looking at my old code, I remmbered, having the same issue. zfs recv -F would rollback the destination dataset, which I didn't want and just sending incremental snaps alog with locally created ones, led to the erorr that the destination dataset had been modified and thus the incremental snapshot could not be applied.
I solved this issue by setting the destination dataset to read-only, which enabled me to mix those snaps.
It was the way, i created my retention on the destination dataset, by simply mamaging locally created and deleted snapshots.

I will give that a try in the afternoon, but I am quite confident, that this will do it for me. All I don't know now, is how to get the -F out of the ZFS.pm in a way, that I can test this using znapzend. Maybe you can help me out with that and I take it for a ride…

@hadfl
Copy link
Collaborator

hadfl commented Dec 17, 2014

this should do the trick: hadfl@14cbaeb

@budachst
Copy link
Author

Thanks, I'll give it a try and report back.

@budachst
Copy link
Author

I just tried without -F and on a write-protected dataset… this way, znapsend keeps on shipping its snapshots, while other snapshots may exist at the same time. If the destination dataset is not set to readonly, then ZFS would report that the destination dataset has been modified since the last (z|s)napshot.

I'd suggest to keep -F when transferring the initial snapshot to the destination, but omit it - or make if configurable - on subsequent zfs send/recvs. That way, you are able to pull of regular file-based backups from a snapshot. However, creating a clone does to break this again, but maybe I can get away with a snapshot, thot doesn't change its name. Will keep you posted.

@budachst
Copy link
Author

This works, when performed with a little caution: one must make sure, that the "off-side" snapshot doesn't get destroyed prior to znapzend shipping the next one, because if the "off-side" snapshot gets destroyed, the former snapshot gets altered and if znapzend relies on that one, zfs recv will yield the "dataset has been modified" error and one would have to rollback the destination to the latest znapzend-snapshot.

Howerver, this will work for me, as I am able to make sure, that there is always at least one "znapshot" between the refreshes of my "backup snapshots" and without -F znapzend will happily ship all it's incremental snaps to the destination.

@budachst
Copy link
Author

I have verified that the setup I worked out, works when I run znapzend manuall from the terminal, but when I waited for the next scheduled run, I noticed that the two datasets in question, were not updated.

I then re-checked with two manually run znapzend which both worked as expected. I have now restartet znapzend via svcadm, but I am wondering how that could be. Is a deamonized znapzend not picking up the changes I made to ZFS.pm?

@oetiker
Copy link
Owner

oetiker commented Dec 17, 2014

ZFS.pm only gets reloaded when you restart znapzend

@budachst
Copy link
Author

Indeed… ;) But now, that that had been done, it's working as expected:

admin@nfsvmpool02:/export/home/admin$ zfs list -r sasTank
NAME USED AVAIL REFER MOUNTPOINT
sasTank 70,3G 1,51T 26K /sasTank
sasTank/nfsvmpool01sas 70,3G 1,51T 54,0G /sasTank/nfsvmpool01sas
sasTank/nfsvmpool01sas@2014-12-13-120000 434M - 54,3G -
sasTank/nfsvmpool01sas@2014-12-13-160000 250M - 54,3G -
sasTank/nfsvmpool01sas@2014-12-13-200000 255M - 54,3G -
sasTank/nfsvmpool01sas@2014-12-14-000000 336M - 54,2G -
sasTank/nfsvmpool01sas@2014-12-14-040000 304M - 54,0G -
sasTank/nfsvmpool01sas@2014-12-14-080000 246M - 54,1G -
sasTank/nfsvmpool01sas@2014-12-14-120000 255M - 54,1G -
sasTank/nfsvmpool01sas@2014-12-14-160000 274M - 54,1G -
sasTank/nfsvmpool01sas@2014-12-14-200000 272M - 54,0G -
sasTank/nfsvmpool01sas@2014-12-15-000000 360M - 54,0G -
sasTank/nfsvmpool01sas@2014-12-15-040000 311M - 54,0G -
sasTank/nfsvmpool01sas@2014-12-15-080000 283M - 54,1G -
sasTank/nfsvmpool01sas@2014-12-15-120000 354M - 54,1G -
sasTank/nfsvmpool01sas@2014-12-15-160000 362M - 54,0G -
sasTank/nfsvmpool01sas@2014-12-15-200000 364M - 54,0G -
sasTank/nfsvmpool01sas@2014-12-16-000000 433M - 54,0G -
sasTank/nfsvmpool01sas@2014-12-16-040000 407M - 54,0G -
sasTank/nfsvmpool01sas@2014-12-16-080000 308M - 54,1G -
sasTank/nfsvmpool01sas@2014-12-16-120000 389M - 54,1G -
sasTank/nfsvmpool01sas@2014-12-16-160000 362M - 54,1G -
sasTank/nfsvmpool01sas@2014-12-16-200000 454M - 54,1G -
sasTank/nfsvmpool01sas@2014-12-17-120000 480M - 54,0G -
sasTank/nfsvmpool01sas@2014-12-17-160000 1K - 54,0G -
sasTank/nfsvmpool01sas@VSMBackup 1K - 54,0G -
sasTank/nfsvmpool01sas@2014-12-17-200000 338M - 54,0G -
sasTank/nfsvmpool01sas@2014-12-18-000000 358M - 54,0G -
sasTank/nfsvmpool01sas@2014-12-18-040000 0 - 54,0G -

But as you can see, znapzend is now happily sending it's snapshots, without being messed up by the locally generated snapshot.

@hadfl
Copy link
Collaborator

hadfl commented Dec 18, 2014

i think the -F has been introduced in the very beginning when is was not clear, yet, how/if znapzend can handle "foreign" snapshots.

now that we know, it can, we could omit the -F if there is a common snapshot on source and destination and an incremental send can be done.

the fix should also cover proper handling of foreing snapshots on the source side. as znapzend always uses -I foreing snapshots in between of own snapshots get sent, too.

there will be a fix, i just can't promise when that will happen...

@budachst
Copy link
Author

I think the -F should be sent, when initially trying to ship a snapshot, especially, if you are trying to recreate exactly the same dataset structue. It's the same with my own scripts, where initial snapshots had to be sent using -F, as otherwise ZFS would not create those on an existing zpool.
And as far as the fix goes - I am happy with the manual changes… ;)

znapzend is now humming away just nicely and it is giving me the advantage of using mbuffer which was something I backed off from, when I created my own solution.

@oetiker oetiker added the bug label Jan 17, 2015
@ser
Copy link

ser commented Aug 3, 2017

Any progress with this bug? One of my backup servers is located on a network link which sometimes fails, and unfortunately it seems to prevent znapzend from working properly :(

@emacsomancer
Copy link

I have a laptop which can't always talk to the backup server. I wish znapzend could somehow automatically do something other than just failing here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants