-
Notifications
You must be signed in to change notification settings - Fork 1
/
README
246 lines (175 loc) · 8.88 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
cya -- a multi-level backup driver for Duplicity
================================================
Mario Juric <[email protected]>
cya (pronnounced see-ya) is a driver for the Duplicity backup tool
(http://duplicity.nongnu.org) that enables:
* multi-level incremental backups on varying time-scales (decade, year,
month, week), with the ability to keep only the latest few backups on each
time scale.
* handling of hard links, access control lists, and extended attributes. This
allows whole-system backups to be performed with cya.
* unattended backups while maintaining security and space efficiency.
Quick Start
===========
How it works
------------
Every day each client adds an incremental backup to a client-specific
'incoming' directory on the archive server (typically,
~archive/$CLIENT/incoming). The backups are usually encrypted for added
security.
The archive server then moves the new increment to its correct place in the
multi-level hierarchy (see 'Multi-level Backups' below), and prepares the
'incoming' directory for the next (incremental) backup.
In time, at each level in the directory tree (typically,
~archive/$CLIENT/backups) a backup chain forms, with increments taken at
different scales.
Setting up unattended backups
-----------------------------
Preparing the archive server:
* install cya in /opt/cya
* install duplicity 0.6.20 or higher
* create a user named 'archive'
* copy templates/collect.sample to /etc/cya/collect. Customize if
necessary.
* add /etc/cya/collect to crontab to be ran daily
Preparing a client to be backed up:
* install cya to /opt/cya
* install duplicity 0.6.20 or higher
* create /etc/cya directory and chown it to 700
* copy templates/backup.sample to /etc/cya/backup and edit it.
* set the DESTHOST, DESTDIR, and SOURCEDIR variables. DESTHOST is the
address of the archive server to be used by SSH to log into it.
DESTDIR is the subdirectory within ~archive, where the backup sets
will be stored. By convention, this is usually the fully qualified
domain name of the client. SOURCEDIR is the directory on the
client you want to back up (usually '/').
* set BACKUP_ENCRYPTION_KEY to a password unique to this client.
This password will be used to encrypt the backup set, and should
not be shared with anyone (including the archive server!). Keep it
in a safe place -- you will need it to restore the system from
backups.
Tip: A good way to generate a hard-to-guess password is by running
`openssl rand -base64 32`
* create ssh keys in /etc/cya/keys (using ssh-keygen). Add the
public key to ~archive/.ssh/authorized_keys on the archive server.
* add the archive server's host key to ~root/.ssh/known_hosts, if it
isn't there already.
Tip: just running 'ssh archive.server.addr' and answering 'yes'
when prompted should do it.
* As root, run 'cya-collect --init ~archive/$DESTDIR' on the archive
server to initialize the backup set directory structure. DESTDIR
must be the same as the one you set in /etc/cya/backup file on the
client.
* add /etc/cya/backup to crontab on the client. For optimal
performance, have it run ~10-15 minutes after cya-collect runs on
the server.
WARNING: Some distros (notably, RHEL) set HOME=/ in /etc/crontab.
If your distro does this, make sure you run /etc/cya/backup with
HOME=/root envvar set. Otherwise, cache directories will be
created in /.
Multi-level backups
===================
cya enables multi-level incremental Duplicity backups on varying time-scales
(decade, year, month, week), by organizing Duplicity backups in a directory
tree, where each level in the tree corresponds to the timescale of a backup
stored at that level. Increments that are shared between different levels
are hardlinked to the correct places in the hierarchy (thus the space
efficiency).
For example, a backup made on 2012-12-04 would be placed in a subdirectory
of:
2000/2010/2012/2012-12/2012-12-02
where the levels correspond to century, decade, year, month and week,
each holding backups for decades, years, months, weeks and days,
respectively.
If the 2012-12-04 backup was the first one ever made, the files in the leaf
directory 2000/2010/2012/2012-12/2012-12-02 would also get hardlinked to
2000/2010/2012/2012-12 (because it's the first weekly backup), and to
2000/2010/2012 (because it's the first montly backup), and so on.
When a backup is made on the next day, 2012-12-05, that day is still in the
week of 2012-12-02, so an incremental backup would be made in
2000/2010/2012/2012-12/2012-12-02. Equally so for all days through
2012-12-08.
On 2012-12-09, the destination directory will change to:
2000/2010/2012/2012-12/2012-12-09
As it is empty, cya will look into one directory up the hierarchy to use as
a basis for this (incremental) backup. It will first hardlink files from
2000/2010/2012/2012-12 to 2000/2010/2012/2012-12/2012-12-09, and make this
the 'incoming' directory for the next incremental backup. The procedure is
repeated when the month/year/decade boundaries are crossed. In time, at each
level in the directory tree a backup chain forms, with increments taken at
different scales.
It's likely desirable to keep this tree pruned, deleting all but two newest
leaf directories at every level of the hierarchy. Right now, cya won't do it
for you (it has to be done manually).
Security model
==============
Definitions:
* archive: the host which holds the backups
* client: a host being backed up
* $ROOT: base directory of backup sets on archive
* Committed backup files: files residing in $ROOT/backups
* Uncommited backup files: files residing in $ROOT/incoming/finished
Design summary:
Client host attempts to back itself up daily to archive:$ROOT/incoming/next
directory, if that directory exists. When it succeeds, it renames that
directory to 'finished'.
The archive host periodically checks for existence of
archive:$ROOT/incoming/finished. If it exists, the increment is moved to the
correct level in the backup hierarchy, and a new $ROOT/incoming/next
directory is prepared.
Design consequences:
* If archive@archive is breached, no committed backup files can be accessed.
Uncommitted backup files can be accessed, read, deleted, but not decrypted.
* If root@archive is breached all backup files can be accessed, read,
deleted, but not decrypted.
* If root@client is breached, no committed backups on the archive can be
accessed. Uncommitted backup files can be accessed, read, deleted, and
decrypted.
* If both root@archive and root@client are breached, the backups can be
read, deleted and decrypted (full compromise).
Implementation details
======================
There are three parts to cya:
* A wrapper for duplicity, `duplicity-ex', that records in the backup the
ACLs and extended attributes of the files being backed up, and efficiently
backs up multiply hard-linked files.
* A scheme for organizing Duplicity incremental backups that results in
multi-level backup chains while being space efficient.
* A scheme and a utility, `cya-collect`, for performing unattended,
multi-level backups with Duplicity.
Clients periodically (typically, daily) create and upload backups to an
"incoming" directory at the archive server. The archive server collects
these, placing them in apropriate directories in the multi-level backup
hierarchy.
In more detail, on the archive server:
1) For each client that is backed up, cya-collect is called daily
via cron.
2) It checks if $ROOT/incoming/finished directory exists. If yes,
that means a new backup has been uploaded. If no, it exits.
3) If 'finished' exists, cya-collect uses the information from
$ROOT/next_info to hardlink the backup files to apropriate backup
sets in $ROOT/backups. Once done, it removes the 'finished'
directory, and creates $ROOT/incomin/next. It hardlinks the
apropriate backup set files (based on current time) into this
directory. It stores which backup set has been hardlinked to
$ROOT/next_info.
On the client:
1) /etc/cya/backup script is run daily via cron. It checks if
$ROOT/incoming/next exists on the archive server. If not, it
exits, as this means that cya-collect hasn't ran yet and moved a
previously created backup to its right place.
2) Otherwise, it runs duplicity, uploading the result to
$ROOT/incoming/next. How duplicity is run can be customized by
creating a shell function named 'duplicity' in /etc/cya/backup
3) Once the backup has finished, $ROOT/incoming/next is moved to
$ROOT/incoming/finished. This signals cya-collect that a new
backup is ready.
Features to be documented
=========================
These are all in the code (duplicity-ex and duplicity-ex-snap utilities in
lib/), but so far undocumented beyond the actual code:
These do:
* extending duplicity to efficiently store hardlinks
* extending duplicity to store ACLs and xattrs
* self-consistent backups using LVM snapshots
* how to restore using the restore scripts