-
Notifications
You must be signed in to change notification settings - Fork 10
/
README
233 lines (153 loc) · 7.57 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
About:
------
PCP: a parallel copy program for lustre.
Latest Version:
---------------
The latest version of pcp can be downloaded from github:
https://github.com/wtsi-ssg/pcp
Requirements:
-------------
pcp requires the following:
mpi4py
http://mpi4py.scipy.org/
and optionally the Lustre C api library,
liblustreapi.so or liblustreapi.a
This is usually part of the lustre client package.
Installation
------------
Install pcp using the usual distutils method:
python setup.py build
python setup.py install # usually requires root access
If running as a non privileged user, you can install in your home directory
by running:
python setup.py install --user
Lustre functionality
--------------------
pcp requires the lustreapi library for its lustre features. If the library
is not detected during installation or execution, pcp will disable the
lustre functions and issue a warning.
If you have the lustreapi libraries installed in a non-standard location, you
can use the following option to setup.py
--with-liblustre=/path/to/library
lustre clients prior to v 2.3 shipped with a static version of liblustreapi
rather than a dynamic one. pcp requires a dynamic library. The build process
will look for liblustreapi.so first. If it does not find one, it will look for
liblustreapi.a and then convert it into a shared library. The shared library
will be installed alongside the python module.
Manual liblustreapi installation
--------------------------------
If setup.py is unable to convert the library automatically, you can try doing
it by hand. The procedure to convert the library is something along these
lines:
ar -x liblustreapi.a
gcc -shared -o lublustreapi.so *.o
Place the resulting library in the /lustre directory in the source tree, and
then re-run the setup.py build / install step.
Usage
-----
pcp is similar to cp -r ; simply give it a source directory and destination
and pcp will recursively copy the source directory to the destination in
parallel.
pcp has a number of useful options; use pcp -h to see a description.
The program runs in three phases:
In phase I, the source directory tree is crawled, the destination source
tree is created and files are marked for copying depending on the runtime
parameters (see below).
In phase II, the files themselves are copied, and optionally checksummed.
If the "preserve" option has been selected, phase III runs and directory
timestamps are copied.
Chunking
--------
pcp will split files up into 500MB chunks, which can then be copied in
parallel. Files smaller than the chunk size will be copied in one go. You can
alter the chunk size with the -b flag. The size is in megabytes. For optimal
performance the chunk size should be a whole multiple of the lustre stripe size
(typically 1MB) and lustre block size (typically 2MB).
You can disable the chunk copy feature by setting the chunk size to 0.
Checksum
--------
If run with the -c flag, pcp will md5 checksum files after they are copied.
In order to minimize the risk of checksumming cached data, rather than data on
disk, pcp runs the md5 calculation on a different MPI rank to the one that did
the copy and uses posix_fadvise to tell the kernel not to cache files.
If a file was copied in chunks, the md5 checksum reported is for the individual
chunk, not the whole file.
lustre striping
---------------
If run with the -l flag, pcp will be lustre stripe aware. When it encounters
a striped file it will stripe the copy across all OSTs at the destination. Note
that it does not exactly preserve stripe information, but copes with case where
the number of OSTs is different at the source and destination.
If -lf is set, pcp will stripe all files and directories at the destination.
-l requires that the source and destination filesystems are both lustre.
-lf can be used when only the destination filesystem is lustre.
The following options can be used in conjunction with -l/-lf to further modify
striping behaviour:
If a size is specified with -ls, pcp will not stripe files smaller than this,
even if the original is striped.
If -ld is specified, destination directories will not be striped. (The contents
themselves may still be striped).
Update copy
-----------
If run with the -u flag, pcp will only copy a file if the source file is
newer than the destination file (mtime), or if the destination files does not
exist. Note there are some potential pitfalls to using update copies (see
below).
Checkpointing
-------------
pcp has checkpoint support. It allows a copy,interrupted for any reason, to be
safely restarted.
Note that checkpoints will only be taken once the program has entered phase II
(copy) stage.
If you specify a dumpfile with -K, a checkpoint will be written every 60
minutes. The checkpoint period can be varied with the -Km option. If the
program runs into an exception, it will also write out a checkpoint before it
exits.
Alternatively, pcp can be made to generate a checkpoint by sending it SIGUSR1,
even if -K or -Km have not been set. If no dumpfile has been specified with -K,
it will default to writing out to "pcp_checkpoint.db" in the program's current
working directory.
To resume from checkpoint, start pcp with the -R <dumpfile> option. All pcp
command line parameters will be taken from the dumpfile; any other command
line arguments you give to pcp will be ignored.
Note that you can safely resume a checkpoint using a different number of
MPI processes than the original.
update copy, failures and checkpoints
-------------------------------------
It is only safe to run update copies after an initial file copy has completed
successfully. You should NOT use an update copy to restart a file transfer if
if the original copy has failed. pcp only uses files mtimes, and not file
contents to decide whether a file needs updating. Any partially transferred
files will be left as is and not re-copied.
In this situation, you should restart the failed copy from a checkpoint
instead. The checkpoint/restore process ensures that partially transferred
files are re-copied.
It is also safe to restart a failed update copy from a checkpoint. However, any
modifications made to the source tree after the initial copy was started will
be ignored.
Incremental Backups
-------------------
pcp can be used to run incremental backups by using the -i flag to specify the
location of a previous copy (PREVBKUP). If a source file matches the corresponding
file in PREVBKUP (ie is unchanged since that copy was taken), the file in PREVBKUP
is hard linked to the destination. If there is no file in PREVBKUP then the file
is copied from the source to the destination. Doing incremental copies implies
the -p flag, as pcp uses all of the file attributes to determine whether a file
has changed or not.
Other Useful Options
--------------------
A dead worker timeout can be specified with -d; if workers do not respond
within timeout seconds of the job starting, the job will terminate.
If run with -p (preserve), pcp will attempt to preserve the ownership,
permissions and timestamps of the copied files.
Invocation
----------
pcp should be invoked by mpirun, and needs at least 2 tasks to run correctly.
For maximum efficiency, ensure tasks are spread across the cluster to maximise
network bandwidth to the filesystem. On most systems that means spreading tasks
across as many machines as possible, rather than packing them together on as
few hosts as possible.
Consult your queuing system and local MPI documentation for the
appropriate commands to achieve this.
Example LSF bsub distributing jobs across as many hosts as possible:
bsub -R "span[ptile=1]" -oo logfile.txt -n 4 mpirun pcp ...