layout | title | subtitle | minutes |
---|---|---|---|
page |
First Steps in LHCb |
More Ganga |
10 |
- Set the input data with BKQuery
- Use LHCbDatasets
- Set the location of the output of our jobs
- Set the location of your .gangadir
- Access output stored on the grid
The input data can be specified for your job with the BKQuery tool. The path for the data can be found using the online Dirac portal and passed to the BKQuery
to get the dataset. For example, to run over the Stripping 21 MagUp, Semileptonic stream
Ganga In [3]: j.inputdata = BKQuery('/LHCb/Collision12/Beam4000GeV-VeloClosed-MagUp/Real Data/Reco14/Stripping21r0p1a/90000000/SEMILEPTONIC.DST').getDataset()
Ganga In [4]: j.inputdata
Ganga Out [4]:
LHCbDataset (
depth = 0,
treat_as_inputfiles = False,
persistency = None,
files = [3717 Entries of type 'DiracFile'] ,
XMLCatalogueSlice = LocalFile (
namePattern = ,
compressed = False,
localDir =
)
)
This is a list of DiracFile
, the Ganga object for files stored on the grid. We can access one locally via the accessURL
:
Ganga In [5]: j.inputdata[0].accessURL()
Ganga Out [5]: ['root://bw32-4.grid.sara.nl:1094/pnfs/grid.sara.nl/data/lhcb/LHCb/Collision12/SEMILEPTONIC.DST/00051179/0000/00051179_00006978_1.semileptonic.dst']
The returned path can be used by Bender to explore the contents of the DST, as in the Interactively exploring a DST lesson.
In the previous lesson we looked at the location of the ouput with job(782).outputdir
. This location points us to the gangadir
where ganga stores information about the jobs and the output. If we have lots of jobs with large files the file system where the gangadir is located will quickly fill up.
The location of the
gangadir
can be changed in the configuration file '~/.gangarc'. Just search for thegangadir
attribute and change it to where you like (on the CERN AFS thework
area is a popular choice).
To avoid filling up the filespace it is wise to put the large files produced by your job somewhere with lots of storge - the grid. You can do so by setting the outputfiles
attribute:
j.outputfiles = [DiracFile('*.root'), LocalFile('stdout')]
The DiracFile
will be stored in your user area on the grid (with up to 2TB personal capacity) from where you can access it with the accessURL()
function as before. The wildcard means that any root file produced by your job will stay on the grid. LocalFile
downloads the file to your gangadir
, in this case the one called stdout
.
Small files are downloaded as standard: .root
, logfiles etc. Files that are expected to be large which have extensions .dst etc are by default kept on the grid as Dirac files. In general you are encouraged to keep your large files on the grid to avoid moving large amounts of data around through your work area.
To find out more take a look at the Ganga FAQ