Skip to content

Latest commit

 

History

History
63 lines (46 loc) · 2.35 KB

data.rst

File metadata and controls

63 lines (46 loc) · 2.35 KB

Generating Training/Validation Samples

Warning

This tutorial is no longer maintained. It is possible that some parts are no longer working.

In order to begin training of the RNN energy estimator you need to get a training sample. Training sample can be extracted from any art dataset with the help of VLNEnergyDataGen dunetpc module.

In this tutorial we will extract training sample from the MCC11 DUNE FD dataset: prodgenie_nu_dune10kt_1x2x6_mcc11_lbl_reco.

Grid Job Submission

The VLNets package of dunetpc comes with a sample job vlnenergydatagenjob.fcl that can be readily used to create training samples for NuMu CC energy estimation.

The preferred job submission method on DUNE is with the help of project.py. Below, you can find a project.py stage configuration that can be used to extract training dataset from the DUNE files:

<stage name="training_sample_generation">
  <fcl>vlnenergydatagenjob.fcl</fcl>
  <inputdef>prodgenie_nu_dune10kt_1x2x6_mcc11_lbl_reco</inputdef>
  <datafiletypes>csv</datafiletypes>
  <numjobs>250</numjobs>
  <schema>root</schema>
  <outdir>&OUTDIR;</outdir>
  <workdir>&WORKDIR;</workdir>
</stage>

Where it is expected that OUTDIR and WORKDIR variables are set by the user. After running the vlnenergydatagenjob.fcl on the grid, the training dataset will be extracted and stored in the csv files in ${OUTDIR}/*/*.csv. To complete training sample generation you need to merge multiple extracted csv files into one.

Merging Job Output Files

The vlne package provides a bash script called merge_csv.sh that can be used to merge multiple csv files into one. You can find this script in the scripts/data directory of the vlne package. In addition to merging the output files together it will compress the result with the xz compressor.

In order to use merge_csv.sh to merge job output files you may run the following command:

bash merge_csv.sh MERGED_FILE_NAME.csv.xz "${OUTDIR}"/*/*.csv

After merge_csv.sh has finished running you can use the resulting file MERGED_FILE_NAME.csv.xz for training vlne networks.