-
Notifications
You must be signed in to change notification settings - Fork 28
How to read the logs
This tutorial explains what you will find in YAMP log file.
YAMP returns very detailed logs to ensure the so-called retrospective provenance, which captures the actual steps executed during the analysis along with all the information about the execution environment -- while the so-called prospective provenance, which describes the steps that should be performed during the analysis, is captured by the Nextflow workflow. Prospective and retrospective provenance may not overlap, for instance because of conditional execution. For a more detailed discussion on provenance, please refer, for instance, to Davidson, & Juliana Freire. "Provenance and scientific workflows: challenges and opportunities." Proceedings of the 2008 ACM SIGMOD.
Logs depend on the analysis flow. For instance, YAMP ran on QC
mode will log only information about the quality control steps, the quality of paired-end reads will be assessed on both strains, and de-duplication logs will be available only if the de-duplication step is carried out, etc.
Logs are generated by MultiQC and provided as an HTML page.
Screenshots on this page have been taken from the log generated by the following command:
nextflow run YAMP.nf -profile test,docker
FastQC is used by YAMP to perform quality assessment and visualisation on both strains of the raw paired-end reads (obviously, if using a single-end library layout, only one file will be analysed). The generated report is then included in the YAMP log file:
Since this test asked for de-duplication (parameter dedup
is set to true in ./conf/test.config
), the log will continue with information on the de-duplication step:
If dedup
was set to false, this was also going to be recorded in the logs:
After the optional de-duplication step, synthetic contaminants, specified in the artefacts
and phix174ill
parameters are removed, and stats are provided in the logs:
Read are then trimmed, adapters removed, and reads that become too short discarded. Stats are then provided in the logs:
When you provide a FASTA file describing the contaminant (pan)genome (parameter: foreign_genome
), it will be indexed beforehand by an additional process, and this action recorded in the log:
However, if you provided the already indexed contaminant (pan)genome (parameter: foreign_genome_ref
) this step will be skipped and the log will include only stats about the decontamination step:
Decontaminated reads, which are stored in a single file including both strains, are then quality assessed, and information similar to those logged for the raw reads are reported:
The first step of the community characterisation analysis block is the taxonomic binning and profiling, performed using MetaPhlAn, where stats on the identified microbial community are printed:
The next analysis step is the functional annotation. In the YAMP log file, only an excerpt of the HUMAnN log is reported, while, more detailed statistics can be found in the test_HUMAnN.log
(see the How to run YAMP tutorial).
The information on microbial community composition is then used to assess several alpha-diversity measures using QIIME2, and statistics on this step are also logged:
If less than three species are detected, the alpha-diversity step is not run, and this is recorded accordingly.
The first section of the analysis introspection reports the version of all the pieces of software used by YAMP:
while the second section prints extensive runtime information, including:
- pipeline's name and version
- details of the nextflow, JAVA, and operating system environments
- information of the configuration profiles used
- container engine and containers used
- running parameters, including links to external files/databases queried during the execution
When running YAMP in characterisation
mode, YAMP expects the QC's reads to be stored in a single file including both strains. If you are providing two files (one for the forward and one for the reverse reads; see How to run YAMP with QC'ed reads for details), YAMP will concatenate them, and this operation will be recorded in the logs:
Getting started
Tips and Tricks
Tutorials