Skip to content

How to read the logs

Alessia Visconti edited this page Mar 23, 2021 · 2 revisions

This tutorial explains what you will find in YAMP log file.

Provenance and reproducibility

YAMP returns very detailed logs to ensure the so-called retrospective provenance, which captures the actual steps executed during the analysis along with all the information about the execution environment -- while the so-called prospective provenance, which describes the steps that should be performed during the analysis, is captured by the Nextflow workflow. Prospective and retrospective provenance may not overlap, for instance because of conditional execution. For a more detailed discussion on provenance, please refer, for instance, to Davidson, & Juliana Freire. "Provenance and scientific workflows: challenges and opportunities." Proceedings of the 2008 ACM SIGMOD.

The YAMP log file

Logs depend on the analysis flow. For instance, YAMP ran on QC mode will log only information about the quality control steps, the quality of paired-end reads will be assessed on both strains, and de-duplication logs will be available only if the de-duplication step is carried out, etc.

Logs are generated by MultiQC and provided as an HTML page.

Screenshots on this page have been taken from the log generated by the following command:

nextflow run YAMP.nf -profile test,docker

Quality Assessment, raw data

FastQC is used by YAMP to perform quality assessment and visualisation on both strains of the raw paired-end reads (obviously, if using a single-end library layout, only one file will be analysed). The generated report is then included in the YAMP log file:

Quality Control, Step 1: de-duplication

Since this test asked for de-duplication (parameter dedup is set to true in ./conf/test.config), the log will continue with information on the de-duplication step:

If dedup was set to false, this was also going to be recorded in the logs:

Quality Control, Step 2: removing synthetic contaminants

After the optional de-duplication step, synthetic contaminants, specified in the artefacts and phix174ill parameters are removed, and stats are provided in the logs:

Quality Control, Step 3: trimming

Read are then trimmed, adapters removed, and reads that become too short discarded. Stats are then provided in the logs:

Quality Control, Step 4: decontamination

When you provide a FASTA file describing the contaminant (pan)genome (parameter: foreign_genome), it will be indexed beforehand by an additional process, and this action recorded in the log:

However, if you provided the already indexed contaminant (pan)genome (parameter: foreign_genome_ref) this step will be skipped and the log will include only stats about the decontamination step:

Quality Assessment, QC'd reads

Decontaminated reads, which are stored in a single file including both strains, are then quality assessed, and information similar to those logged for the raw reads are reported:

Community characterisation, Step 1: Taxonomic binning and profiling

The first step of the community characterisation analysis block is the taxonomic binning and profiling, performed using MetaPhlAn, where stats on the identified microbial community are printed:

Community characterisation, Step 2: Functional annotation

The next analysis step is the functional annotation. In the YAMP log file, only an excerpt of the HUMAnN log is reported, while, more detailed statistics can be found in the test_HUMAnN.log (see the How to run YAMP tutorial).

Community characterisation, Step 3: Evaluating alpha-diversity

The information on microbial community composition is then used to assess several alpha-diversity measures using QIIME2, and statistics on this step are also logged:

If less than three species are detected, the alpha-diversity step is not run, and this is recorded accordingly.

The analysis introspection

The first section of the analysis introspection reports the version of all the pieces of software used by YAMP:

while the second section prints extensive runtime information, including:

  • pipeline's name and version
  • details of the nextflow, JAVA, and operating system environments
  • information of the configuration profiles used
  • container engine and containers used
  • running parameters, including links to external files/databases queried during the execution

If you are using independently QC'd reads

When running YAMP in characterisation mode, YAMP expects the QC's reads to be stored in a single file including both strains. If you are providing two files (one for the forward and one for the reverse reads; see How to run YAMP with QC'ed reads for details), YAMP will concatenate them, and this operation will be recorded in the logs: