Skip to content

How to run YAMP on a HPC

Alessia Visconti edited this page May 16, 2017 · 4 revisions

How to run YAMP on a HPC

This tutorial aims at explaining how to run YAMP on High-performance computing (HPC) facilities, and includes some information that can be useful also locally!

Nextflow executor

To run on different (HPC or not) systems, YAMP takes advantage of the Nextflow framework architecture, and, specifically of its executor. Briefly, a Nextflow executor is the component that specifies on which system YAMP is run, and orchestrates the YAMP execution.

In the nextflow.config file, in fact, one of the parameters to be set is the executor. In the How to run YAMP tutorial we set it to PBS/TORQUE, using the following instruction:

	executor = 'pbs'

but Nextflow supports multiple executors, among others:

  • SGE: executor = 'sge'
  • LSF: executor = 'lsf'
  • SLURM: executor = 'slurm'

You can find more information on the supported scheduler on the Nextflow documetation

To run YAMP locally, you should comment/remove the executor parameter from the nextflow.config file.

Nextflow queue

Usually, schedulers offer users multiple queues which the jobs can be submitted to. For instance, they can have queue dedicated to short or long jobs, or to jobs that require low or high memory. One of the parameters to set in the nextflow.config file is the queue.

In the How to run YAMP tutorial we set it to metagenome, supposing to have a dedicated queue for metagenomic analysis!

	queue = 'metagenome'

To run on your system, you should simply specify the same of your queue(s), for instance:

	queue 'highmem,long-highmem'

You can find more information on the queue directive on the Nextflow documetation.

Other Nextflow directives

Nextflow makes available a number of other directives that allow allocating the correct resources on the local or remote system, and that are explained here:

Please note that the processes' specifications you will find in the nextflow.config file (that is, time, CPU and memory), have been optimised using our in-house metagenomic dataset, that is composed by about 2000 faecal samples with very different data quality and thus very different requirements. These values may require some tuning, but we are confident that they will cover most of the users' scenarios.