diff --git a/site/content/en/docs/Concepts/engines.md b/site/content/en/docs/Concepts/engines.md index 1726722c3..388c154fd 100644 --- a/site/content/en/docs/Concepts/engines.md +++ b/site/content/en/docs/Concepts/engines.md @@ -21,6 +21,7 @@ Currently, Amazon Genomics CLI's officially supported engines can be used to run | [Nextflow](https://www.nextflow.io) | [Nextflow DSL](https://www.nextflow.io/docs/latest/script.html) | Standard and DSL 2 | Head Process | | [miniwdl](https://miniwdl.readthedocs.io/en/latest/) | [WDL](https://openwdl.org) | [documented here](https://miniwdl.readthedocs.io/en/latest/runner_reference.html?highlight=errata#wdl-interoperability) | Head Process | | [Snakemake](https://snakemake.readthedocs.io/en/stable/) | [Snakemake](https://snakemake.readthedocs.io/en/stable/snakefiles/writing_snakefiles.html) | All versions | Head Process | +| [Toil](http://toil.ucsc-cgl.org/) | [CWL](https://www.commonwl.org/) | All versions up to 1.2 | Server | Overtime we plan to add additional engine and language support and provide the ability for third party developers to develop engine plugins. diff --git a/site/content/en/docs/Workflow engines/toil.md b/site/content/en/docs/Workflow engines/toil.md new file mode 100644 index 000000000..5fcbcd32b --- /dev/null +++ b/site/content/en/docs/Workflow engines/toil.md @@ -0,0 +1,63 @@ +--- +title: "Toil" +date: 2022-04-26T15:34:00-04:00 +draft: false +weight: 20 +description: > + Details on the Toil engine deployed by Amazon Genomics CLI +--- + +## Description + +[Toil](http://toil.ucsc-cgl.org/) is a workflow engine developed by the +[Computational Genomics Lab](https://cglgenomics.ucsc.edu/) at the +[UC Santa Cruz Genomics Institute](https://genomics.ucsc.edu/). In Amazon Genomics +CLI, Toil is an engine that can be deployed in a +[context]( {{< relref "../Concepts/contexts" >}} ) as an +[engine]( {{< relref "../Concepts/engines">}} ) to run workflows based on the +[CWL](https://www.commonwl.org/) specification. + +Toil is an open source project distributed by UC Santa Cruz under the [Apache 2 +license](https://github.com/DataBiosphere/toil/blob/master/LICENSE) and +available on +[GitHub](https://github.com/DataBiosphere/toil). + +## Architecture + +There are two components of a Toil engine as deployed in an Amazon Genomics +CLI context: + +### Engine Service + +The Toil engine is run in "server mode" as a container service in ECS. The +engine can run multiple workflows asynchronously. Workflow tasks are run in an +elastic [compute environment]( #compute-environment ) and monitored by Toil. +Amazon Genomics CLI communicates with the Toil engine via a GA4GH +[WES](https://github.com/ga4gh/workflow-execution-service-schemas) REST service +which the server offers, available via API Gateway. + +### Compute Environment + +Workflow tasks are submitted by Toil to an AWS Batch queue and run in +Toil-provided containers using an AWS Compute Environment. Tasks which use the +[CWL `DockerRequirement`](https://www.commonwl.org/user_guide/07-containers/index.html) +will additionally be run under +[Singularity](https://github.com/sylabs/singularity#readme). AWS Batch +coordinates the elastic provisioning of EC2 instances (container hosts) based +on the available work in the queue. Batch will place containers on container +hosts as space allows. + +#### Disk Expansion + +Container hosts in the Batch compute environment use EBS volumes as local +scratch space. As an EBS volume approaches a capacity threshold, new EBS +volumes will be attached and merged into the file system. These volumes are +destroyed when AWS Batch terminates the container host. CWL disk space +requirements are ignored by Toil when running against AWS Batch. + +This setup means that workflows that succeed on AGC may fail on other CWL +runners (because they do not request enough disk space) and workflows that +succeed on other CWL runners may fail on AGC (because they allocate disk space +faster than the expansion process can react). + +