Running atlas on a single machine #419

botellaflotante · 2020-12-10T14:44:34Z

botellaflotante
Dec 10, 2020
Collaborator

calculate_insert_size_400GB.log
calculate_insert_size_1200GB.log

Hi there!
I found these errors running my samples in 500 GB / 1500 GB machines. I think it is related to the Tadpole2 memory, but I put 400GB and 1200GB respectively in the config file, so I am not sure why it fails... could it be that there are too many/too big samples and snakemake somehow does not make it to use resources available correctly? If run independently each sample they run OK. Also, if I run Tadpole2 step activating its own env, it will also run OK for one sample....
Best

SilasK · 2020-12-11T08:01:04Z

SilasK
Dec 11, 2020
Maintainer

Do you use the cluster mode of atlas? e.g. have you installed a cluster profile for it? I’m not sure if I already explained this before. But, if you run multiple samples on your 1.5TB machine and allow each job to take 1.2 TB you will get an error if more than one sample is executed simutaneosly. You should tell atlas the memory limit of your machine or use a cluster profile both is explained in the documentation. Unless you have abnormal big samples 100GB memory should be sufficient.

0 replies

botellaflotante · 2020-12-11T10:56:57Z

botellaflotante
Dec 11, 2020
Collaborator Author

update: however, when I run fewer samples it works... Is there any possibility to configure one wdir with the whole project but then run partially this initial qc step for smaller groups of samples? (like modifying the samples.tsv or something?)

so, in the config file I am using (for the "small machine"):
threads: 32
mem: 460

threads and memory for jobs needing high amount of memory. e.g GTDB-tk,checkm or assembly
large_mem: 460
large_threads: 32
assembly_threads: 32
assembly_memory: 460

I though that this way snakemake would distribute the resources equally to all samples... but then, it is not doing this, right? so, what would you suggest here as config, if I have 70 samples of 10 M reads each for example..?

PS: this machine has 36 CPU/500GB

0 replies

SilasK · 2020-12-11T11:10:43Z

SilasK
Dec 11, 2020
Maintainer

The memory in the config file is for a single step for a single sample. If you run atlas on a single machine (in contrast to a cluster) you have to tell atlas the max memory of your system so that it
doesn't start two jobs that would use more than the total memory.

Please follow the steps here: https://metagenome-atlas.readthedocs.io/en/latest/usage/getting_started.html#single-machine-execution

0 replies

botellaflotante · 2020-12-13T21:00:44Z

botellaflotante
Dec 13, 2020
Collaborator Author

ok, thanks a lot. I think that is the problem, indeed. Now, as there are many samples, I also put in --resources mem=1200. However, I have some big samples (100 M reads each), so for these I am trying with 500 GB in the config file (because 100 GB did not work). Let's see if it works.

0 replies

botellaflotante · 2020-12-26T19:23:59Z

botellaflotante
Dec 26, 2020
Collaborator Author

This worked fine, although it seems there is a lot to optimize.. for example, during assembly, rule error_correction, Tadpole2 step will use lot of memory (some huge samples I have of 120 M read counts each need 700 GB mem in order to work), but run_spades (which is the most time consuming step) uses really few memory.. Most of the time will be spent in run_spades, without being able to start any other sample at the same time (I can only run 2 samples at a time in the 1500 GB machine)
Is there any way to use memory more efficiently so that Tadpole2 is not a bottleneck for the other samples on the list? Maybe the cluster way would do it? or maybe is it possible to run first rule error_correction in all samples, and then changing the mem in the config file to run_spades???

0 replies

SilasK · 2021-01-04T11:45:05Z

SilasK
Jan 4, 2021
Maintainer

Yes, cluster submission is the recommended way.

You can run the pipeline up to run_spades if you really want e.g. by with the arguments: --omit-from run_spades

0 replies

botellaflotante · 2021-01-05T12:12:21Z

botellaflotante
Jan 5, 2021
Collaborator Author

OK, that worked to run the first rules (merge pairs, error_correction...for each of these samples some of these steps took 900 GB RAM). Now I lowered the mem in the config file and re-run with "atlas run assembly --resources mem=1500 --jobs 36". The idea was that multiple run_spades are lauched but only one started ...
should I do something else?

rule run_spades:
input: SRR5195142/assembly/reads/QC.errorcorr.merged_R1.fastq.gz, SRR5195142/assembly/reads/QC.errorcorr.merged_R2.fastq.gz, SRR5195142/assembly/reads/QC.errorcorr.merged_se.fastq.gz, SRR5195142/assembly/reads/QC.errorcorr.merged_me.fastq.gz
output: SRR5195142/assembly/contigs.fasta, SRR5195142/assembly/scaffolds.fasta
log: SRR5195142/logs/assembly/spades.log
jobid: 1087
benchmark: logs/benchmarks/assembly/spades/SRR5195142.txt
wildcards: sample=SRR5195142
threads: 36
resources: mem=35, time=48

I put this in the config file (just as a first try), although is probably too little memory for the assembly, not sure:

threads and memory (GB) for most jobs especially from BBtools, which are memory demanding
threads: 36
mem: 35

threads and memory for jobs needing high amount of memory. e.g GTDB-tk,checkm or assembly
large_mem: 35
large_threads: 36
assembly_threads: 36
assembly_memory: 35

0 replies

SilasK · 2021-01-05T13:42:38Z

SilasK
Jan 5, 2021
Maintainer

I think @Sofie8 has a similar problem running atlas on a single machine.

I read somewhere that spades don't parallelize well beyond 8 threads. Also, the steps for GTDB and checkM explode with too many threads.

On the other hand, you said you have big samples, then you would need much more memory for spades.

I suggest you to use something closer to the default settings:


threads: 8
mem: 80 # more or less

large_mem: 100
large_threads: 8
assembly_threads: 8 
assembly_memory: 250 #or 500

Now you would run atlas if you have 1.5 TB

atlas run assembly --resources mem=1500

I'm not sure if you should set the --jobs argument.
If you set it overwrites the config, what happens if you don't set it?

0 replies

SilasK · 2021-01-05T13:50:25Z

SilasK
Jan 5, 2021
Maintainer

From the snakemake docs:

--jobs or --cores: Use at most N CPU cores/jobs in parallel. If N is omitted or ‘all’, the limit is set to the number of available CPU cores.
So, in theory, it shouldn't overwrite the resources arguments in the config.

If you specify 8 threads in the config and run atlas with or without the jobs argument the speeds should be run with 8 threads.

0 replies

botellaflotante · 2021-01-05T13:55:39Z

botellaflotante
Jan 5, 2021
Collaborator Author

yep, that works... my mistake was putting threads: 36 in config file .... and then setting --jobs 36.... with assembly_threads: 8
assembly_memory: 250 it lauches 6 jobs. let's see if memory is enough this way. thanks!

0 replies

SilasK · 2021-01-08T13:49:03Z

SilasK
Jan 8, 2021
Maintainer

This may be linked #319

0 replies

botellaflotante · 2021-01-22T02:05:08Z

botellaflotante
Jan 22, 2021
Collaborator Author

however, I think that when having huge samples like I do (from 50 to 200 M reads each) running it in cluster mode would be the same in terms of spades needing a lot of memory (between 500- 1200 GB probably) for each sample, right? I mean what difference would it make to do it in cluster mode? can the assembly step be split in this case?

0 replies

SilasK · 2021-01-22T06:14:18Z

SilasK
Jan 22, 2021
Maintainer

Ok, seems you have very big samples. If you don't manage to assemble your reads with spades you should either subsample, split your samples or use megahit.

Why using cluster mode:

In my institution, we have a cluster system with multiple high-memory nodes. If I use Atlas in cluster-mode each sample will be sent to a different node so that I can assemble ~5-20 samples in parallel, depending on the availability of the cluster nodes. All the other steps before and after are also executed besides the assembly. In single-machine mode you submit atlas to one big cluster and in your case, you will probably only assemble one sample at a time.

I also have time constraints on my clusters. E.g. there are many nodes where I can run a job for <12 h and only one high-memory node where I can run atlas for >1d.

Spades has different checkpoints during the assembly and is configured to restart-from the last checkpoint.

By the way, where do you work?

0 replies

slambrechts · 2021-03-31T16:20:51Z

slambrechts
Mar 31, 2021

I think I'm in a similar situation here, as the average #reads across all my files is 47 M, with a maximum of 113 M for one sample (and the second one in line containing 83 M reads). Do I understand correctly that setting:

threads: 8
mem: 60

large_mem: 100
large_threads: 8
assembly_threads: 8 
assembly_memory: 250

worked in this case for both preprocessing and assembly using spades? Or do you have to increase the memory for preprocessing and then lower it again for spades? And which one should be increased/lowered again then: "mem", "large_mem" or "assembly_memory"?

I have a single machine with 36 threads and 738 GB, so I was planning to run atlas like so:

atlas run all --resources mem=732 -w $HOME $HOME/CLEAN_READS -c $HOME/config.yaml

or should I also add --jobs 36?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running atlas on a single machine #419

{{title}}

Replies: 14 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Running atlas on a single machine #419

botellaflotante Dec 10, 2020 Collaborator

Replies: 14 comments

SilasK Dec 11, 2020 Maintainer

botellaflotante Dec 11, 2020 Collaborator Author

SilasK Dec 11, 2020 Maintainer

botellaflotante Dec 13, 2020 Collaborator Author

botellaflotante Dec 26, 2020 Collaborator Author

SilasK Jan 4, 2021 Maintainer

botellaflotante Jan 5, 2021 Collaborator Author

SilasK Jan 5, 2021 Maintainer

SilasK Jan 5, 2021 Maintainer

botellaflotante Jan 5, 2021 Collaborator Author

SilasK Jan 8, 2021 Maintainer

botellaflotante Jan 22, 2021 Collaborator Author

SilasK Jan 22, 2021 Maintainer

slambrechts Mar 31, 2021

botellaflotante
Dec 10, 2020
Collaborator

SilasK
Dec 11, 2020
Maintainer

botellaflotante
Dec 11, 2020
Collaborator Author

SilasK
Dec 11, 2020
Maintainer

botellaflotante
Dec 13, 2020
Collaborator Author

botellaflotante
Dec 26, 2020
Collaborator Author

SilasK
Jan 4, 2021
Maintainer

botellaflotante
Jan 5, 2021
Collaborator Author

SilasK
Jan 5, 2021
Maintainer

SilasK
Jan 5, 2021
Maintainer

botellaflotante
Jan 5, 2021
Collaborator Author

SilasK
Jan 8, 2021
Maintainer

botellaflotante
Jan 22, 2021
Collaborator Author

SilasK
Jan 22, 2021
Maintainer

slambrechts
Mar 31, 2021