Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

signal 11 error with Bowtie2 hg38 alignment #300

Open
oliviacwhite opened this issue Nov 19, 2024 · 15 comments
Open

signal 11 error with Bowtie2 hg38 alignment #300

oliviacwhite opened this issue Nov 19, 2024 · 15 comments

Comments

@oliviacwhite
Copy link

oliviacwhite commented Nov 19, 2024

I am trying to run the PEPATAC using conda. Currently, I am at the looper run examples/test_project/test_config_refgenie.yaml step and it runs fine until it reaches the bowtie2 hg38 genome alignment, where it outputs a signal 11 error (attached below). I run this command using a slurm job so I don't think it is a memory availability issue. My only guess is that there is something wrong with the bowtie2_index as I pulled it from refgenie using refgenie pull hg38/bowtie2_index . The reason I think this is that it consistently downloaded the index (reaching 100%), but then said that it was killed. I think my bowtie2_index for hg38 is downloaded correctly, as it has the .bt2 files in the directory, however, I have to guess that something is going wrong that I don't understand. I can't think of what to try because of the signal 11 index besides completely removing refgenie and pepatac from my terminal and restarting, but I am worried about running into the same issues. Any help is greatly appreciated!

bowtie2 -p 4 --very-sensitive -X 2000 --rg-id test1 -x /path/to/refgenie_genomes/alias/hg38/bowtie2_index/default/2230c535660fb4774114bfa966a62f823fdb6d21acf138d4 -1 /path/to/pepatac/pepatac_test/results_pipeline/test1/prealignments/test1_rCRSd_unmap_R1.fq.gz -2 /path/to/pepatac/pepatac_test/results_pipeline/test1/prealignments/test1_rCRSd_unmap_R2.fq.gz | samtools view -bS - -@ 1 | samtools sort - -@ 1 -T /path/to/pepatac/pepatac_test/results_pipeline/test1/aligned_hg38/tmph3t1gkb8 -o /path/to/pepatac/pepatac_test/results_pipeline/test1/aligned_hg38/test1_temp.bam (2971903,2971904,2971905)

(ERR): bowtie2-align died with signal 11 (SEGV) 
[main_samview] fail to read the header from "-".
samtools sort: failed to read header from "-"

Command completed. Elapsed time: 0:00:00. Running peak memory: 0.103GB.
PID: 2971905; Command: samtools; Return code: 1; Memory used: 0.0GB
PID: 2971903; Command: bowtie2; Return code: 1; Memory used: 0.009GB
PID: 2971904; Command: samtools; Return code: 1; Memory used: 0.002GB

Child process 2971879 (perl) was already terminated.
Starting cleanup: 3 files; 3 conditional files for cleanup

Cleaning up flagged intermediate files. . .

Conditional flag found: []

These conditional files were left in place:

  • /path/to/pepatac/pepatac_test/results_pipeline/test1/fastq/test1*.fastq
  • /path/to/pepatac/pepatac_test/results_pipeline/test1/fastq/*.fastq
  • /path/to/pepatac/pepatac_test/results_pipeline/test1/fastq/*.log
  • /path/to/pepatac/pepatac_test/results_pipeline/test1/prealignments/tmpu5c2lrbp
  • /path/to/pepatac/pepatac_test/results_pipeline/test1/prealignments/rCRSd_bt2
  • /path/to/pepatac/pepatac_test/results_pipeline/test1/aligned_hg38/tmph3t1gkb8

Pipeline failed at: (11-18 12:51:19) elapsed: 0.0 TIME

Total time: 0:00:06
Failure reason: Subprocess returned nonzero result. Check above output for details
Traceback (most recent call last):
File "/path/to/pepatac/pipelines/pepatac.py", line 2779, in
sys.exit(main())
File "/path/to/pepatac/pipelines/pepatac.py", line 1112, in main
pm.run([cmd, cmd2], rmdup_bam, follow=check_alignment_genome)
File "/path/to/miniconda3/envs/pepatac/lib/python3.9/site-packages/pypiper/manager.py", line 1036, in run
list_ret, maxmem = self.callprint(
File "/path/to/miniconda3/envs/pepatac/lib/python3.9/site-packages/pypiper/manager.py", line 1316, in callprint
self._triage_error(SubprocessError(msg), nofail)
File "/path/to/miniconda3/envs/pepatac/lib/python3.9/site-packages/pypiper/manager.py", line 2539, in _triage_error
self.fail_pipeline(e)
File "/path/to/miniconda3/envs/pepatac/lib/python3.9/site-packages/pypiper/manager.py", line 2009, in fail_pipeline
raise exc
pypiper.exceptions.SubprocessError: Subprocess returned nonzero result. Check above output for details

@donaldcampbelljr
Copy link
Member

Based on the error message:

(ERR): bowtie2-align died with signal 11 (SEGV) 
[main_samview] fail to read the header from "-".
samtools sort: failed to read header from "-"

It leads me to believe that the steps preceding | samtools view -bS - -@ 1 | samtools sort - -@ 1 -T are the issue.

You could run the first part of the command manually and see if it offers further insight:

bowtie2 -p 4  --very-sensitive  -X 2000  --rg-id test1 -x /path/to/refgenie_genomes/alias/hg38/bowtie2_index/default/2230c535660fb4774114bfa966a62f823fdb6d21acf138d4 -1 /path/to/pepatac/pepatac_test/results_pipeline/test1/prealignments/test1_rCRSd_unmap_R1.fq.gz -2 /path/to/pepatac/pepatac_test/results_pipeline/test1/prealignments/test1_rCRSd_unmap_R2.fq.gz

I do not believe you need to reinstall PEPATAC. However, given that the above command does rely on genome assets via refgenie, it may be worth clearing them and re-pulling them using the instructions here: https://pepatac.databio.org/en/latest/run-conda/

@donaldcampbelljr
Copy link
Member

Hi @oliviacwhite, did you have any luck with this?

@oliviacwhite
Copy link
Author

oliviacwhite commented Nov 27, 2024

Hi @donaldcampbelljr, I was able to get my bowtie2 command to work correctly. Thank you so much for your help. I ended up removing the refgenie assets and re-pulling / building them. I used a slurm job submission with bowtie2_index which allowed it to be properly pulled and not kill the command from lack of memory. This helped everything and after renaming some files to fix an error in my looper run, I was able to get it to run successfully all the way through on the example data.

Unfortunately, I have now run into another problem I can't seem to fix. Now I am trying to get the summary directory after the looper run by running looper runp examples/test_project/test_config_refgenie.yaml. I do this in a slurm job and get the following output:

/path/to/pepatac/pipelines/pepatac_collator.py  --config None -O pepatac_test/results_pipeline -P 8 -M 16000 -n test_project -r pepatac_test/results_pipeline
Compute node: dcc-adrc-01
Start time: 2024-11-27 10:30:20
### Pipeline run code and environment:

*          Command: `/path/to/pepatac/pipelines/pepatac_collator.py --config None -O pepatac_test/results_pipeline -P 8 -M 16000 -n test_project -r pepatac_test/results_pipeline`
*     Compute host: `dcc-adrc-01`
*      Working dir: `/path/to/pepatac`
*        Outfolder: `/path/to/pepatac/pepatac_test/results_pipeline/summary/`
*         Log file: `/path/to/pepatac/pepatac_test/results_pipeline/summary/PEPATAC_log.md`
*       Start time:  (11-27 10:30:22) elapsed: 0.0 _TIME_

### Version log:

*   Python version: `3.9.7`
*      Pypiper dir: `/path/to/miniconda3/envs/pepatac/lib/python3.9/site-packages/pypiper`
*  Pypiper version: `0.14.3`
*     Pipeline dir: `/path/to/pepatac/pipelines`
* Pipeline version: `0.0.7`
*    Pipeline hash: `461ae32c8ddf06bad5362aea1430b5dd714a3f3f`
*  Pipeline branch: `* master`
*    Pipeline date: `2024-10-05 08:04:48 -0400`
*    Pipeline diff: `2 files changed, 7 insertions(+), 7 deletions(-)`

### Arguments passed to pipeline:

*        `config_file`:  `None`
*              `cores`:  `8`
*             `cutoff`:  `2`
*              `dirty`:  `False`
*       `force_follow`:  `False`
*              `input`:  `None`
*             `logdev`:  `False`
*                `mem`:  `16000`
*           `min_olap`:  `1`
*          `min_score`:  `5`
*               `name`:  `test_project`
*          `new_start`:  `False`
*         `normalized`:  `False`
*      `output_parent`:  `pepatac_test/results_pipeline`
*      `pipeline_name`:  `None`
*           `poverlap`:  `False`
*            `recover`:  `False`
*            `results`:  `pepatac_test/results_pipeline`
*        `sample_name`:  `None`
*             `silent`:  `False`
*     `skip_consensus`:  `False`
*         `skip_table`:  `False`
*           `testmode`:  `False`
*          `verbosity`:  `None`

### Initialized Pipestat Object:

* PipestatManager (PEPATAC)
* Backend: File
*  - results: /path/to/pepatac/pepatac_test/results_pipeline/summary/stats.yaml
*  - status: /path/to/pepatac/pepatac_test/results_pipeline/summary
* Multiple Pipelines Allowed: False
* Pipeline name: PEPATAC
* Pipeline type: project
* Status Schema key: None
* Results formatter: default_formatter
* Results schema source: None
* Status schema source: None
* Records count: 2
* Sample name: DEFAULT_SAMPLE_NAME


----------------------------------------

Using default schema: /path/to/pepatac/pipelines/pipestat_output_schema.yaml
Traceback (most recent call last):
  File "/path/to/pepatac/pipelines/pepatac_collator.py", line 172, in <module>
    sys.exit(main())
  File "/path/to/pepatac/pipelines/pepatac_collator.py", line 82, in main
    project = peppy.Project(args.config_file)
  File "/path/to/miniconda3/envs/pepatac/lib/python3.9/site-packages/peppy/project.py", line 123, in __init__
    is_cfg = is_cfg_or_anno(cfg)
  File "/path/to/miniconda3/envs/pepatac/lib/python3.9/site-packages/peppy/utils.py", line 179, in is_cfg_or_anno
    raise ValueError(
ValueError: File path 'None' does not point to an annotation or config. Accepted extensions: {'config': ('.yaml', '.yml'), 'annotation': ('.csv', '.tsv')}

### Pipeline failed at:  (11-27 10:30:22) elapsed: 0.0 _TIME_

Total time: 0:00:00
Failure reason: Pipeline failure. See details above.
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/path/to/miniconda3/envs/pepatac/lib/python3.9/site-packages/pypiper/manager.py", line 2165, in _exit_handler
    self.fail_pipeline(Exception("Pipeline failure. See details above."))
  File "/path/to/miniconda3/envs/pepatac/lib/python3.9/site-packages/pypiper/manager.py", line 2009, in fail_pipeline
    raise exc
Exception: Pipeline failure. See details above.

I can see the error here clearly, as the command includes --config None when the config should be leading to the appropriate file /path/to/pepatac/examples/test_project/test_config_refgenie.yaml . I would appreciate help determining why the looper runp command does not find the configuration .yaml file, as I would expect it to.

That being said, I tried a work around using the following command that specifies the --config pathway:
/path/to/pepatac/pipelines/pepatac_collator.py --config /path/to/pepatac/examples/test_project/test_config_refgenie.yaml -O pepatac_test/results_pipeline -P 1 -M 16000 -n test_project -r pepatac_test/results_pipeline
This (temporarily) fixed the --config None error, but it gave me a different output error message:

### Initialized Pipestat Object:

* PipestatManager (PEPATAC)
* Backend: File
*  - results: /path/to/pepatac/pepatac_test/results_pipeline/summary/stats.yaml
*  - status: /path/to/pepatac/pepatac_test/results_pipeline/summary
* Multiple Pipelines Allowed: False
* Pipeline name: PEPATAC
* Pipeline type: project
* Status Schema key: None
* Results formatter: default_formatter
* Results schema source: None
* Status schema source: None
* Records count: 2
* Sample name: DEFAULT_SAMPLE_NAME


----------------------------------------




Traceback (most recent call last):
  File "/path/to/pepatac/pipelines/pepatac_collator.py", line 172, in <module>
    sys.exit(main())
  File "/path/to/pepatac/pipelines/pepatac_collator.py", line 99, in main
    yaml_dict['PEPATAC']['sample'][sample_name] = yaml_tmp['PEPATAC']['sample'][sample_name]
KeyError: 'test1'

### Pipeline failed at:  (11-27 11:04:00) elapsed: 0.0 _TIME_

Total time: 0:00:00
Failure reason: Pipeline failure. See details above.
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/path/to/miniconda3/envs/pepatac/lib/python3.9/site-packages/pypiper/manager.py", line 2165, in _exit_handler
    self.fail_pipeline(Exception("Pipeline failure. See details above."))
  File "/path/to/miniconda3/envs/pepatac/lib/python3.9/site-packages/pypiper/manager.py", line 2009, in fail_pipeline
    raise exc
Exception: Pipeline failure. See details above.

Now, I can't seem to figure out what is going wrong here (something in the pepatac_collator.py script or in the stats.yaml file that was generated after the normal looper run where the sample names are not recognized?) and do not know how to fix this specific error. Any help is greatly appreciated!

@donaldcampbelljr
Copy link
Member

Hi @oliviacwhite,

I suspect the test folder might actually be out of date and contain config files that were useable only with older looper versions.

Could you check out the tutorial folder, specifically the .looper_tutorial_refgenie.yaml file:
https://github.com/databio/pepatac/blob/master/examples/tutorial/.looper_tutorial_refgenie.yaml

And ensure you have looper 1.6.0. installed:
https://github.com/databio/pepatac/blob/master/requirements.txt

Example commands:
looper run --looper-config .looper_tutorial_refgenie.yaml
looper runp --looper-config .looper_tutorial_refgenie.yaml

I can look at this sometime after the holidays (next week).

Thanks.

@oliviacwhite
Copy link
Author

So, if I understand correctly, I should try the tutorial (after downloading the correct tutorial data files), as the test_project configuration files may be out of date to work with looper? Thank you for your help! If you have more time after the holidays, I'd appreciate it.

@donaldcampbelljr
Copy link
Member

Yes, I recommend trying to get the tutorial to work with the requirements listed in the requirements.txt doc. Looper, pipestat, and pypiper are used within the pipeline and have been updated since looper 1.6.0. So, if you use newer versions, the pipeline will probably error.

@oliviacwhite
Copy link
Author

Hi @donaldcampbelljr, I have worked on running looper runp --looper-config .looper_tutorial_refgenie.yaml on the tutorial data sets and get the same basic error as previously:

Using default schema: /path/to/pepatac/pipelines/pipestat_output_schema.yaml
Traceback (most recent call last):
  File "/path/to/pepatac/pipelines/pepatac_collator.py", line 172, in <module>
    sys.exit(main())
  File "/path/to/pepatac/pipelines/pepatac_collator.py", line 99, in main
    yaml_dict['PEPATAC']['sample'][sample_name] = yaml_tmp['PEPATAC']['sample'][sample_name]
KeyError: 'tutorial2'

### Pipeline failed at:  (12-02 12:45:13) elapsed: 0.0 _TIME_

Total time: 0:00:00
Failure reason: Pipeline failure. See details above.
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/path/to/miniconda3/envs/pepatac/lib/python3.9/site-packages/pypiper/manager.py", line 2165, in _exit_handler
    self.fail_pipeline(Exception("Pipeline failure. See details above."))
  File "/path/to/miniconda3/envs/pepatac/lib/python3.9/site-packages/pypiper/manager.py", line 2009, in fail_pipeline
    raise exc
Exception: Pipeline failure. See details above.

I don't understand what KeyError: tutorial2 means, or how I can fix it, so that the pipeline finds the proper samples so that it can make the summary folder from the looper run. Thanks!

@donaldcampbelljr
Copy link
Member

Just to confirm, did you run the sample-level items first using looper run? Can you check the associated stats.yaml files in your results folders for each sample and ensure that results exist in the yaml file? The above error looks like it is trying to read from the results file for the tutorial2 sample but cannot find any items in the yaml file (thus the key error).

@oliviacwhite
Copy link
Author

oliviacwhite commented Dec 3, 2024

Yes, first I did looper run, and everything looked good. The associated stats.yaml files exist for both tutorial1 and tutorial2. Here are the first few lines for the tutorial2 stats.yaml file:

PEPATAC:
  project: {}
  sample:
    DEFAULT_SAMPLE_NAME:
      File_mb: 27
      pipestat_created_time: '2024-12-03 12:14:06'
      pipestat_modified_time: '2024-12-03 12:29:32'
      Read_type: paired
      Genome: hg38
      Raw_reads: '1000000'
      Fastq_reads: 1000000
      Trimmed_reads: 1000000
      Trim_loss_rate: 0.0

I am not sure if having DEFAULT_SAMPLE_NAME is causing issues, as that is also the name of the completed.flag in the tutorial2 directory (PEPATAC_DEFAULT_SAMPLE_NAME_completed.flag). When I run looper runp, I noticed that in the ###Arguments passed to Pipeline section, it includes * sample_name: None and under ### Initialized Pipestat Object, it includes * Sample name: DEFAULT_SAMPLE_NAME. I assume that my issue is arising from the fact that my pipeline is not finding the correct sample name(s) but am unsure how to solve this, or why it would be running into this. I thought that if I could manually include the sample names in the command, I could possibly bypass this issue, to at least get my summary information as an output, but I can't figure out how to do that, if it's possible.

@donaldcampbelljr
Copy link
Member

I'm wondering if you edit the yaml file and replace DEFAULT_SAMPLE_NAME with tutorial2 if it will work? Similarly, do this for tutorial1's stats.yaml file.

Also, which versions of pipestat and piper are you using? This looks like an older bug that was solved a few releases ago.

@oliviacwhite
Copy link
Author

oliviacwhite commented Dec 5, 2024

Yes, replacing DEFAULT_SAMPLE_NAME with tutorial1 or tutorial2 was successful and allowed me to run looper runp and looper report and get my summary files and report .html files. Thank you!

When I look at the summary and report files, it looks mostly good, however, when I look at the summary reports, it says STATUS:Missing, instead of STATUS:Completed, as it should. Additionally, a DEFAULT_SAMPLE_NAME record appears (see first screenshot), in addition to tutorial1 and tutorial2, which contains some files (generally those that are in the summary directory, such as Library complexity), which are likewise missing from the tutorial1 and tutorial2 reported statistics pages (see second screenshot). I have included some screenshots of what is happened on these pages.

Re: pipestat and piper, these runs were done with pipestat version 0.11.0 and piper version 0.14.0. As the requirements.txt file of pepatac specifies pipestat==0.6.0, I installed that version instead and tried running the tutorial files again, but it made no difference and the DEFAULT_SAMPLE_NAME issues persisted. At this point, I am not positive that any essential information is really missing from my results, but I would prefer to have this optimized and without small errors before I run PEPATAC on my larger data set.

Screenshot 2024-12-05 at 11 07 01
Screenshot 2024-12-05 at 11 08 18

@donaldcampbelljr
Copy link
Member

I believe it cannot find the status because it was setting the sample_name to DEFAULT_SAMPLE_NAME and thus, the status flags have that name in the path instead of the correct name, i.e. PEPATAC_DEFAULT_SAMPLE_NAME_completed.flag vs PEPATAC_tutorial2_completed.flag .

The additional record_identifier is unfortunately a placeholder that is made during project summaries for older versions of Looper.

Could you attach the PEPATAC_log.md file for one of those tutorial runs here? I might be able to see why its still using the default sample name.

Another suggestion, though it will be a bit more work on your end:

On the dev branch, we have a version of PEPATAC that is running with the latest versions of Looper, Pipestat, and Pypiper and have fixed some of these small bugs. If you would like, you could pull the development branch (git checkout -b dev) and install the newer versions of those tools (specifically, Looper, Pipestat, and Pypiper) in the requirements.txt file and see if this runs better for you overall.

@oliviacwhite
Copy link
Author

Here is one PEPATAC_log.md file. I will look into your other suggestion as well.

241205OCW_tutorial1_log.txt

@donaldcampbelljr
Copy link
Member

Ok, looking at the log file, it shows that the pipeline was still actually using Piper (pypiper) v0.14.3. If you downgrade that to 0.14.0, the sample names should report correctly.

@donaldcampbelljr
Copy link
Member

Hey @oliviacwhite,

Did you have any luck with the above suggestions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants