Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error on test data #2

Open
nextgenusfs opened this issue Dec 20, 2016 · 19 comments
Open

Error on test data #2

nextgenusfs opened this issue Dec 20, 2016 · 19 comments

Comments

@nextgenusfs
Copy link

On Mac, seems to have compiled correctly. But get error on the test data, I don't think it is permissions issue.

src/deML -i testData/index.txt -f testData/todemultiplex.fq1.gz  -r testData/todemultiplex.fq2.gz -if1 testData/todemultiplex.i1.gz  -if2 testData/todemultiplex.i2.gz   -o testData/
Conflicts for index1:
AGTCAGA from RG9 causes a conflict with RG57 
AACTAGA from RG10 causes a conflict with RG58 
CTATGGC from RG11 causes a conflict with RG59 
CGACGGT from RG12 causes a conflict with RG60 
AACCAAG from RG13 causes a conflict with RG61 
CGGCGTA from RG14 causes a conflict with RG62 
GCAGTCC from RG15 causes a conflict with RG63 
CTCGCGC from RG16 causes a conflict with RG64 
CTGCGAC from RG17 causes a conflict with RG65 
ACGTATG from RG18 causes a conflict with RG66 
ATACTGA from RG19 causes a conflict with RG67 
AGTCAGA from RG57 causes a conflict with RG9 
AACTAGA from RG58 causes a conflict with RG10 
CTATGGC from RG59 causes a conflict with RG11 
CGACGGT from RG60 causes a conflict with RG12 
AACCAAG from RG61 causes a conflict with RG13 
CGGCGTA from RG62 causes a conflict with RG14 
GCAGTCC from RG63 causes a conflict with RG15 
CTCGCGC from RG64 causes a conflict with RG16 
CTGCGAC from RG65 causes a conflict with RG17 
ACGTATG from RG66 causes a conflict with RG18 
ATACTGA from RG67 causes a conflict with RG19 
Conflicts for index2:
AATTCAA from RG1 causes a conflict with RG57 
CGCGCAG from RG2 causes a conflict with RG58 
AAGGTCT from RG3 causes a conflict with RG59 
ACTGGAC from RG4 causes a conflict with RG60 
AGCAGGT from RG5 causes a conflict with RG61 
GTACCGG from RG6 causes a conflict with RG62 
GGTCAAG from RG7 causes a conflict with RG63 
AATGATG from RG8 causes a conflict with RG64 
AGTCAGA from RG9 causes a conflict with RG65 
AACTAGA from RG10 causes a conflict with RG66 
CTATGGC from RG11 causes a conflict with RG67 
AGTCAGA from RG57 causes a conflict with RG1 
AACTAGA from RG58 causes a conflict with RG2 
CTATGGC from RG59 causes a conflict with RG3 
CGACGGT from RG60 causes a conflict with RG4 
AACCAAG from RG61 causes a conflict with RG5 
CGGCGTA from RG62 causes a conflict with RG6 
GCAGTCC from RG63 causes a conflict with RG7 
CTCGCGC from RG64 causes a conflict with RG8 
CTGCGAC from RG65 causes a conflict with RG9 
ACGTATG from RG66 causes a conflict with RG10 
ATACTGA from RG67 causes a conflict with RG11 
Conflicts for pairs:
Cannot write to file testData/_RG49_r1.fail.fq.gz either you do not have permissions or you have too many read groups, in that case, convert your input data to a single BAM file and demultiplex it
@grenaud
Copy link
Owner

grenaud commented Dec 21, 2016 via email

@grenaud
Copy link
Owner

grenaud commented Jan 10, 2017

Can you try modifying the -o option to be:
-o testData/outdata

@katier239
Copy link

I am getting the same error with the test data. I tried using -o testData/outdata, but it returned the same error as previously. Running the script with bam input seemed to work ok.

I don't know how to read code, but the error message made me think that the program can only handle a certain number of samples when the input is in fastq format. I modified the index.txt file so that it contained a smaller number of read groups (including RG49), and that allowed the process to complete with the fastq input files.

If there any way that deML can be made to work in fastq mode with more read groups? I have a feeling that converting my fastq amplicon sequencing files into bam format might be a bit tricky/inpractical.

Thanks :)

@grenaud
Copy link
Owner

grenaud commented May 30, 2017

dear klr123, sorry for the late reply. I have added an error message in the code about the number of opened file descriptors and the maximum number of file descriptors on the system. When that limit is reached, it warns the user and prints info as to how to do this.

Could you do a "ulimit -n"? It should be at least 1024.

@katier239
Copy link

Thank you for helping me to troubleshoot that Grenaud!

I ran ulimit -n and it returned 256. I then ran ulimit -n 1024, and after that the test data analysis ran to completion without any error messages.

Thanks for your help :)

@grenaud grenaud closed this as completed Sep 19, 2017
@dangeles
Copy link

dangeles commented Mar 6, 2019

I'm getting the exact same issue, but ulimit is 65536. Any idea what might be causing this?

@grenaud
Copy link
Owner

grenaud commented Mar 6, 2019

really? How many read groups do you have? What is your OS?

@grenaud grenaud reopened this Mar 6, 2019
@dangeles
Copy link

dangeles commented Mar 6, 2019

Hey! Thanks for answering so quickly. I'm on a Debian Red Hat system. I have 287 read groups.

@dangeles
Copy link

dangeles commented Mar 6, 2019

Ah, a further check reveals that the software I'm using was last compiled on 2016-05-25. I will reclone the repo and see if this fixes anything

@dangeles
Copy link

dangeles commented Mar 6, 2019

OK, I've recompiled, and the error is still the same.

@grenaud
Copy link
Owner

grenaud commented Mar 6, 2019

strange, can you do a ulimit -a?

@grenaud
Copy link
Owner

grenaud commented Mar 8, 2019

@dangeles Did you get a chance to test this? I suspect there is a discrepancy on your system between the user limit and the system limit.

@dangeles
Copy link

Hey, sorry, I was traveling with limited access to internet. Here's my output from ulimit -a:

core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 128520
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 65536
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) unlimited
cpu time               (seconds, -t) unlimited
max user processes              (-u) 1024
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

@grenaud
Copy link
Owner

grenaud commented Mar 13, 2019

two follow-up questions:

  1. You have debian or red hat?
  2. can you give me the output of:
    ulimit -Sn and ulimit -Hn

In the meantime, could you transform your fastqs to bam using https://github.com/grenaud/BCL2BAM2FASTQ and demultiplex on those?

@dangeles
Copy link

  1. Red hat.
  2. ulimit -Sn: 65536 and ulimit -HN 65536

@grenaud
Copy link
Owner

grenaud commented Mar 15, 2019

can you privately send me a drop box link to your index list and files?

Also, if you want to get ahead, I would simply use my fastq2bam (https://github.com/grenaud/BCL2BAM2FASTQ/tree/master/fastq2bam), demultiplex the bam file and reconvert to fastq (https://github.com/grenaud/BCL2BAM2FASTQ/tree/master/bam2fastq)

@dangeles
Copy link

dangeles commented Mar 16, 2019

I'll figure out how to dropbox you some data by Monday.

I used your fastq2bam tool to convert to BAM, seemingly without issues. Oddly enough, though, when I run the demultiplex command on the BAM file, it still gives me the same conflict message. Now, the BAM files are getting populated, so I know the script is running, which makes me think that the problem (though similar to the original) is in my index file.

Here's the exact error message:

Conflicts for index1:
TCGCCTTA from N701S501 causes a conflict with N701S502 N701S503 N701S504 N701S505 N701S506 N701S507 N701S508 N701S509 N701S510 N701S511 N701S512 N701S513 N701S514 N701S515 N701S516 N701S517 N701S518 N701S519 N701S520 N701S521 N701S522 N701S523 N701S524 N701S525 N701S526 N701S527 N701S528 N701S529 N701S530 N701S531 N701S532 N701S533 N701S534 N701S535 N701S536

...{many many lines later}...

Conflicts for index2:
AGTTAACA from N724S536 causes a conflict with N701S536 N702S536 N703S536 N704S536 N705S536 N706S536 N707S536 N708S536 N709S536 N710S536 N711S536 N712S536 N713S536 N714S536 N715S536 N716S536 N717S536 N718S536 N719S536 N720S536 N721S536 N722S536 N723S536

Sorry for all the hassle!

@grenaud
Copy link
Owner

grenaud commented Mar 17, 2019

Those are not error messages but simply warnings. It means that TCGCCTTA was used by several reads groups. But I guess that is bound to happen if you have more than 200 read groups.

Let me know for the fastqs, I find it odd that with such a high limit, you get this error message.

@dangeles
Copy link

Ah, that's good. deML seems to be running just fine on BAM. Seems like for large file numbers, maybe that will just be my default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants