Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR: LoadError: GeneticVariation.VCF.Reader #95

Open
moneterg opened this issue Oct 8, 2020 · 7 comments
Open

ERROR: LoadError: GeneticVariation.VCF.Reader #95

moneterg opened this issue Oct 8, 2020 · 7 comments

Comments

@moneterg
Copy link

moneterg commented Oct 8, 2020

Hi,

I'm writing to ask about an error which appeared to me when I ran VIVA for the first time.

I've ran the command below:

viva --vcf_file 450exomes-cohort_GENOTYPEGVCF.vcf --output_directory . --save_format html --x_axis_labels --heatmap_title 450exomes-cohort_GENOTYPEGVCF --avg_dp sample,variant

So, I've got this error here:

Welcome to VIVA.

Loading dependency packages:

┌ Warning: ORCA.jl has been deprecated and all savefig functionality
│ has been implemented directly in PlotlyBase itself.
│
│ By implementing in PlotlyBase.jl, the savefig routines are automatically
│ available to PlotlyJS.jl also.
└ @ ORCA ~/.julia/packages/ORCA/U5XaN/src/ORCA.jl:8
...

Finished loading packages!

Reading 450exomes-cohort_notrimmed_COMBINED_GENOTYPEGVCF.vcf ...

ERROR: LoadError: GeneticVariation.VCF.Reader file format error on line 200
Stacktrace:
 [1] error(::String, ::Int64) at ./error.jl:42
 [2] _readheader!(::GeneticVariation.VCF.Reader, ::BioCore.Ragel.State{BufferedStreams.BufferedInputStream{IOStream}}) at /scratch/7411317/.julia/packages/BioCore/YBJvb/src/ReaderHelper.jl:106
 [3] readheader!(::GeneticVariation.VCF.Reader) at /scratch/7411317/.julia/packages/BioCore/YBJvb/src/ReaderHelper.jl:80
 [4] Reader at /scratch/7411317/.julia/packages/GeneticVariation/r8DAL/src/vcf/reader.jl:15 [inlined]
 [5] GeneticVariation.VCF.Reader(::IOStream) at /scratch/7411317/.julia/packages/GeneticVariation/r8DAL/src/vcf/reader.jl:28
 [6] top-level scope at /scratch/7411317/VariantVisualization.jl/viva:131
 [7] include(::Function, ::Module, ::String) at ./Base.jl:380
 [8] include(::Module, ::String) at ./Base.jl:368
 [9] exec_options(::Base.JLOptions) at ./client.jl:296
 [10] _start() at ./client.jl:506
in expression starting at /scratch/7411317/VariantVisualization.jl/viva:131

Such VCF was obtained by GATK v3.8 best practices pipeline.
After haplotypecaller, I've ran combineGVCFs, after this, GenotypeGVCF.

I really appreciate if someone could help me here.

Thank you very much for your time.

@gtollefson
Copy link
Collaborator

Hi @monete ,

Let's get this sorted! That error is coming from a dependency package which we use to read the VCF file. It doesn't like something that appears on line 200. Can you paste lines 195-205 of the VCF file as a reply below? We'll look for formatting or special symbols which the reader function may not be expecting. I'll ping the GeneticVariation.jl package developer to assist further.

@benjward Do you know whether VCF files produced by GATK v3.8 are supported by GeneticVariation.jl package readheader!() function? Can you help us troubleshoot once @monete has sent us the offending VCF line?

@moneterg
Copy link
Author

moneterg commented Oct 8, 2020

Hi @gtollefson

I'm pasting here lines between 195-201 (after line 201 there are variants information).

##contig=<ID=chrUn_gl000246,length=38154,assembly=hg19>
##contig=<ID=chrUn_gl000247,length=36422,assembly=hg19>
##contig=<ID=chrUn_gl000248,length=39786,assembly=hg19>
##contig=<ID=chrUn_gl000249,length=38502,assembly=hg19>
##dbSNP_BUILD_ID=138
##fileDate=20130806
##phasing=partial
##reference=file:///scratch/5644370/hg19/ucsc.hg19.fasta
##source=dbSNP
##variationPropertyDocumentationUrl=ftp://ftp.ncbi.nlm.nih.gov/snp/specs/dbSNP_BitField_latest.pdf	
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	AAX754	ABG583	ACA082	...

I'm summarizing sample names here, since I have 450 names.
Hope this help!

@gtollefson
Copy link
Collaborator

gtollefson commented Oct 8, 2020

@monete Ah ha! Ok, I'm guessing the readheader! function that we depend on is not expecting the header line title "variationPropertyDocumentationUrl " or isn't expecting one of the special characters in the url. I would delete that line in a text editor, save a new vcf file, and then rerun and let us know what you get.

@moneterg
Copy link
Author

moneterg commented Oct 9, 2020

Hi @gtollefson

I'm trying to running this program on a slurm HPC system.
On command-line in login node (not in a job), the first 5min of test seems to be ok!

But when I submit the job (with the command line identical to the test), this error appeared:

┌ Info: waiting for lock on pidfile
└   path = "/scratch/7411317/.jlassetregistry.lock"
┌ Warning: ORCA.jl has been deprecated and all savefig functionality
│ has been implemented directly in PlotlyBase itself.
│
│ By implementing in PlotlyBase.jl, the savefig routines are automatically
│ available to PlotlyJS.jl also.
└ @ ORCA ~/.julia/packages/ORCA/U5XaN/src/ORCA.jl:8
ERROR: LoadError: KeyError: key "3/7" not found
Stacktrace:
 [1] getindex at ./dict.jl:467 [inlined]
 [2] (::VariantVisualization.var"#translate#27"{Dict{Any,Any}})(::String) at /scratch/7411317/.julia/packages/VariantVisualization/1yoNl/src/vcf_utils_complete.jl:738
 [3] iterate at ./generator.jl:47 [inlined]
 [4] collect_to!(::Array{Int64,2}, ::Base.Generator{Array{Any,2},VariantVisualization.var"#translate#27"{Dict{Any,Any}}}, ::Int64, ::Int64) at ./array.jl:732
 [5] collect_to_with_first!(::Array{Int64,2}, ::Int64, ::Base.Generator{Array{Any,2},VariantVisualization.var"#translate#27"{Dict{Any,Any}}}, ::Int64) at ./array.jl:710
 [6] _collect(::Array{Any,2}, ::Base.Generator{Array{Any,2},VariantVisualization.var"#translate#27"{Dict{Any,Any}}}, ::Base.EltypeUnknown, ::Base.HasShape{2}) at ./array.jl:704
 [7] collect_similar at ./array.jl:628 [inlined]
 [8] map at ./abstractarray.jl:2162 [inlined]
 [9] translate_genotype_to_num_array(::Array{Any,2}, ::Dict{Any,Any}) at /scratch/7411317/.julia/packages/VariantVisualization/1yoNl/src/vcf_utils_complete.jl:741
 [10] combined_all_genotype_array_functions(::Array{Any,1}) at /scratch/7411317/.julia/packages/VariantVisualization/1yoNl/src/vcf_utils_complete.jl:622
 [11] top-level scope at /scratch/7411317/VariantVisualization.jl/viva:410
 [12] include(::Function, ::Module, ::String) at ./Base.jl:380
 [13] include(::Module, ::String) at ./Base.jl:368
 [14] exec_options(::Base.JLOptions) at ./client.jl:296
 [15] _start() at ./client.jl:506
in expression starting at /scratch/7411317/VariantVisualization.jl/viva:408

Do you have any tip for me about this?

Thank you for your time.

@gtollefson
Copy link
Collaborator

gtollefson commented Oct 9, 2020

@monete no problem, I’m happy you’re using our tool! We’ll solve it.

Can you run the command to completion on the command line? It would help to know if it works there before debugging on the shared computing network. If you ran it in the login node but it didn’t complete, it’s possible that it didn’t reach that point in the run to trigger the error yet, since there are less resources available on the login node.

@moneterg
Copy link
Author

Hi @gtollefson
1h 30min of running on command-line on login node.
Like you said, the error appeared.

Command line:

viva --vcf_file 450exomes-cohort_COMBINED_GENOTYPEGVCF_edited.vcf --output_directory . --save_format html --x_axis_labels --heatmap_title 450exomes-cohort_COMBINED_GENOTYPEGVCF_edited --avg_dp sample,variant

Output:

Welcome to VIVA.

Loading dependency packages:

┌ Warning: ORCA.jl has been deprecated and all savefig functionality
│ has been implemented directly in PlotlyBase itself.
│ 
│ By implementing in PlotlyBase.jl, the savefig routines are automatically
│ available to PlotlyJS.jl also.
└ @ ORCA ~/.julia/packages/ORCA/U5XaN/src/ORCA.jl:8
...

Finished loading packages!

Reading 450exomes-cohort_COMBINED_GENOTYPEGVCF_edited.vcf ...

No filters applied. Large vcf files will take a long time to process and heatmap visualizations will lose resolution at this scale unless viewed in interactive html for zooming.

Loading VCF file into memory for visualization
Selected 902958 variants with no filters applied
ERROR: LoadError: KeyError: key "3/7" not found
Stacktrace:
 [1] getindex at ./dict.jl:467 [inlined]
 [2] (::VariantVisualization.var"#translate#27"{Dict{Any,Any}})(::String) at /scratch/7411317/.julia/packages/VariantVisualization/1yoNl/src/vcf_utils_complete.jl:738
 [3] iterate at ./generator.jl:47 [inlined]
 [4] collect_to!(::Array{Int64,2}, ::Base.Generator{Array{Any,2},VariantVisualization.var"#translate#27"{Dict{Any,Any}}}, ::Int64, ::Int64) at ./array.jl:732
 [5] collect_to_with_first!(::Array{Int64,2}, ::Int64, ::Base.Generator{Array{Any,2},VariantVisualization.var"#translate#27"{Dict{Any,Any}}}, ::Int64) at ./array.jl:710
 [6] _collect(::Array{Any,2}, ::Base.Generator{Array{Any,2},VariantVisualization.var"#translate#27"{Dict{Any,Any}}}, ::Base.EltypeUnknown, ::Base.HasShape{2}) at ./array.jl:704
 [7] collect_similar at ./array.jl:628 [inlined]
 [8] map at ./abstractarray.jl:2162 [inlined]
 [9] translate_genotype_to_num_array(::Array{Any,2}, ::Dict{Any,Any}) at /scratch/7411317/.julia/packages/VariantVisualization/1yoNl/src/vcf_utils_complete.jl:741
 [10] combined_all_genotype_array_functions(::Array{Any,1}) at /scratch/7411317/.julia/packages/VariantVisualization/1yoNl/src/vcf_utils_complete.jl:622
 [11] top-level scope at /scratch/7411317/VariantVisualization.jl/viva:410
 [12] include(::Function, ::Module, ::String) at ./Base.jl:380
 [13] include(::Module, ::String) at ./Base.jl:368
 [14] exec_options(::Base.JLOptions) at ./client.jl:296
 [15] _start() at ./client.jl:506
in expression starting at /scratch/7411317/VariantVisualization.jl/viva:408

Thanks again :)

@gtollefson
Copy link
Collaborator

gtollefson commented Oct 16, 2020

Hi @monete,

The version of VCF files we developed VIVA to expect caps alternate allele numbers at 6, so it doesn't expect 3/7. I will modify the code to allow allele values over 6 so it can interpret 3/7, which interprets the variant as a heterozygous variant. We're in the middle of preparing several manuscripts/results for other projects so it may take me some time to push the changes. In the meantime, if you are able to change the 3/7 value to 3/6 using find and replace in a text editor, it should run. Let me know how it goes. Otherwise, I'll ping you here once I push this fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants