Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow sample groups to be supplied using column numbers #18

Open
hepcat72 opened this issue Apr 18, 2019 · 4 comments
Open

Allow sample groups to be supplied using column numbers #18

hepcat72 opened this issue Apr 18, 2019 · 4 comments

Comments

@hepcat72
Copy link
Owner

Min column number would be 10.

@IsabelFE
Copy link

I have a question about the use of --sample-group. Can I provide a list of sample names in a file? If I have 2 groups I would like to provide 2 files, one with each list of names. Is that possible?

@hepcat72
Copy link
Owner Author

hepcat72 commented Jun 23, 2021

Maybe. You might be able to do it by wrapping the file in either backticks or via process substitution.

The version of the perl module that comes with the script might be too old to do this, but you can check by running this perl one-liner to see if it would work.

I created 2 files:

group1:

sample1 sample2 sample3

group2:

sample4 sample5 sample6

Then I ran this test one-liner:

perl -e 'use CommandLineInterface;my $s=[];add2DArrayOption(GETOPTKEY => "s=s",GETOPTVAL => $s);processCommandLine();print(join("\n",map {my $r=$_;join("\t",@$r)} @$s),"\n");' -- -s "`cat group1`" -s "`cat group2`"

And I did get the correct output:

sample1	sample2	sample3
sample4	sample5	sample6

If you get the same output by running the above command, you should be able to supply your sample files with:

--sample-group "`cat group1`" --sample-group "`cat group2`"

I use the tcsh shell, but I think that backticks work the same way in bash (which is what I assume you're using).

In your files, the sample names must all be provided on one line, otherwise, you only get the first sample from each group.

@IsabelFE
Copy link

I think it worked, I got and output file like this:

#vcfSampleCompare.pl Version 2.013
# Created: 6/22/2017
# Last modified: Wed Jun 23 13:52:39 2021
# Author: Robert William Leach
# Contact: [email protected]
# Company: Princeton University
# License: Copyright 2019
#User: isabelfernandezescapa
#Time: Wed Jun 23 13:52:38 2021
#Host: WS-00129
#PID: 72373
#Directory: /Users/isabelfernandezescapa/Desktop/AAAA
#Command: /opt/miniconda3/bin/perl /opt/miniconda3/bin/vcfSampleCompare.pl --sample-group "2021-0412-02A21_FB
2021-0415-00F07_FB" --sample-group "2021-0412-00I13_FB
2021-0412-00N21_FB
2021-0412-01M22_FB
2021-0412-01P18_FB
2021-0412-02A21_FB
2021-0412-02G24_FB
2021-0413-00F17_FB
2021-0415-01A05_FB
2021-0412-01B10_FB" FB_merged_renamed.vcf

#CHROM	POS	ID	REF	ALT	BEST_PAIR	BEST_GT_SCORE	BEST_OR_SCORE	BEST_DP_SCORE	PAIR_ID	PAIR_GT_SCORE	PAIR_OR_SCORE	PAIR_DP_SCORE	STATES_USED_GT	STATES_USED_OR	GROUP1_SAMPLES	GROUP1_GTS	GROUP1_ORS	GROUP2_SAMPLES	GROUP2_GTS	GROUP2_ORS

I imagine there are no variants identified as different for those 2 groups. I am just playing with 10 random files in order to set up the pipeline, I will try with real data later on the week.

@hepcat72
Copy link
Owner Author

Parsing that file downstream might be problematic given the uncommented sample lines, but it seems like it may work, at least up to this point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants