Allow sample groups to be supplied using column numbers #18

hepcat72 · 2019-04-18T19:41:55Z

Min column number would be 10.

IsabelFE · 2021-06-23T16:36:22Z

I have a question about the use of --sample-group. Can I provide a list of sample names in a file? If I have 2 groups I would like to provide 2 files, one with each list of names. Is that possible?

hepcat72 · 2021-06-23T16:59:09Z

Maybe. You might be able to do it by wrapping the file in either backticks or via process substitution.

The version of the perl module that comes with the script might be too old to do this, but you can check by running this perl one-liner to see if it would work.

I created 2 files:

group1:

sample1 sample2 sample3

group2:

sample4 sample5 sample6

Then I ran this test one-liner:

perl -e 'use CommandLineInterface;my $s=[];add2DArrayOption(GETOPTKEY => "s=s",GETOPTVAL => $s);processCommandLine();print(join("\n",map {my $r=$_;join("\t",@$r)} @$s),"\n");' -- -s "`cat group1`" -s "`cat group2`"

And I did get the correct output:

sample1	sample2	sample3
sample4	sample5	sample6

If you get the same output by running the above command, you should be able to supply your sample files with:

--sample-group "`cat group1`" --sample-group "`cat group2`"

I use the tcsh shell, but I think that backticks work the same way in bash (which is what I assume you're using).

In your files, the sample names must all be provided on one line, otherwise, you only get the first sample from each group.

IsabelFE · 2021-06-23T17:55:16Z

I think it worked, I got and output file like this:

#vcfSampleCompare.pl Version 2.013
# Created: 6/22/2017
# Last modified: Wed Jun 23 13:52:39 2021
# Author: Robert William Leach
# Contact: [email protected]
# Company: Princeton University
# License: Copyright 2019
#User: isabelfernandezescapa
#Time: Wed Jun 23 13:52:38 2021
#Host: WS-00129
#PID: 72373
#Directory: /Users/isabelfernandezescapa/Desktop/AAAA
#Command: /opt/miniconda3/bin/perl /opt/miniconda3/bin/vcfSampleCompare.pl --sample-group "2021-0412-02A21_FB
2021-0415-00F07_FB" --sample-group "2021-0412-00I13_FB
2021-0412-00N21_FB
2021-0412-01M22_FB
2021-0412-01P18_FB
2021-0412-02A21_FB
2021-0412-02G24_FB
2021-0413-00F17_FB
2021-0415-01A05_FB
2021-0412-01B10_FB" FB_merged_renamed.vcf

#CHROM	POS	ID	REF	ALT	BEST_PAIR	BEST_GT_SCORE	BEST_OR_SCORE	BEST_DP_SCORE	PAIR_ID	PAIR_GT_SCORE	PAIR_OR_SCORE	PAIR_DP_SCORE	STATES_USED_GT	STATES_USED_OR	GROUP1_SAMPLES	GROUP1_GTS	GROUP1_ORS	GROUP2_SAMPLES	GROUP2_GTS	GROUP2_ORS

I imagine there are no variants identified as different for those 2 groups. I am just playing with 10 random files in order to set up the pipeline, I will try with real data later on the week.

hepcat72 · 2021-06-23T18:03:00Z

Parsing that file downstream might be problematic given the uncommented sample lines, but it seems like it may work, at least up to this point.

hepcat72 added enhancement low priority labels Apr 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow sample groups to be supplied using column numbers #18

Allow sample groups to be supplied using column numbers #18

hepcat72 commented Apr 18, 2019

IsabelFE commented Jun 23, 2021

hepcat72 commented Jun 23, 2021 •

edited

Loading

IsabelFE commented Jun 23, 2021

hepcat72 commented Jun 23, 2021

Allow sample groups to be supplied using column numbers #18

Allow sample groups to be supplied using column numbers #18

Comments

hepcat72 commented Apr 18, 2019

IsabelFE commented Jun 23, 2021

hepcat72 commented Jun 23, 2021 • edited Loading

IsabelFE commented Jun 23, 2021

hepcat72 commented Jun 23, 2021

hepcat72 commented Jun 23, 2021 •

edited

Loading