You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think the reason for this is that the only motivation for writing a new EIGENSRAT data set would be to save a filtered version of SNP data, such as after removing transitions, etc. However, handling filtered SNPs has, so far, been handled by assigning a modifier component to an EIGENSTRAT R object... with all analyses downstream from the filtering using the standard ADMIXTOOLS mechanism for excluding filtered SNPs via a "badsnpname" parfile parameter.
library(admixr)
# download small testing dataprefix<- download_data(dirname= tempdir())
snps<- eigenstrat(prefix)
# get a path to an example filter BED filebed<- file.path(dirname(prefix), "regions.bed")
# the BED file contains regions to keep in an analysis -- remove SNPs which fall outsidenew_snps<- filter_bed(snps, bed)
# BED file contains regions to remove from an analysisnew_snps<- filter_bed(snps, bed, remove=TRUE)
new_snps#> EIGENSTRAT object#> =================#> components:#> ind file: /var/folders/70/b_q2zdh116b9pfg29p03sx600000gn/T//RtmpzHxCwT/snps/snps.ind#> snp file: /var/folders/70/b_q2zdh116b9pfg29p03sx600000gn/T//RtmpzHxCwT/snps/snps.snp#> geno file: /var/folders/70/b_q2zdh116b9pfg29p03sx600000gn/T//RtmpzHxCwT/snps/snps.geno#> #> modifiers:#> excluded sites: /var/folders/70/b_q2zdh116b9pfg29p03sx600000gn/T//RtmpzHxCwT/file5cb6227326a3.snp #> (SNPs excluded: 100000, SNPs remaining: 400000)
(note the "modifier" item in the EIGENSTRAT object summary above).
Recently it has been pointed out to me that it would be useful to have a function which can save filtered EIGENSTRAT data (such as new_snps) above to a completely new location. Effectively, this write_eigenstrat() function would take the contents of the snp file, remove positions present in the excluded sites, and write out snp and geno files which are filtered down appropriately.
I suppose this would be useful in situations where the original EIGENSTRAT data set is quite huge, and loading it and filtering it before every analysis (as opposed to loading a smaller, filtered down version) would be wasteful. But having this function seems like a good idea in principle.
Is there something I am missing? Ah, right. There's also a relabel() function which can group individuals into renamed groups:
relabel() creates a new ind file that we swap in for the original ind file in a parfile generated by ever every D/f3/f4/etc. wrapper. The putative write_eigenstrat() should -- if a group modifier is present -- write this modified ind file as well.
The text was updated successfully, but these errors were encountered:
Additionally, while I'm at it, we have filter_bed(), same for transversions-only filtering, so it also makes sense to implement filtering of individuals.
Maybe the putative filter_individuals()could add another "modifier" item in the EIGENSTRAT R object? Although filtering individuals automatically implies filtering columns of a geno file... so perhaps this will need something bit more complex.
Looking at the Reading/writing EIGENSTRAT data section of the reference, I see there's no
write_eigenstrat()
function.I think the reason for this is that the only motivation for writing a new EIGENSRAT data set would be to save a filtered version of SNP data, such as after removing transitions, etc. However, handling filtered SNPs has, so far, been handled by assigning a
modifier
component to anEIGENSTRAT
R object... with all analyses downstream from the filtering using the standard ADMIXTOOLS mechanism for excluding filtered SNPs via a "badsnpname" parfile parameter.I.e., to use an example from the main vignette:
(note the "modifier" item in the EIGENSTRAT object summary above).
Recently it has been pointed out to me that it would be useful to have a function which can save filtered EIGENSTRAT data (such as
new_snps
) above to a completely new location. Effectively, thiswrite_eigenstrat()
function would take the contents of thesnp
file, remove positions present in theexcluded sites
, and write out snp and geno files which are filtered down appropriately.I suppose this would be useful in situations where the original EIGENSTRAT data set is quite huge, and loading it and filtering it before every analysis (as opposed to loading a smaller, filtered down version) would be wasteful. But having this function seems like a good idea in principle.
Is there something I am missing? Ah, right. There's also a
relabel()
function which can group individuals into renamed groups:relabel()
creates a newind
file that we swap in for the originalind
file in aparfile
generated by ever every D/f3/f4/etc. wrapper. The putativewrite_eigenstrat()
should -- if a group modifier is present -- write this modifiedind
file as well.The text was updated successfully, but these errors were encountered: