You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the examples below, I created a data set that either underwent "real" deduplication (using deduplicate_sequences.py) or "fake" deduplication using this script:
#!/usr/bin/env python
"""
Usage seqmagick extract-ids seqs.fasta | pretend_to_deduplicate.py > dedup_info.csv
"""
import sys
import csv
def main():
writer = csv.writer(sys.stdout)
for name in sys.stdin:
name = name.strip()
writer.writerow([name, name, 1])
main()
In the examples below, the output using "real" deduplication is in output-dedup, "fake" in output.
In the examples below, I created a data set that either underwent "real" deduplication (using deduplicate_sequences.py) or "fake" deduplication using this script:
In the examples below, the output using "real" deduplication is in output-dedup, "fake" in output.
This manifests in an actual analysis as under-counting for each specimen, for example:
As far as I can tell, the behavior is the same in classif_rect.py (the original script from which classif_table.py was derived).
For the time being, the fix appears to be not to deduplicate before aligning and running pplacer.
The text was updated successfully, but these errors were encountered: