Releases · epruesse/SINA

The old --meta-fmt CSV option has been deprecated in favor of having multiple output modules active. To get CSV output as well as the aligned sequences, you can now write -o aligned.fasta.gz -o aligned.csv. The fields that are written to CSV, FASTA or ARB output types can be configured with the --field (-f) parameter. SINA can now also show a list of all fields available in a reference ARB database using --arb-list-fields FILENAME.

Changelog:

allow multiple output types at once
add dedicated CSV/TSV output (#10)
fix loading reference database from running ARB (#76 )
report errors when sequence can't be read from ARB (#73)
add --arb-list-fields listing fields available in ARB
database

Assets 5

09 Mar 03:27

epruesse

v1.6.1

d4a133f

Minor fixes

All progress bars now silence when the output is redirected into a file or pipe
Progress bars no longer overwrite some of the previous output (i.e. the cursor is no longer moved up too often).

Assets 5

26 Apr 01:22

epruesse

v1.6.0

a73f148

Speedups: Internal Kmer Search Now Default

With 1.6.0, the new, very fast internal search engine has become the default. The --search module has been parallelized and performance has been tweaked in many other places.

Here are some numbers:

Input	Reference	Settings	1.6.0	1.5.0	speedup
V4	SILVA NR	align	282/s	22/s	12.8
V4	SILVA NR	align & classify	185/s	3/s	61.7
V4	SILVA NR	turn & align & classify	120/s	3/s	40
full	SILVA NR	align	42/s	3/s	14
full	SILVA NR	align & classify	35/s	0.65/s	58.3
full	SILVA NR	turn & align & classify	33/s	0.6/s	55
V4	test (38k)	align	312/s	225/s	1.4
V4	test (38k)	align & classify	265/s	25/s	10.6
V4	test (38k)	turn & align & classify	260/s	25/s	10.4
full	test (38k)	align	58/s	45/s	1.3
full	test (38k)	align & classify	51/s	9.6/s	5.3
full	test (38k)	turn & align & classify	51/s	6/s	8.5

(Numbers from a Ryzen 1700 with 32GB and 16 threads)

Assets 5

25 Mar 17:21

epruesse

v1.6.0-rc.1

7fcd906

Prerelease: speedups!!! Pre-release

Pre-release

It's finally done. Please give it a spin.

With 1.6.0, the new internal search engine is becoming the default. The --search module has been parallelized and performance has been tweaked in many other places.

Assets 5

07 Feb 02:59

epruesse

v1.5.0

30c171c

Towards an Internal Kmer Search Engine

Internal Kmer Search Update

With this release, the internal kmer search is nearing completion. The kmer-index is now persisted to disk, computed in parallel, and uses a presence/absence optimization to reduce its total size and search speed. It's many times faster than the original PT server based search. (You still need to use --num-pts though to make it use multiple threads). Tweaks to the way SINA interacts with ARB and caches sequences internally have reduced the memory usage of the kmer search indexing and use stages to allow working with the current SILVA Ref NR with on a 16GB machine.

Documentation Update

The documentation is now up to date with the current features. A man file is distributed with SINA and available via man sina from conda environments. Text-file versions are shipped in share/doc/sina, and a pretty html version rendered by sphinx is available at https://sina.readthedocs.io.

Evalutation Options Reinstated

The options --show-dist and --fs-msc-max have been re-instated to allow evaluating the accuracy of SINA. New unit tests are in place to verify that the accuracy doesn't accidentally drop. These will help making the switch to the internal kmer search without risking significant changes to the overall accuracy.

Changelog

update documentation (#20)
reinstate --show-dist
reinstate --fs-msc-max
add choice "exact" to --search-iupac
change default for --search-kmer-len to match --fs-kmer-len
parallelize launch of background PT servers
lower memory usage:
- avoid redundant sequence caching by libARBDB
- use compact aligned base (50% on internal sequence cache)
improve internal kmer search performace
- add caching of kmer index on disk
- parallelize kmer index construction
- add presence/absence optimization
fix field align_ident_slv added for 100% matches even when
not enabled
fix crash on overhang past alignment edge
fix libARBDB writing to stdout, clobbering sequence output
fix out-of-bounds access on iterator in NAST implementation
remove dependency on boost serialization library
build release binaries with GCC 7 and C++11 ABI
add integration tests watching for accuracy regressions (#25)

Full Changelog on ReadTheDocs

Assets 5

09 Nov 19:06

epruesse

v1.4.0

935347d

Parallel SINA

Parallel SINA is here!

Use --num-pts N to specify the number of PT servers you would like working in parallel. The rest of SINA will adapt dynamically to the available resources (if you must, adjust it with --threads).

Please remember that the PT server is rather memory hungry. If you set --num-pts too high, you will run out and SINA will crash.

Other Improvements:

Add search result to output:

Using --add-relatives N you can now ask SINA to add the search result sequences to the sequence output file. If you have --search enabled, it will use the n best results from the alignment based homology search. Otherwise, it will use the n sequences with the highest relative number of kmers shared with each query. Each reference sequence will be added only once.

Input / Output:

SINA will now read and write gzipped FASTA files transparently. You can also use - as input/output file name to pipe sequences through SINA.

Logging

SINA now has an actual logging facility. You can change it's verbosity with -q, and -v (repeat to increase or decrease further). The log file specified with --log-file will always be verbose (but not include debug messages).

Assets 5

30 Oct 03:29

epruesse

v1.4.0-rc

6ab1840

Parallel SINA - Preview Pre-release

Pre-release

Parallelization adds a whole new class of bugs that become possible. If this breaks, stalls, crashes or otherwise misbehaves, please create an issue!

process sequences in parallel (#17, #31)
add support for gzipped read/write (#29)
add support for "-" to read/write using pipes
remove internal pipeline in favor of TBB
add option --add-relatives; adds ref sequences to output (#19)
add logging with variable verbosity (#14)
be smart about locating arb_pt_server binary (#30)
add --add-relatives adding search result to output (#19)

Assets 5

20 Sep 02:09

epruesse

v1.3.5

ecb4ff0

Maintenance Release

report number of references discarded due to configured constraints
fix crash (regression) if no acceptable references found for a query
fix --search causes a program option error (#28)
fix race condition in terminating PT server

Assets 5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Internal Kmer Search Update

Documentation Update

Evalutation Options Reinstated

Changelog

Parallel SINA is here!

Other Improvements:

Add search result to output:

Input / Output:

Logging

Releases: epruesse/SINA

Minor fix (build issue w/o TBB Malloc)

Minor fix (rounding error in classifier)

Improved CSV output

Minor fixes

Speedups: Internal Kmer Search Now Default

Prerelease: speedups!!!

Towards an Internal Kmer Search Engine

Internal Kmer Search Update

Documentation Update

Evalutation Options Reinstated

Changelog

Parallel SINA

Parallel SINA is here!

Other Improvements:

Add search result to output:

Input / Output:

Logging

Parallel SINA - Preview

Maintenance Release