-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(compressed) suffic arrays over collections #28
Comments
We already discussed this. Some of the issues we had were mainly how the input format should look like. Any ideas? |
Well, the main decision from my POV is:
2. is probably much easier to implement, but likely has a strong impact on the performance. This would need to be evaluated... You would need something like seqan/seqan3#104 if you want to prevent copying the input sequences into one sequence. This further increases the impact on performance (or increases the size overhead if copying). 1. will be slightly more work, but has the advantages:
|
@h-2 Size should not be an issue here. Afaik, the CSA is using bitcompressed vectors anyway (so there are no bits wasted). |
Hm, that may be true for the BWT, but for the sampled SA, as well? And what about the full SA during construction? |
Another very central feature beside #27 that we require is to be able to create indexes over collections of strings/vectors. We also discussed this in march with @simongog and it seemed like he already had some ideas for this.
We could theoretically wrap around your index structure, but it might make more sense to work on this as part of the SDSL.
@mpetri @simongog Do you have any preferences here?
@cpockrandt Can you create a proof-of-concept wrapping the data structures to show how this could work?
The text was updated successfully, but these errors were encountered: