-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature idea: Jelly stream transcoders #225
Comments
Merging streams is relatively easy, especially if we make a few assumptions. I'm going to go with two: (1) the input stream is valid (wrt. the spec), and (2) the input's lookup sizes are smaller or equal to the output's. Then, the number of lookup entries in the lookup is equal or less to the number of lookup entries in the input, simplifying the algorithm greatly. Splitting streams, however, turns out to be pretty hard. After splitting the stream we'd to re-emit lookup entries, but only those that will be used in the output. This would have to be tracked somehow, and honestly, I'm not sure if it's worth it. At least in the aforementioned use case, when splitting the input stream we have to fully parse it anyway, so any savings here would not be huge. Instead of the stream splitter, we could investigate resetting |
Issue: #225 Maybe it works? TODO: - Lookup tests - Transcoder tests in core - Integration tests (extensive) - Code audit – performance, security - Scaladoc/javadoc
Not exactly sure about this one, I guess it depends on the used lookup eviction policy. TODO: check this. |
Issue: #225 Maybe it works? TODO: - Lookup tests - Transcoder tests in core - Integration tests (extensive) - Code audit – performance, security - Scaladoc/javadoc
I've run into a use case where we want to expose a single, contiguous Jelly stream. The source data for this stream is not kept in memory, but instead it's serialized on disk (currently with a W3C serialization). So, to expose the Jelly stream, we first have to parse the W3C files to (for example) RDF4J
Statement
s and then reserialize this as Jelly.What if we could use Jelly to store the data on disk? Then, when serving the stream we could somehow re-encode the Jelly stream data without deserializing it fully to RDF statements. Instead, we could process only the intermediate Protobuf classes.
Such a transcoder would function much like a Jelly encoder, but would take
RdfStreamRow
s (orRdfStreamFrame
s) as input. Then:The major part of this is of course the lookup table remapping. To speed this up, we could have fixed size arrays to hold the mappings. For example:
int[] prefixMapping
would haven + 1
elements, wheren
is the size of the prefix lookup in the input. ValueprefixMapping[i]
tells us how to map prefix idi
from the input stream to the output stream. If the mapping is not there (value 0), then use the slow path – get or insert the prefix in encoder's lookup.I'm thinking of a few possible use cases for these transcoders, and thus a few different APIs that would need to be exposed:
It should not only be possible to concatenate multiple input streams into one, but also to split a single input stream. I'm not sure if this all should be stuffed into one API, but one possible way to do this would be something like this:
The transcoder would automatically detect when a new input stream is started based on the presence of
RdfStreamOptions
. TheingestFrame
method is just there for the convenience.The text was updated successfully, but these errors were encountered: