Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inlining vertex properties into a CompositeIndex structure #4692

Merged
merged 4 commits into from
Oct 27, 2024

Conversation

ntisseyre
Copy link
Contributor

@ntisseyre ntisseyre commented Oct 10, 2024

Overview

Inlining vertex properties into a search index structure (CompositeIndex) can offer significant performance and efficiency benefits.

  1. Performance Improvements
    Faster Querying: Inlining vertex properties directly within the index allows the search engine to retrieve all relevant data from the index itself. This means queries don’t need to make additional calls to data stores to fetch full vertex information, significantly reducing lookup time.

  2. Data Locality
    In distributed storages, having inlined properties ensures that more complete data exists within individual partitions or shards. This reduces cross-node network calls and improves the overall query performance by ensuring data is more local to the request being processed.

  3. Cost of Indexing vs. Storage Trade-off
    While inlining properties increases the size of the index (potentially leading to more extensive index storage requirements), it is often a worthwhile trade-off for performance, mainly when query speed is critical. This is a typical pattern in systems optimized for read-heavy workloads.

Usage

In order to take advantage of the inlined properties feature, JanusGraph Transaction should be set to use .propertyPrefetching(false)

Example:

//Build index
 mgmt.buildIndex("composite", Vertex.class)
            .addKey(idKey)
            .addInlinePropertyKey(nameKey)
            .buildCompositeIndex();

//Query
tx = graph.buildTransaction()
            .propertyPrefetching(false) //this is important
            .start();
tx.traversal().V().has("id", 100).next().value("name");

@ntisseyre ntisseyre force-pushed the inline_index_props branch 5 times, most recently from 4bfbbfb to cd15a3f Compare October 12, 2024 00:49
@ntisseyre ntisseyre force-pushed the inline_index_props branch 2 times, most recently from e02cacf to eab2ba3 Compare October 16, 2024 17:19
@ntisseyre ntisseyre changed the title DRAFT: Inlining vertex properties into a CompositeIndex structure Inlining vertex properties into a CompositeIndex structure Oct 16, 2024
Copy link
Member

@porunov porunov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @ntisseyre for this great contribution!
I left some general comments below. However, I have some questions / thoughts.

  1. What happens if the user uses a composite index with inlined properties but requests properties which are not covered by the inline composite index? Will we fetch only vertex id from the composite index or will we fetch those inline properties from the index as well?
  2. It would be really good to add documentation about this new functionality. I think somewhere between Label Constraint and Composite versus Mixed Indexes topics should be a good place for the doc, but I'm good with other places as well: https://github.com/JanusGraph/janusgraph/blob/master/docs/schema/index-management/index-performance.md#composite-versus-mixed-indexes
  3. A benchmark test would be really beneficial to actually see some performance differences between inline indices and normal composite indices.

@ntisseyre
Copy link
Contributor Author

Thank you @ntisseyre for this great contribution! I left some general comments below. However, I have some questions / thoughts.

  1. What happens if the user uses a composite index with inlined properties but requests properties which are not covered by the inline composite index? Will we fetch only vertex id from the composite index or will we fetch those inline properties from the index as well?
  2. It would be really good to add documentation about this new functionality. I think somewhere between Label Constraint and Composite versus Mixed Indexes topics should be a good place for the doc, but I'm good with other places as well: https://github.com/JanusGraph/janusgraph/blob/master/docs/schema/index-management/index-performance.md#composite-versus-mixed-indexes
  3. A benchmark test would be really beneficial to actually see some performance differences between inline indices and normal composite indices.

Thank you, @porunov, for the feedback!

  1. If the user is requesting properties that are not covered by composite index inlined keys, it will fall back to standard logic to fetch all properties for vertex from edgeStore table.
  2. Will do;
  3. Will create a benchmark test

@porunov porunov added this to the Release v1.1.0 milestone Oct 17, 2024
Signed-off-by: ntisseyre <[email protected]>
@porunov
Copy link
Member

porunov commented Oct 17, 2024

Benchmark test executed locally:

Benchmark                                            (isInlined)  (verticesAmount)  Mode  Cnt     Score     Error  Units
CQLCompositeIndexInlinePropBenchmark.searchVertices         true              5000  avgt    5    80.373 ±  14.164  ms/op
CQLCompositeIndexInlinePropBenchmark.searchVertices        false              5000  avgt    5  1778.890 ± 296.721  ms/op

Results: Inline indices properties fetching is ~22 times faster for this test executed locally compared to normal properties fetching logic.

P.S. I didn't expect such a huge difference to be fair, but looks amazing 👀

@porunov
Copy link
Member

porunov commented Oct 17, 2024

Benchmark test: LGTM.
Added testIndexInlinePropertiesReindex test seems to fail for some index backends.

@ntisseyre
Copy link
Contributor Author

Benchmark test: LGTM. Added testIndexInlinePropertiesReindex test seems to fail for some index backends.

When I run this test locally, it also returns unstable results: sometimes it succeeds, and sometimes it doesn't see values being inlined in the index structure. I suspect that the reIndexing/data propagation is not finished or that there is some cache, but I couldn't find where.

Signed-off-by: ntisseyre <[email protected]>
Copy link
Member

@porunov porunov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you for this amazing feature @ntisseyre !

@ntisseyre
Copy link
Contributor Author

LGTM. Thank you for this amazing feature @ntisseyre !

@porunov, I have updated the documentation in one of the commits. Please take a look at the Important Notes on Compatibility section and let me know what you think and how such changes should generally be approached during the release cycle.

@porunov
Copy link
Member

porunov commented Oct 18, 2024

LGTM. Thank you for this amazing feature @ntisseyre !

@porunov, I have updated the documentation in one of the commits. Please take a look at the Important Notes on Compatibility section and let me know what you think and how such changes should generally be approached during the release cycle.

We actually describe all upgrade instructions and breaking changes in docs/changelog.md in the release the feature lands into. As your contributed feature will land into 1.1.0 version, could you please add something similar to the following after Assets under release 1.1.0 (i.e. right before release 1.0.1 starts)?:

#### Upgrade Instructions

##### Inlining vertex properties into a Composite Index

Inlining vertex properties into a Composite Index structure can offer significant performance and efficiency benefits. 
See [documentation](./schema/index-management/index-performance.md#inlining-vertex-properties-into-a-composite-index) on how to inline vertex properties into a composite index.

**Important Notes on Compatibility**

1. **Backward Incompatibility**
Once a JanusGraph instance adopts this new schema feature, it cannot be rolled back to a prior version of JanusGraph. 
The changes in the schema structure are not compatible with earlier versions of the system.

2. **Migration Considerations**
It is critical that users carefully plan their migration to this new version, as there is no automated or manual rollback process 
to revert to an older version of JanusGraph once this feature is used.

In that case you can remove Important Notes on Compatibility from the schema/index-management/index-performance.md because usually it's irrelevant to the new users and previous users should always look into upgrade instructions.

Signed-off-by: ntisseyre <[email protected]>
@porunov
Copy link
Member

porunov commented Oct 27, 2024

@ntisseyre do you want to merge it or not ready yet? Just asking because I wanted to rebase #4695 after you merge this PR to add inline index functionality to the JSON schema importer.

@ntisseyre
Copy link
Contributor Author

@ntisseyre do you want to merge it or not ready yet? Just asking because I wanted to rebase #4695 after you merge this PR to add inline index functionality to the JSON schema importer.

I was waiting to see if anyone had any more comments, but I'm good to merge it.

@ntisseyre ntisseyre merged commit 213b754 into JanusGraph:master Oct 27, 2024
174 checks passed
@li-boxuan
Copy link
Member

Sorry I just saw this. This is awesome!!!! I didn't take a look at implementation detail, but we are essentially implementing covering index, right?

@ntisseyre
Copy link
Contributor Author

Sorry I just saw this. This is awesome!!!! I didn't take a look at implementation detail, but we are essentially implementing covering index, right?

That's right, it is like MySQL covering index

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants