Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Index hints in schema #2181

Open
gxslash opened this issue Dec 26, 2024 · 3 comments
Open

Support for Index hints in schema #2181

gxslash opened this issue Dec 26, 2024 · 3 comments
Assignees

Comments

@gxslash
Copy link

gxslash commented Dec 26, 2024

Feature description

I would like to have the ability to provide index hints during the staging-optimized replace strategy when using the PostgreSQL destination in dlt. This feature would allow the creation of indexes on staging tables, which can then be preserved and moved to the actual schema after the replace operation. This enhancement would streamline workflows and ensure that indexes are maintained as part of the load process without manual intervention.

Are you a dlt user?

Yes, I run dlt in production.

Use case

This feature addresses the problem of losing indexes when using the staging-optimized replace strategy. Currently, any indexes created on tables are dropped and must be manually recreated after the replace operation. This adds extra steps and affects performance, as indexes cannot be utilized during the staging phase.

For example, in my use case, I need to create various types of indexes (e.g., multi-column, hash, and btree). These indexes are essential for optimizing query performance but must currently be managed separately, which is both inefficient and error-prone.

Proposed solution

Introduce an "index hint" feature where users can specify the columns and types of indexes they want in the table schema. During the staging-optimized replace process, dlt would create these indexes on the staging tables and seamlessly transfer them to the actual schema once the replace operation is complete. This solution would ensure that indexes are preserved automatically without manual intervention.

Related issues

No specific issue currently logged.

@sh-rp sh-rp changed the title Support for Index Hints in Staging-Optimized Replace Strategy Support for Index hints in schema Jan 2, 2025
@sh-rp
Copy link
Collaborator

sh-rp commented Jan 2, 2025

@gxslash I slightly modified your title. The request here is, to be able to mark columns as index in the dlt schema which should then translate to ddl commands to create indexes in the destination database in case it supports these hints. We need to discuss if multi-column hints should also be supported.

@rudolfix
Copy link
Collaborator

@gxslash I assume that UNIQUE indexes are not enough? @sh-rp we recently added multi column indexes to Snowflake connector. maybe part of the code can be taken from there.

@gxslash
Copy link
Author

gxslash commented Jan 16, 2025

I actually would like to increase the read performance of my database. As a result, I needed to define indexes such like Btree, Hash, Gin etc. I already define them as a single-run job within dlt or in another environment. But the reason I ask this feature is that staging-optimized replace strategy works almost 2.5 times faster for my case but drops indexes etc. I can either

  1. create indexes in actual tables after tables' schema renamed from staging to the actual one
  2. or use insert-from-staging strategy

I started using insert-from-staging because I do not want to affect tables after they are published.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

No branches or pull requests

3 participants