Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Year-Based Partitioning for Notification Table #95

Open
3 of 11 tasks
MackHalliday opened this issue Dec 10, 2024 · 6 comments
Open
3 of 11 tasks

Implement Year-Based Partitioning for Notification Table #95

MackHalliday opened this issue Dec 10, 2024 · 6 comments
Assignees
Labels
Dev Reviewed Reviewed by Tech Lead Notify Board trigger PM Reviewed Reviewed by Product Manager QA Reviewed Reviewed by Quality Assurance QA Issue requires QA collaboration

Comments

@MackHalliday
Copy link
Contributor

MackHalliday commented Dec 10, 2024

User Story - Business Need

We’re partitioning the Notification table by year to improve performance and scalability as the table grows. By splitting the data into smaller, year-based partitions using the created_at column, we can make queries targeting specific time ranges much faster and more efficient. It also makes managing and archiving data simpler in the long run. This change is a proactive step to ensure the system stays performant as the database continues to scale.

This ticket assumes that the ENP application has access to a test database and supports running migrations.

  • Ticket is understood, and QA has been contacted (if the ticket has a QA label).

User Story(ies)

As a VANotify developer maintaining the ENP PostgresSQL database
I want to partition the Notification table by year
So that we can improve query performance, scalability, and manageability of the data as it grows over time.

Additional Info and Resources

  • Spike documentation on Partitioning
  • Partitioning will use PostgreSQL declarative partitioning with RANGE on the created_at column.
  • The parent table will include PARTITION BY RANGE (created_at).
  • Child tables will automatically be created for each year starting with notifications_2024 and notifications_2025.
  • Newly created partitioned tables can be automatically linked to base Notification table using event listener
  • Queries for notifications should include a created_at condition to leverage partitioning.
  • Example SQL snippet for creating partitions:
    CREATE TABLE IF NOT EXISTS notifications_2024 PARTITION OF notifications
    FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');
    
    CREATE TABLE IF NOT EXISTS notifications_2025 PARTITION OF notifications
    FOR VALUES FROM ('2025-01-01') TO ('2026-01-01');

Acceptance Criteria

  • The Notification table is partitioned by year using the created_at column.
  • An event listener to create new year-based partitioned tables as needed
  • Existing records (if any) are handled and appropriately moved to the correct partition (or this is deemed unnecessary).
  • New records are automatically routed to the correct yearly partition based on the created_at value.
  • Queries on the Notification table using a created_at condition only scan relevant partitions.
  • Queries can handle date ranges across multiple years

Note on Removed AC

This ticket was created assuming we would have migration working. We decided to implement this before the migrations. Therefore - the ACs related to migrations were removed.

  • Migration scripts are created and successfully tested to implement partitioning without downtime.
  • Downgrade migration can be successfully executed to revert the partitioning changes without data loss or corruption.

Note on Removed AC

We do not have a database setup for unit tests - therefore, this cannot be tested via unit tests. I talked to @k-macmillan and we determined this will be tested thru the test endpoints for the database instead.

  • Unit tests are implemented and demonstrate that queries correctly retrieve data from the partitioned table structure.
  • Unit tests demonstrate queries can handle date ranges across multiple years

QA Considerations

  • Verify that inserts for different created_at values go to the correct partition. POST /v2/notifications/push
  • Test queries with and without created_at filters to confirm correct behavior. (If not deployed, confirmed write tests OK or any existing endpoints using filtering OK. At the time of writing this ticket, there appear to be no endpoints that filter Notifications)
  • Confirm that the application functionality using the Notification table remains unchanged post-migration. (Regression tests OK)

Potential Dependencies

@MackHalliday MackHalliday added Notify Board trigger QA Issue requires QA collaboration labels Dec 10, 2024
@MackHalliday MackHalliday changed the title Implement Year-Based Partitioning for Notification Table in ENP Implement Year-Based Partitioning for Notification Table Dec 10, 2024
@MackHalliday MackHalliday changed the title Implement Year-Based Partitioning for Notification Table DRAFT Implement Year-Based Partitioning for Notification Table Dec 10, 2024
@MackHalliday MackHalliday changed the title DRAFT Implement Year-Based Partitioning for Notification Table Implement Year-Based Partitioning for Notification Table Dec 10, 2024
@MackHalliday MackHalliday changed the title Implement Year-Based Partitioning for Notification Table Implement Year-Based Partitioning for Notification Table Dec 10, 2024
@cris-oddball
Copy link
Contributor

question for ya @MackHalliday

CREATE TABLE IF NOT EXISTS notifications_2024 PARTITION OF notifications
FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');

CREATE TABLE IF NOT EXISTS notifications_2025 PARTITION OF notifications
FOR VALUES FROM ('2025-01-01') TO ('2026-01-01');

Should it go from '2024-01-01' to '2024-01-31' instead of '2025-01-01'?

@MackHalliday
Copy link
Contributor Author

MackHalliday commented Dec 12, 2024

As I understood the documentation - those queries read like this

CREATE TABLE partition_name PARTITION OF parent_table
FOR VALUES FROM (start_value) TO (end_value);

Where TO is an exclusive value - meaning the range includes values up until that specific value.

Docs

Range Partitioning 
The table is partitioned into “ranges” defined by a key column or set of columns, with no overlap between the ranges of values assigned to different partitions. For example, one might partition by date ranges, or by ranges of identifiers for particular business objects. Each range's bounds are understood as being inclusive at the lower end and exclusive at the upper end. For example, if one partition's range is from 1 to 10, and the next one's range is from 10 to 20, then value 10 belongs to the second partition not the first.

@cris-oddball
Copy link
Contributor

Great! thank you @MackHalliday

@cris-oddball cris-oddball added the QA Reviewed Reviewed by Quality Assurance label Dec 12, 2024
@kbelikova-oddball kbelikova-oddball added the PM Reviewed Reviewed by Product Manager label Dec 17, 2024
@k-macmillan k-macmillan added the Dev Reviewed Reviewed by Tech Lead label Dec 19, 2024
@MackHalliday
Copy link
Contributor Author

Today

  • Attempt to implement automatic year-base partitioning for Notification table. No tables are being created, but the logs show no specific errors as to why
  • Adding logging to hopefully catch error.

Tomorrow

  • Continue to debug error and move on to updating queries if error resolved.

@MackHalliday
Copy link
Contributor Author

MackHalliday commented Dec 30, 2024

  • Got Partitioning working and able to create tables by year.

TODO

  • Update queries
  • Create method to handle past years as well as current
  • Finish method to handle 'future' years

@MackHalliday
Copy link
Contributor Author

MackHalliday commented Jan 2, 2025

Tuesday

  • Talked to @k-macmillan about testing via unit tests. I tried to setup a quick database for unit tests - issues with garbage collecting. Decided to test only using the database test endpoints instead (GET / POST db/test/ endpoints).
  • Added queries and update return of db/test/ endpoints.
  • Currently only working with raw SQL. Working on getting queries to automatically work with SQLAlchemy.

Today

  • Working on getting queries to automatically work with SQLAlchemy.
  • Confirm logic for created future Notification tables. (This logic is already implemented. Need to confirm OK).
  • Submit PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dev Reviewed Reviewed by Tech Lead Notify Board trigger PM Reviewed Reviewed by Product Manager QA Reviewed Reviewed by Quality Assurance QA Issue requires QA collaboration
Projects
None yet
Development

No branches or pull requests

4 participants