Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add creationTime in mysql column #248

Merged
merged 1 commit into from
Nov 6, 2024

Conversation

jiang95-dev
Copy link
Collaborator

@jiang95-dev jiang95-dev commented Nov 4, 2024

Summary

Add creationTime in mysql column. This column will be used in multiple cases, such as Orion inventory, dual-writer HI data collection, and stats-collection job monitoring.

Changes

  • Client-facing API Changes
  • Internal API Changes
  • Bug Fixes
  • New Features
  • Performance Improvements
  • Code Style
  • Refactoring
  • Documentation
  • Tests

For all the boxes checked, please include additional details of the changes made in this pull request.

Testing Done

  • Manually Tested on local docker setup. Please include commands ran, and their output.
  • Added new tests for the changes made.
  • Updated existing tests to reflect the changes made.
  • No tests added or updated. Please explain why. If unsure, please feel free to ask for help.
  • Some other form of testing like staging or soak time in production. Please explain.

Manually insert a row without creationTime:

mysql> select * from user_table_row;
+-------------+----------+---------+-------------------+--------------+---------------+---------------------+----------------------------+
| database_id | table_id | version | metadata_location | storage_type | creation_time | last_modified_time  | ETL_TS                     |
+-------------+----------+---------+-------------------+--------------+---------------+---------------------+----------------------------+
| d1          | t1       |       0 | /data/v1          | hdfs         |             0 | 2024-11-05 23:42:09 | 2024-11-05 23:42:09.745050 |
+-------------+----------+---------+-------------------+--------------+---------------+---------------------+----------------------------+

Create a table:

scala> spark.sql("create table d1.t2 (col string)")
res0: org.apache.spark.sql.DataFrame = []

scala> spark.sql("insert into d1.t2 values ('a')")
res1: org.apache.spark.sql.DataFrame = []                                       

scala> spark.sql("select * from d1.t2").show
+---+                                                                           
|col|
+---+
|  a|
+---+

scala> spark.sql("show tables in d1").show
+---------+---------+
|namespace|tableName|
+---------+---------+
|       d1|       t1|
+---------+---------+

scala> spark.sql("show databases").show
+---------+
|namespace|
+---------+
|       d1|
+---------+

After creating a table:

mysql> select * from user_table_row;
+-------------+----------+---------+---------------------------------------------------------------------------------------------------------------------+--------------+---------------+---------------------+----------------------------+
| database_id | table_id | version | metadata_location                                                                                                   | storage_type | creation_time | last_modified_time  | ETL_TS                     |
+-------------+----------+---------+---------------------------------------------------------------------------------------------------------------------+--------------+---------------+---------------------+----------------------------+
| d1          | t1       |       0 | /data/v1                                                                                                            | hdfs         |             0 | 2024-11-05 23:42:09 | 2024-11-05 23:42:09.745050 |
| d1          | t2       |       0 | /data/openhouse/d1/t2-43cb50ee-f1f7-4383-a262-4b0bf507d02e/00000-5724f70d-a526-48d1-8620-9d3c513ae3e2.metadata.json | hdfs         | 1730850353924 | 2024-11-05 23:45:56 | 2024-11-05 23:45:56.030361 |
+-------------+----------+---------+---------------------------------------------------------------------------------------------------------------------+--------------+---------------+---------------------+----------------------------+

Additional Information

  • Breaking Changes
  • Deprecations
  • Large PR broken into smaller PRs, and PR plan linked in the description.

For all the boxes checked, include additional details of the changes made in this pull request.

Copy link
Collaborator

@autumnust autumnust left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no major concern - can we keep track of deployment / backfilling steps ? This needs to be handled very carefully

@jiang95-dev
Copy link
Collaborator Author

no major concern - can we keep track of deployment / backfilling steps ? This needs to be handled very carefully

It is captured in the ticket.

@jiang95-dev
Copy link
Collaborator Author

Discussed offline. BigInt has intrinsic advantages over Timestamp where it may not be supported everywhere (such as H2). We choose to rely on Java instead of mysql for this value and let users to interpret it. Inconsistency is not favored but for the sake of time, we will leave the column lastModifiedTime along and look for better schema evolution strategy.

@jiang95-dev jiang95-dev merged commit 8e2cc7d into linkedin:main Nov 6, 2024
1 check passed
cbb330 pushed a commit to cbb330/openhouse that referenced this pull request Nov 13, 2024
cbb330 pushed a commit to cbb330/openhouse that referenced this pull request Nov 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants