Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add integration for openhouse-java-iceberg-1.5-runtime and openhouse-spark-3.5-runtime #221

Merged
merged 9 commits into from
Nov 19, 2024

Conversation

jiang95-dev
Copy link
Collaborator

@jiang95-dev jiang95-dev commented Oct 3, 2024

Summary

Adding 3 integration artifacts: openhouse-java-iceberg-1.5-runtime, openhouse-spark-3.5-runtime, and tables-test-fixtures-iceberg-1.5. They are leveraging the iceberg-core:1.5.2 and iceberg-spark-runtime-3.5_2.12, and can only be compiled under Java 9+. This client side upgrade is the step after the server side upgrade, and apps upgrade will be the next step.

Changes

  • Client-facing API Changes
  • Internal API Changes
  • Bug Fixes
  • New Features
  • Performance Improvements
  • Code Style
  • Refactoring
  • Documentation
  • Tests

There are no duplicated files after the clean up. Files changed are:

  • build.gradle: Spark 3.5 needs jackson library to be at least 2.15.
  • java-runtime/build.gradle
  • java-itest/build.gradle
  • spark-runtime/build.gradle
  • spark-runtime: Changed the interface and implemented new functions for logical plans and execution plans to pass the build.
  • spark-itest/build.gradle: Added jvmArgs for Java 17.
  • spark-itest: Half of the tests need to be refactored since Spark 3.5 brought in some changes.
  • tables-test-fixtures: Client can depend on this module after spark 3.5 is launched.
  • spark-3.5-base-hadoop3.2.dockerfile: Build on spark 3.5 and the new spark runtime module.

Testing Done

  • Manually Tested on local docker setup. Please include commands ran, and their output.
  • Added new tests for the changes made.
  • Updated existing tests to reflect the changes made.
  • No tests added or updated. Please explain why. If unsure, please feel free to ask for help.
  • Some other form of testing like staging or soak time in production. Please explain.

All unit tests and integration tests passed.

Local docker tests:

  1. Added spark-3.5-base-hadoop3.2.dockerfile and made local change of spark-service.yml for spark tests.
  2. Launched spark shell with the command:
bin/spark-shell --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.5.2  \
  --jars openhouse-spark-runtime_2.12-*-all.jar  \
  --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,com.linkedin.openhouse.spark.extensions.OpenhouseSparkSessionExtensions   \
  --conf spark.sql.catalog.openhouse=org.apache.iceberg.spark.SparkCatalog   \
  --conf spark.sql.catalog.openhouse.catalog-impl=com.linkedin.openhouse.spark.OpenHouseCatalog   \
  --conf spark.sql.catalog.openhouse.uri=http://openhouse-tables:8080   \
  --conf spark.sql.catalog.openhouse.cluster=LocalHadoopCluster \
  --conf spark.sql.catalogImplementation=in-memory \
  --conf spark.sql.catalog.openhouse.auth-token=$(cat /var/config/openhouse.token)
  1. Tested CRUD and compaction on MOR tables which is only allowed operations in spark 3.5 and they worked.

Additional Information

  • Breaking Changes
  • Deprecations
  • Large PR broken into smaller PRs, and PR plan linked in the description.

For all the boxes checked, include additional details of the changes made in this pull request.

@jiang95-dev jiang95-dev changed the title [Draft] Add integration for openhouse-java-iceberg-1.5-runtime and openhouse-spark-3.5-runtime Add integration for openhouse-java-iceberg-1.5-runtime and openhouse-spark-3.5-runtime Oct 11, 2024
@jiang95-dev jiang95-dev marked this pull request as ready for review October 11, 2024 07:16
Copy link
Collaborator

@HotSushi HotSushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Levi for the PR. Most of this code seem to be duplicated, i've mentioned some tricks to reuse existing code, can we give that a try.
If there were any code changes, could you mark them with a github comment so that its easier to review?

build.gradle Outdated Show resolved Hide resolved
@cbb330
Copy link
Collaborator

cbb330 commented Nov 12, 2024

I'm working in parallel on implementing this PR's spark3.5 integration via the apps distribution.

wondering from the working group here, should I buidl the directory structure and hooks to "old" library in a similar way as this PR? since apps is also structured with spark3.1, and requires an additional spark 3.5 directory.

@jiang95-dev
Copy link
Collaborator Author

I'm working in parallel on implementing this PR's spark3.5 integration via the apps distribution.

wondering from the working group here, should I buidl the directory structure and hooks to "old" library in a similar way as this PR? since apps is also structured with spark3.1, and requires an additional spark 3.5 directory.

Spark 3.5 gtp image is not ready yet. That's why I didn't upgrade apps. And planning-wise, apps should depend on the spark-3.5-runtime which I'm currently adding, so please hold on a bit.

cbb330
cbb330 previously approved these changes Nov 15, 2024
@cbb330 cbb330 self-requested a review November 15, 2024 22:27
sumedhsakdeo
sumedhsakdeo previously approved these changes Nov 18, 2024
Copy link
Collaborator

@sumedhsakdeo sumedhsakdeo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nits, looks great @jiang95-dev

.github/workflows/build-tag-publish.yml Outdated Show resolved Hide resolved
autumnust
autumnust previously approved these changes Nov 18, 2024
Copy link
Collaborator

@autumnust autumnust left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm for the current state.
Just wondering if there are future cleanup that's obviously required ?

.github/workflows/build-tag-publish.yml Outdated Show resolved Hide resolved
tables-test-fixtures-iceberg-1.5/build.gradle Outdated Show resolved Hide resolved
Copy link
Member

@abhisheknath2011 abhisheknath2011 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @jiang95-dev for the changes!

Copy link
Collaborator

@autumnust autumnust left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you help setup the capability to dismiss a review request? that's by default enabled internally. @sumedhsakdeo

settings.gradle Show resolved Hide resolved
@sumedhsakdeo sumedhsakdeo merged commit b7a38eb into linkedin:main Nov 19, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants