-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PR1] refactoring :apps build and adding spark3.5/iceberg1.5 artifact #255
base: main
Are you sure you want to change the base?
Conversation
7dcde77
to
3e7e046
Compare
Is this ready for review? |
@autumnust waiting on #221 to merge so i can rebase. this PR is forked from that. if you want you can compare the commit delta here: https://github.com/linkedin/openhouse/pull/255/files/770a6d0497b9af7b1bedb876cbf639b71b3a9822..94c5100d5aa780a373ee5d0e14394ecdcec2f561 |
81e11a0
to
ab39faa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the new module! One major request is to use iceberg-convention-1.2.gradle
and iceberg-convention-1.5.2.gradle
to avoid duplicate code.
@@ -25,6 +25,7 @@ pluginManagement { | |||
rootProject.name = 'openhouse' | |||
|
|||
include ':apps:spark' | |||
include ':apps:spark-3.5' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also rename this module?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when i attempt to rename :apps:spark-3.5
, to :apps:openhouse-spark-apps-1.5_2.12
A problem occurred evaluating project ':apps:openhouse-spark-apps-1.5_2.12'.
> Could not get unknown property 'sourceSets' for project ':apps:openhouse-spark-apps_2.12' of type org.gradle.api.Project.
which is... unexpected.I attempted a few different alternatives without success
@@ -1,7 +1,6 @@ | |||
plugins { | |||
id 'openhouse.java-conventions' | |||
id 'openhouse.hadoop-conventions' | |||
id 'openhouse.iceberg-conventions-1.2' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why removing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this convention defines strictly 1.2. that strictly causes issue when used as a dependency in :apps:spark-3.5 which needs iceberg 1.5
we can also do:
- replace this line with 1.5 conventions, since this can be pinned down by consumer :apps:spark-3.1. this is the same approach you used with services modules to 1.5 and overriding with strictly 1.2
- exclude the transitive dependency from :apps:spark-3.5, but I would prefer not to do that and use the version resolver instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
compiling this module with spark 3.5/iceberg1.5 breaks :apps:spark3.1
but compiling this module with spark3.1/iceberg1.2 works in :apps:spark3.5 (although unsafe, but this is ok since it won't be productionized until next year)
so leaving this :libs:datalayout as a nonstrict 1.2 is preferred , which requires removing iceberg-conventions-1.2
@@ -0,0 +1,123 @@ | |||
plugins { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit. The name apps-spark-conventions
doesn't sound right. Should apps-spark-commons
be better?
@@ -1,38 +1,49 @@ | |||
plugins { | |||
id 'openhouse.java-conventions' | |||
id 'openhouse.hadoop-conventions' | |||
id 'openhouse.iceberg-conventions-1.2' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line should not be removed.
apps/spark-3.5/build.gradle
Outdated
implementation("org.apache.iceberg:iceberg-bundled-guava:${icebergVersion}") | ||
implementation("org.apache.iceberg:iceberg-data:${icebergVersion}") | ||
implementation("org.apache.iceberg:iceberg-core:${icebergVersion}") | ||
implementation("org.apache.iceberg:iceberg-common:${icebergVersion}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add iceberg-conventions-1.5.2.gradle
to remove them.
sparkVersionSuffix = "3.1" | ||
openhouseSparkRuntimeModule = ":integrations:spark:spark-${sparkVersionSuffix}:openhouse-spark-runtime_2.12" | ||
tablesTestFixturesModule = ":tables-test-fixtures:tables-test-fixtures_2.12" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are changes in this file relevant to this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, in this PR i must unpin 1.2 from the convention plugin removed above. i do this by removing convention plugin, and defining explicitly here and without any strictly version constraints.
this is so that consumers of this library (:apps) can override with their required version such as iceberg 1.5.2
force 'com.fasterxml.jackson:jackson-bom:2.13.4' | ||
force 'com.fasterxml.jackson.core:jackson-databind:2.13.4' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are these changes relevant ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this may actually be dangerous to remove, and I will fix it, since idk what is currently relying on this.
but this strict version constraint breaks :apps:spark-3.5 because that also needs jackson 2.15
apps/spark/build.gradle
Outdated
version { | ||
strictly("${icebergVersion}") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are these really helpful ? In practice I never find force
or strictly
covering everything that I want, since there could always be fatJar in the dependency Tree and these keywords don't work for them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strictly here overrides any dependencies of compaction that are defining a higher version of 1.5.2, such as
services
cluster:storage
internalcatalog
htscatalog
without strictly, these dependencies will bump up iceberg in compactino unintentionally.
an alternative to forcing the version to be LOWER than what is defined in transitive dependencies, is:
- excluding all iceberg stuff from all of the dependencies above. but i think this is less readable and from what I understand, utilizing the resolution rules is preferred to exclusions
- using the 1.2 conventions plugin which ALSO defines strictly. but i prefer to be explicit
@@ -55,7 +55,9 @@ dependencies { | |||
fatJarPackagedDependencies(project(path: ':integrations:spark:spark-3.1:openhouse-spark-runtime_2.12', configuration: 'shadow')) { | |||
transitive = false | |||
} | |||
implementation(project(path: ':integrations:spark:spark-3.1:openhouse-spark-runtime_2.12', configuration: 'shadow')) | |||
implementation(project(path: ':integrations:spark:spark-3.1:openhouse-spark-runtime_2.12', configuration: 'shadow')) { | |||
exclude group: "org.apache.iceberg", module: "iceberg-spark-runtime-3.1_2.12" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought this one is shaded already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, 3.5 was being used regardless. but i noticed both 3.1 and 3.5 were available on classpath before this exclusion, i don't think this was intentional, so i'm cleaning this up here.
Summary
This is Part 1 of 2 for enabling compaction based MoR.
problem: enabling Merge-on-Read for customer tables requires the ability to compact delete files. Compacting delete files is supported in spark3.1/iceberg1.2 but requires us to test it. We cannot test it, because creating delete files is not supported in spark3.1 (only compacting existing ones).
solution: we add spark3.5/iceberg1.5 compaction application, which extends the existing compaction test suite. in a follow up PR, we will add tests which create delete files and compacts the delete files under the latest versions.
Changes
Testing Done
all tests running on spark3.1 and spark3.5:
correct spark version resolutions for test runtime:
correct spark version resolutions for runtime:
Additional Information
For all the boxes checked, include additional details of the changes made in this pull request.