-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dynamic Shim Detection for build
Process [databricks]
#11308
Changes from 39 commits
f0cc81b
c23d29c
0ffdc8b
d558425
a3ab365
80c8470
6f4f567
347c50c
c9a15d2
368cafc
6933a7a
95dd876
5245c86
6fc5a40
6d8e776
3a542e8
bfcadcd
9f4eeb3
2e4175a
aa1d862
b9542b9
caaf058
507c70a
d9b4d76
d0796ea
3b716da
b529a49
e7634cb
8cf07b4
a41cf63
e9d2290
edc7d76
ae657ba
a6b8b89
a4112ab
a04150c
375aad3
c058193
8347892
420e88e
33e3315
3af97ca
3e487ed
b90302c
deb96d5
61ff81f
06ee292
99f759a
74bbf02
8131aa6
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
# Copyright (c) 2024, NVIDIA CORPORATION. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
import logging | ||
import os | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Unused? |
||
import xml.etree.ElementTree as ET | ||
import sys | ||
|
||
_log = logging.getLogger("dyn-shim-detection") | ||
# This script is called by maven's antrun plugin. The `project` variable is set by antrun which contains all the | ||
# properties that are set in the pom.xml. For more details checkout the documentation of the `script` task | ||
# https://ant.apache.org/manual/Tasks/script.html | ||
show_version_info = project.getProperty("dyn.shim.trace") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It'd be nice to have a comment explaining where There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Overall looks OK, but I'd like to see this addressed. If you look at this Python script on its own, it's confusing how it runs since there are magic variables that just appear vs. how normal Python programs work. A comment explaining the context in which this is invoked and describing the magic variables it is relying on would be very helpful. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for reviewing. I have added a comment explaining, I hope it helps. |
||
_log.setLevel(logging.DEBUG if show_version_info else logging.INFO) | ||
# Same as shimplify.py | ||
ch = logging.StreamHandler() | ||
ch.setFormatter(logging.Formatter('%(name)s - %(levelname)s - %(message)s')) | ||
_log.addHandler(ch) | ||
spark_rapids_source_basedir = project.getProperty("spark.rapids.source.basedir") | ||
multi_module_project_dir = project.getProperty("maven.multiModuleProjectDirectory") | ||
gerashegalov marked this conversation as resolved.
Show resolved
Hide resolved
|
||
expression = project.getProperty("expression") | ||
|
||
sys.path.append("{}/build/".format(spark_rapids_source_basedir)) | ||
from get_buildvers import _get_expression | ||
value = _get_expression("{}/pom.xml".format(multi_module_project_dir), expression, _log) | ||
project.setProperty(expression, value) |
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. do we need to upload the bytecode file? should be in .gitignore? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, this was a mistake and yes, I will add it to .gitignore. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
# Copyright (c) 2024, NVIDIA CORPORATION. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
import sys | ||
import xml.etree.ElementTree as ET | ||
|
||
def _get_expression(pomFile, expression, logger = None): | ||
pom = ET.parse(pomFile) | ||
ns = {"pom": "http://maven.apache.org/POM/4.0.0"} | ||
releases = [] | ||
release_prefix = "release" | ||
for profile in pom.findall(".//pom:profile/pom:id", ns): | ||
if profile.text.startswith(release_prefix): | ||
releases.append(profile.text[len(release_prefix):]) | ||
snapshots = [] | ||
no_snapshots = [] | ||
|
||
for release in releases: | ||
spark_version = pom.find(".//pom:spark{}.version".format(release), ns) | ||
if (spark_version.text.endswith("SNAPSHOT")): | ||
snapshots.append(release) | ||
else: | ||
no_snapshots.append(release) | ||
excluded_shims = pom.find(".//pom:dyn.shim.excluded.releases", ns).text | ||
if (excluded_shims): | ||
for removed_shim in [x.strip() for x in excluded_shims.split(",")]: | ||
if (removed_shim not in snapshots and removed_shim not in no_snapshots): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: this looks insufficient as snapshots and no_snapshots should not have common items, can we try
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was trying to guard against a fictitious shim that doesn't exist in either snap or no_snaps but is listed in the excluded_shims. Yours look cleaner. I will update |
||
raise Exception("Shim {} listed in dyn.shim.excluded.releases in pom.xml not present in releases".format(removed_shim)) | ||
try: | ||
snapshots.remove(removed_shim) | ||
except ValueError: | ||
pass | ||
try: | ||
no_snapshots.remove(removed_shim) | ||
except ValueError: | ||
pass | ||
|
||
|
||
if "scala2.13" in pomFile: | ||
no_snapshots = list(filter(lambda x: not x.endswith("cdh"), no_snapshots)) | ||
db_release = list(filter(lambda x: x.endswith("db"), no_snapshots)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: new line. try reformat the code for better readability 👍 |
||
no_snapshots = list(filter(lambda x: not x.endswith("db"), no_snapshots)) | ||
snap_and_no_snap = no_snapshots + snapshots | ||
all = snap_and_no_snap + db_release | ||
release_dict={} | ||
release_dict["databricks.buildvers"]=" ".join(db_release) | ||
release_dict["snapshots.buildvers"]=" ".join(snapshots) | ||
release_dict["no_snapshots.buildvers"]=" ".join(no_snapshots) | ||
release_dict["snap_and_no_snap.buildvers"]=" ".join(snap_and_no_snap) | ||
release_dict["all.buildvers"]=" ".join(all) | ||
if (logger): | ||
logger.debug("release_dict: {}".format(release_dict)) | ||
if (expression): | ||
return release_dict[expression] | ||
|
||
|
||
if __name__ == "__main__": | ||
print(_get_expression(sys.argv[2], sys.argv[1])) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: it would be nice to include a clear usage message with a |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -49,28 +49,6 @@ | |
<rapids.source.jar.phase>none</rapids.source.jar.phase> | ||
</properties> | ||
<profiles> | ||
<profile> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @NvTimLiu can you please double-check if there are any other references to these profiles? There might be some outside of this repository, thanks |
||
<id>noSnapshotsWithDatabricks</id> | ||
<properties> | ||
<included_buildvers> | ||
${noSnapshot.buildvers}, | ||
${databricks.buildvers} | ||
</included_buildvers> | ||
</properties> | ||
</profile> | ||
<profile> | ||
<id>noSnapshots</id> | ||
<properties> | ||
<included_buildvers> | ||
<!-- #if scala-2.12 --> | ||
${noSnapshot.buildvers} | ||
<!-- #endif scala-2.12 --> | ||
<!-- #if scala-2.13 --><!-- | ||
${noSnapshotScala213.buildvers} | ||
--><!-- #endif scala-2.13 --> | ||
</included_buildvers> | ||
</properties> | ||
</profile> | ||
<profile> | ||
<id>premergeUT1</id> | ||
<properties> | ||
|
@@ -127,14 +105,6 @@ | |
</included_buildvers> | ||
</properties> | ||
</profile> | ||
<profile> | ||
<id>databricks</id> | ||
<properties> | ||
<included_buildvers> | ||
${databricks.buildvers} | ||
</included_buildvers> | ||
</properties> | ||
</profile> | ||
<profile> | ||
<!-- | ||
https://spark.apache.org/versioning-policy.html | ||
|
@@ -161,58 +131,6 @@ | |
</included_buildvers> | ||
</properties> | ||
</profile> | ||
<profile> | ||
<id>snapshots</id> | ||
<properties> | ||
<included_buildvers> | ||
${noSnapshot.buildvers}, | ||
${snapshot.buildvers} | ||
</included_buildvers> | ||
</properties> | ||
</profile> | ||
<profile> | ||
<id>snapshotOnly</id> | ||
<properties> | ||
<included_buildvers> | ||
${snapshot.buildvers} | ||
</included_buildvers> | ||
</properties> | ||
</profile> | ||
<profile> | ||
<id>snapshotsWithDatabricks</id> | ||
<properties> | ||
<included_buildvers> | ||
${noSnapshot.buildvers}, | ||
${snapshot.buildvers}, | ||
${databricks.buildvers} | ||
</included_buildvers> | ||
</properties> | ||
</profile> | ||
<profile> | ||
<id>noSnapshotsScala213</id> | ||
<properties> | ||
<included_buildvers> | ||
${noSnapshotScala213.buildvers} | ||
</included_buildvers> | ||
</properties> | ||
</profile> | ||
<profile> | ||
<id>snapshotsScala213</id> | ||
<properties> | ||
<included_buildvers> | ||
${noSnapshotScala213.buildvers}, | ||
${snapshotScala213.buildvers} | ||
</included_buildvers> | ||
</properties> | ||
</profile> | ||
<profile> | ||
<id>snapshotScala213Only</id> | ||
<properties> | ||
<included_buildvers> | ||
${snapshotScala213.buildvers} | ||
</included_buildvers> | ||
</properties> | ||
</profile> | ||
<profile> | ||
<id>individual</id> | ||
<activation> | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
previously snapshotXXX means snapshots+non-snapshots, I think currently it should be
snap_and_no_snap
here?