Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update the main branch for 2412 release #480

Merged
merged 23 commits into from
Dec 24, 2024

Conversation

nvliyuan
Copy link
Collaborator

update the main branch for v2412 release.
Please create a merge commit, not squash.

nvauto and others added 23 commits September 24, 2024 07:36
[auto-merge] branch-24.10 to branch-24.12 [skip ci] [bot]
[auto-merge] branch-24.10 to branch-24.12 [skip ci] [bot]
[auto-merge] branch-24.10 to branch-24.12 [skip ci] [bot]
[auto-merge] branch-24.10 to branch-24.12 [skip ci] [bot]
[auto-merge] branch-24.10 to branch-24.12 [skip ci] [bot]
[auto-merge] branch-24.10 to branch-24.12 [skip ci] [bot]
1, Add the variables 'SPARK_MASTER_URL' and 'DATA_ROOT' to support automated testing from CI/CD jobs.

2, 'output_prefix' is NOT referenced in below lines

Signed-off-by: timl <[email protected]>
[auto-merge] branch-24.10 to branch-24.12 [skip ci] [bot]
* Add a TPC-DS SF 10 Notebook for locall Jupyter

or Google Colab

Signed-off-by: Gera Shegalov <[email protected]>

* Update link to the current blob

Signed-off-by: Gera Shegalov <[email protected]>

---------

Signed-off-by: Gera Shegalov <[email protected]>
* Update "Open in Colab" link

* Update README.md

Signed-off-by: Gera Shegalov <[email protected]>

---------

Signed-off-by: Gera Shegalov <[email protected]>
* Add Tools Notebooks for EMR

* Update README

* Sign-off commit

Signed-off-by: Partho Sarthi <[email protected]>

---------

Signed-off-by: Partho Sarthi <[email protected]>
* add notebook

Signed-off-by: YanxuanLiu <[email protected]>

* change default kernel

Signed-off-by: YanxuanLiu <[email protected]>

* add test

Signed-off-by: YanxuanLiu <[email protected]>

* print

Signed-off-by: YanxuanLiu <[email protected]>

* change spark master

Signed-off-by: YanxuanLiu <[email protected]>

* change path of data path

Signed-off-by: YanxuanLiu <[email protected]>

* remove old notebook

Signed-off-by: YanxuanLiu <[email protected]>

* optimize default value

Signed-off-by: YanxuanLiu <[email protected]>

* clear all output

Signed-off-by: YanxuanLiu <[email protected]>

* remove id

Signed-off-by: YanxuanLiu <[email protected]>

---------

Signed-off-by: YanxuanLiu <[email protected]>
* change data parh

Signed-off-by: YanxuanLiu <[email protected]>

* change data path

Signed-off-by: YanxuanLiu <[email protected]>

* save result to file

Signed-off-by: YanxuanLiu <[email protected]>

* change format of output

Signed-off-by: YanxuanLiu <[email protected]>

---------

Signed-off-by: YanxuanLiu <[email protected]>
* Support running optuna on Spark

* play around with optuna + xgboost + joblibspark

* update

* update

* update

* Optuna: Add how optuna + xgboost + spark works

* deploy optuna examples on databricks

* Cleanup and prepare for open source

* Update README.md

* Update run-optuna-spark-xgboost.sh

* Update README.md with chmod for run scripts

* update/add copyright

* Replace RDD example with Dataframe example, cleanups to README and repo structure

* Move files to separate dir for merge to spark-rapids-examples

* separate dir

* Merge branch-24.10

* Remove gitignore

Signed-off-by: Rishi Chandra <[email protected]>

* remove username

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Fix run-sparkrapids-xgboost.sh implementation path

* Fix corrupt mysql installation in init_optuna.sh

* Fix corrupt mysql installation in init_optuna_xgboost.sh

* Update run-joblibspark-xgboost.sh

* Update run-sparkrapids-xgboost.sh

* Update run-joblibspark-simple.sh

* Update run-joblibspark-xgboost.sh

* Update run-sparkrapids-xgboost.sh

* Update to 24.10, include apt updates

* use gpu_hist for older xgb versions

* use gpu_hist for older xgb versions

* Update sparkrapids-xgboost-read-per-worker.py

* Update init script with mysql installation fixes

* Address comments

* Add cluster startup script

* Move around files, add notebooks

* Repo renovations

* Update README, remove run scripts, fix PCA to use 24.10.1

* minor updates

* Final cleanups, runs passed on databricks

* README updates

* Updates to comments, cleanup outputs

* Address comments, minor reordering, update README

* comment fix

* remove unnecessary imports

* tuning max bins and n_estimators bug fixes

* 'max_bin' != 'max_bins' 🤦

* ensure QDM and XGB bins are the same

* typos

* cleanup

* Address comments

* Note about sampler serialization

* Add link

* Add link

* Undo benchmark commit

* typo

---------

Signed-off-by: Rishi Chandra <[email protected]>
Co-authored-by: Bobby Wang (SW-TEGRA) <[email protected]>
Co-authored-by: Bobby Wang <[email protected]>
Co-authored-by: Erik Ordentlich <[email protected]>
@nvliyuan nvliyuan merged commit e863522 into NVIDIA:main Dec 24, 2024
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants