Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GPartition/SageMaker] Improvements to gpartition and SageMaker partition. #880

Merged
merged 1 commit into from
Jun 27, 2024

Conversation

thvasilo
Copy link
Contributor

We use this commit as a base for range partitioning.

Issue #, if available:

Fixes #802

Description of changes:

  • Download files in parallel from S3 for SageMaker partition.
  • Copy raw_id_mappings from the input path to the generated DGL graph path for SageMaker and local.
  • Change dtype of random partition assignment pyarrow array, from int64 to uint8 for cases where num_parts < 256, uint16 otherwise.
  • Support getting SageMaker job information for both SageMaker Processing jobs, and SageMaker Training jobs, allowing us to use more instance types.
  • Make gpartition unit test verification re-usable across partition algorithm implementations.

We use this PR as a base for RangePartition.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@thvasilo thvasilo added the ready able to trigger the CI label Jun 17, 2024
@thvasilo thvasilo self-assigned this Jun 17, 2024
…ageMaker.

We use this commit as a base for range partitioning.
Copy link
Collaborator

@jalencato jalencato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small comments

Copy link
Contributor

@classicsong classicsong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@jalencato jalencato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thvasilo thvasilo merged commit f009041 into awslabs:main Jun 27, 2024
11 checks passed
@thvasilo thvasilo deleted the gsf-partition-improvements branch June 27, 2024 18:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0.3.1 ready able to trigger the CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DistPartition should move the raw id mappings to the partition output
3 participants