Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python Dataset Reader #2414

Merged
merged 25 commits into from
Apr 4, 2024

Conversation

fiedorowicz1
Copy link
Contributor

@fiedorowicz1 fiedorowicz1 commented Jan 8, 2024

Implements a new python data reader that is cleaner, more general, and supports DistConv. It also supports parallel pre-fetching of multiple mini-batches.

@fiedorowicz1 fiedorowicz1 marked this pull request as draft January 8, 2024 23:10
@fiedorowicz1 fiedorowicz1 force-pushed the distconv-python-data-reader branch from eee429e to 40c516b Compare January 17, 2024 18:48
@fiedorowicz1 fiedorowicz1 marked this pull request as ready for review January 19, 2024 01:12
Copy link
Contributor

@tbennun tbennun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall looks good, some comments

src/data_ingestion/readers/data_reader_python_v2.cpp Outdated Show resolved Hide resolved
@@ -0,0 +1,137 @@
import os
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use the pytest version?
Would be nice to extend the test infrastructure (maybe at a later PR?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify what you mean here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly. What I meant is that our pytest version of the tests (i.e., @test_util.lbann_test) might have a nicer syntax here. However, it is designed to use the existing Python data reader and may be extended as such to use the dataset reader (for example, with a flag @test_util.lbann_test(dataset=MyObject()) or similar).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we should change the test framework to use the new python dataset reader, but I think that should be a separate PR.

src/data_ingestion/readers/data_reader_python_dataset.cpp Outdated Show resolved Hide resolved
src/data_ingestion/readers/data_reader_python_dataset.cpp Outdated Show resolved Hide resolved
src/data_ingestion/readers/data_reader_python_dataset.cpp Outdated Show resolved Hide resolved
src/data_ingestion/readers/data_reader_python_dataset.cpp Outdated Show resolved Hide resolved
src/data_ingestion/readers/data_reader_python_dataset.cpp Outdated Show resolved Hide resolved
@fiedorowicz1 fiedorowicz1 force-pushed the distconv-python-data-reader branch from 40c516b to 861931c Compare February 27, 2024 06:32
@fiedorowicz1 fiedorowicz1 changed the title Python Data Reader 2.0 Python Dataset Reader Mar 7, 2024
Copy link
Contributor

@tbennun tbennun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

Copy link
Collaborator

@ndryden ndryden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! :)

Copy link
Contributor

@tbennun tbennun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just noticed this

python/lbann/util/data.py Outdated Show resolved Hide resolved
@tbennun tbennun self-requested a review March 21, 2024 00:29
send_rank = recv_rank = send_rank_count = recv_rank_count = 0;
uint64_t send_rank_max_count =
local_distconv_mb_size + (distconv_extra_samples > 0);
uint64_t recv_rank_max_count = local_mb_size + (extra_samples > 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fiedorowicz1 this looks like an Alltoall collective. Do you think using it would be more effective instead of the loop?

@fiedorowicz1 fiedorowicz1 force-pushed the distconv-python-data-reader branch from e0cf5f2 to 75aac4e Compare April 4, 2024 00:51
@fiedorowicz1 fiedorowicz1 merged commit 1db91a2 into LLNL:develop Apr 4, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants