-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a GeneratorStep
from a dataset using a helper function
#812
Conversation
Documentation for this PR has been built. You can view it at: https://distilabel.argilla.io/pr-812/ |
CodSpeed Performance ReportMerging #812 will not alter performanceComparing Summary
|
GeneratorStep
from a dataset using a helper function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Just some comments regarding the quickstart on the docs, docstrings and some suggestion for the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we could leave here the new example that you've added @plaguss as it's the quicker way to get started. WDYT @dvsrepo @davidberenstein1957 ?
Co-authored-by: Gabriel Martín Blázquez <[email protected]>
Co-authored-by: Gabriel Martín Blázquez <[email protected]>
Co-authored-by: Gabriel Martín Blázquez <[email protected]>
Co-authored-by: Gabriel Martín Blázquez <[email protected]>
Co-authored-by: Gabriel Martín Blázquez <[email protected]>
Co-authored-by: Gabriel Martín Blázquez <[email protected]>
Co-authored-by: Gabriel Martín Blázquez <[email protected]>
710337c
to
64e4ff2
Compare
Description
This PR simplifies the process to create a generator step from a dataset:
1. Helper function to create the
GeneratorStep
from the dataset already processed:From the example by @dvsrepo, we can update the code like so:
can be integrated in a pipeline as:
New entry in the docs:
2. Pass the dataset via
pipeline.run(dataset=....)
Internally we will create the step (has less flexibility, but it's more direct, and easier if you don't need the flexibility):
Example in the docs with the new functionality:
A new example that will appear in the quick start with the new simplifications (the original pipeline can be compared here):
The pipeline from this blogpost you did @dvsrepo could now do with a less lines: