You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
My data source is a list of json files. I have to concatenate them into one file before using the JsonDataSource consrtructor
Describe the solution you'd like
Allow the JsonDataSource constructor to accept the path to a folder
Describe alternatives you've considered
Doing the aggregation of all files into one big file myself, before feeding it to JsonDataSource
Additional context
It is not uncommen to receive data sources as a collection of files, instead of one big file. A convenience method to handle this common case would be appreciated.
The text was updated successfully, but these errors were encountered:
In Tribuo 4.0.X you can use AggregateDataSource to aggregate across files if you're happy with it round-robining the iterators and that the provenance won't be able to produce a configuration.
In the short term (i.e. potentially before 4.1) we can add an enum to AggregateDataSourcewhich lets the iterators work sequentially (so it will preserve the example ordering according to the order you specify the files in) looks like it's already sequential, we'll update the docs to make that clear, and also add a version which is configurable and operates on ConfigurableDataSource which will allow the provenance to convert into a configuration for re-running the experiment. Both of these changes are straightforward and easy to make compatible with existing config files & provenance objects.
Longer term (i.e. after 4.1) I agree that extending those data sources to operate on folders would be a good change. It's a longer term thing because we'll need to think through the implications for existing provenance & configuration files and try to evolve those DataSources in a compatible way. Issue #70 involves extending the loading in a few ways, some of which could be integrated into such a change (e.g. supporting compressed files), so we'd prefer to do any more substantial refactor once and cover more of the different extensions at the same time.
The PR is out for the short term work which extends AggregateDataSource and adds AggregateConfigurableDataSource. Once it's merged you'll be able to put a collection of JsonDataSources into a config file, and have a single aggregate source which collects all of them up. The individual JsonDataSources can share the RowProcessor provided the json files all have the same fields.
As I mentioned above we'll look into a bigger refactoring of the data sources to allow them to iterate multiple files, and to operate on compressed files in a future release.
Is your feature request related to a problem? Please describe.
My data source is a list of json files. I have to concatenate them into one file before using the JsonDataSource consrtructor
Describe the solution you'd like
Allow the JsonDataSource constructor to accept the path to a folder
Describe alternatives you've considered
Doing the aggregation of all files into one big file myself, before feeding it to JsonDataSource
Additional context
It is not uncommen to receive data sources as a collection of files, instead of one big file. A convenience method to handle this common case would be appreciated.
The text was updated successfully, but these errors were encountered: