[FEA]: Unify the FileSourceStage
, MultiFileSource
and DirectoryWatcher
functionality
#976
Labels
feature request
New feature or request
Is this a new feature, an improvement, or a change to existing functionality?
New Feature
How would you describe the priority of this feature request
Medium
Please provide a clear description of problem this feature solves
Currently, there are 2 stages and a utility class which can read files and push them into the pipeline:
MultiFileSource
,FileSourceStage
, andDirectoryWatcher
. All 3 are very similar but have slightly different features. Having very similar, but slightly different functionality can be confusing and makes it difficult to use functionality in 2 stages at the same time (i.e.DirectoryWatcher
with multiple search patterns)Describe your ideal solution
This should combine the features of all 3 into a single stage to make it easier for users. Instead of needing to decide which stage to use based on the features a user wants, there will be 1 stage with the capability of all 3 and options to configure the functionality.
For example, the
FileSourceStage
should be able to support the following:FileSource(watch=True, files=["my_directory/*.json"])
FileSource(watch=True, files=["s3://my_bucket/my_directory/*.json"])
FileSource(files=["local_directory1/*.json", "local_directory2/*.json"])
The end goal is a single stage which has has the capability of all 3.
Describe any alternatives you have considered
No response
Additional context
This is a follow on issue that will help #975
Code of Conduct
The text was updated successfully, but these errors were encountered: