Releases: epoch8/datapipe
v0.13.0-beta.4
- Fix
RedisStore
serialization for Ray
v0.13.0-beta.2
- Refactor all database writes to
insert on conflict update
- Remove check for non-overlapping input indices because they are supported now
- Add
transform_keys
toDatatableBatchTransform
- Fix
BatchTransformStep.get_full_process_ids
ids duplication - Add
MetaTable.get_changed_rows_count_after_timestamp
v0.13.0-beta.1
- Split
core_steps
intostep.batch_transform
,step.batch_generate
,step.datatable_transform
,step.update_external_table
- Move
metatable.MetaTable
todatatable
v0.13.0-alpha.8
- Fix SingleThreadExecutor initialization
- Fix CLI
table migrate-transform-tables
for complex case - Add magic
ds
inject intoBatchGenerate
v0.13.0-alpha.7
-
Try to setup logging in RayExecutor (fails so far)
-
Lazy initialisation of Ray to speedup things in CLI
-
Add
ExecutorConfig.parallelism
parameter -
Add
name
parameter toexecutor.run_process_batch
to customize task name in ray dashboard -
Migrate
run_changelist
to executor, possible parallelisation -
Limit number of in-flight Ray tasks in one
run_process_batch
to 100 -
Fix batch count in tqdm in
run_changelist
-
Add
--start-step
parameter tostep run-changelist
CLI -
Move
--executor
parameter fromdatapipe step
todatapipe
command
v0.13.0-alpha.6
- Move batch functions to
BaseBatchTransformStep
- fix index_difference index assert
v0.13.0-alpha.5
- Allow passing empty dfs when idx is passed to func
v0.13.0-alpha.4
WIP 0.13.0
Changes
Core
- Add
datapipe.metastore.TransformMetaTable
. Now each transform gets it's own
meta table that tracks status of each transformation - Generalize
BatchTransform
andDatatableBatchTransform
through
BaseBatchTransformStep
- Add
transform_keys
to*BatchTransform
- Move changed idx computation out of
DataStore
toBaseBatchTransformStep
- Add column
priority
to transform meta table, sort work by priority - Switch from vanilla
tqdm
totqdm_loggable
for better display in logs TableStoreFiledir
constructor accepts new argumentfsspec_kwargs
- Add
filters
,order_by
,order
arguments to*BatchTransformStep
- Add magic injection of
ds
,idx
,run_config
to transform function via
parameters introspection
CLI
- Add
step reset-metadata
CLI command - Add
step fill-metadata
CLI command that populates transform meta-table with
all indices to process - Add
step run-idx
CLI command - CLI
step run_changelist
command accepts new argument--chunk-size
- New CLI command
table migrate_transform_tables
for0.13
migration
Execution
- Executors:
datapipe.executor.SingleThreadExecutor
,
datapipe.executor.ray.RayExecutor
Deployment
- Add helm chart for running regular loops in k8s as
CronJob
Bugfixes
- Fix
QdrantStore.read_rows
when no idx is specified
v0.13.0-alpha.3
WIP 0.13.0
Major changes
- Add
datapipe.metastore.TransformMetaTable
. Now each transform gets it's own
meta table that tracks status of each transformation - Generalize
BatchTransform
andDatatableBatchTransform
through
BaseBatchTransformStep
- Add
transform_keys
to*BatchTransform
- Move changed idx computation out of
DataStore
toBaseBatchTransformStep
- Add column
priority
to transform meta table, sort work by priority
New features
-
Add
step reset-metadata
CLI command -
Add
step fill-metadata
CLI command that populates transform meta-table with
all indices to process -
Add helm chart for running regular loops in k8s as
CronJob
-
Switch from vanilla
tqdm
totqdm_loggable
for better display in logs -
Add
step run-idx
CLI command -
Executors:
datapipe.executor.SingleThreadExecutor
,
datapipe.executor.ray.RayExecutor
Bugfixes
- Fix
QdrantStore.read_rows
when no idx is specified
v0.12.0-alpha.2
WIP 0.12.0
Breaking changes
- Move cli from
datapipe-app
todatapipe
- Remove separate
datapipe step status
command, now it's a flag:datapipe step list --status
DatatableTransform
moved fromdatapipe.compute
todatapipe.core_steps
New features
- Add
datapipe.store.qdrant.QdrantStore
- Add
DatatableBatchTransform
pipeline step
Refactorings
- Add
labels
arg and property toComputeStep
base class - Add
labels
arg toBatchTransform
andBatchTransformStep
- Add
labels
arg toBatchGenerate
andDatatableTransformStep
- Add
labels
arg toUpdateExternalTable
andDatatableTransformStep
- Large refactoring,
ComputeStep
now contains pieces of overridable functions
forrun_full
andrun_changelist
- Add prototype events logging for steps, add
event_logger.log_step_full_complete
, add tabledatapipe_step_events