Skip to content

v0.13.0

Compare
Choose a tag to compare
@elephantum elephantum released this 02 Sep 08:29
· 293 commits to master since this release

Changes

Core

  • Add datapipe.metastore.TransformMetaTable. Now each transform gets it's own
    meta table that tracks status of each transformation
  • Generalize BatchTransform and DatatableBatchTransform through
    BaseBatchTransformStep
  • Add transform_keys to *BatchTransform
  • Move changed idx computation out of DataStore to BaseBatchTransformStep
  • Add column priority to transform meta table, sort work by priority
  • Switch from vanilla tqdm to tqdm_loggable for better display in logs
  • TableStoreFiledir constructor accepts new argument fsspec_kwargs
  • Add filters, order_by, order arguments to *BatchTransformStep
  • Add magic injection of ds, idx, run_config to transform function via
    parameters introspection to BatchTransform
  • Add magic ds inject into BatchGenerate
  • Split core_steps into step.batch_transform, step.batch_generate,
    step.datatable_transform, step.update_external_table
  • Move metatable.MetaTable to datatable
  • Enable WAL mode for sqlite database by default

CLI

  • Add step reset-metadata CLI command
  • Add step fill-metadata CLI command that populates transform meta-table with
    all indices to process
  • Add step run-idx CLI command
  • CLI step run_changelist command accepts new argument --chunk-size
  • New CLI command table migrate_transform_tables for 0.13 migration
  • Add --start-step parameter to step run-changelist CLI
  • Move --executor parameter from datapipe step to datapipe command

Execution

  • Executors: datapipe.executor.SingleThreadExecutor,
    datapipe.executor.ray.RayExecutor

Bugfixes

  • Fix QdrantStore.read_rows when no idx is specified
  • Fix RedisStore serialization for Ray