Skip to content

Releases: epoch8/datapipe

v0.13.0-beta.4

21 Aug 14:50
7f165b7
Compare
Choose a tag to compare
v0.13.0-beta.4 Pre-release
Pre-release
  • Fix RedisStore serialization for Ray

v0.13.0-beta.2

05 Aug 17:39
18855a0
Compare
Choose a tag to compare
v0.13.0-beta.2 Pre-release
Pre-release
  • Refactor all database writes to insert on conflict update
  • Remove check for non-overlapping input indices because they are supported now
  • Add transform_keys to DatatableBatchTransform
  • Fix BatchTransformStep.get_full_process_ids ids duplication
  • Add MetaTable.get_changed_rows_count_after_timestamp

v0.13.0-beta.1

02 Aug 18:13
f4707a8
Compare
Choose a tag to compare
v0.13.0-beta.1 Pre-release
Pre-release
  • Split core_steps into step.batch_transform, step.batch_generate, step.datatable_transform, step.update_external_table
  • Move metatable.MetaTable to datatable

v0.13.0-alpha.8

31 Jul 17:09
Compare
Choose a tag to compare
v0.13.0-alpha.8 Pre-release
Pre-release
  • Fix SingleThreadExecutor initialization
  • Fix CLI table migrate-transform-tables for complex case
  • Add magic ds inject into BatchGenerate

v0.13.0-alpha.7

30 Jul 17:19
c085abd
Compare
Choose a tag to compare
v0.13.0-alpha.7 Pre-release
Pre-release
  • Try to setup logging in RayExecutor (fails so far)

  • Lazy initialisation of Ray to speedup things in CLI

  • Add ExecutorConfig.parallelism parameter

  • Add name parameter to executor.run_process_batch to customize task name in ray dashboard

  • Migrate run_changelist to executor, possible parallelisation

  • Limit number of in-flight Ray tasks in one run_process_batch to 100

  • Fix batch count in tqdm in run_changelist

  • Add --start-step parameter to step run-changelist CLI

  • Move --executor parameter from datapipe step to datapipe command

v0.13.0-alpha.6

24 Jul 12:21
Compare
Choose a tag to compare
v0.13.0-alpha.6 Pre-release
Pre-release
  • Move batch functions to BaseBatchTransformStep
  • fix index_difference index assert

v0.13.0-alpha.5

20 Jul 14:13
Compare
Choose a tag to compare
v0.13.0-alpha.5 Pre-release
Pre-release
  • Allow passing empty dfs when idx is passed to func

v0.13.0-alpha.4

19 Jul 20:26
2acb2fd
Compare
Choose a tag to compare
v0.13.0-alpha.4 Pre-release
Pre-release

WIP 0.13.0

Changes

Core

  • Add datapipe.metastore.TransformMetaTable. Now each transform gets it's own
    meta table that tracks status of each transformation
  • Generalize BatchTransform and DatatableBatchTransform through
    BaseBatchTransformStep
  • Add transform_keys to *BatchTransform
  • Move changed idx computation out of DataStore to BaseBatchTransformStep
  • Add column priority to transform meta table, sort work by priority
  • Switch from vanilla tqdm to tqdm_loggable for better display in logs
  • TableStoreFiledir constructor accepts new argument fsspec_kwargs
  • Add filters, order_by, order arguments to *BatchTransformStep
  • Add magic injection of ds, idx, run_config to transform function via
    parameters introspection

CLI

  • Add step reset-metadata CLI command
  • Add step fill-metadata CLI command that populates transform meta-table with
    all indices to process
  • Add step run-idx CLI command
  • CLI step run_changelist command accepts new argument --chunk-size
  • New CLI command table migrate_transform_tables for 0.13 migration

Execution

  • Executors: datapipe.executor.SingleThreadExecutor,
    datapipe.executor.ray.RayExecutor

Deployment

  • Add helm chart for running regular loops in k8s as CronJob

Bugfixes

  • Fix QdrantStore.read_rows when no idx is specified

v0.13.0-alpha.3

19 Jul 19:29
3099861
Compare
Choose a tag to compare
v0.13.0-alpha.3 Pre-release
Pre-release

WIP 0.13.0

Major changes

  • Add datapipe.metastore.TransformMetaTable. Now each transform gets it's own
    meta table that tracks status of each transformation
  • Generalize BatchTransform and DatatableBatchTransform through
    BaseBatchTransformStep
  • Add transform_keys to *BatchTransform
  • Move changed idx computation out of DataStore to BaseBatchTransformStep
  • Add column priority to transform meta table, sort work by priority

New features

  • Add step reset-metadata CLI command

  • Add step fill-metadata CLI command that populates transform meta-table with
    all indices to process

  • Add helm chart for running regular loops in k8s as CronJob

  • Switch from vanilla tqdm to tqdm_loggable for better display in logs

  • Add step run-idx CLI command

  • Executors: datapipe.executor.SingleThreadExecutor,
    datapipe.executor.ray.RayExecutor

Bugfixes

  • Fix QdrantStore.read_rows when no idx is specified

v0.12.0-alpha.2

10 Jul 22:00
Compare
Choose a tag to compare
v0.12.0-alpha.2 Pre-release
Pre-release

WIP 0.12.0

Breaking changes

  • Move cli from datapipe-app to datapipe
  • Remove separate datapipe step status command, now it's a flag: datapipe step list --status
  • DatatableTransform moved from datapipe.compute to datapipe.core_steps

New features

  • Add datapipe.store.qdrant.QdrantStore
  • Add DatatableBatchTransform pipeline step

Refactorings

  • Add labels arg and property to ComputeStep base class
  • Add labels arg to BatchTransform and BatchTransformStep
  • Add labels arg to BatchGenerate and DatatableTransformStep
  • Add labels arg to UpdateExternalTable and DatatableTransformStep
  • Large refactoring, ComputeStep now contains pieces of overridable functions
    for run_full and run_changelist
  • Add prototype events logging for steps, add
    event_logger.log_step_full_complete, add table datapipe_step_events