Skip to content

pytimetk 0.2.0

Compare
Choose a tag to compare
@mdancho84 mdancho84 released this 03 Nov 17:39
· 194 commits to master since this release

Anomaly Detection

  • anomalize(): A new scalable function for detecting time series anomalies (outliers)
  • plot_anomalies(): A scalable visualization tool for inspecting anomalies in time series data.
  • plot_anomalies_decomp: A scalable visualization tool for inspecting the observed, seasonal, trend, and remainder decomposition, which are useful for telling you whether or not anomalies are being detected to your preference.
  • plot_anomalies_cleaned(): A scalable visualization tool for showing the before and after transformation for the cleaned vs uncleaned anomalies.

New Functions:

  • apply_by_time(): For complex apply-style aggregations by time.
  • augment_rolling_apply(): For complex rolling operations using apply-style data frame functions.
  • augment_expanding(): For expanding calculations with single-column functions (e.g. mean).
  • augment_expanding_apply(): For complex expanding operations with apply-style data frame functions.
  • augment_hilbert(): Hilbert features for signal processing.
  • augment_wavelet(): Wavelet transform features.
  • get_frequency(): Infer a pandas-like frequency. More robust than pandas.infer_freq.
  • get_seasonal_frequency(): Infer the pandas-like seasonal frequency (periodicity) for the time series.
  • get_trend_frequency(): Infer the pandas-like trend for the time series.

New Finance Module

More coming soon.

  • augment_ewm(): Exponentially weighted augmentation

Speed Improvements

Polars Engines:

  • summarize_by_time(): Gains a polars engine.
    • 3X to 10X speed improvements.
  • augment_lags() and augment_leads(): Gains a polars engine. Speed improvements increase with number of lags/leads.
    • 6.5X speed improvement with 100 lags.
  • augment_rolling(): Gains a polars engine. 10X speed improvement.
  • augment_expanding(): Gains a polars engine.
  • augment_timeseries_signature(): Gains a polars engine. 3X speed improvement.
  • augment_holiday_signature(): Gains a polars engine.

Parallel Processing and Vectorized Optimizations:

  • pad_by_time(): Complete overhaul. Uses Cartesian Product (Vectorization) to enhance the speed. 1000s of time series can now be padded in seconds.
    • Independent review: Time went from over 90 minutes to 13 seconds for a 500X speedup on 10M rows.
  • future_frame(): Complete overhaul. Uses vectorization when possible. Grouped parallel processing. Set threads = -1 to use all cores.
    • Independent Review: Time went from 11 minutes to 2.5 minutes for a 4.4X speedup on 10M rows
  • ts_features: Uses concurrent futures to parallelize tasks. Set threads = -1 to use all cores.
  • ts_summary: Uses concurrent futures to parallelize tasks. Set threads = -1 to use all cores.
  • anomalize: Uses concurrent futures to parallelize tasks. Set threads = -1 to use all cores.
  • augment_rolling() and augment_rolling_apply(): Uses concurrent futures to parallelize tasks. Set threads = -1 to use all cores.

Helpful Utilities:

  • parallel_apply: Mimics the pandas apply() function with concurrent futures.
  • progress_apply: Adds a progress bar to pandas apply()
  • glimpse(): Mimics tidyverse (tibble) glimpse function

New Data Sets:

  • expedia: Expedia hotel searches time series data set

3 New Applied Tutorials:

  1. Sales Analysis Tutorial
  2. Finance Analysis Tutorial
  3. Demand Forecasting Tutorial
  4. Anomaly Detection Tutorial

Final Deprecations:

  • summarize_by_time(): kind = "period". This was removed for consistency with pytimetk. "timestamp" is the default.
  • augment_rolling(): use_independent_variables. This is replaced by augment_rolling_apply().