Skip to content

Releases: equinor/gordo

0.30.0

02 Sep 07:39
Compare
Choose a tag to compare
  • Add simple dataframe serialization ability to server serializer utils (0b2b84d)
  • Remove logic that y must be a subset of X for anomaly detection. (6d75c37)

0.29.0

30 Aug 08:07
dd390e1
Compare
Choose a tag to compare
  • Make client auto choose prediction endpoint (6690160)
  • Add **kwargs to sub_providers (iroc and ncs_reader) (63c53ef)
  • Remove explicit threads=None argument from DataLakeProvider init (697a5b2)
  • Fix overwrite of y in KerasLSTMAutoEncoder.fit (dd390e1)

Release 0.28.0 of gordo-components

23 Aug 12:46
Compare
Choose a tag to compare

New release of gordo-components!

Small changes:

  • All dependencies are updated, including pandas (0.24.2->0.25.0)
  • Fix issue where IROC reader used 1 thread by default. (#409)
  • Add exponential retries to influx forwarder (#413)
  • Filter bad data (code 0) from the datalake (#423)
  • Wrapper enabling use of standard scikit-learn scorers (#427)

Major change:
Change all our keras neural networks to take an explicit y instead of using the passed (and possibly scaled) X as the target.
This gives more freedom in several ways:

  • It allows training towards a un-scaled y with a scaled X, or having them xscaled in different ways.
  • It allows the y and X to be different sets of tags. The target y can be a subset of X or even a completely different set of tags.
  • It follows the standard scikit-learn pattern, making it easier to use e.g. standard scikit-learn scorers. (more about this below)

But it also involves some changes in the model definitions to get the same behavior as before.

Change in model format:

Previous model definition:

model:
  sklearn.pipeline.Pipeline:
      steps:
      - sklearn.preprocessing.data.MinMaxScaler
      - gordo_components.model.models.KerasLSTMAutoEncoder:
          kind: lstm_hourglass
          lookback_window: 10

New model definition:

model:
  gordo_components.model.anomaly.diff.DiffBasedAnomalyDetector:
    base_estimator:
      sklearn.compose.TransformedTargetRegressor:
        transformer: sklearn.preprocessing.data.MinMaxScaler
        regressor:
          sklearn.pipeline.Pipeline:
              steps:
              - sklearn.preprocessing.data.MinMaxScaler
              - gordo_components.model.models.KerasLSTMAutoEncoder:
                  kind: lstm_hourglass
                  lookback_window: 10

Explanation:

The first class, gordo_components.model.anomaly.diff.DiffBasedAnomalyDetector is a class which takes a base estimator as a parameter, and provides a new method anomaly in addition to any methods the base_estimator already has (like fit and predict). In the case of DiffBasedAnomalyDetector the call to anomaly(X,y) is implemented by calling predict on the base_estimator, scaling the output, scaling the passed y, calculating the absolute value of the differences, and then calculating the norm. The output of anomaly(X,y) is a multi-level dataframe with the original input and output to the base-estimator, in addition to per-sensor calculated errors (abs of differences) and the complete error score. The major difference from before is that the error-calculations are now an explicit class which can be used in e.g. notebooks, instead of existing as a function in the server-class.

The second new class in the config above is sklearn.compose.TransformedTargetRegressor. This is a standard scikit-learn class which allows one to scale the target y before the model is fitted, and then inverse scales the output of the base_estimator when predict is called. This class is needed if you want the Keras network to train towards scaled y as it was before, if you do not want this then you can omit the sklearn.compose.TransformedTargetRegressor.

Using scikit learn scores

It is now possible to use standard scikit-learn scorers with a simple wrapper.
Example:

from gordo_components import serializer
import yaml
import numpy
from sklearn.metrics import r2_score

config = yaml.load(
    """
    sklearn.pipeline.Pipeline:
        steps:
          - sklearn.preprocessing.data.MinMaxScaler
          - gordo_components.model.models.KerasLSTMAutoEncoder:
              kind: lstm_hourglass
              lookback_window: 10
              epochs: 20
    """
)
model = serializer.pipeline_from_definition(config)

X = numpy.random.rand(100,10)
y = numpy.random.rand(100,10)

model.fit(X,y)
#This will fail since the output and the target is of different length

# r2_score(X,model.predict(X))

# The fix:
from gordo_components.model.utils import metric_wrapper
metric_wrapper(r2_score)(X, model.predict(X))

Release 0.27.0 of gordo-components

12 Aug 11:16
Compare
Choose a tag to compare
  • Watchman: Handle empty events gracefully (#381)
  • Support custom aggregation methods in TimeSeriesDataset (#369)

Release 0.26.1 of gordo-components

02 Aug 07:32
Compare
Choose a tag to compare
  • Generalize IROC reader (#375)
  • Fix implicit changing of columns in the server base view which affected Grafana tag to data assignments (#380)

Release 0.26 of gordo-components

10 Jul 08:17
Compare
Choose a tag to compare
  • Pass keyword arguments onto Keras compile, allowing more flexibility
  • Add Gullfaks A as new asset
  • Add "infinity" imputer
  • Add pushing of "latest" tag for docker images, making it easier to always test latest build of master
  • Optimize ML server post data processing, speeding it up
  • Add pytest-benchmark

Release 0.25 of gordo-components

26 Jun 12:28
Compare
Choose a tag to compare
  • Change default keras activation functions to tanh (#346)
  • Server: Log timings and return as header (#345)
  • Added PERA (Peregrino) as new asset
  • Allow TimeseriesDataset to take and output target tags (#327)
  • Support sklearn.multioutput.MultiOutputRegressor (#321)
  • Add output activation function for feedforward NN as a parameter (#352)

Release 0.24 of gordo-components

24 Jun 12:43
Compare
Choose a tag to compare
  • More robust and scalable watchman - using K8S updates
  • Fix a bug that made the automatic client fail on IROC projects if train_start was not UTC
  • Add multithreaded download of NCS data from datalake
  • Support dry-run mode on the ncs_provider load_series

Release 0.23.0 of gordo-components

12 Jun 13:52
Compare
Choose a tag to compare
  • Support building models without scoring/cross val (#326)
  • Fix issue where the serializer drops parameters to keras (#333)
  • Refactor ML Server into modular model views (#288)
  • Rename auto encoders .transform() -> .predict() (#288)
  • Upgrade scikit-learn ~=0.21

Note: This depends on a compatible version of gordo-infrastructure, the soon-to-be-released 0.24.0

Release 0.22.0 of gordo-components

04 Jun 11:37
Compare
Choose a tag to compare
  • Replace generate_window with keras functionality, speeding up predict and improving fit (#299)
  • Ensure model-config in builder is fully expanded (#313)
  • Fix setup.py license and supported python versions (#314)