training not able to start #150

langerma · 2019-10-05T16:56:21Z

error:

Oct  5 18:51:11 monitor loudmld: INFO:root:job[a6eba68c-30a3-41ec-a404-e4f9b94372da] starting, nice=5

Oct  5 18:51:11 monitor loudmld: Process Process-7:
Oct  5 18:51:11 monitor loudmld: Traceback (most recent call last):
Oct  5 18:51:11 monitor loudmld: File "/usr/lib64/python3.6/multiprocessing/process.py", line 258, in _bootstrap
Oct  5 18:51:11 monitor loudmld: self.run()
Oct  5 18:51:11 monitor loudmld: File "/usr/lib64/python3.6/multiprocessing/process.py", line 93, in run
Oct  5 18:51:11 monitor loudmld: self._target(*self._args, **self._kwargs)
Oct  5 18:51:11 monitor loudmld: File "/usr/local/lib64/python3.6/site-packages/pebble/pool/process.py", line 389, in worker_process
Oct  5 18:51:11 monitor loudmld: send_result(channel, Result(task.id, result))
Oct  5 18:51:11 monitor loudmld: File "/usr/local/lib64/python3.6/site-packages/pebble/common.py", line 183, in send_result
Oct  5 18:51:11 monitor loudmld: pipe.send(data)
Oct  5 18:51:11 monitor loudmld: File "/usr/local/lib64/python3.6/site-packages/pebble/pool/channel.py", line 98, in unix_send
Oct  5 18:51:11 monitor loudmld: return self.writer.send(obj)
Oct  5 18:51:11 monitor loudmld: File "/usr/lib64/python3.6/multiprocessing/connection.py", line 206, in send
Oct  5 18:51:11 monitor loudmld: self._send_bytes(_ForkingPickler.dumps(obj))
Oct  5 18:51:11 monitor loudmld: File "/usr/lib64/python3.6/multiprocessing/reduction.py", line 51, in dumps
Oct  5 18:51:11 monitor loudmld: cls(buf, protocol).dump(obj)
Oct  5 18:51:11 monitor loudmld: AttributeError: Can't pickle local object '_compile_scalar.<locals>.validate_value'

model:

{
    "bucket_interval": "1m",
    "default_bucket": "influx",
    "features": [
        {
            "anomaly_type": "low_high",
            "default": "previous",
            "field": "load1",
            "io": "io",
            "match_all": [
                {
                    "tag": "host",
                    "value": "monitor"
                }
            ],
            "measurement": "system",
            "metric": "mean",
            "name": "mean_load1"
        }
    ],
    "forecast": 5,
    "grace_period": 0,
    "interval": "60s",
    "max_evals": 10,
    "max_threshold": 0,
    "min_threshold": 0,
    "name": "telegraf_system_mean_load1_host_monitor_1m",
    "offset": "10s",
    "run": {
        "detect_anomalies": true,
        "save_prediction": true
    },
    "seasonality": {
        "daytime": true,
        "weekday": true
    },
    "span": 100,
    "type": "donut"
}

command:

train-model --from now-14d --to now telegraf_system_mean_load1_host_monitor_1m

running master

The text was updated successfully, but these errors were encountered:

regel · 2019-10-06T06:29:08Z

I could not reproduce. Can you run pip freeze and provide the output? Do you see any major differences compared to the file base/vendor/requirements.txt?

The problem could be the networkx, h5py, or numpy version. If you are in a virtualenv I recommend running pip install -r base/vendor/requirements.txtto setup the dependencies.

You can also try the following:

generate data (warning: this command will also clear the database)

loudml-wave --from now-3d --to now --shape sin --amplitude 10 --base 5 --clear --tags host:monitor --field load1 influx

train the model:

train-model -f now-3d -t now -m1 telegraf_system_mean_load1_host_monitor_1m

regel · 2019-10-07T18:50:18Z

I hope this helps. Let me know if this is still a blocking point.

langerma · 2019-10-07T19:10:33Z

pip freeze output as follows

absl-py==0.8.0
aniso8601==8.0.0
astor==0.8.0
boto3==1.9.238
botocore==1.12.238
cached-property==1.5.1
certifi==2019.9.11
chardet==3.0.4
Click==7.0
cycler==0.10.0
daiquiri==1.6.0
dateutils==0.6.6
decorator==4.4.0
dictdiffer==0.8.0
dill==0.3.1.1
docker==2.6.1
docker-compose==1.18.0
docker-pycreds==0.2.1
dockerpty==0.4.1
docopt==0.6.2
docutils==0.15.2
elasticsearch==6.3.1
Flask==1.1.1
Flask-RESTful==0.3.7
future==0.17.1
gast==0.3.2
gevent==1.4.0
greenlet==0.4.15
grpcio==1.24.0
h5py==2.9.0
hyperopt==0.1.2
idna==2.8
influxdb==5.2.3
itsdangerous==1.1.0
Jinja2==2.10.1
jmespath==0.9.4
jsonschema==2.5.1
Keras-Applications==1.0.8
Keras-Preprocessing==1.1.0
kiwisolver==1.1.0
loudml===1.6.0-446132b4
Markdown==3.1.1
MarkupSafe==1.1.1
matplotlib==3.0.3
mock==3.0.5
networkx==1.11
numpy==1.16.4
pbr==5.4.3
Pebble==4.4.0
protobuf==3.9.2
pycrypto==2.6.1
pymongo==3.9.0
pyparsing==2.4.2
PySocks==1.6.8
python-dateutil==2.8.0
pytz==2019.2
PyYAML==5.1.2
requests==2.22.0
requests-aws4auth==0.9
s3transfer==0.2.1
schedule==0.6.0
scipy==1.3.1
six==1.12.0
stevedore==1.31.0
tensorboard==1.13.1
tensorflow==1.13.2
tensorflow-estimator==1.13.0
termcolor==1.1.0
texttable==1.6.2
tqdm==4.36.1
urllib3==1.25.6
virtualenv==15.1.0
virtualenv-clone==0.5.3
virtualenvwrapper==4.8.4
voluptuous==0.10.5
warp10client==1.0.1
websocket-client==0.47.0
Werkzeug==0.16.0

langerma · 2019-10-07T19:12:19Z

i have running an older version on jetson nano with tensorflow-gpu which works fine (also with tf 1.13.1 which is the last 1.13 version for the jetson nano)

but the newer version (1.6) has the same error on jetson

toni-moreno · 2020-07-31T07:05:07Z

Hello @regel @langerma , I've reproduced this error when the input bucket leaks / measurement and/or annotation_db

With this configuration is working ok

  - name: influxdb-linux #input
    type: influxdb
    addr: influxdb:8086
    database: telegraf
    retention_policy: autogen
    measurement: loudml #review
    dbuser: admin #https://github.com/regel/loudml/issues/377 
    dbuser_password: admin1234
    annotation_db: loudml
    create_database: false

With the the below config this error 'AttributeError: Can't pickle local object '_compile_scalar.<locals>.validate_value'' happened

  - name: influxdb-linux #input
    type: influxdb
    addr: influxdb:8086
    database: telegraf
    retention_policy: autogen
    #measurement: loudml <---------- commented 
    dbuser: admin #
    dbuser_password: admin1234
    #annotation_db: loudml <-----------commented
    create_database: false

@langerma I don't know if you have the same config.

Anyway in this context if have serious doubts about how the data flow is working

[input-bucket] --> [LOUDML] ---> [output-bucket]

with this flow measurement and annotation_db (and also create_database) has no sense in input-bucket if only querying. ( related to
#377)

@regel could be great if we can discuss about this data flow .

Thank you very much

langerma · 2020-08-03T07:23:25Z

@toni-moreno i am not using influxdb anymore :-( so i cannot contribute any more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

training not able to start #150

training not able to start #150

langerma commented Oct 5, 2019

regel commented Oct 6, 2019

regel commented Oct 7, 2019

langerma commented Oct 7, 2019

langerma commented Oct 7, 2019

toni-moreno commented Jul 31, 2020

langerma commented Aug 3, 2020

training not able to start #150

training not able to start #150

Comments

langerma commented Oct 5, 2019

regel commented Oct 6, 2019

regel commented Oct 7, 2019

langerma commented Oct 7, 2019

langerma commented Oct 7, 2019

toni-moreno commented Jul 31, 2020

langerma commented Aug 3, 2020