Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix sqlite migration #3

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Conversation

fabm3n
Copy link

@fabm3n fabm3n commented Oct 23, 2022

found some bugs while migrating my sqlite database to influx db

Maaxion
Maaxion previously approved these changes Dec 12, 2022
@maxime1992
Copy link

maxime1992 commented Jan 31, 2023

Hey guys, first of all thanks for the repo and thanks for this PR.

I'm trying to run the script through a Docker container as I don't anything Python related directly on my machine, it's not a language I use/know personally.

I've created the following Dockerfile based on the readme:

FROM ubuntu:18.04

RUN apt update -y

RUN apt install python3 python3.7-dev python3-venv python3-pip git -y

WORKDIR /home

COPY . .

RUN git clone --depth=1 https://github.com/home-assistant/core.git home-assistant-core

RUN python3 -m venv .venv

RUN . .venv/bin/activate

RUN python3 -m pip install --upgrade --force pip

RUN pip3 install -r home-assistant-core/requirements.txt

RUN pip3 install -r requirements.txt

But when it runs the pip3 install -r home-assistant-core/requirements.txt command, it fails saying:

Collecting atomicwrites-homeassistant==1.4.1
  Downloading atomicwrites_homeassistant-1.4.1-py2.py3-none-any.whl (7.1 kB)
ERROR: Cannot install awesomeversion==22.9.0 because these package versions have conflicting dependencies.

The conflict is caused by:
    The user requested awesomeversion==22.9.0
    The user requested (constraint) awesomeversion==22.9.0

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/user_guide/#fixing-conflicting-dependencies

I have no idea what to do with this 😓 Could anyone help me out to run the migration code? Thanks a lot!

Oh and the build command I'm using: docker build . -t homeassistant2influxdb

@Maaxion
Copy link
Owner

Maaxion commented Feb 2, 2023

A bit busy with work right now, but in your docker file you're pulling the latest home-assistant core.

This PR is up to date with release 2022.6.7

You can try to instead pull that version using git clone --depth 1 --branch <tag_name> <repo_url>.

This script would need to be verified to work with the latest home-assistant core release.

@fabm3n
Copy link
Author

fabm3n commented Feb 2, 2023

A bit busy with work right now, but in your docker file you're pulling the latest home-assistant core.

This PR is up to date with release 2022.6.7

You can try to instead pull that version using git clone --depth 1 --branch <tag_name> <repo_url>.

This script would need to be verified to work with the latest home-assistant core release.

The issue is morelikely that he tries to use an old python version.

@maxime1992 try to use the python image (untested):

FROM python:3.11

RUN apt update -y

RUN apt install git -y

WORKDIR /home

COPY . .

RUN git clone --depth=1 https://github.com/home-assistant/core.git home-assistant-core

RUN python3 -m pip install --upgrade --force pip

RUN pip3 install -r home-assistant-core/requirements.txt

RUN pip3 install -r requirements.txt

@maxime1992
Copy link

Hey guys, I've managed early this morning to get it working 🙏

It worked like 90% through, I think I ran out of RAM after that but I'll just kill a few things and re-run it it should be perfect 👌.

When I get some time I'll explain how I've done it with Docker, which should help other people trying to run this script :)

Thanks for your answers and both of your work on this!

@maxime1992
Copy link

Ok so I can run the script, but while running it I'm getting errors like this:

Failed extracting data from ('binary_sensor.plug_0_washing_machine_update_available', 'off', None, 'state_changed', datetime.datetime(2022, 5, 26, 16, 10, 15, 160660)): the JSON object must be str, bytes or bytearray, not NoneType.
Attributes: None
Failed extracting data from ('switch.plug_0_dryer', 'off', None, 'state_changed', datetime.datetime(2022, 5, 26, 16, 10, 15, 163585)): the JSON object must be str, bytes or bytearray, not NoneType.
Attributes: None

Any idea what I should do to fix this?

@fabm3n
Copy link
Author

fabm3n commented Feb 2, 2023

Ok so I can run the script, but while running it I'm getting errors like this:


Failed extracting data from ('binary_sensor.plug_0_washing_machine_update_available', 'off', None, 'state_changed', datetime.datetime(2022, 5, 26, 16, 10, 15, 160660)): the JSON object must be str, bytes or bytearray, not NoneType.

Attributes: None

Failed extracting data from ('switch.plug_0_dryer', 'off', None, 'state_changed', datetime.datetime(2022, 5, 26, 16, 10, 15, 163585)): the JSON object must be str, bytes or bytearray, not NoneType.

Attributes: None

Any idea what I should do to fix this?

I can provide a fix tomorrow.

@maxime1992
Copy link

Wow that's fantastic @fabm3n thank you so much! 🙏

I forgot to post it but there's also this in the error logs:

Attributes: NoneTraceback (most recent call last):
  File "/usr/src/homeassistant2influxdb.py", line 189, in main
    _attributes = rename_friendly_name(json.loads(_attributes_raw))
  File "/usr/local/lib/python3.10/json/__init__.py", line 339, in loads
    raise TypeError(f'the JSON object must be str, bytes or bytearray, '
TypeError: the JSON object must be str, bytes or bytearray, not NoneType

I've used dbeaver to look at the database and the different attributes. I don't think they've changed so yup I think I'm lost here... Thank you so much :)

@fabm3n
Copy link
Author

fabm3n commented Feb 2, 2023

I will send you a quick mail. Can you send me your database so i can try it on my own?

@fabm3n
Copy link
Author

fabm3n commented Feb 3, 2023

Ok so I can run the script, but while running it I'm getting errors like this:

Failed extracting data from ('binary_sensor.plug_0_washing_machine_update_available', 'off', None, 'state_changed', datetime.datetime(2022, 5, 26, 16, 10, 15, 160660)): the JSON object must be str, bytes or bytearray, not NoneType.
Attributes: None
Failed extracting data from ('switch.plug_0_dryer', 'off', None, 'state_changed', datetime.datetime(2022, 5, 26, 16, 10, 15, 163585)): the JSON object must be str, bytes or bytearray, not NoneType.
Attributes: None

Any idea what I should do to fix this?

Can you check why these entities have no attributes?

@maxime1992
Copy link

@fabm3n can you give some guidance on what to do please ?

@maxime1992
Copy link

Ok not sure of how helpful that info is going to be... But I've ran this query: SELECT COUNT(*) from states WHERE state is NULL and it returned 694. If I remove the where close, I've got... a lot, lot, lot, lot, lot more.

I've taken a look into what it might be, and I think they were mostly sensor that:

  • From devices that are not part of my zigbee network anymore
  • Old phone that's not connected to HA anymore
  • Came from integrations that have been deprecated or stopped working

So essentially... I'm pretty sure I don't care about these 600 values, and I suspect your fix might help already. I'll try to have a go!

@maxime1992
Copy link

Ok I tried with all your latest updates, here's the error I got:

Traceback (most recent call last):                                                                                                                                                                                                                                        
  File "/usr/src/homeassistant2influxdb.py", line 371, in <module>
    main()
  File "/usr/src/homeassistant2influxdb.py", line 235, in main
    if "friendly_name" in _attributes:
TypeError: argument of type 'NoneType' is not iterable

@maxime1992
Copy link

maxime1992 commented Feb 3, 2023

I'll try to edit the line 235 from

if "friendly_name" in _attributes:

to

if _attributes is not None and "friendly_name" in _attributes:

EDIT to avoid further spamming.... 👀

image

@maxime1992
Copy link

Victory!!! It's all working I've got all the data in Influx! Thanks to both of you for the help and patience 🙏 ❤️ !

I'll try to explain (soon) what I've done with Docker as it helped to avoid some Python issue, in case someone else needs it.

@maxime1992
Copy link

@Maaxion I reckon this is good for merging and it's been tested for a migration while I was on HA 2023.1.4 FYI :)

@maxime1992
Copy link

maxime1992 commented Feb 4, 2023

As promised, in case anyone else wants to run this with Docker:

  • git clone https://github.com/Maaxion/homeassistant2influxdb.git h2i && cd h2i

Note: If this PR is not merged yet, make sure to apply the changes first (or instead of cloning from Maaxion's repo do this: https://github.com/fabm3n/homeassistant2influxdb.git h2i && cd homeassistant2influxdb h2i && git checkout fabm3n:fix-sqlite-migration)

  • Within the h2i folder: git clone https://github.com/home-assistant/core.git home-assistant-core

  • Edit home-assistant-core and change the first 2 lines with arg and from by this: FROM ghcr.io/home-assistant/amd64-homeassistant-base:2022.11.0. For the architecture bit or the string above, you can pick a different one if needed. See HA documentation

  • Build the HA core docker image as we'll use it as base: cd home-assitant-core && docker build . -t home-assistant-core

  • Go back to the previous folder cd ..

  • Fill up the influxdb.yaml file

  • Create a Dockerfile with the following content:

FROM home-assistant-core

RUN apk add build-base

WORKDIR /usr/src

RUN mv homeassistant home-assistant-core

COPY . .

RUN pip3 install -r requirements.txt
  • Create a .dockerignore file with the following content:
home-assistant-core
mariadb
  • Build the image: docker build . -t h2i

  • Run the container with some extra RAM compared to default as it may crash if you've got loads of data to migrate: docker run -it -m 4096m --network host h2i bash

  • From the container, run time python homeassistant2influxdb.py --type MariaDB --user homeassistant --password your-password-here --host your-host-here --database ha_db

Notes on the above:

  • Of course, feel free to change the type and other arguments
  • For the host, if you want to reach a MariaDB (or other) on the same system and running as well within a container, your best option is to run docker inspect mariadb | grep -i IPAddress (and replace mariadb by the name of you container of course) to get it's IP address and use that as the host

I had issues as explained in the comments above while trying to run a Docker container with Python myself, hence the build of HA first and then start of that image to put the code of homeassistant2influxdb :)

@max5962
Copy link

max5962 commented Feb 9, 2023

Hello @fabm3n @maxime1992 ,
i'm tring to use this branch in order to migrate all my database SQLITE to an influxDB instance.
But i have this issue :

Traceback (most recent call last):
File "C:\TEMPORAIRE\migration\homeassistant2influxdb\homeassistant2influxdb.py", line 371, in
main()
File "C:\TEMPORAIRE\migration\homeassistant2influxdb\homeassistant2influxdb.py", line 225, in main
time_fired=datetime.strptime(_time_fired, "%Y-%m-%d %H:%M:%S.%f")
TypeError: strptime() argument 1 must be str, not None

any idea ?
Thnaks

@maxime1992
Copy link

Hey, I'm no Python expert (actually never wrote any) but look at the few last commits made by @fabm3n. It's probably a matter of doing just the same for your field in question.

@maxime1992
Copy link

maxime1992 commented Feb 9, 2023

On a separate note, I started to use Kibana and noticed that data prior the migration and new data coming straight out of HA have a difference:

image

image

image

Unsure why it's not considered the same 🤔 Anyone has an idea of what to do here? Fixing the data in influx would be ideal, but if it's too much troubles I can just re-export again the entire database as I've kept it running in // on HA in case I had any issue.

EDIT: In case anyone else comes across this, I've managed to fix my grafana query by adding this at the end of my query: |> drop(columns:["source", "friendly_name"]) I believe these 2 columns are added by this script so it may be a good idea to comment that out/remove it in the first place as it's not yet possible to drop columns with influxdb...

@fabm3n
Copy link
Author

fabm3n commented Feb 15, 2023

Ok not sure of how helpful that info is going to be... But I've ran this query: SELECT COUNT(*) from states WHERE state is NULL and it returned 694. If I remove the where close, I've got... a lot, lot, lot, lot, lot more.

I've taken a look into what it might be, and I think they were mostly sensor that:

  • From devices that are not part of my zigbee network anymore
  • Old phone that's not connected to HA anymore
  • Came from integrations that have been deprecated or stopped working

So essentially... I'm pretty sure I don't care about these 600 values, and I suspect your fix might help already. I'll try to have a go!

That's good! So ignoring in your case is not a problem.

@fabm3n
Copy link
Author

fabm3n commented Feb 15, 2023

I'll try to edit the line 235 from

if "friendly_name" in _attributes:

to

if _attributes is not None and "friendly_name" in _attributes:

EDIT to avoid further spamming.... 👀

image

Do you still have your "old" database and can check which data has a friendly name set as null? Like you did it for the status.

@fabm3n
Copy link
Author

fabm3n commented Feb 15, 2023

Hello @fabm3n @maxime1992 , i'm tring to use this branch in order to migrate all my database SQLITE to an influxDB instance. But i have this issue :

Traceback (most recent call last): File "C:\TEMPORAIRE\migration\homeassistant2influxdb\homeassistant2influxdb.py", line 371, in main() File "C:\TEMPORAIRE\migration\homeassistant2influxdb\homeassistant2influxdb.py", line 225, in main time_fired=datetime.strptime(_time_fired, "%Y-%m-%d %H:%M:%S.%f") TypeError: strptime() argument 1 must be str, not None

any idea ? Thnaks

Hi @max5962,
can you check your database with something like db brwser for sqlite to check why there are some timestamps with null in it? Maybe you can detect a sensor or something else which causes this issue for you.

@fabm3n
Copy link
Author

fabm3n commented Feb 15, 2023

On a separate note, I started to use Kibana and noticed that data prior the migration and new data coming straight out of HA have a difference:

image

image

image

Unsure why it's not considered the same 🤔 Anyone has an idea of what to do here? Fixing the data in influx would be ideal, but if it's too much troubles I can just re-export again the entire database as I've kept it running in // on HA in case I had any issue.

EDIT: In case anyone else comes across this, I've managed to fix my grafana query by adding this at the end of my query: |> drop(columns:["source", "friendly_name"]) I believe these 2 columns are added by this script so it may be a good idea to comment that out/remove it in the first place as it's not yet possible to drop columns with influxdb...

I didn't went deep into my data so i never had such a issue. That's why i don't try to fix this :)

@maxime1992
Copy link

Do you still have your "old" database and can check which data has a friendly name set as null? Like you did it for the status.

I have, I may have a look at some point but not right now

I didn't went deep into my data so i never had such a issue. That's why i don't try to fix this :)

Oh yeah it's fine, as per my edit in the message you quoted, the only fix needed is to remove that column. And a simple fix in this script would be to drop the line that adds the "source" and sets it to HA.

@max5962
Copy link

max5962 commented Feb 16, 2023

Hello @fabm3n @maxime1992 , i'm tring to use this branch in order to migrate all my database SQLITE to an influxDB instance. But i have this issue :
Traceback (most recent call last): File "C:\TEMPORAIRE\migration\homeassistant2influxdb\homeassistant2influxdb.py", line 371, in main() File "C:\TEMPORAIRE\migration\homeassistant2influxdb\homeassistant2influxdb.py", line 225, in main time_fired=datetime.strptime(_time_fired, "%Y-%m-%d %H:%M:%S.%f") TypeError: strptime() argument 1 must be str, not None
any idea ? Thnaks

Hi @max5962, can you check your database with something like db brwser for sqlite to check why there are some timestamps with null in it? Maybe you can detect a sensor or something else which causes this issue for you.

I succeed last week to retreived my data by changing the request by :
I added an WHERE clause because I just want to retreived my energy data.

With previous database format, the time_fired was always NULL, so i converted last_updated_ts, to the desired format. I just want to use my history data in an Grafana, so i don't care if its not totaly exact ( cf "||.33195 ).

if table == "states":
        # Using two different SQL queries in a Union to support data made with older HA db schema:
        # https://github.com/home-assistant/core/pull/71165
        sql_query = """select states.entity_id,
                              states.state,
                              state_attributes.shared_attrs as attributes,
                              'state_changed',
                              DATETIME(states.last_updated_ts,'unixepoch') || ".33195" as time_fired
                       from states, state_attributes
                       where event_id is null
                        and states.attributes_id = state_attributes.attributes_id and
							states.entity_id="MY_ENTITY_ID";

@xaviergriffon
Copy link

xaviergriffon commented Mar 12, 2023

Hello @fabm3n @maxime1992 , i'm tring to use this branch in order to migrate all my database SQLITE to an influxDB instance. But i have this issue :
Traceback (most recent call last): File "C:\TEMPORAIRE\migration\homeassistant2influxdb\homeassistant2influxdb.py", line 371, in main() File "C:\TEMPORAIRE\migration\homeassistant2influxdb\homeassistant2influxdb.py", line 225, in main time_fired=datetime.strptime(_time_fired, "%Y-%m-%d %H:%M:%S.%f") TypeError: strptime() argument 1 must be str, not None
any idea ? Thnaks

Hi @max5962, can you check your database with something like db brwser for sqlite to check why there are some timestamps with null in it? Maybe you can detect a sensor or something else which causes this issue for you.

Hi everyone,
I had the same problem as @max5962, from my analysis, the date columns migrated to timestamp and were renamed XXX_ts since 2023_2. (The migration started in version 2023_2 but the statistics are in datetime until 2023_3(1st import tested))
I modified the queries and adapted the interpretation of the time_fired column.
time_fired=datetime.fromtimestamp(_time_fired)
The import worked as a result. Thanks a lot @fabm3n and @maxime1992 .

@segdy
Copy link

segdy commented Apr 17, 2023

Hello @fabm3n @maxime1992 , i'm tring to use this branch in order to migrate all my database SQLITE to an influxDB instance. But i have this issue :
Traceback (most recent call last): File "C:\TEMPORAIRE\migration\homeassistant2influxdb\homeassistant2influxdb.py", line 371, in main() File "C:\TEMPORAIRE\migration\homeassistant2influxdb\homeassistant2influxdb.py", line 225, in main time_fired=datetime.strptime(_time_fired, "%Y-%m-%d %H:%M:%S.%f") TypeError: strptime() argument 1 must be str, not None
any idea ? Thnaks

Hi @max5962, can you check your database with something like db brwser for sqlite to check why there are some timestamps with null in it? Maybe you can detect a sensor or something else which causes this issue for you.

Hi everyone, I had the same problem as @max5962, from my analysis, the date columns migrated to timestamp and were renamed XXX_ts since 2023_2. (The migration started in version 2023_2 but the statistics are in datetime until 2023_3(1st import tested)) I modified the queries and adapted the interpretation of the time_fired column. time_fired=datetime.fromtimestamp(_time_fired) The import worked as a result. Thanks a lot @fabm3n and @maxime1992 .

Hi @xaviergriffon would you be so kind and show how you changed your code?
I have the exact same issue.
Really appreciated!

@waltonbp
Copy link

waltonbp commented Aug 5, 2023

Hi I need some help, the python seems to execute but the Sqlite data entity_id is failing some validation ??

Any ideas what happening here, not very skilled at Python

(.venv) pi@Pi4:~/homeassistant2influxdb $ python3.11 homeassistant2influxdb.py --type SQLite --database home-assistant_v2.db
Migrating home assistant database statistics, states to Influx database home-assistant_v2.db and bucket Home Assistant
Running SQL query on database table statistics.This may take longer than a few minutes, depending on how many rows there are in the database.
Processing rows from table statistics and writing to InfluxDB.
Running SQL query on database table states.This may take longer than a few minutes, depending on how many rows there are in the database.
Processing rows from table states and writing to InfluxDB.
0%| | 0.00/222k [00:00<?, ? rows/s]BW plus row count
Traceback (most recent call last):
File "/home/pi/homeassistant2influxdb/homeassistant2influxdb.py", line 374, in
main()
File "/home/pi/homeassistant2influxdb/homeassistant2influxdb.py", line 213, in main
state = State(
^^^^^^
File "/home/pi/homeassistant2influxdb/.venv/lib/python3.11/site-packages/homeassistant/core.py", line 1259, in init
if validate_entity_id and not valid_entity_id(entity_id):
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/pi/homeassistant2influxdb/.venv/lib/python3.11/site-packages/homeassistant/core.py", line 184, in valid_entity_id
return VALID_ENTITY_ID.match(entity_id) is not None
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: expected string or bytes-like object, got 'NoneType'

Is the code trying to write the Influx database as this in docker (

Not sure how the code knows the database name to access an existing Influxdb whcih has some data already loaded ???

(.venv) pi@Pi4:~/homeassistant2influxdb $ docker exec -it influxdb influx
Connected to http://localhost:8086 version 1.2.2
InfluxDB shell version: 1.2.2

USE home_assistant
Using database home_assistant
SHOW SERIES
key


W,domain=sensor,entity_id=myenergi_eddi_23468676_internal_load_ct1
W,domain=sensor,entity_id=myenergi_eddi_23468676_power_ct_internal_load
W,domain=sensor,entity_id=myenergi_harvi_11938877_generation_battery_ct2
W,domain=sensor,entity_id=myenergi_harvi_11938877_grid_ct1
W,domain=sensor,entity_id=myenergi_harvi_11938877_power_ct_generation_battery
W,domain=sensor,entity_id=myenergi_harvi_11938877_power_ct_grid
kWh,domain=sensor,entity_id=myenergi_eddi_23468676_ct_internal_load_today
kWh,domain=sensor,entity_id=myenergi_eddi_23468676_energy_consumed_session
kWh,domain=sensor,entity_id=myenergi_eddi_23468676_energy_used_today
°C,domain=sensor,entity_id=myenergi_eddi_23468676_temp_tank_1

Any help appreciate ...

@nanderson97651
Copy link

Hi I need some help, the python seems to execute but the Sqlite data entity_id is failing some validation ??

I'm seeing this same issue. Were you able to resolve it?

@waltonbp
Copy link

Sorry no updates ...I have been on holiday

I will have another look in the next few weeks... I'm guess some data in not in the correct format

@MPSMPS
Copy link

MPSMPS commented Feb 2, 2024

@waltonbp Hi, having the same issue. Have you found a solution?

@maxime1992
Copy link

I'll just re-share the message I posted a while ago: #3 (comment)

I think it may have been overlooked and could help a few. I'm really not an expert in that area and it took me quite a few hours to come up with that, so I'm afraid I won't be able to help much further but as soon as I succeeded, I wrote that message while everything was still fresh in my mind so you may want to have a look

@nabeelr
Copy link

nabeelr commented Feb 16, 2024

Hey all! Any solution to this? I have issues with dependencies not working, which then means the script won't run.

@maxime1992
Copy link

Hey all! Any solution to this? I have issues with dependencies not working, which then means the script won't run.

Have you checked my comment above ?

It's not as simple as one may like but I'm pretty sure that'd still work

@nabeelr
Copy link

nabeelr commented Feb 16, 2024

I resolved all the dependencies issues, and got the script mostly running, but now it chokes when it hits a specific line of code.

Something about the sql query not being a valid one.

Would running it in Docker help that?

I'm trying to import from a SQLite database file into InfluxDB using their v2 API

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants