-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🎇 BLE matching in the pipeline #965
Conversation
This will provide a way for the pipeline to access the dynamic config. the first time `get_dynamic_config` is called, it fetches the config associated with the study from Github. The study is defined by the environment variable `STUDY_CONFIG`. The downloaded config is cached within one session (each pipeline run is a different session), so subsequent calls to `get_dynamic_config` will not need to re-fetch.
Testing done: -loaded a dump of dfc-fermata from 24 april into my local database -ran `. setup/setup.sh` to update dependencies (e-mission-common is a new dependency) -located one user (the opcode I've been using the most for my own testing). cleared pipeline for that user via `./e-mission-py.bash bin/reset_pipeline.py -e nrelop_dfc-fermata_test_*` -ran pipeline for that user, supplying the STUDY_CONFIG as an environment variable `STUDY_CONFIG=dfc-fermata ./e-mission-py.bash bin/debug/intake_single_user.py -e nrelop_dfc-fermata_test_*` -observed log output. BLE modes were detected for at least some of the sections ``` 2024-05-02 17:11:52,566:DEBUG:140704482301888:Returning cached dynamic config for dfc-fermata at version 5 2024-05-02 17:11:52,566:DEBUG:140704482301888:getting BLE sensed vehicle for section from 1713192674 to 1713193146 2024-05-02 17:11:52,593:DEBUG:140704482301888:After filtering, 471 BLE ranging entries during the section 2024-05-02 17:11:52,595:DEBUG:140704482301888:after counting, ble_beacon_counts = {'dfc0:fff0': 471} 2024-05-02 17:11:52,595:DEBUG:140704482301888:found vehicle Jack's Mazda 3 with BLE beacon dfc0:fff0 ``` Some sections still have a ble_sensed_mode of None but this is plausible because we hadn't integrated the BLE sensing until a couple weeks ago and I've been using this opcode longer than that. - observed that the sections have `ble_sensed_mode`: ``` ts = esta.TimeSeries.get_time_series(user_id) entries = ts.find_entries(["analysis/cleaned_section"]) print([e['data']['ble_sensed_mode'] for e in entries]) ``` Output shows 31 sections with 'None' and 9 sections with the following: ``` {'value': 'car_jacks_mazda3', 'bluetooth_major_minor': ['dfc0:fff0'], 'text': "Jack's Mazda 3", 'baseMode': 'CAR', 'met_equivalent': 'IN_VEHICLE', 'kgCo2PerKm': 0.16777, 'vehicle_info': {'type': 'car', 'license': 'JHK ****', 'make': 'Mazda', 'model': '3', 'year': 2014, 'color': 'red', 'engine': 'ICE', 'mpg': 33}} ```
Since sections now have a `ble_sensed_mode` property in addition to `sensed_mode`, we can use the `ble_sensed_mode` as the basis for a new kind of section summary. To enable this, the `get_section_summary` function was refactored to accept an additional argument, which can be either `sensed_mode` or `ble_sensed_mode`. If it's `ble_sensed_mode`, the summary values will be summed and grouped by the `baseMode` of the vehicle that was sensed by BLE. If the additional argument is not given, it just defaults to `sensed_mode` (same behavior as before) Testing done: -using a dump of dfc-fermata from 24 april loaded into my local database -reset and re-ran the pipeline for a user -inspected the `ble_sensed_summary`s of the resulting confirmed trips: ``` ts = esta.TimeSeries.get_time_series(user_id) entries = ts.find_entries(["analysis/confirmed_trip"]) for e in entries: print(e['data']['ble_sensed_summary']) ``` Output: ``` {'distance': {'UNKNOWN': 4294.125850161847}, 'duration': {'UNKNOWN': 10379.947390556335}, 'count': {'UNKNOWN': 2}} {'distance': {'UNKNOWN': 1511402.9404509964}, 'duration': {'UNKNOWN': 131559.69582343102}, 'count': {'UNKNOWN': 4}} {'distance': {'UNKNOWN': 34462.99760301494}, 'duration': {'UNKNOWN': 977.1180453300476}, 'count': {'UNKNOWN': 3}} {'distance': {'UNKNOWN': 19485.795458445176}, 'duration': {'UNKNOWN': 3718.8894975185394}, 'count': {'UNKNOWN': 2}} {'distance': {'UNKNOWN': 850.4852679201318}, 'duration': {'UNKNOWN': 15861.93504524231}, 'count': {'UNKNOWN': 1}} {'distance': {'UNKNOWN': 47455.78820030493}, 'duration': {'UNKNOWN': 7241.158731937408}, 'count': {'UNKNOWN': 3}} {'distance': {'UNKNOWN': 1551245.9810044689}, 'duration': {'UNKNOWN': 13537.040816783905}, 'count': {'UNKNOWN': 5}} {'distance': {'UNKNOWN': 713.4707286619786}, 'duration': {'UNKNOWN': 1448.6543893814087}, 'count': {'UNKNOWN': 1}} {'distance': {'UNKNOWN': 3981236.406827432}, 'duration': {'UNKNOWN': 26685.684997320175}, 'count': {'UNKNOWN': 2}} {'distance': {'UNKNOWN': 333.2678850005484}, 'duration': {'UNKNOWN': 21.80204725265503}, 'count': {'UNKNOWN': 1}} {'distance': {'UNKNOWN': 2866.186136018516}, 'duration': {'UNKNOWN': 3249.8193860054016}, 'count': {'UNKNOWN': 1}} {'distance': {'UNKNOWN': 2645.5964919933926}, 'duration': {'UNKNOWN': 750.7993273735046}, 'count': {'UNKNOWN': 1}} {'distance': {'UNKNOWN': 2442.091053542208}, 'duration': {'UNKNOWN': 489.8975965976715}, 'count': {'UNKNOWN': 1}} {'distance': {'UNKNOWN': 4449.832978365483}, 'duration': {'UNKNOWN': 2710.8742623329163}, 'count': {'UNKNOWN': 1}} {'distance': {'UNKNOWN': 6409.15214382765}, 'duration': {'UNKNOWN': 745.2548396587372}, 'count': {'UNKNOWN': 1}} {'distance': {'UNKNOWN': 163.76752941202727}, 'duration': {'UNKNOWN': 12.731597900390625}, 'count': {'UNKNOWN': 1}} {'distance': {'UNKNOWN': 4440.681532305813}, 'duration': {'UNKNOWN': 93.64632654190063}, 'count': {'UNKNOWN': 3}} {'distance': {'UNKNOWN': 366.71884583356365}, 'duration': {'UNKNOWN': 1299.6981484889984}, 'count': {'UNKNOWN': 1}} {'distance': {'UNKNOWN': 2078.9642225193907}, 'duration': {'UNKNOWN': 11704.498789787292}, 'count': {'UNKNOWN': 1}} {'distance': {'E_CAR': 2328.846316296626, 'UNKNOWN': 6971.404103278014}, 'duration': {'E_CAR': 3389.3658237457275, 'UNKNOWN': 945.0989944934845}, 'count': {'E_CAR': 1, 'UNKNOWN': 2}} {'distance': {'UNKNOWN': 155.63066751202888}, 'duration': {'UNKNOWN': 116.02340650558472}, 'count': {'UNKNOWN': 1}} {'distance': {'E_CAR': 7177.555954021382, 'UNKNOWN': 43.50518475446066}, 'duration': {'E_CAR': 1140.2288630008698, 'UNKNOWN': 175.48782801628113}, 'count': {'E_CAR': 1, 'UNKNOWN': 1}} {'distance': {'UNKNOWN': 846.2578386118126}, 'duration': {'UNKNOWN': 766.3598084449768}, 'count': {'UNKNOWN': 1}} {'distance': {'E_CAR': 104.02576498063684}, 'duration': {'E_CAR': 38.734825134277344}, 'count': {'E_CAR': 1}} {'distance': {'E_CAR': 8250.614829636044}, 'duration': {'E_CAR': 825.4048693180084}, 'count': {'E_CAR': 1}} {'distance': {'E_CAR': 6171.384206967509}, 'duration': {'E_CAR': 781.507465839386}, 'count': {'E_CAR': 1}} {'distance': {'E_CAR': 1065.2594223724116, 'UNKNOWN': 73.02336166659416}, 'duration': {'E_CAR': 491.0299062728882, 'UNKNOWN': 105.17022156715393}, 'count': {'E_CAR': 1, 'UNKNOWN': 1}} {'distance': {'E_CAR': 733.2246673403289}, 'duration': {'E_CAR': 845.3612954616547}, 'count': {'E_CAR': 1}} {'distance': {'E_CAR': 4275.13516142465}, 'duration': {'E_CAR': 541.7157530784607}, 'count': {'E_CAR': 1}} {'distance': {'E_CAR': 7391.707788417107}, 'duration': {'E_CAR': 611.8730540275574}, 'count': {'E_CAR': 1}} {'distance': {'E_CAR': 10035.89912694906}, 'duration': {'E_CAR': 1205.663957118988}, 'count': {'E_CAR': 1}} {'distance': {'E_CAR': 9085.234840541858}, 'duration': {'E_CAR': 3597.4890780448914}, 'count': {'E_CAR': 1}} ```
Ok. The The
I just need to regenerate the 'expected output' snapshots |
Ideally, I should add some new test data under After I fix the existing tests, I'll try to find a day of my own travel that has multiple BLE trips and not too many gaps / not any untracked time |
When conda installs dependencies, it will attempt to use git for any dependencies that reference a git repository (e.g. `git+https://github.com/my/repo`) So git must be installed on the container before dependencies are installed, or it will fail.
`ble_sensed_summary` is a new property for confirmed trips, so tests fail on the old snapshots. New generated snapshots are the same, but with different uuid and oids, and with empty ble_sensed_summary values in each confirmed trip. Currently we don't have any tests with BLE sensor data, so there are no tests that have a `ble_sensed_summary` that is filled in. Coming later...
There is a script to download the expected values, under |
If you're referring to the |
On master,
|
I am not sure why this happened. Will have to look at
|
@JGreenlee I am also confused as to why you end up with
when the basemode for your entry is
|
I used a different user (not me) to demonstrate the I encountered an error in running the pipeline on my opcode, causing trips to not get generated. not a regression Planning to follow up on it, but I ran out of time on Friday |
This test is failing because it expects shankari_2016-06-20.expected_confirmed_trips to have a specific UUID, which is hardcoded in the test. This came up the last time these "expected" snapshots were re-generated: e-mission@3c52375 I think it will be annoying to keep replacing the user_id every time the expected outputs change. So I just reworked the test to use whatever user_id is in the file rather than some arbitrary one
BTW there was nothing wrong with the UUID format or the way I was creating the dumps. The test that was failing (
|
Still need to add a test that runs the pipeline with some BLE sensor data & validates the expected But this PR can be reviewed & merged to unblock others if needed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with this in general. The secret sauce is e-mission-common, which I have not reviewed yet, but at least this will generate the correct data structures to unblock @Abby-Wheelis
# sys.exit(1) | ||
# TODO what to do here? What if Github is down or something? | ||
# If we terminate, will the pipeline just try again later? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should throw an exception instead of terminating. That will cause the pipeline to catch the exception and mark that state as incomplete, which will ensure that any unprocessed data will be processed in the next run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO
@@ -233,6 +236,7 @@ def create_confirmed_entry(ts, tce, confirmed_key, input_key_list): | |||
tce["data"]["cleaned_trip"]) | |||
confirmed_object_data['inferred_section_summary'] = get_section_summary(ts, cleaned_trip, "analysis/inferred_section") | |||
confirmed_object_data['cleaned_section_summary'] = get_section_summary(ts, cleaned_trip, "analysis/cleaned_section") | |||
confirmed_object_data['ble_sensed_summary'] = get_section_summary(ts, cleaned_trip, "analysis/inferred_section", mode_prop="ble_sensed_mode") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would assume that you would want to use analysis/cleaned_section
here since the inferred section is not based on raw/smoothed data from sensors, but by running an ML pipeline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sensed data goes with sensor data, makes sense
TODO
@JGreenlee I merged this and then reset and re-ran the pipeline on the production
There are a few users for whom there is no |
Ah I know, it's because I don't have the
|
We use a `git` based reference to the `em-common` repo. The related PR added git to the test dockerfile #965 to get the tests to pass. We need to add a similar command to the production Dockerfile
`ble_sensed_mode` and `ble_sensed_summary` were added to sections and confirmed trips in e-mission/e-mission-server#965. We are going to need `ble_sensed_mode` to determine vehicle info. The section summaries (`ble_sensed_summary`, `cleaned_section_summary` and `inferred_section_summary`) are not used in unprocessed trips currently, but I am adding them so there is less of a gap between composite trips and unprocessed trips
We do not need to manually install em-common since it was already installed in the base server image as part of e-mission/e-mission-server#965 and e-mission/e-mission-server@d33ce2e So we removed it in 334c95b But we do need to bump up the base image to include it
add dynamic_config.py
This will provide a way for the pipeline to access the dynamic config.
the first time
get_dynamic_config
is called, it fetches the config associated with the study from Github. The study is defined by the environment variableSTUDY_CONFIG
.The downloaded config is cached within one session (each pipeline run is a different session), so subsequent calls to
get_dynamic_config
will not need to re-fetch.add ble_sensed_mode to sections
Testing done:
-loaded a dump of dfc-fermata from 24 april into my local database
-ran
. setup/setup.sh
to update dependencies (e-mission-common is a new dependency)-located one user (the opcode I've been using the most for my own testing). cleared pipeline for that user via
./e-mission-py.bash bin/reset_pipeline.py -e nrelop_dfc-fermata_test_*
-ran pipeline for that user, supplying the STUDY_CONFIG as an environment variable
STUDY_CONFIG=dfc-fermata ./e-mission-py.bash bin/debug/intake_single_user.py -e nrelop_dfc-fermata_test_*
-observed log output. BLE modes were detected for at least some of the sections
Some sections still have a ble_sensed_mode of None but this is plausible because we hadn't integrated the BLE sensing until a couple weeks ago and I've been using this opcode longer than that.
ble_sensed_mode
:Output shows 31 sections with 'None' and 9 sections with the following:
add ble_sensed_summary to confirmed trips
Since sections now have a
ble_sensed_mode
property in addition tosensed_mode
, we can use theble_sensed_mode
as the basis for a new kind of section summary.To enable this, the
get_section_summary
function was refactored to accept an additional argument, which can be eithersensed_mode
orble_sensed_mode
. If it'sble_sensed_mode
, the summary values will be summed and grouped by thebaseMode
of the vehicle that was sensed by BLE. If the additional argument is not given, it just defaults tosensed_mode
(same behavior as before)Testing done:
-using a dump of dfc-fermata from 24 april loaded into my local database
-reset and re-ran the pipeline for a user
-inspected the
ble_sensed_summary
s of the resulting confirmed trips:Output: