Feat/Segment trip time page #61

TTalex · 2023-08-11T16:03:18Z

Hey,

This feature adds a new page helping users compute average trip duration between two selected points.

I wanted to experiment with a simple level of service indicator based on e-mission data.

The use case comes from feedback from a local authority in France. They expressed a need for alternative ways of gathering travel time information that doesn't rely on buying data from historic actors.

I do believe that e-mission can be pretty good for this, since trip completeness isn't required to compute average durations. We can get pretty good results with low amounts of data.

The initial idea sparked a discussion around map matching. While results would probably be more accurate with map matching, simple curve-fitting, as done when creating analysis/recreated_locations entries, seems to already be performing well enough.

User interaction

Here is a quick demo of the new page (the database only has one user data, mine).

The user is asked to

Select a range around start and end points
Select a start point using the left map
Select an end point using the right map

Using this information, queries are made to fetch recreated_locations matching either points. Resulting data is then displayed.

The proof of concept includes the following tables:

Median segment duration by mode of transport : This is the average time taken from start to end, divided by mode of transportation
Median segment duration by mode and hour of the day (UTC): Same as before, also divided by hour. This helps to identify peak congestion times (usually during morning and evening commute for drivers)
Median segment duration by hour of the day (UTC): Same, without mode
Median segment duration by mode and month: This is especially helpful to measure the impact of holidays
Full trip data: Allows the user to build her own tables and visualizations

More useful stats could be added in the future, for example separating between weekdays and weekends.

A commonly used statistic is average vehicle speed. This requires information on true distance travelled. However, distance is complex to compute with recreated_locations, since we only fetch start and end points, we lose the distance information in intermediary points. This would require further queries to sum all intermediary points distances. The same is true for speeds. This is not complex, but could be heavy on the database / memory. In reality, the distance is most likely already known by the user, at least for the "usual" path.

New project requirements

I have added the dash_leaflet library because built-in mapbox plots aren't great:

Click events only work on markers, you cannot detect clicks on an empty map
Drawing circles with a range in meters is not easy. Markers circles are in pixels and ignore zoom, so the solution is to rebuild a polyline with 360 points making a circle... where leaflet has a built-in Circle method

Permissions

Three configurations are linked to this new page:

segment_trip_time: User can view this page. (default true)
segment_trip_time_full_trips: User can see the table containing non-aggregated data (default true)
segment_trip_time_min_users: Minimal number of distinct users in data required to display anything (value is a number, default 0). This parameter should help with guaranteeing some kind of anonymity, otherwise a user could target a specific house as a start point, and leak personal travel data that way.

Dev notes

Up until now, the admin-dashboard was querying data once at startup. This PR behaves differently, with database queries on user actions. The code includes a few comments on why this was done this way, and the performances implications.

Hope this is useful for someone else :)

shankari · 2023-08-19T06:01:03Z

@TTalex I think that this is a great feature, and super well documented! We just deployed a release, but are planning a new one in a couple of weeks for the API upgrade, and it would be great to include this as well. I'll review and merge this weekend.

shankari · 2023-08-19T06:06:14Z

FYI, map matching has been postponed in favor of helping finish up "count every trip"

shankari

@TTalex this is a great PR. I just have some suggestions on how to integrate it into e-mission-server. I just realized that the README is obsolete, and the intern who submitted a doc PR left without addressing my comments

Went through and removed all the obsolete instructions:
43f740b

app_sidebar_collapsible.py

pages/segment_trip_time.py

utils/db_utils.py

requirements.txt

Since `$near` doesn't seem to work with it e-mission/op-admin-dashboard#61 (comment) ``` pymongo.errors.OperationFailure: $geoNear, $near, and $nearSphere are not allowed in this context, full error: {'ok': 0.0, 'errmsg': '$geoNear, $near, and $nearSphere are not allowed in this context', 'code': 2, 'codeName': 'BadValue'} ``` `$geoWithin` seems to be fine. test passes

utils/db_utils.py

…and end zones Rewrote data fetching to use the geoquery sdk

TTalex · 2023-09-01T15:46:34Z

Quoting myself from the first message of this PR:

A commonly used statistic is average vehicle speed. This requires information on true distance travelled. However, distance is complex to compute with recreated_locations, since we only fetch start and end points, we lose the distance information in intermediary points. This would require further queries to sum all intermediary points distances. The same is true for speeds. This is not complex, but could be heavy on the database / memory. In reality, the distance is most likely already known by the user, at least for the "usual" path.

This could be solved by having a total_distance_from_start field in the recreated location. I wonder if this would be an interesting feature for other use cases ?

I believe it could be implemented in add_dist_heading_speed https://github.com/e-mission/e-mission-server/blob/f78a22b7735e3877f31b29a4f7029dbd182416d4/emission/analysis/intake/cleaning/location_smoothing.py#L71 as follows:

+ import itertools
# [...]
def add_dist_heading_speed(points_df):
    # type: (pandas.DataFrame) -> pandas.DataFrame
    """
    Returns a new dataframe with an added "speed" column.
    The speed column has the speed between each point and its previous point.
    The first row has a speed of zero.
    """
    point_list = [ad.AttrDict(row) for row in points_df.to_dict('records')]
    zipped_points_list = list(zip(point_list, point_list[1:]))

    distances = [pf.calDistance(p1, p2) for (p1, p2) in zipped_points_list]
    distances.insert(0, 0)
+   distances_from_start = list(itertools.accumulate(distances))
    speeds = [pf.calSpeed(p1, p2) for (p1, p2) in zipped_points_list]
    speeds.insert(0, 0)
    headings = [pf.calHeading(p1, p2) for (p1, p2) in zipped_points_list]
    headings.insert(0, 0)

    with_distances_df = pd.concat([points_df, pd.Series(distances, name="distance")], axis=1)
+   with_distances_from_start_df = pd.concat([with_distances_df, pd.Series(distances_from_start, name="distance_from_start")], axis=1)
-   with_speeds_df = pd.concat([with_distances_df, pd.Series(speeds, name="speed")], axis=1)
+   with_speeds_df = pd.concat([with_distances_from_start_df, pd.Series(speeds, name="speed")], axis=1)
    if "heading" in with_speeds_df.columns:
        with_speeds_df.drop("heading", axis=1, inplace=True)
    with_headings_df = pd.concat([with_speeds_df, pd.Series(headings, name="heading")], axis=1)
    return with_headings_df

Recreated locations would then look like the following examples:
{metadata: {key: "analysis/recreated_location", [...]}, data: {idx: 0, distance: 0, distance_from_start:0, [...]}}
{metadata: {key: "analysis/recreated_location", [...]}, data: {idx: 1, distance: 105, distance_from_start:105, [...]}}
{metadata: {key: "analysis/recreated_location", [...]}, data: {idx: 2, distance: 65, distance_from_start:170, [...]}}
{metadata: {key: "analysis/recreated_location", [...]}, data: {idx: 3, distance: 100, distance_from_start:270, [...]}}

Computing distance from the second point (idx 1) to the last one (idx 3) would only require both points, skipping fetching idx 2: 270-105=165

Maybe it's a bit too specific for this use case to induce a change to the Location model. (and might require a patch on existing database entries for consistency 😕)

shankari · 2023-09-29T14:34:26Z

@TTalex we have created an interface for the cleaned2inferredsections mapping.
Can you switch to it? LMK if you are too busy, and one of us can handle it.
e-mission/e-mission-server#937

We can then change the implementation at will depending on the scalability vs. reuse tradeoff.

shankari · 2023-09-29T14:39:27Z

This could be solved by having a total_distance_from_start field in the recreated location. I wonder if this would be an interesting feature for other use cases ?

This is an interesting thought. Adding new entries to the data model and patching existing entries is work but fairly straightforward conceptually. If we can come up with a second use case that needs this functionality (maybe map matching), I am happy to include it. Not sure if we want to do a one-off change before that though...

TTalex · 2023-10-02T09:23:16Z

@TTalex we have created an interface for the cleaned2inferredsections mapping. Can you switch to it? LMK if you are too busy, and one of us can handle it. e-mission/e-mission-server#937

Sweet, thanks, I've made the swap.

TTalex · 2023-10-02T09:28:11Z

This is an interesting thought. Adding new entries to the data model and patching existing entries is work but fairly straightforward conceptually. If we can come up with a second use case that needs this functionality (maybe map matching), I am happy to include it. Not sure if we want to do a one-off change before that though...

I wouldn't change it either if I were you, that's why I didn't bother doing a PR :)

There might be some use cases in end users UI for it, where the change would induce slight performance improvements. For example, with point by point visualization such as this one:

But I'm not confident it would be an improvement at all, since the full list of points has to be loaded anyway.

shankari · 2023-10-29T05:10:27Z

I tried testing it on some real-world data in Denver and only got 3 trips, which seems a bit low. Going to try it against my own data...

shankari · 2023-10-29T15:18:21Z

While testing against my own data, ran into the following error

Traceback (most recent call last):
  File "/usr/src/app/pages/segment_trip_time.py", line 181, in generate_content_on_endpoints_change
    mode_by_section_id = db_utils.query_inferred_sections_modes(
  File "/usr/src/app/utils/db_utils.py", line 247, in query_inferred_sections_modes
    return esds.cleaned2inferred_section_list(sections)
  File "/usr/src/app/emission/storage/decorations/section_queries.py", line 51, in cleaned2inferred_section_list
    matching_inferred_section = cleaned2inferred_section(section_userid.get('user_id'), section_userid.get('section'))
  File "/usr/src/app/emission/storage/decorations/section_queries.py", line 45, in cleaned2inferred_section
    curr_predicted_entry = _get_inference_entry_for_section(user_id, section_id, "analysis/inferred_section", "data.cleaned_section")
  File "/usr/src/app/emission/storage/decorations/section_queries.py", line 66, in _get_inference_entry_for_section
    assert len(ret_list) <= 1, "Found len(ret_list) = %d, expected <=1" % len(ret_list)
AssertionError: Found len(ret_list) = 807, expected <=1

shankari · 2023-10-29T15:19:49Z

Looking at the logs, we have

op-admin-dash-dashboard-1  | DEBUG:root:About to query {'metadata.key': 'analysis/inferred_section', 'user_id': UUID('9c084ef4-2f97-4196-bd37-950c17938ec6'), 'data.cleaned_section': ObjectId('643874ce88f9b4eda2beca67')}
op-admin-dash-dashboard-1  | DEBUG:root:About to query {'metadata.key': 'analysis/inferred_section', 'user_id': UUID('d83a43a1-df6b-42ed-986f-f5b5f6150221'), 'data.cleaned_section': ObjectId('644e58ecb14cecd84298aae4')}
op-admin-dash-dashboard-1  | DEBUG:root:Found no inferred prediction, returning None
op-admin-dash-dashboard-1  | DEBUG:root:About to query {'metadata.key': 'analysis/inferred_section', 'user_id': UUID('d83a43a1-df6b-42ed-986f-f5b5f6150221'), 'data.cleaned_section': ObjectId('64532ffeedc48e75d0268b04')}
op-admin-dash-dashboard-1  | DEBUG:root:Found no inferred prediction, returning None
op-admin-dash-dashboard-1  | DEBUG:root:About to query {'metadata.key': 'analysis/inferred_section', 'user_id': UUID('16c2d3cd-6d62-42dc-98df-6d927cd9a3c8'), 'data.cleaned_section': ObjectId('62db2032a6977e4c0214befe')}
op-admin-dash-dashboard-1  | DEBUG:root:About to query {'metadata.key': 'analysis/inferred_section', 'user_id': UUID('d83a43a1-df6b-42ed-986f-f5b5f6150221'), 'data.cleaned_section': ObjectId('644df8edea199f1d0473e301')}

so it must be one of these sections

shankari · 2023-10-29T15:34:34Z

Bingo!

# ./e-mission-py.bash
Python 3.9.18 | packaged by conda-forge | (main, Aug 30 2023, 03:49:32)
[GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import emission.core.get_database as edb
Connecting to database URL mongodb://db/openpath_stage
>>> from uuid import UUID
>>> from bson.objectid import ObjectId
>>> edb.get_analysis_timeseries_db().find({'metadata.key': 'analysis/inferred_section', 'user_id': UUID('d83a43a1-df6b-42ed-986f-f5b5f6150221'), 'data.cleaned_section': ObjectId('644df8edea199f1d0473e301')})
<pymongo.cursor.Cursor object at 0x7f1e66313970>
>>> edb.get_analysis_timeseries_db().count_documents({'metadata.key': 'analysis/inferred_section', 'user_id': UUID('d83a43a1-df6b-42ed-986f-f5b5f6150221'), 'data.cleaned_section': ObjectId('644df8edea199f1d0473e301')})
807
>>> edb.get_analysis_timeseries_db().count_documents({'metadata.key': 'analysis/inferred_section', 'user_id': UUID('16c2d3cd-6d62-42dc-98df-6d927cd9a3c8'), 'data.cleaned_section': ObjectId('62db2032a6977e4c0214befe')})
1

This is almost certainly due to
e-mission/e-mission-docs#927 (comment)

My current guess is that it might be due to multiple calls to overpass failing, but in that previous issue, I see:

That seemed weird, since there did not appear to be any errors while generating the mode inference.

shankari · 2023-10-29T15:36:16Z

It's really weird that there are still exactly 807 matching entries. Maybe we can spend a little time today to investigate (at least on the side)

shankari · 2023-10-29T16:19:18Z

After resetting the pipeline for that user, we get 30 trips. Which is not that much but is at least greater than 3.
I am going to merge this for now, but also poke around a bit to see if this is actually correct.

shankari · 2023-10-29T16:26:05Z

pages/segment_trip_time.py

+        times = pd.to_datetime(df['start_fmt_time'], errors='coerce', utc=True)
+        duration_per_hour = format_duration_df(
+            df.groupby(times.dt.hour).agg({'duration': 'median', 'section': 'count'}),
+            time_column_name='Hour',
+        )


I just tried to use this myself, and having the hours in UTC is very annoying. We already have the split out components in local time in the data.start_local_dt field

From emission/core/wrapper/section.py

"start_local_dt": ecwb.WrapperBase.Access.WORM, # ********searchable datatime in local time of start location

I will spend ~ 10 mins trying to fix this myself while merging the change, but will file a cleanup issue if I can't get that to work.

shankari · 2023-10-29T16:43:59Z

After resetting and re-running the pipeline, we get 14k points at the start and 824 points at the end.
But only 31 overlaps. I bet it is because of shorter segments. Let's try with a smaller segment of road so that it is more likely to be in the same section.

op-admin-dash-dashboard-1  | DEBUG:root:Found 14390 results
op-admin-dash-dashboard-1  | DEBUG:root:After de-duping, converted 14390 points to 14390

op-admin-dash-dashboard-1  | DEBUG:root:Found 824 results
op-admin-dash-dashboard-1  | DEBUG:root:After de-duping, converted 824 points to 824

shankari · 2023-10-29T16:48:17Z

As expected, there are more trips for a short segment although still not as many as we would like.

shankari · 2023-10-29T17:05:19Z

Ran into the duplicate entries for another user
e-mission/e-mission-docs#927 (comment)

We might want to write a check for this and run it on production before pushing it out.

Let's switch to open access for a bit to see if things are better.

achasmita · 2023-11-16T06:07:48Z

I got more trips when I selected bigger zone:

shankari · 2023-11-17T05:42:19Z

@achasmita I don't see how big these zones are. How did you pick them?
How do you validate that the number of trips is correct?

achasmita · 2023-11-20T01:08:33Z

@achasmita I don't see how big these zones are. How did you pick them? How do you validate that the number of trips is correct?

While selecting zone, I checked trip table to find the best areas based on latitude and longitude that had the most trips:

I also observed locs_matching_start amd locs_matching_end data before removing duplicates and after removing duplicates and once they are filtered:

locs_matching_start before and after removing duplicates (left) and locs_matching_end before and after removing duplicates (right)

After selecting the start and end zone, duplicate sections are removed. It is keeping first occured section and removing the other occurences.

data after merged and filtered (After removing duplicates and after filtering data for which [merged['idx_x'] < merged['idx_y']])

After observing data what I can see now is:

Also if there is an overlap it will not include those section as both section will have same idx.
And manually verified this result:

achasmita · 2023-11-21T22:09:27Z

I plotted some co-ordinates (25-30) from trip table manually by checking it on google map and it was giving the correct result for number of trip and I verified ObjectId and UserId.

Expanded version, dictated to @shankari

I picked 25-30 trips from the trip table that had similar start and end locations
I plotted these on Google Maps so visualize their spatial location
Then, I determined, using the visualization, what the polygons that cover them are
then, I drew those polygons on the trip time table
Then, I saw the same number of trips and they had the same ObjectId and UserId

This shows us that the trips that are within the start and end polygons are shown in the trip time table.
It does not show us that the trip time table shows all trips with that start and end polygon.

Concretely, if the trips that you found were $t_1, t_2,....t_{30}$ and there were 5 other trips $t_{31}...t_{35}$ that were in the same polygons, but you didn't spot check them because they were on page 25, then you don't know that they were excluded.

My concern is not that the trip time table is inaccurate, but that it is incomplete.

achasmita · 2023-11-22T20:48:09Z

I tried exploring start and end zone with different segment size:
-I was getting very few trips

I also explored data in both zone(start/end)
I was not able to figure out anything new.

shankari · 2023-11-22T21:08:05Z

@achasmita Thank you for adding additional examples with the results of your investigation. I have some more questions.

You said that you picked ~ 25-30 trips, but there are actually 105 trips displayed above. Can you explain this discrepancy?
I don't understand the follow-on.

locs_matching_start before and after removing duplicates (left) and locs_matching_end before and after removing duplicates (right)
After selecting the start and end zone, duplicate sections are removed. It is keeping first occured section and removing the other occurences.
But we first select the start and end zones - that is what allows us to find the locs_matching_start and locs_matching_end, right?

Can you also expand on what you did in "I also explored data in both zone(start/end)"? How did you explore the data, and what were the results?

achasmita · 2023-11-22T21:25:42Z

@achasmita Thank you for adding additional examples with the results of your investigation. I have some more questions.

You said that you picked ~ 25-30 trips, but there are actually 105 trips displayed above. Can you explain this discrepancy?

I don't understand the follow-on.

locs_matching_start before and after removing duplicates (left) and locs_matching_end before and after removing duplicates (right)
After selecting the start and end zone, duplicate sections are removed. It is keeping first occured section and removing the other occurences.
But we first select the start and end zones - that is what allows us to find the locs_matching_start and locs_matching_end, right?

Can you also expand on what you did in "I also explored data in both zone(start/end)"? How did you explore the data, and what were the results?

The above screeshot was just to make sure if i am selecting correct size of zone, I will find the other screenshot for 25-30 trips and post it soon.

For data, I printed tops 50 and bottom 50 data after removing duplicates and compared it with data in trip table to see if i can figure out if any data is missing.

shankari · 2023-11-22T21:53:50Z

@achasmita

For data, I printed tops 50 and bottom 50 data after removing duplicates and compared it with data in trip table to see if i can figure out if any data is missing.

Can you expand on this?

achasmita · 2023-11-22T23:50:32Z

@achasmita

For data, I printed tops 50 and bottom 50 data after removing duplicates and compared it with data in trip table to see if i can figure out if any data is missing.

Can you expand on this?

I printed data on locs_matching_start once the duplicates are excluded
I compared the result with data on trip table
As I selected start zone considering start coordinates (data.start_loc_coordinates) in trip table.

shankari · 2024-02-01T19:55:58Z

My main concern with this is we were getting very few trips displayed for basically any start/end combo.

I want to see "large number" of trips OR
I want some explanation of why they don't exist

@achasmita was not able to get (1) without making the polygons really large and wasn't able to come up with (2).
The next steps that I was going to do was to look at locations where I anticipate having a lot of trips and then seeing why they don't show up in the list

Aa a concrete example, on staging, I would expect to see a lot of trips from my house to the library or to the grocery store nearby or to my kids' school. In particular, I would expect to see at least 100 trips from my house to the local school. Similarly, in the Denver area, you could see the locations that are hotspots in the heatmap and try to see if there are trips between them.

JGreenlee · 2024-02-16T00:19:21Z

While trying this branch, I initially got this error:

AttributeError: module 'emission.storage.decorations.section_queries' has no attribute 'cleaned2inferred_section_list'

I realized this is because this branch was using an old image of e-mission-server (shankari/e-mission-server:gis-based-mode-detection_2023-04-21--54-09).
Which was probably before cleaned2inferred_section_list was added.

Updated to the most recent (shankari/e-mission-server:master_2024-02-10--19-38) and rebuilt.

Now there's a different error:

Traceback (most recent call last):
  File "/usr/src/app/pages/segment_trip_time.py", line 181, in generate_content_on_endpoints_change
    mode_by_section_id = db_utils.query_inferred_sections_modes(
  File "/usr/src/app/utils/db_utils.py", line 200, in query_inferred_sections_modes
    return esds.cleaned2inferred_section_list(sections)
  File "/usr/src/app/emission/storage/decorations/section_queries.py", line 51, in cleaned2inferred_section_list
    matching_inferred_section = cleaned2inferred_section(section_userid.get('user_id'), section_userid.get('section'))
  File "/usr/src/app/emission/storage/decorations/section_queries.py", line 45, in cleaned2inferred_section
    curr_predicted_entry = _get_inference_entry_for_section(user_id, section_id, "analysis/inferred_section", "data.cleaned_section")
  File "/usr/src/app/emission/storage/decorations/section_queries.py", line 66, in _get_inference_entry_for_section
    assert len(ret_list) <= 1, "Found len(ret_list) = %d, expected <=1" % len(ret_list)
AssertionError: Found len(ret_list) = 807, expected <=1

There are 807 inferred sections for one cleaned section??

I inspected the logs to see which UUID + section this is happening for. It happens on:
DEBUG:root:About to query {'metadata.key': 'analysis/inferred_section', 'user_id': UUID('d83a43a1-df6b-42ed-986f-f5b5f6150221'), 'data.cleaned_section': ObjectId('644df8edea199f1d0473e301')}

Sure enough, there are 807 inferred section entries for that UUID and that cleaned section.
They must be duplicate entries because they are identical except for their _id

query = {
  'metadata.key': 'analysis/inferred_section',
  'user_id': UUID('d83a43a1-df6b-42ed-986f-f5b5f6150221'),
  'data.cleaned_section': ObjectId('644df8edea199f1d0473e301')
}

r = edb.get_analysis_timeseries_db().find(query)
for i in r:
    print(i)

{'_id': ObjectId('644e6f67b14cecd84298f6b2'), 'user_id': UUID('d83a43a1-df6b-42ed-986f-f5b5f6150221'), 'metadata': {'key': 'analysis/inferred_section', 'platform': 'server', 'write_ts': 1682831597.6901028, 'time_zone': 'America/Los_Angeles', 'write_local_dt': {'year': 2023, 'month': 4, 'day': 29, 'hour': 22, 'minute': 13, 'second': 17, 'weekday': 5, 'timezone': 'America/Los_Angeles'}, 'write_fmt_time': '2023-04-29T22:13:17.690103-07:00'}, 'data': {'source': 'SmoothedHighConfidenceMotion', 'trip_id': ObjectId('644df8dbea199f1d0473e2ff'), 'start_ts': 1659712158.517581, 'start_local_dt': {'year': 2022, 'month': 8, 'day': 5, 'hour': 8, 'minute': 9, 'second': 18, 'weekday': 4, 'timezone': 'America/Los_Angeles'}, 'start_fmt_time': '2022-08-05T08:09:18.517581-07:00', ...
{'_id': ObjectId('644e7949cdabcb78bc676484'), 'user_id': UUID('d83a43a1-df6b-42ed-986f-f5b5f6150221'), 'metadata': {'key': 'analysis/inferred_section', 'platform': 'server', 'write_ts': 1682831597.6901028, 'time_zone': 'America/Los_Angeles', 'write_local_dt': {'year': 2023, 'month': 4, 'day': 29, 'hour': 22, 'minute': 13, 'second': 17, 'weekday': 5, 'timezone': 'America/Los_Angeles'}, 'write_fmt_time': '2023-04-29T22:13:17.690103-07:00'}, 'data': {'source': 'SmoothedHighConfidenceMotion', 'trip_id': ObjectId('644df8dbea199f1d0473e2ff'), 'start_ts': 1659712158.517581, 'start_local_dt': {'year': 2022, 'month': 8, 'day': 5, 'hour': 8, 'minute': 9, 'second': 18, 'weekday': 4, 'timezone': 'America/Los_Angeles'}, 'start_fmt_time': '2022-08-05T08:09:18.517581-07:00', ...
{'_id': ObjectId('644e88388464a359f04a7c74'), 'user_id': UUID('d83a43a1-df6b-42ed-986f-f5b5f6150221'), 'metadata': {'key': 'analysis/inferred_section', 'platform': 'server', 'write_ts': 1682831597.6901028, 'time_zone': 'America/Los_Angeles', 'write_local_dt': {'year': 2023, 'month': 4, 'day': 29, 'hour': 22, 'minute': 13, 'second': 17, 'weekday': 5, 'timezone': 'America/Los_Angeles'}, 'write_fmt_time': '2023-04-29T22:13:17.690103-07:00'}, 'data': {'source': 'SmoothedHighConfidenceMotion', 'trip_id': ObjectId('644df8dbea199f1d0473e2ff'), 'start_ts': 1659712158.517581, 'start_local_dt': {'year': 2022, 'month': 8, 'day': 5, 'hour': 8, 'minute': 9, 'second': 18, 'weekday': 4, 'timezone': 'America/Los_Angeles'}, 'start_fmt_time': '2022-08-05T08:09:18.517581-07:00', ...
{'_id': ObjectId('644e95d6fac3c75f1a08eb28'), 'user_id': UUID('d83a43a1-df6b-42ed-986f-f5b5f6150221'), 'metadata': {'key': 'analysis/inferred_section', 'platform': 'server', 'write_ts': 1682831597.6901028, 'time_zone': 'America/Los_Angeles', 'write_local_dt': {'year': 2023, 'month': 4, 'day': 29, 'hour': 22, 'minute': 13, 'second': 17, 'weekday': 5, 'timezone': 'America/Los_Angeles'}, 'write_fmt_time': '2023-04-29T22:13:17.690103-07:00'}, 'data': {'source': 'SmoothedHighConfidenceMotion', 'trip_id': ObjectId('644df8dbea199f1d0473e2ff'), 'start_ts': 1659712158.517581, 'start_local_dt': {'year': 2022, 'month': 8, 'day': 5, 'hour': 8, 'minute': 9, 'second': 18, 'weekday': 4, 'timezone': 'America/Los_Angeles'}, 'start_fmt_time': '2022-08-05T08:09:18.517581-07:00', ...
{'_id': ObjectId('644ea4395e9649cf426dd33d'), 'user_id': UUID('d83a43a1-df6b-42ed-986f-f5b5f6150221'), 'metadata': {'key': 'analysis/inferred_section', 'platform': 'server', 'write_ts': 1682831597.6901028, 'time_zone': 'America/Los_Angeles', 'write_local_dt': {'year': 2023, 'month': 4, 'day': 29, 'hour': 22, 'minute': 13, 'second': 17, 'weekday': 5, 'timezone': 'America/Los_Angeles'}, 'write_fmt_time': '2023-04-29T22:13:17.690103-07:00'}, 'data': {'source': 'SmoothedHighConfidenceMotion', 'trip_id': ObjectId('644df8dbea199f1d0473e2ff'), 'start_ts': 1659712158.517581, 'start_local_dt': {'year': 2022, 'month': 8, 'day': 5, 'hour': 8, 'minute': 9, 'second': 18, 'weekday': 4, 'timezone': 'America/Los_Angeles'}, 'start_fmt_time': '2022-08-05T08:09:18.517581-07:00', ...
{'_id': ObjectId('644eb16ab2270c7ba1ae53fc'), 'user_id': UUID('d83a43a1-df6b-42ed-986f-f5b5f6150221'), 'metadata': {'key': 'analysis/inferred_section', 'platform': 'server', 'write_ts': 1682831597.6901028, 'time_zone': 'America/Los_Angeles', 'write_local_dt': {'year': 2023, 'month': 4, 'day': 29, 'hour': 22, 'minute': 13, 'second': 17, 'weekday': 5, 'timezone': 'America/Los_Angeles'}, 'write_fmt_time': '2023-04-29T22:13:17.690103-07:00'}, 'data': {'source': 'SmoothedHighConfidenceMotion', 'trip_id': ObjectId('644df8dbea199f1d0473e2ff'), 'start_ts': 1659712158.517581, 'start_local_dt': {'year': 2022, 'month': 8, 'day': 5, 'hour': 8, 'minute': 9, 'second': 18, 'weekday': 4, 'timezone': 'America/Los_Angeles'}, 'start_fmt_time': '2022-08-05T08:09:18.517581-07:00', ...
{'_id': ObjectId('644ebf8b7c7a20f08cd4d7dc'), 'user_id': UUID('d83a43a1-df6b-42ed-986f-f5b5f6150221'), 'metadata': {'key': 'analysis/inferred_section', 'platform': 'server', 'write_ts': 1682831597.6901028, 'time_zone': 'America/Los_Angeles', 'write_local_dt': {'year': 2023, 'month': 4, 'day': 29, 'hour': 22, 'minute': 13, 'second': 17, 'weekday': 5, 'timezone': 'America/Los_Angeles'}, 'write_fmt_time': '2023-04-29T22:13:17.690103-07:00'}, 'data': {'source': 'SmoothedHighConfidenceMotion', 'trip_id': ObjectId('644df8dbea199f1d0473e2ff'), 'start_ts': 1659712158.517581, 'start_local_dt': {'year': 2022, 'month': 8, 'day': 5, 'hour': 8, 'minute': 9, 'second': 18, 'weekday': 4, 'timezone': 'America/Los_Angeles'}, 'start_fmt_time': '2022-08-05T08:09:18.517581-07:00', ...
{'_id': ObjectId('644ecddcb4d41475b8330bee'), 'user_id': UUID('d83a43a1-df6b-42ed-986f-f5b5f6150221'), 'metadata': {'key': 'analysis/inferred_section', 'platform': 'server', 'write_ts': 1682831597.6901028, 'time_zone': 'America/Los_Angeles', 'write_local_dt': {'year': 2023, 'month': 4, 'day': 29, 'hour': 22, 'minute': 13, 'second': 17, 'weekday': 5, 'timezone': 'America/Los_Angeles'}, 'write_fmt_time': '2023-04-29T22:13:17.690103-07:00'}, 'data': {'source': 'SmoothedHighConfidenceMotion', 'trip_id': ObjectId('644df8dbea199f1d0473e2ff'), 'start_ts': 1659712158.517581, 'start_local_dt': {'year': 2022, 'month': 8, 'day': 5, 'hour': 8, 'minute': 9, 'second': 18, 'weekday': 4, 'timezone': 'America/Los_Angeles'}, 'start_fmt_time': '2022-08-05T08:09:18.517581-07:00', ...
{'_id': ObjectId('644edbdf3e1640bffa2a6051'), 'user_id': UUID('d83a43a1-df6b-42ed-986f-f5b5f6150221'), 'metadata': {'key': 'analysis/inferred_section', 'platform': 'server', 'write_ts': 1682831597.6901028, 'time_zone': 'America/Los_Angeles', 'write_local_dt': {'year': 2023, 'month': 4, 'day': 29, 'hour': 22, 'minute': 13, 'second': 17, 'weekday': 5, 'timezone': 'America/Los_Angeles'}, 'write_fmt_time': '2023-04-29T22:13:17.690103-07:00'}, 'data': {'source': 'SmoothedHighConfidenceMotion', 'trip_id': ObjectId('644df8dbea199f1d0473e2ff'), 'start_ts': 1659712158.517581, 'start_local_dt': {'year': 2022, 'month': 8, 'day': 5, 'hour': 8, 'minute': 9, 'second': 18, 'weekday': 4, 'timezone': 'America/Los_Angeles'}, 'start_fmt_time': '2022-08-05T08:09:18.517581-07:00', ...
{'_id': ObjectId('644ee98a98049428510657bb'), 'user_id': UUID('d83a43a1-df6b-42ed-986f-f5b5f6150221'), 'metadata': {'key': 'analysis/inferred_section', 'platform': 'server', 'write_ts': 1682831597.6901028, 'time_zone': 'America/Los_Angeles', 'write_local_dt': {'year': 2023, 'month': 4, 'day': 29, 'hour': 22, 'minute': 13, 'second': 17, 'weekday': 5, 'timezone': 'America/Los_Angeles'}, 'write_fmt_time': '2023-04-29T22:13:17.690103-07:00'}, 'data': {'source': 'SmoothedHighConfidenceMotion', 'trip_id': ObjectId('644df8dbea199f1d0473e2ff'), 'start_ts': 1659712158.517581, 'start_local_dt': {'year': 2022, 'month': 8, 'day': 5, 'hour': 8, 'minute': 9, 'second': 18, 'weekday': 4, 'timezone': 'America/Los_Angeles'}, 'start_fmt_time': '2022-08-05T08:09:18.517581-07:00', ...
{'_id': ObjectId('644ef78671732fce1ba61e81'), 'user_id': UUID('d83a43a1-df6b-42ed-986f-f5b5f6150221'), 'metadata': {'key': 'analysis/inferred_section', 'platform': 'server', 'write_ts': 1682831597.6901028, 'time_zone': 'America/Los_Angeles', 'write_local_dt': {'year': 2023, 'month': 4, 'day': 29, 'hour': 22, 'minute': 13, 'second': 17, 'weekday': 5, 'timezone': 'America/Los_Angeles'}, 'write_fmt_time': '2023-04-29T22:13:17.690103-07:00'}, 'data': {'source': 'SmoothedHighConfidenceMotion', 'trip_id': ObjectId('644df8dbea199f1d0473e2ff'), 'start_ts': 1659712158.517581, 'start_local_dt': {'year': 2022, 'month': 8, 'day': 5, 'hour': 8, 'minute': 9, 'second': 18, 'weekday': 4, 'timezone': 'America/Los_Angeles'}, 'start_fmt_time': '2022-08-05T08:09:18.517581-07:00', ...
{'_id': ObjectId('644f05e5635c15953692eeb0'), 'user_id': UUID('d83a43a1-df6b-42ed-986f-f5b5f6150221'), 'metadata': {'key': 'analysis/inferred_section', 'platform': 'server', 'write_ts': 1682831597.6901028, 'time_zone': 'America/Los_Angeles', 'write_local_dt': {'year': 2023, 'month': 4, 'day': 29, 'hour': 22, 'minute': 13, 'second': 17, 'weekday': 5, 'timezone': 'America/Los_Angeles'}, 'write_fmt_time': '2023-04-29T22:13:17.690103-07:00'}, 'data': {'source': 'SmoothedHighConfidenceMotion', 'trip_id': ObjectId('644df8dbea199f1d0473e2ff'), 'start_ts': 1659712158.517581, 'start_local_dt': {'year': 2022, 'month': 8, 'day': 5, 'hour': 8, 'minute': 9, 'second': 18, 'weekday': 4, 'timezone': 'America/Los_Angeles'}, 'start_fmt_time': '2022-08-05T08:09:18.517581-07:00', ...
{'_id': ObjectId('644f13cf3a381d0218807744'), 'user_id': UUID('d83a43a1-df6b-42ed-986f-f5b5f6150221'), 'metadata': {'key': 'analysis/inferred_section', 'platform': 'server', 'write_ts': 1682831597.6901028, 'time_zone': 'America/Los_Angeles', 'write_local_dt': {'year': 2023, 'month': 4, 'day': 29, 'hour': 22, 'minute': 13, 'second': 17, 'weekday': 5, 'timezone': 'America/Los_Angeles'}, 'write_fmt_time': '2023-04-29T22:13:17.690103-07:00'}, 'data': {'source': 'SmoothedHighConfidenceMotion', 'trip_id': ObjectId('644df8dbea199f1d0473e2ff'), 'start_ts': 1659712158.517581, 'start_local_dt': {'year': 2022, 'month': 8, 'day': 5, 'hour': 8, 'minute': 9, 'second': 18, 'weekday': 4, 'timezone': 'America/Los_Angeles'}, 'start_fmt_time': '2022-08-05T08:09:18.517581-07:00', ...
...

I removed those duplicates and tried again, but there appears to be duplicates for all the other sections too.

I am unsure why there are so many duplicates.
Do they exist in the dataset? Or did I create duplicates while I was loading in the dataset?

JGreenlee · 2024-02-16T18:03:59Z

I removed the duplicates manually with this script:

import emission.core.get_database as edb

# get cleaned sections
cleaned = edb.get_analysis_timeseries_db().find({
    'metadata.key': 'analysis/cleaned_section',
})

cleaned_ct = 0
for c in cleaned:
    cleaned_ct += 1
    query_inferred = {
      'metadata.key': 'analysis/inferred_section',
      'user_id': c['user_id'],
      'data.cleaned_section': c['_id']
    }
    
    first_inferred = edb.get_analysis_timeseries_db().find_one(query_inferred)
    if first_inferred is None:
        print(f"cleaned section {cleaned_ct} had no inferred sections")
        continue

    # remove all of those entries unless the ID is the first inferred section
    dedup_query = {
      'metadata.key': 'analysis/inferred_section',
      'user_id': c['user_id'],
      'data.cleaned_section': c['_id'],
      '_id': {'$ne': first_inferred['_id']}
    }
    delete_result = edb.get_analysis_timeseries_db().delete_many(dedup_query)
    print(f"removed {delete_result.deleted_count} duplicates from cleaned section {cleaned_ct}")

It took awhile to run.

Now I am finally able to test the Segment Trip Time page.

I would expect to see a lot of trips from my house to the library or to the grocery store nearby or to my kids' school. In particular, I would expect to see at least 100 trips from my house to the local school.

For trips from home to school, I found 222 trips spanning from July 2022 to December 2023. This seems to align with expectations because there are about 180 school days in 1 year.

The boxes I used were about the size of 1 block. I will follow up with smaller boxes

JGreenlee · 2024-02-19T05:06:22Z

Home to school

<image>
Home to school with boxes about the size of 1 block, got 222 trips

<image>
Home to school with boxes about half that size, 197 trips (still pretty good)

The trips span from July 2022 to December 2023. This seems to align with expectations because there are about 180 school days in 1 year.
Based on the dates, it generally makes sense – large gaps are observed for summer break etc
<image>

So I do think this is probably a pretty comprehensive measure of this repeated trip

Home to viola class

<image>
Home to viola class, using smaller boxes, 52 trips

<image>
This trip is observed almost every week, sometimes twice in the same day but not always

<image>
1 "bicycling" trip to viola class which took 18 minutes - I think this was probably just mislabeled

Methodology for drawing boxes

I found these usage guidelines quite helpful and accurate:
<image>

For reference, below are the heatmaps around those 3 areas of interest. I drew the boxes considering where the locus of activity seems to be for each area (and considering the guidelines above)

(I wonder if this tool could potentially be even more useful + easy to use if a heatmap was actually overlayed on the start/end selection area? I found myself switching back and forth often)

Conclusion

Based on this, the tool does appear to work as expected. It captured a fairly comprehensive, if not fully comprehensive, picture of the above recurring trips. I also briefly validated the tool against my own travel data.
I think the instructions for usage are clear as well.

The only changes I might suggest would include a heatmap overlay to make it easier to identify places of activity while drawing boxes, and potentially a toggle to "swap" start and end locations. (I see a common use case where the user has seen the duration from A to B - now they want to see the duration from B to A, but they don't want to have to re-draw the boxes)

shankari · 2024-02-19T05:16:40Z

@JGreenlee thanks for the comprehensive review! Given the length of time that this has been pending, I will merge the changes now for the next round and we can address the UX improvements in a subsequent round.

JGreenlee · 2024-02-19T05:27:30Z

@shankari Great!
~~When this is merged, the Dockerfile must be updated with a more recent image of e-mission-server~~

While trying this branch, I initially got this error:
AttributeError: module 'emission.storage.decorations.section_queries' has no attribute 'cleaned2inferred_section_list'
I realized this is because this branch was using an old image of e-mission-server (shankari/e-mission-server:gis-based-mode-detection_2023-04-21--54-09). Which was probably before cleaned2inferred_section_list was added.

Updated to the most recent (shankari/e-mission-server:master_2024-02-10--19-38) and rebuilt.

JGreenlee · 2024-02-19T05:28:37Z

Actually it looks like you just did that a few days ago!

JGreenlee · 2024-03-18T15:04:22Z

I resolved the merge conflicts for this feature and updated it to observe the global filters (which were added since this feature was created).

apply date range & uuid filters to the 'segment trip time' page

The 'segment trip time' page was written before we had the global filters implemented. So we need to patch them in now to get this feature up to speed and merged.

Datepicker values and excluded uuids (if any) are passed through to the query_segments_crossing_endpoints function. Within this function is the DB call. We pass a time_query arg based on the selected dates + timezone. We also pass a query in the extra_query_list arg that will exclude any entries where the user_id is found in the excluded_uuids list.

@shankari
These changes are on my branch https://github.com/JGreenlee/op-admin-dashboard/tree/segment_trip_time_resolved_conflicts

TTalex added 3 commits August 11, 2023 16:10

Small UTC date tweaks to avoid problems on utc positive tz

849bab4

Feature: Added segment trip time page

286ce3f

Fix typos

c56af09

shankari requested changes Aug 20, 2023

View reviewed changes

TTalex added 4 commits August 23, 2023 16:05

Merge branch 'master' into segment_trip_time

33b61f3

Cleanup requirements

84f946d

Remove db index creation (moved to emission core)

88c9665

Drop SON usage, makes for cleaner dicts

25e5179

TTalex mentioned this pull request Aug 23, 2023

New index creation: data.cleaned_section on analysis db e-mission/e-mission-server#934

Merged

shankari mentioned this pull request Aug 27, 2023

Add test to combine $geoWithin with count_documents e-mission/e-mission-server#936

Merged

shankari requested changes Aug 27, 2023

View reviewed changes

utils/db_utils.py Outdated Show resolved Hide resolved

TTalex added 2 commits August 29, 2023 15:28

Dropped point and range selection, users now draw polygons for start …

59e0c98

…and end zones Rewrote data fetching to use the geoquery sdk

Segment trip time: also use sdk to fetch sensed modes

6cdf8e6

shankari mentioned this pull request Sep 1, 2023

Add Time-Series Support for Counting Items in the Database e-mission/e-mission-docs#933

Closed

MukuFlash03 mentioned this pull request Sep 29, 2023

Added initial interface for fetching inferred section modes e-mission/e-mission-server#937

Merged

Use new cleaned2inferred_section_list interface

1fea161

shankari approved these changes Oct 26, 2023

View reviewed changes

shankari mentioned this pull request Oct 29, 2023

⚗️inferred sections may not be linked to their corresponding trips e-mission/e-mission-docs#927

Open

shankari reviewed Oct 29, 2023

View reviewed changes

e-mission deleted a comment from achasmita Nov 17, 2023

shankari merged commit 51bc245 into e-mission:master Mar 19, 2024

shankari mentioned this pull request Mar 19, 2024

Feat/Segment trip time page #105

Merged

Feat/Segment trip time page #61

Feat/Segment trip time page #61

Conversation

TTalex commented Aug 11, 2023

User interaction

New project requirements

Permissions

Dev notes

shankari commented Aug 19, 2023

shankari commented Aug 19, 2023

shankari left a comment

Choose a reason for hiding this comment

TTalex commented Sep 1, 2023 • edited Loading

shankari commented Sep 29, 2023

shankari commented Sep 29, 2023

TTalex commented Oct 2, 2023

TTalex commented Oct 2, 2023

shankari commented Oct 29, 2023

shankari commented Oct 29, 2023

shankari commented Oct 29, 2023

shankari commented Oct 29, 2023

shankari commented Oct 29, 2023

shankari commented Oct 29, 2023

shankari Oct 29, 2023

Choose a reason for hiding this comment

shankari commented Oct 29, 2023

shankari commented Oct 29, 2023

shankari commented Oct 29, 2023

achasmita commented Nov 16, 2023 • edited Loading

shankari commented Nov 17, 2023

achasmita commented Nov 20, 2023

achasmita commented Nov 21, 2023 • edited by shankari Loading

achasmita commented Nov 22, 2023

shankari commented Nov 22, 2023

achasmita commented Nov 22, 2023

shankari commented Nov 22, 2023

achasmita commented Nov 22, 2023

shankari commented Feb 1, 2024

JGreenlee commented Feb 16, 2024

JGreenlee commented Feb 16, 2024

JGreenlee commented Feb 19, 2024

Home to school

Home to viola class

Methodology for drawing boxes

Conclusion

shankari commented Feb 19, 2024

JGreenlee commented Feb 19, 2024 • edited Loading

JGreenlee commented Feb 19, 2024

JGreenlee commented Mar 18, 2024

apply date range & uuid filters to the 'segment trip time' page

TTalex commented Sep 1, 2023 •

edited

Loading

achasmita commented Nov 16, 2023 •

edited

Loading

achasmita commented Nov 21, 2023 •

edited by shankari

Loading

JGreenlee commented Feb 19, 2024 •

edited

Loading