-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🗃️ Add an interface to support returning the cleaned section <-> inferred section mapping for a set of cleaned sections #970
Comments
One problem with using a range for this specific use case is that we get the list of section ids from the list of points that were within a polygon. This could span a wide date range (e.g. months) or a wide geographic range (the trajectories passed through a location but the start and end could be anywhere). An intermediate tradeoff could be that we still use the time range, but split it up so that we don't have to read too many sections at a time. Note that this is similar to the
we can then continue to use the timeseries-based data model but not have to make
|
@MukuFlash03 here's the next issue for you to work on |
I see that the current implementation with the loop requires a user_id and a section_id to be passed in. For the batch method, you can take in a list of user_ids and section_ids or a list of {user_id, section_id} dictionaries. Essentially you can go from one of those representations to the other either by doing zip or a list comprehension that splits it out. or take only section ids and just implement the performance optimization for now I think we will have to tweak the interface a bit over time and polish it depending on new use case that come in |
Since the initial code implementation was ready, I thought of first adding the required functionality before optimizing and also worked on the tests. I saw that the functionality involved the keys
My doubt is whether this is the right file that I should be using for testing the section queries? I have this concern since the sample data format does not match query being formed to fetch data in With respect to this data file,
Will work on code implementation again for now, then move back to testing. |
I also see that the code uses the analysis timeseries db to query for data. So, I believe this is not the right way to test functionality involving analysis timeseries db. The other functions in emission/storage/decorations/section_queries.py like get_sections_for_trip() and get_sections_for_trip_list() are tested by creating a new section and inserting it into the analysis timeseries db using So, now also considering this testing approach but need to see how sensed_mode is to be set and accessed. |
after setting up the example, you need to run the pipeline. That will create the analysis results. |
The issue I am facing is, after setting up example and running intake pipeline, the analysis timeseries data contains only analysis/cleaned_section keys with sections but no analysis/inferred_section keys. I also think the function I am unsure whether I first have to manually convert cleaned_section data to inferred_section? How would I first create my test data containing inferred_sections? So, I have been trying to understand the entire data flow from setting up example datasets to obtaining the analysis data in the appropriate timeseries dbs. I found the pipeline implementation in However, Comments here say that the intake pipeline for mode inference testing may not be correct. I do see that a sample dataset exists which progresses towards getting inferred_section keys, but I’m not sure how the inferred_section file was created. The 1st one is the raw data, while 2nd one results from running intake pipeline.
Also, I do see emission.run model pipeline for mode inference but the code looks incomplete and unused anywhere else. Still trying to understand how to generate inferred_sections. |
We currently have two mode inference algorithms - one based on Random Forest from sensor data (speed, acceleration...) Seed_model.json is saved random forest model. Alternative, use GIS based testing branch, which may eventually become master branch and current branch becomes random-forest branch. |
Fixed in e-mission/e-mission-server#937 |
This will allow us to have a generic interface for use by the dashboards while optimizing the implementation later.
This is currently needed for
e-mission/op-admin-dashboard@6cdf8e6#diff-1c6b8e6d103286796ce21a8276c4a4d8b258e29d6b9cc6df516a92accf4674d1R201
The desired interface would be something like:
cleaned2inferred_section_list
, similar to the currentcleaned2inferred_section
but with a list passed in. The initial implementation could be the simple loop at: e-mission/op-admin-dashboard@6cdf8e6#diff-1c6b8e6d103286796ce21a8276c4a4d8b258e29d6b9cc6df516a92accf4674d1R201-R206A performance optimization would be the original implementation with
e-mission/op-admin-dashboard@6cdf8e6#diff-1c6b8e6d103286796ce21a8276c4a4d8b258e29d6b9cc6df516a92accf4674d1L199-L202
Although, given our data model, I would prefer an optimization in which we retrieved potentially matching inferred modes by time range or geo-range and then matched them up in memory. In general, with the timeseries data model, we want to avoid using the linkages (the foreign keys) between collections because they would not necessarily be searchable in a real timeseries database. They are more of a relational data model.
If we did go with the timeseries approach, we could also close
e-mission/e-mission-server#934
@TTalex
The text was updated successfully, but these errors were encountered: