-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Determine how to support running analyses on the dataset #24
Comments
I can think of three options for how to do this:
|
Playing around with NEDB: an example notebook where:
2021-03-26 20:14:27,545:DEBUG:123145524649984:Found 14 messages in response to query {'user_id': UUID('4aebf2e0-f097-4845-8652-2ada3a76dadd'), '$or': [{'metadata.key': 'statemachine/transition'}], 'metadata.write_ts': {'$lte': 1581048094.196096, '$gte': 1581005599.791271}}
|
to use the timeseries. you will need to add the e-mission server directory to your |
Let's do the easy design first.
|
The goal of this project is to make it easier for others to come up with their own algorithms. I think that the two options are:
We have time to implement one option, not two. |
I'm leaning the second option -- having the e-mission algorithms published would make it easier for others to come up with their own algorithms based on our own provided implementation. Refactoring the e-mission codebase so that core and storage are more compartmentalized would probably also make things easier for us to work with on our end in the long run. Am I headed in the right direction? |
I'm actually leaning towards the first option. The main difference is that it is by no means clear to me that anybody wants to come up with new algorithms based on our provided implementation. I think that people want to start with the data, explore it, and try out ML libraries ( We have one customer request: did he ask for mongodump (which would have used the database and the existing algorithms) or files (which would be more in line with the ML library approach)? |
I see, that makes sense. The customer asked for files @shankari. |
@singhish ok, let's try to get a second data point with you pretending to be a customer since you are not as close to the data as I am. Let's say you want to enter a challenge in which you need to segment a trip into multiple unimodal segments. Would you prefer to work with notebooks that had a simpler embedded baseline, or try to work with that code to understand and improve it? |
This might be a personal thing, but as a developer, probably the latter, as working with code feels a lot nicer to me as opposed to dealing with the overhead associated with running a notebook. Data scientists might prefer the former though. @shankari |
Specifically, the notebooks ending in
_master
are failing due to how our current analysis pipeline is set up. A decision needs to be made regarding how to handle the analysis that is currently being done by using the analysis-related keys on E-Mission.The text was updated successfully, but these errors were encountered: