Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

synced with main, cleaned outputs #134

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1,699 changes: 1,699 additions & 0 deletions replacement_mode_modeling/01_extract_db_data.ipynb

Large diffs are not rendered by default.

508 changes: 508 additions & 0 deletions replacement_mode_modeling/02_run_trip_level_models.py

Large diffs are not rendered by default.

1,243 changes: 1,243 additions & 0 deletions replacement_mode_modeling/03_user_level_models.ipynb

Large diffs are not rendered by default.

1,166 changes: 1,166 additions & 0 deletions replacement_mode_modeling/04_FeatureClustering.ipynb

Large diffs are not rendered by default.

929 changes: 929 additions & 0 deletions replacement_mode_modeling/05_biogeme_modeling.ipynb

Large diffs are not rendered by default.

42 changes: 42 additions & 0 deletions replacement_mode_modeling/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@

# Efforts towards predicting the replaced mode without user labels

## Prerequisites:
- These experiments were conducted on top of the `emission` anaconda environment. Please ensure that this environment is available to you before re-running the code.
- In addition, some notebooks use `seaborn` for plotting and `pandarallel` for parallel pandas processing. The packages can be installed in the following manner:

```
(After activating emission conda env)
pip3 install pandarallel==1.6.5
pip3 install seaborn==0.12.2
```

- Ensure you have the following data sources loaded in your MongoDB Docker container:
- Stage_database (All CEO)
- Durham
- Masscec
- Ride2own
- UPRM NICR

- Additionally, please also procure the CanBikeCO survey CSV file and place it in the `viz_scripts/` directory.

- Once these data sources are procured and loaded in your Mongo container, you will need to add the inferred sections to the data. To do this, please run the [add_sections_and_summaries_to_trips.py](https://github.com/e-mission/e-mission-server/blob/master/bin/historical/migrations/add_sections_and_summaries_to_trips.py) script. **NOTE**: If you see a lot of errors in the log, try to re-run the script by modifying the following line from:

```language=python
# Before
eps.dispatch(split_lists, skip_if_no_new_data=False, target_fn=add_sections_to_trips)

# After
eps.dispatch(split_lists, skip_if_no_new_data=False, target_fn=None)
```

This will trigger the intake pipeline for the current db and add the inferred section.

- Note 2: The script above did not work for the All CEO data for me. Therefore, I obtained the section durations using the `get_section_durations` method I've written in the first notebook. Please note that running this script takes a long time and it is advised to cache the generated output.

## Running the experiments
The order in which the experiments are to be run are denoted by the preceding number. The following is a brief summary about each notebook:
1. `01_extract_db_data.ipynb`: This notebook extracts the data, performs the necessary preprocessing, updates availability indicators, computes cost estimates, and stores the preprocessed data in `data/filtered_trips`.
2. `02_run_trip_level_models.py`: This script reads all the preprocessed data, fits trip-level models with different stratitifications, generates the outputs, and stores them in `outputs/benchmark_results/`.
3. `03_user_level_models.ipynb`: This notebook explores user fingerprints, similarity searching, and naive user-level models.
4. `04_FeatureClustering.ipynb`: This notebook performs two functions: (a) Cluster users based on demographics/trip feature summaries and check for target distributions across clusters, and (b) Cluster users by grouping w.r.t. the target and checking for feature homogeneity within clusters
1 change: 1 addition & 0 deletions replacement_mode_modeling/data/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Temporary folder
Loading