Skip to content

Commit

Permalink
Update clustering.py (e-mission#37)
Browse files Browse the repository at this point in the history
* Update clustering.py

Changes in clustering.py file to shift dependency from hlu09's  tour_model_extended to main branch trip_model. Still need to change type of data being passed to fit function for this to work.

* moving clustering_examples.ipynb to trip_model

All dependencies of this notebook from  custom branch are removed. There currently seems no errors while generating maps in clustering_examples notebook.

* Removing changes in builtimeseries.py

With these changes, no change in e-mission-server should be required.

* Changes to support TRB_Label_Assist

passing way of clustering to the e-mission-server. It was 'origin-destination' by default. Now can take one of three values,  'origin','destination' or 'origin-destination'.

* suggestions

previous suggestions to improve readability.

* Revert "suggestions"

This reverts commit 3e19b32.

* Improving readability

Suggestions from previous comments to improve readability.

* making `cluster_performance.ipynb`, `generate_figs_for_poster` and  `SVM_decision_boundaries`  compatible with changes in `clustering.py` and `mapping.py` files. Also porting these 3 notebooks to trip_model

`cluster_performance.ipynb`, `generate_figs_for_poster` and  `SVM_decision_boundaries`  now have no dependence on the custom branch. Results of plots  are attached to show no difference in theie previous and current outputs.

* Unified Interface for fit function

Unified Interface for fit function across all models. Passing 'Entry' Type data from the notebooks till the Binning functions.  Default set to 'none'.

* Fixing `models.py` to support `regenerate_classification_performance_results.py`

Prior to this update, `NaiveBinningClassifier` in 'models.py' had dependencies on both of tour model and trip model. Now, this classifier is completely dependent on trip model. All the other notebooks (except `classification_performance.ipynb`) were tested as well and they are working as usual.

 Other minor fixes to support previous changes.

* [PARTIALLY TESTED] Single database read and   Code Cleanuo

1. removed mentions of `tour_model` or `tour_model_first_only` .

2. removed two reads from database.

3. Removed notebook outputs  ( this could be the reason a few diffs are too big to view)

* Delete TRB_label_assist/first_trial_results/cv results DBSCAN+SVM (destination).csv

not required.

* Reverting Notebook

Reverting notebooks to initial state, since running on the browser messed up the cell index numbers.  This was causing unnecessary git diffs even when no changes were made. running on VS code should resolve this. WIll do the subsequent changes on VS code and commit again.

* [Partially Tested]Handled Whitespaces

Whitespaces corrected.

* [Partially Tested] Suggested changes implemented

`Classification_performance` and `regenerate_classification_performance_results.py` are not tested yet as they would take too long to run. The itertools removal in these two files is tested in other notebooks and it works.  Other files, like models.py will be tested once  any of the above two are run.

* Revert "[Partially Tested] Suggested changes implemented"

This reverts commit bb404e9.

* [Partially Tested] Suggested changes implemented

[Partially Tested] Suggested changes implemented
bb404e9
`Classification_performance` and `regenerate_classification_performance_results.py` are not tested yet as they would take too long to run. The itertools removal in these two files is tested in other notebooks and it works. Other files, like models.py will be tested once any of the above two are run.

* Minor variable fixes

Fixed names of variables to be more self-explanatory

* [TESTED] All the notebooks and files are tested

1. Change in models file a.t. changes in greedy_similarity_binning in e-mission-server

2.Minor fixes

* Minor Fixes

Minor Fixes to improve readability.

* Minor Fixes in models.py

Improved readability
  • Loading branch information
humbleOldSage authored Nov 25, 2023
1 parent 86f2fa3 commit 4788f27
Show file tree
Hide file tree
Showing 8 changed files with 76 additions and 28 deletions.
7 changes: 6 additions & 1 deletion TRB_label_assist/SVM_decision_boundaries.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
"import emission.storage.timeseries.abstract_timeseries as esta\n",
"import emission.storage.decorations.trip_queries as esdtq\n",
"import emission.core.get_database as edb\n",
"import emission.analysis.modelling.trip_model.run_model as eamtr\n",
"\n",
"import data_wrangling\n",
"from clustering import add_loc_clusters"
Expand Down Expand Up @@ -60,10 +61,12 @@
"uuids = [suburban_uuid, college_campus_uuid]\n",
"confirmed_trip_df_map = {}\n",
"labeled_trip_df_map = {}\n",
"ct_entry={}\n",
"expanded_trip_df_map = {}\n",
"for u in uuids:\n",
" ts = esta.TimeSeries.get_time_series(u)\n",
" ct_df = ts.get_data_df(\"analysis/confirmed_trip\")\n",
" ct_entry[u]=eamtr._get_training_data(u,None)\n",
" ct_df = ts.to_data_df(\"analysis/confirmed_trip\",ct_entry[u])\n",
" confirmed_trip_df_map[u] = ct_df\n",
" labeled_trip_df_map[u] = esdtq.filter_labeled_trips(ct_df)\n",
" expanded_trip_df_map[u] = esdtq.expand_userinputs(labeled_trip_df_map[u])"
Expand Down Expand Up @@ -110,6 +113,8 @@
" df_for_cluster = all_trips_df if cluster_unlabeled else labeled_trips_df\n",
"\n",
" df_for_cluster = add_loc_clusters(df_for_cluster,\n",
" ct_entry,\n",
" clustering_way='destination',\n",
" radii=radii,\n",
" alg=alg,\n",
" loc_type=loc_type,\n",
Expand Down
9 changes: 5 additions & 4 deletions TRB_label_assist/classification_performance.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -19,15 +19,14 @@
"import pandas as pd\n",
"import numpy as np\n",
"from uuid import UUID\n",
"\n",
"import matplotlib.pyplot as plt\n",
"\n",
"# import logging\n",
"# logging.basicConfig(level=logging.DEBUG)\n",
"\n",
"import emission.storage.timeseries.abstract_timeseries as esta\n",
"import emission.storage.decorations.trip_queries as esdtq\n",
"\n",
"import emission.analysis.modelling.trip_model.run_model as eamtr\n",
"from performance_eval import get_clf_metrics, cv_for_all_algs, PREDICTORS"
]
},
Expand All @@ -49,10 +48,11 @@
"labeled_trip_df_map = {}\n",
"expanded_labeled_trip_df_map = {}\n",
"expanded_all_trip_df_map = {}\n",
"ct_entry={}\n",
"for u in all_users:\n",
" ts = esta.TimeSeries.get_time_series(u)\n",
" ct_df = ts.get_data_df(\"analysis/confirmed_trip\")\n",
"\n",
" ct_entry[u]=eamtr._get_training_data(u,None)\n",
" ct_df = ts.to_data_df(\"analysis/confirmed_trip\",ct_entry[u])\n",
" confirmed_trip_df_map[u] = ct_df\n",
" labeled_trip_df_map[u] = esdtq.filter_labeled_trips(ct_df)\n",
" expanded_labeled_trip_df_map[u] = esdtq.expand_userinputs(\n",
Expand Down Expand Up @@ -132,6 +132,7 @@
"# load in all runs\n",
"model_names = list(PREDICTORS.keys())\n",
"cv_results = cv_for_all_algs(\n",
" ct_entry,\n",
" uuid_list=all_users,\n",
" expanded_trip_df_map=expanded_labeled_trip_df_map,\n",
" model_names=model_names,\n",
Expand Down
12 changes: 8 additions & 4 deletions TRB_label_assist/cluster_performance.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,10 @@
"source": [
"%load_ext autoreload\n",
"%autoreload 2\n",
"\n",
"import pandas as pd\n",
"import matplotlib.pyplot as plt\n",
"from matplotlib.gridspec import GridSpec\n",
"\n",
"import emission.analysis.modelling.trip_model.run_model as eamtr\n",
"import emission.storage.timeseries.abstract_timeseries as esta\n",
"import emission.storage.decorations.trip_queries as esdtq\n",
"import performance_eval\n",
Expand All @@ -45,10 +44,11 @@
"labeled_trip_df_map = {}\n",
"expanded_labeled_trip_df_map = {}\n",
"expanded_all_trip_df_map = {}\n",
"ct_entry={}\n",
"for u in all_users:\n",
" ts = esta.TimeSeries.get_time_series(u)\n",
" ct_df = ts.get_data_df(\"analysis/confirmed_trip\")\n",
"\n",
" ct_entry[u]=eamtr._get_training_data(u,None) \n",
" ct_df = ts.to_data_df(\"analysis/confirmed_trip\",ct_entry[u]) \n",
" confirmed_trip_df_map[u] = ct_df\n",
" labeled_trip_df_map[u] = esdtq.filter_labeled_trips(ct_df)\n",
" expanded_labeled_trip_df_map[u] = esdtq.expand_userinputs(\n",
Expand Down Expand Up @@ -87,6 +87,8 @@
"\n",
" all_results_df = performance_eval.run_eval_cluster_metrics(\n",
" expanded_labeled_trip_df_map,\n",
" ct_entry,\n",
" clustering_way='destination',\n",
" user_list=all_users,\n",
" radii=radii,\n",
" loc_type='end',\n",
Expand Down Expand Up @@ -265,6 +267,8 @@
"\n",
"SVM_results_df = performance_eval.run_eval_cluster_metrics(\n",
" expanded_labeled_trip_df_map,\n",
" ct_entry,\n",
" clustering_way=\"destination\",\n",
" user_list=all_users,\n",
" radii=radii,\n",
" loc_type='end',\n",
Expand Down
21 changes: 17 additions & 4 deletions TRB_label_assist/clustering_examples.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -26,12 +26,11 @@
"%autoreload 2\n",
"\n",
"from uuid import UUID\n",
"\n",
"import emission.storage.timeseries.abstract_timeseries as esta\n",
"import emission.storage.decorations.trip_queries as esdtq\n",
"import emission.core.get_database as edb\n",
"\n",
"import mapping"
"import emission.analysis.modelling.trip_model.run_model as eamtr\n",
"import mapping\n"
]
},
{
Expand Down Expand Up @@ -60,9 +59,11 @@
"confirmed_trip_df_map = {}\n",
"labeled_trip_df_map = {}\n",
"expanded_trip_df_map = {}\n",
"ct_entry={}\n",
"for u in uuids:\n",
" ts = esta.TimeSeries.get_time_series(u)\n",
" ct_df = ts.get_data_df(\"analysis/confirmed_trip\")\n",
" ct_entry[u]=eamtr._get_training_data(u,None) \n",
" ct_df = ts.to_data_df(\"analysis/confirmed_trip\",ct_entry[u]) \n",
" confirmed_trip_df_map[u] = ct_df\n",
" labeled_trip_df_map[u] = esdtq.filter_labeled_trips(ct_df)\n",
" expanded_trip_df_map[u] = esdtq.expand_userinputs(labeled_trip_df_map[u])"
Expand All @@ -83,8 +84,10 @@
"outputs": [],
"source": [
"fig = mapping.find_plot_clusters(expanded_trip_df_map[suburban_uuid],\n",
" ct_entry[suburban_uuid],\n",
" alg='naive',\n",
" loc_type='end',\n",
" clustering_way=\"destination\",\n",
" plot_unlabeled=False,\n",
" cluster_unlabeled=False,\n",
" radii=[50, 100, 150])\n",
Expand All @@ -98,8 +101,10 @@
"outputs": [],
"source": [
"fig = mapping.find_plot_clusters(expanded_trip_df_map[college_campus_uuid],\n",
" ct_entry[college_campus_uuid],\n",
" alg='naive',\n",
" loc_type='end',\n",
" clustering_way=\"destination\",\n",
" plot_unlabeled=False,\n",
" cluster_unlabeled=False,\n",
" radii=[50, 100, 150])\n",
Expand All @@ -121,9 +126,11 @@
"outputs": [],
"source": [
"fig = mapping.find_plot_clusters(expanded_trip_df_map[suburban_uuid],\n",
" ct_entry[suburban_uuid],\n",
" alg='DBSCAN',\n",
" SVM=False,\n",
" loc_type='end',\n",
" clustering_way=\"destination\",\n",
" plot_unlabeled=False,\n",
" cluster_unlabeled=False,\n",
" radii=[50, 100, 150, 200])\n",
Expand All @@ -137,9 +144,11 @@
"outputs": [],
"source": [
"fig = mapping.find_plot_clusters(expanded_trip_df_map[college_campus_uuid],\n",
" ct_entry[college_campus_uuid],\n",
" alg='DBSCAN',\n",
" SVM=False,\n",
" loc_type='end',\n",
" clustering_way=\"destination\",\n",
" plot_unlabeled=False,\n",
" cluster_unlabeled=False,\n",
" radii=[50, 100, 150, 200])\n",
Expand All @@ -161,9 +170,11 @@
"outputs": [],
"source": [
"fig = mapping.find_plot_clusters(expanded_trip_df_map[suburban_uuid],\n",
" ct_entry[suburban_uuid],\n",
" alg='DBSCAN',\n",
" SVM=True,\n",
" loc_type='end',\n",
" clustering_way=\"destination\",\n",
" plot_unlabeled=False,\n",
" cluster_unlabeled=False,\n",
" radii=[50, 100, 150, 200])\n",
Expand All @@ -177,9 +188,11 @@
"outputs": [],
"source": [
"fig = mapping.find_plot_clusters(expanded_trip_df_map[college_campus_uuid],\n",
" ct_entry[college_campus_uuid],\n",
" alg='DBSCAN',\n",
" SVM=True,\n",
" loc_type='end',\n",
" clustering_way=\"destination\",\n",
" plot_unlabeled=False,\n",
" cluster_unlabeled=False,\n",
" radii=[50, 100, 150, 200])\n",
Expand Down
21 changes: 17 additions & 4 deletions TRB_label_assist/generate_figs_for_poster.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -29,15 +29,14 @@
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"import matplotlib\n",
"\n",
"from sklearn.pipeline import make_pipeline\n",
"from sklearn.preprocessing import StandardScaler\n",
"from sklearn import svm\n",
"\n",
"import emission.storage.timeseries.abstract_timeseries as esta\n",
"import emission.storage.decorations.trip_queries as esdtq\n",
"import emission.core.get_database as edb\n",
"\n",
"import emission.analysis.modelling.trip_model.run_model as eamtr\n",
"import mapping\n",
"import data_wrangling\n",
"from clustering import add_loc_clusters"
Expand Down Expand Up @@ -67,9 +66,11 @@
"confirmed_trip_df_map = {}\n",
"labeled_trip_df_map = {}\n",
"expanded_trip_df_map = {}\n",
"ct_entry={}\n",
"for u in uuids:\n",
" ts = esta.TimeSeries.get_time_series(u)\n",
" ct_df = ts.get_data_df(\"analysis/confirmed_trip\")\n",
" ct_entry[u]=eamtr._get_training_data(u,None) \n",
" ct_df = ts.to_data_df(\"analysis/confirmed_trip\",ct_entry[u]) \n",
" confirmed_trip_df_map[u] = ct_df\n",
" labeled_trip_df_map[u] = esdtq.filter_labeled_trips(ct_df)\n",
" expanded_trip_df_map[u] = esdtq.expand_userinputs(labeled_trip_df_map[u])"
Expand Down Expand Up @@ -98,8 +99,10 @@
"outputs": [],
"source": [
"fig = mapping.find_plot_clusters(expanded_trip_df_map[user1_uuid],\n",
" ct_entry[user1_uuid],\n",
" alg='naive',\n",
" loc_type='end',\n",
" clustering_way='destination',\n",
" plot_unlabeled=False,\n",
" cluster_unlabeled=False,\n",
" radii=[50, 100, 150])\n",
Expand Down Expand Up @@ -137,9 +140,11 @@
"outputs": [],
"source": [
"fig = mapping.find_plot_clusters(expanded_trip_df_map[user2_uuid],\n",
" ct_entry[user2_uuid],\n",
" alg='DBSCAN',\n",
" SVM=False,\n",
" loc_type='end',\n",
" clustering_way='destination',\n",
" plot_unlabeled=False,\n",
" cluster_unlabeled=False,\n",
" radii=[150])\n",
Expand All @@ -161,9 +166,11 @@
"outputs": [],
"source": [
"fig = mapping.find_plot_clusters(expanded_trip_df_map[user2_uuid],\n",
" ct_entry[user2_uuid],\n",
" alg='DBSCAN',\n",
" SVM=True,\n",
" loc_type='end',\n",
" clustering_way='destination',\n",
" plot_unlabeled=False,\n",
" cluster_unlabeled=False,\n",
" radii=[150])\n",
Expand Down Expand Up @@ -289,8 +296,14 @@
"\n",
" labeled_trips_df = all_trips_df.loc[all_trips_df.user_input != {}]\n",
" df_for_cluster = all_trips_df if cluster_unlabeled else labeled_trips_df\n",
"\n",
" if loc_type=='start':\n",
" clustering_way='origin'\n",
" else:\n",
" clustering_way='destination'\n",
" \n",
" df_for_cluster = add_loc_clusters(df_for_cluster,\n",
" ct_entry,\n",
" clustering_way=clustering_way,\n",
" radii=radii,\n",
" alg=alg,\n",
" loc_type=loc_type,\n",
Expand Down
8 changes: 5 additions & 3 deletions TRB_label_assist/get_performance_for_poster.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
"\n",
"import emission.storage.timeseries.abstract_timeseries as esta\n",
"import emission.storage.decorations.trip_queries as esdtq\n",
"import emission.analysis.modelling.trip_model.run_model as eamtr\n",
"\n",
"from performance_eval import get_clf_metrics, cv_for_all_algs, PREDICTORS"
]
Expand All @@ -48,10 +49,11 @@
"labeled_trip_df_map = {}\n",
"expanded_labeled_trip_df_map = {}\n",
"expanded_all_trip_df_map = {}\n",
"ct_entry={}\n",
"for u in all_users:\n",
" ts = esta.TimeSeries.get_time_series(u)\n",
" ct_df = ts.get_data_df(\"analysis/confirmed_trip\")\n",
"\n",
" ct_entry[u]=eamtr._get_training_data(u,None) \n",
" ct_df = ts.to_data_df(\"analysis/confirmed_trip\",ct_entry[u]) \n",
" confirmed_trip_df_map[u] = ct_df\n",
" labeled_trip_df_map[u] = esdtq.filter_labeled_trips(ct_df)\n",
" expanded_labeled_trip_df_map[u] = esdtq.expand_userinputs(\n",
Expand Down Expand Up @@ -113,7 +115,7 @@
" 'random forests (O-D, destination clusters)',\n",
" 'random forests (coordinates)'\n",
"]\n",
"cv_results = cv_for_all_algs(\n",
"cv_results = cv_for_all_algs(ct_entry,\n",
" uuid_list=all_users,\n",
" expanded_trip_df_map=expanded_labeled_trip_df_map,\n",
" model_names=model_names,\n",
Expand Down
Loading

0 comments on commit 4788f27

Please sign in to comment.