Skip to content

Commit

Permalink
Filtered out rows with dictionary in user_label_df
Browse files Browse the repository at this point in the history
The idea is to check the data type using isinstance() and then apply this check on the entire data frame as a whole instead of doing it iteratively on each row which is much slower.

These rows are then filtered out of the original dataframe leaving behind only the non-dict rows.
  • Loading branch information
Mahadik, Mukul Chandrakant authored and Mahadik, Mukul Chandrakant committed Jan 26, 2024
1 parent 54659fb commit a9c8257
Showing 1 changed file with 3 additions and 0 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -295,6 +295,9 @@ def _generate_predictions(self):
# compute unique label sets and their probabilities in one cluster
# 'p' refers to probability
group_cols = user_label_df.columns.tolist()
# Filtering out rows from the user_label_df if they are dictionary objects which come from the survey inputs provided by the users instead of multilabels
if 'trip_user_input' in group_cols:
user_label_df = user_label_df.loc[user_label_df['trip_user_input'].apply(lambda x: not isinstance(x, dict))]
unique_labels = user_label_df.groupby(group_cols).size().reset_index(name='uniqcount')
unique_labels['p'] = unique_labels.uniqcount / sum_trips
labels_columns = user_label_df.columns.to_list()
Expand Down

0 comments on commit a9c8257

Please sign in to comment.