-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to create a Label that indicates the Future Type (or state) of something #214
Comments
Thanks for the question! Would a row-based window size be a good modeling approach? The row-based window size can get you the current purchase and the next purchase. Then, you can compare the times for labeling. I'll go through an example using this data. import composeml as cp
import pandas as pd
df = pd.read_csv(
'data.csv',
parse_dates=['transaction_time'],
index_col='transaction_id',
)
df
This labeling function will get a data slice with two rows -- the current purchase and the next purchase. It also has a def next_purchase(df, within):
if len(df) < 2: return False
within = pd.Timedelta(within)
next_time = df.index[1] - df.index[0]
return within >= next_time
lm = cp.LabelMaker(
target_entity='customer_id',
time_index='transaction_time',
labeling_function=next_purchase,
window_size=2, # two rows to get current and next purchase
) When running the search, the gap is set to one so that each data slice starts on the next purchase. lt = lm.search(
df=df.sort_values('transaction_time'),
num_examples_per_instance=-1,
gap=1, # one row to start on next purchase
within='3 days',
verbose=False,
)
lt
Let me know if this approach can work. |
Interesting approach. Thanks for sharing! Questions: Why can you use |
The data frame slices that are given to the labeling function do have the time index set as the index. During the search, the label maker sets the time index as the data frame index. Does a row-based window size work for your use case? |
May I ask how you would extend your approach for situations where there is only one product type to consider. Or to stick with the above example: Assume we are just interested in Department==Computer type of transactions. The row-based approach will take two neighboring lines while in fact what is needed is a validation of whether or not a Computer transaction will happen any time within the specified time window. Would be interested to hear your thoughts on this. |
@S-UP thanks for the question! In that case, I think it'd make sense to isolate the computer department before generating labels. We can group by the department and select computers. computers = df.groupby('department').get_group('computers')
lt = lm.search(
df=computers.sort_values('transaction_time'),
num_examples_per_instance=-1,
gap=1, # one row to start on next purchase
within='3 days',
verbose=False,
) |
Thanks. I realize I should have been more explicit. I still want to create labels per Transaction ID or, potentially, Transaction Date (i.e. aggregating all transactions into a single transaction date). So, if a customer purchases from Garden and does not purchase from Computer within the specified window, then there shall be a Next Purchase == False flag for the Garden transaction. |
Ah okay, in that case, you can use the def next_purchase(df):
department = df.iloc[0].department
return df.department.eq(department).sum() > 1
lm = cp.LabelMaker(
target_entity='customer_id',
time_index='transaction_time',
labeling_function=next_purchase,
window_size='3d', # time window
)
lt = lm.search(
df=df.sort_values('transaction_time'),
num_examples_per_instance=-1,
gap=1, # one to iterate over each transaction
verbose=False,
) If you only want labels for a single department, you can also make it a parameter to the labeling function. def next_purchase(df, department):
return df.department.eq(department).sum() > 1
lt = lm.search(
...,
department='computers',
) |
I wonder about the best approach to create a Label that generates forward-looking classes.
Example: A customer might purchase 12 times (on different dates).
I want to assign a label that says he/she will do a next purchase (within X months) after a given event was observed. Thus, after having observed the first transaction, will this customer come back and register another transaction? If so, he/she should receive a label 'will purchase again'. Else 'will NOT purchase again'.
From what I've seen Compose always constructs labels using all events up until (but excluding) another event (for which the label is then set). So I wonder how to generate a label for the last transaction observed in the above example. The 12th transaction is the last recorded and thus we would label a 'will NOT purchase again' here as we know the customer will not transact again.
The overall goal is to identify customers who are most likely to re-engage. Maybe there is also a more suitable modeling approach to this.
The text was updated successfully, but these errors were encountered: