Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review/generic loss questions #60

Closed
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 27 additions & 7 deletions cyclic_boosting/generic_loss.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,31 +65,51 @@ def calc_parameters(
float, float
estimated parameters and its uncertainties
"""
sorting = feature.lex_binned_data.argsort()
sorted_bins = feature.lex_binned_data[sorting]
bins, split_indices = np.unique(sorted_bins, return_index=True)
split_indices = split_indices[1:]
# ! TODO: [Q] Why these operations?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The general idea is to do an independent optimization in each bin of the feature at hand. And the outcome of each of these optimizations is the factor (or summand for additive CB modes) and its uncertainty.

# I probably do not understand the CB algo, high level explainer would be great
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference to the other (older) CB modes is that we do the optimizations here numerically by explicitly minimizing a loss, rather than analytically. But in principle, this is the same thing.

sorting = feature.lex_binned_data.argsort() # 1. get element index row-wise ordered from smallest to greatest
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct

sorted_bins = feature.lex_binned_data[sorting] # 2. return the bins sorted from smallest to greatest
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct

# do not quite understand how this works
# as my example with a3=np.random.rand(3,10) and a3[a3.argsort()] was returning an IndexError
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lex_binnded_data is just a vector

bins, split_indices = np.unique(
sorted_bins, return_index=True
) # 3. return only the unique values for each bin ordered
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This returns the unique values (only needed for the special case of empty bins in multi-dimensional features) and its indices. The latter are needed for the split of all the target and prediction values in the different bins.

split_indices = split_indices[1:] # 5. drop the zero index bin
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are looking for bin ranges.


y_pred = np.hstack((y[..., np.newaxis], self.unlink_func(pred.predict_link())[..., np.newaxis]))
# 6. joining the values of the target variable with those of the predictions
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct

y_pred = np.hstack((y_pred, self.weights[..., np.newaxis]))
# 7. joining the previous matrix with the weights (of each input variable?)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct

y_pred_bins = np.split(y_pred[sorting], split_indices)
# 8. sort the predictions according to the bins (of the input variable?) and split this into bins
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, split the target and prediction values in the bins of the feature considered here. This is done to perform independent optimizations in the following.


# keep potential empty bins in multi-dimensional features
all_bins = range(max(feature.lex_binned_data) + 1)
empty_bins = list(set(bins) ^ set(all_bins))
empty_bins = set(bins) ^ set(all_bins)
# 9. returns the elements which are either in set(bins)
# or set(all_bins).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exclusive or. The idea is to find empty bins. These are in all_bins but not in bins. Empty bins can occur in multi-dimensional features (which are mapped to one-dimensional structure before this function).

# ! TODO: list can be removed as only iterator is used below
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair enough

# why does this return the empty bins though?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comment above about multi-dimensional features

# because all_bins is a superset of sorted_bins, this is tantamount to finding the values
# which are not in bins. Bins return a list of all the values
# check, for example, a5= np.array([[i*j + 1 for i in range(0,3)] for j in range(0,3)])
# bins , split_indices = np.unique(a5, return_index=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point is, you do not find empty bins in lex_binned_data. So, there can be multi-dimensional bins which we would miss here. But we have to include it to not mess up the multi-dimensional binning structure.

for i in empty_bins:
y_pred_bins.insert(i, np.zeros((0, 3)))
y_pred_bins.insert(i, np.zeros((0, 3))) # ! TODO: [Q] Is the (0,3) format due to (y, y_hat, weights)?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes


n_bins = len(y_pred_bins)
parameters = np.zeros(n_bins)
uncertainties = np.zeros(n_bins)

# 10. Try to minimize a loss function given y, y_pred and the weights?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But do this independently for each bin of the feature at hand.

for bin in range(n_bins):
parameters[bin], uncertainties[bin] = self.optimization(
y_pred_bins[bin][:, 0], y_pred_bins[bin][:, 1], y_pred_bins[bin][:, 2]
)
# ! TODO: What parameters are being returned?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameters returned are the factors (or summands for additive CB modes) of the different bins and its uncertainties (needed for the smoothing).


neutral_factor = self.unlink_func(np.array(self.neutral_factor_link))
# 11. if there is one more bin corresponding to the neutral factor, then add it to the parameters
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct

if n_bins + 1 == feature.n_bins:
parameters = np.append(parameters, neutral_factor)
uncertainties = np.append(uncertainties, 0)
Expand Down Expand Up @@ -404,7 +424,7 @@ def quantile_global_scale(
weights: np.ndarray,
prior_prediction_column: Union[str, int, None],
link_func,
) -> None:
) -> Tuple:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right

"""
Calculation of the global scale for quantile regression, corresponding
to the (continuous approximation of the) respective quantile of the
Expand Down