Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Categorical trust regions #865

Merged
merged 68 commits into from
Aug 27, 2024
Merged

Categorical trust regions #865

merged 68 commits into from
Aug 27, 2024

Conversation

uri-granta
Copy link
Collaborator

@uri-granta uri-granta commented Aug 7, 2024

Related issue(s)/PRs:

Summary

Follows on from #864.

Fully backwards compatible: yes / no

PR checklist

  • The quality checks are all passing
  • The bug case / new feature is covered by tests
  • Any new features are well-documented (in docstrings or notebooks)

Base automatically changed from uri/experiment_with_encoded_models to develop August 21, 2024 07:30
@uri-granta uri-granta marked this pull request as ready for review August 21, 2024 07:35
@@ -1624,7 +1626,13 @@ def __init__(
self._y_min = tf.constant(np.inf, dtype=self.location.dtype)

def _init_eps(self) -> None:
self.eps = self._zeta * (self.global_search_space.upper - self.global_search_space.lower)
if not isinstance(self.global_search_space, HasOneHotEncoder):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using an isinstance check in this file to detect categorical spaces makes me a bit uncomfortable, as it forces developers to have to inherit from HasOneHotEncoder, and it feels too much of a special case. I think our original design decision was to use a property like is_categorical to do this. I don't have a strong objection, but just wanted to highlight that. I think we already discussed this before, but can't remeber the conclusion.

Copy link
Collaborator Author

@uri-granta uri-granta Aug 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remind me: are there any categorical spaces (where we would wish to use Hamming distances) that are also numerically bounded? Because looking at it now, it seems much more natural to write

Suggested change
if not isinstance(self.global_search_space, HasOneHotEncoder):
if self.global_search_space.has_bounds:

Certainly we shouldn't calculate eps for unbounded spaces.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. We won't have categorical spaces that have bounds. However, we could potentially have the reverse, i.e. spaces that are not bounded and are also not categorical.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well we can't use eps as written for unbounded spaces, even if they're not categorical, as it uses the bounds. Is there any reason not to default to Hamming distance for those cases, at least for now?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems fine to me. @vpicheny what do you think?

Comment on lines 2279 to 2285
# use Hamming distance for categorical spaces
return tf.math.reduce_sum(
tf.where(tf.expand_dims(points, -2) == tf.expand_dims(points, -3), 0, 1),
axis=-1,
keepdims=True, # (keep last dim for distance calculation below)
) # [num_points, num_points, 1]
else:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maybe we should add more of an explanation here, as the categorical and numerical cases are slightly inconsistent.

The size of the last dimension for the numerical case is D, i.e. the distance in each dimension is calculated separately. Each dimesion is then separately tested against distance in _get_points_within_distance and it selects neighbors if all dimensions are within distance (i.e. reduce_all below).

For the categorical case the last dimension is 1 as we do a reduce_sum. I can see why that is, as we want to effectively do a reduce_any in _get_points_within_distance, i.e. the neighbors are selected if they are within distance in any dimension.

So we can add an explanation, or alternatively do the selection below and explicitly add a reduce_any.

tests/unit/acquisition/test_rule.py Outdated Show resolved Hide resolved
@uri-granta uri-granta merged commit 2c725f9 into develop Aug 27, 2024
12 checks passed
@uri-granta uri-granta deleted the uri/categorical_trust_regions branch August 27, 2024 10:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants