Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added class AttrStandardizer #15

Merged
merged 15 commits into from
Sep 13, 2024
Merged

Added class AttrStandardizer #15

merged 15 commits into from
Sep 13, 2024

Conversation

saanikat
Copy link
Member

@saanikat saanikat commented Aug 9, 2024

No description provided.

@saanikat
Copy link
Member Author

saanikat commented Aug 9, 2024

#8: Created a class AttrStandardizer, model gets instantiated only once.

# Instantiate the AttrStandardizer class with the specified schema
model = AttrStandardizer("ENCODE")

# Call standardize to get predictions
results = model.standardize(pep ="geo/gse178283:default")

attribute_standardizer/attr_standardizer_class.py

@saanikat
Copy link
Member Author

saanikat commented Aug 9, 2024

I have changed the CONFIDENCE_THREHOLD for prediction to 0.51 and user is given top 3 predictions provided all are greater than the CONFIDENCE_THRESHOLD else only the predictions greater than CONFIDENCE_THRESHOLD are returned.

@saanikat saanikat changed the title Likely solved #8 and #10, added top 3 predictions, changed CONFIDENCE THRESHOLD Added class AttrStandardizer Aug 9, 2024
Copy link
Member

@nleroy917 nleroy917 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems great. I had some small comments, but overall definitely an improvement over before. I think the class-based approach is not only more ergonomic, but also enables only loading the model once.

README.md Outdated Show resolved Hide resolved
attribute_standardizer/attr_standardizer_class.py Outdated Show resolved Hide resolved
Comment on lines 80 to 87
(
input_size_values,
input_size_values_embeddings,
input_size_headers,
hidden_size,
output_size,
dropout_prob,
) = self._get_parameters()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a huge deal... but this feels like it should be a dictionary, not a tuple returning a ton of params like this.

attribute_standardizer/attribute_standardizer.py Outdated Show resolved Hide resolved
import warnings


# TODO : convert to single np array before converting to tensor
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏼

@nleroy917
Copy link
Member

@saanikat

I have changed the CONFIDENCE_THREHOLD for prediction to 0.51 and user is given top 3 predictions provided all are greater than the CONFIDENCE_THRESHOLD else only the predictions greater than CONFIDENCE_THRESHOLD are returned.

Should this be a parameter in the .standardize() function? That way the user can specify it and try to tune things...

@nleroy917 nleroy917 closed this Aug 10, 2024
@nleroy917 nleroy917 reopened this Aug 10, 2024
@khoroshevskyi khoroshevskyi self-requested a review August 27, 2024 17:05
@saanikat
Copy link
Member Author

Added BEDBASE schema as option.

Changed the calculations used for generating the embeddings using get_top_cluster_average

@khoroshevskyi khoroshevskyi changed the base branch from master to dev September 12, 2024 18:35
@khoroshevskyi khoroshevskyi mentioned this pull request Sep 12, 2024
@saanikat
Copy link
Member Author

#18 and #19 solved.

Copy link
Member

@khoroshevskyi khoroshevskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks good, just small things to be changed

Comment on lines +117 to +119
except Exception as e:
logger.error(f"Error loading the model: {str(e)}")
raise
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a little bit incorrect. First you are catching exception, and then raising exception without providing actual error.
Solution for that is:

  1. Do not use try except
  2. Do not raise exception
  3. Raise custom error (bedmess exception)

attribute_standardizer/attr_standardizer.py Outdated Show resolved Hide resolved
@saanikat
Copy link
Member Author

bedmess has been changed to bedms in the README.md in saanika branch.
The function name has been changed from show_available_schemas to get_available_schemas as requested by @khoroshevskyi

Copy link
Member

@khoroshevskyi khoroshevskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@khoroshevskyi khoroshevskyi merged commit bc78cf6 into dev Sep 13, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants