Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates to support end-to-end run with laiss-resspect classifier #90

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

drewoldag
Copy link
Collaborator

@drewoldag drewoldag commented Dec 5, 2024

This is a relatively small change, but an important one in order to retrieve data used by the LAISS classifier.

This is a result of the work that @rknop has done, and summarized in this snippet from Slack: "You need to pass the parameter include_hostinfo (set to 1, or, really, something true) to get this additional information."

However, it is also indicative of a broader concern, that each external feature extractor may need to provide a definition of a specific TOM query that is necessary for that specific feature extractor.

@drewoldag drewoldag self-assigned this Dec 5, 2024
Copy link

github-actions bot commented Dec 5, 2024

Before [cd86501] After [26cc825] Ratio Benchmark (Parameter)
136±3ms 137±5ms 1.01 benchmarks.time_feature_creation
193M 193M 1 benchmarks.peakmem_learn_loop('KNN')
196M 196M 1 benchmarks.peakmem_learn_loop('RandomForest')
3.16±0.01s 3.17±0.01s 1 benchmarks.time_learn_loop('RandomForest', 'RandomSampling')
3.19±0.02s 3.15±0.02s 0.99 benchmarks.time_learn_loop('RandomForest', 'UncSampling')
169±1ms 165±3ms 0.98 benchmarks.time_learn_loop('KNN', 'RandomSampling')
176±2ms 165±1ms 0.94 benchmarks.time_learn_loop('KNN', 'UncSampling')

Click here to view all benchmarks.

@drewoldag drewoldag changed the title Updating the TOM query to include data for laiss classifier. WIP - DO NOT MERGE - Updating the TOM query to include data for laiss classifier. Dec 5, 2024
@drewoldag drewoldag changed the title WIP - DO NOT MERGE - Updating the TOM query to include data for laiss classifier. WIP - DO NOT MERGE - Updates to support end-to-end run with laiss-resspect classifier Dec 5, 2024
… - need to dynamically retrieve the canonical id column name.
…s commit, replacing with dynamic column names.
@drewoldag drewoldag changed the title WIP - DO NOT MERGE - Updates to support end-to-end run with laiss-resspect classifier Updates to support end-to-end run with laiss-resspect classifier Dec 6, 2024
@@ -391,6 +391,7 @@ def request_TOM_data(url: str = "https://desc-tom-2.lbl.gov", username: str = No
dic['mjd_now'] = mjdnow
if cheat_gentypes is not None:
dic['cheat_gentypes'] = cheat_gentypes
dic['include_hostinfo'] = True
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AmandaWasserman I wanted to check with you to make sure this is ok to keep here. I assume that this will just pull down more data from the TOM and that it wouldn't hurt anything, but if you can confirm that, I would appreciate it.

drewoldag and others added 3 commits December 11, 2024 15:14
* Initial commit to implement a "certainty sampler".

* Fix spelling error.

* Updating comments and logging messages.

* Add test for certainty sampling.
…RY` instead of the hardcoded `VALID_STRATEGY` list.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant