You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I encountered an assert issue when training RMI on very small (100 items) datasets for testing purposes:
thread '<unnamed>' panicked at 'start index was 100 but end index was 100', [...]/RMI/rmi_lib/src/train/two_layer.rs:27:5
Upon closer investigation, I think I have found an off-by-one error in the train_two_layer function implementation. I did not take a look at the context beyond the function, therefore take what I am about to say with a grain of salt:
An obvious fix would be to change the condition in [3.] to split_idx >= md_container.len() - 1, however I am not entirely certain whether that leads to issues in other contexts. I guess a similar issue would happen if similar_idx == 0, only for the first call. I changed the condition in my local version and re-ran the tests - it seems to work just fine:
Running test cache_fix_osm
Test cache_fix_osm finished.
Running test cache_fix_wiki
Test cache_fix_wiki finished.
Running test max_size_wiki
Test max_size_wiki finished.
Running test radix_model_wiki
Test radix_model_wiki finished.
Running test simple_model_osm
Test simple_model_osm finished.
Running test simple_model_wiki
Test simple_model_wiki finished.
============== TEST RESULTS ===============
python3 report.py
PASS cache_fix_osm
PASS cache_fix_wiki
PASS max_size_wiki
PASS radix_model_wiki
PASS simple_model_osm
PASS simple_model_wiki
I can open a pull request with that fix if you would like.
The text was updated successfully, but these errors were encountered:
Hi,
I encountered an assert issue when training
RMI
on very small (100 items) datasets for testing purposes:Upon closer investigation, I think I have found an off-by-one error in the
train_two_layer
function implementation. I did not take a look at the context beyond the function, therefore take what I am about to say with a grain of salt:Link to source file
split_idx
is calculated here:RMI/rmi_lib/src/train/two_layer.rs
Line 132 in 5fdff45
split_idx
should be in the interval[0, md_container.len())
RMI/rmi_lib/src/train/two_layer.rs
Line 139 in 5fdff45
Now lets look at the case where
split_idx == md_container.len() - 1
, which is valid per [2.]:split_idx < md_container.len()
RMI/rmi_lib/src/train/two_layer.rs
Line 147 in 5fdff45
split_idx + 1
(== md_container.len()
) is passed tobuild_models_from
asstart_idx
Link 4.1
Link 4.2
md_container.len() > md_container.len()
is falseLink 5
An obvious fix would be to change the condition in [3.] to
split_idx >= md_container.len() - 1
, however I am not entirely certain whether that leads to issues in other contexts. I guess a similar issue would happen ifsimilar_idx == 0
, only for the first call. I changed the condition in my local version and re-ran the tests - it seems to work just fine:I can open a pull request with that fix if you would like.
The text was updated successfully, but these errors were encountered: