test_5lm_feature_experiment fails #117

Teuling · 2024-12-17T14:06:24Z

Describe the bug

tests/unit/graph_learning_test.py:1714: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
src/tbp/monty/frameworks/experiments/monty_experiment.py:522: in train
    self.run_epoch()
src/tbp/monty/frameworks/experiments/monty_experiment.py:486: in run_epoch
    self.run_episode()
src/tbp/monty/frameworks/experiments/object_recognition_experiments.py:41: in run_episode
    last_step = self.run_episode_steps()
src/tbp/monty/frameworks/experiments/object_recognition_experiments.py:106: in run_episode_steps
    self.model.step(observation)
src/tbp/monty/frameworks/models/monty_base.py:141: in step
    self._matching_step(observation)
src/tbp/monty/frameworks/models/abstract_monty_classes.py:25: in _matching_step
    self._vote()
src/tbp/monty/frameworks/models/graph_matching.py:428: in _vote
    self.send_vote_to_lm(self.learning_modules[i], i, combined_votes)
src/tbp/monty/frameworks/models/graph_matching.py:82: in send_vote_to_lm
    lm.receive_votes(combined_votes[lm_id])
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <tbp.monty.frameworks.models.feature_location_matching.FeatureGraphLM object at 0x7ab2462f7730>
vote_data = {'neg_object_id_votes': {'new_object0': 0}, 'pos_location_votes': {'new_object0': array([[ 1.01622774e-02,  1.49800395...tion.Rotation object at 0x7ab23f4cbe40>, <scipy.spatial.transform._rotation.Rotation object at 0x7ab23f4ed030>, ...]]}}

    def receive_votes(self, vote_data):
        """Use votes to remove objects and poses from possible matches.
    
        NOTE: Add object back into possible matches if majority of other modules
                think it is correct? Could help with dealing with noise but may
                also prevent LMs from narrowing down quickly. Since we are not
                working with this LM anymore, we probably wont add that.
    
        Args:
            vote_data: positive and negative votes on object IDs + positive
                votes for locations and rotations on the object.
        """
        if (vote_data is not None) and (
            self.buffer.get_num_observations_on_object() > 0
        ):
            current_possible_matches = self.get_possible_matches()
            for possible_obj in current_possible_matches:
                if (
>                   vote_data["neg_object_id_votes"][possible_obj]
                    > vote_data["pos_object_id_votes"][possible_obj]
                ):
E               KeyError: 'new_object1'

Ubuntu 24.04.1 LTS.
(the rest of the tests run fine)

The text was updated successfully, but these errors were encountered:

tristanls · 2024-12-18T12:33:42Z

Hi @Teuling, thank you for sharing the test failure. Since my own test runs don't encounter this issue, it is difficult for me to identify what's happening.

Have you identified the series of events that created the new_object1 key in your test run?

FYI @nielsleadholm, @vkakerbeck, maybe you've come across this circumstance before?

Teuling · 2024-12-18T21:37:40Z

I only followed these instructions: https://thousandbrainsproject.readme.io/docs/getting-started
I made no changes to the code myself whatsoever.

Maybe this gives a clue: https://gist.github.com/Teuling/b71599c6a6cc6874f04135450e643348#file-gistfile1-txt

My packages:
name: tbp.monty
channels:

pyg
aihabitat
pytorch
conda-forge
defaults
https://repo.anaconda.com/pkgs/main
https://repo.anaconda.com/pkgs/r
dependencies:
_libgcc_mutex=0.1=main
_openmp_mutex=5.1=1_gnu
attrs=24.2.0=py38h06a4308_0
blas=1.0=mkl
brotli-python=1.0.9=py38h6a678d5_8
bzip2=1.0.8=h5eee18b_6
c-ares=1.19.1=h5eee18b_0
ca-certificates=2024.11.26=h06a4308_0
certifi=2024.8.30=py38h06a4308_0
charset-normalizer=3.3.2=pyhd3eb1b0_0
cmake=3.26.4=h96355d8_0
cudatoolkit=11.5.1=h59c8dcf_10
cycler=0.11.0=pyhd3eb1b0_0
expat=2.6.4=h6a678d5_0
ffmpeg=4.3=hf484d3e_0
freetype=2.12.1=h4a9f257_0
gitdb=4.0.7=pyhd3eb1b0_0
gitpython=3.1.43=py38h06a4308_0
gmp=6.2.1=h295c915_3
gnutls=3.6.15=he1e5248_0
habitat-sim-mutex=1.0=display_bullet
idna=3.7=py38h06a4308_0
imageio=2.33.1=py38h06a4308_0
imageio-ffmpeg=0.5.1=pyhd8ed1ab_0
importlib-metadata=8.5.0=pyha770c72_0
importlib_metadata=8.5.0=hd3eb1b0_0
intel-openmp=2023.1.0=hdb19cb5_46306
jinja2=3.1.4=py38h06a4308_0
joblib=1.4.2=py38h06a4308_0
jpeg=9e=h5eee18b_3
kiwisolver=1.4.4=py38h6a678d5_0
krb5=1.20.1=h143b758_1
lame=3.100=h7b6447c_0
lcms2=2.12=h3be6417_0
ld_impl_linux-64=2.40=h12ee557_0
lerc=3.0=h295c915_0
libcurl=8.9.1=h251f7ec_0
libdeflate=1.17=h5eee18b_1
libedit=3.1.20230828=h5eee18b_0
libev=4.33=h7f8727e_1
libffi=3.4.4=h6a678d5_1
libgcc-ng=11.2.0=h1234567_1
libgfortran-ng=11.2.0=h00389a5_1
libgfortran5=11.2.0=h1234567_1
libgomp=11.2.0=h1234567_1
libiconv=1.16=h5eee18b_3
libidn2=2.3.4=h5eee18b_0
libllvm14=14.0.6=hecde1de_4
libnghttp2=1.57.0=h2d74bed_0
libpng=1.6.39=h5eee18b_0
libssh2=1.11.1=h251f7ec_0
libstdcxx-ng=11.2.0=h1234567_1
libtasn1=4.19.0=h5eee18b_0
libtiff=4.5.1=h6a678d5_0
libunistring=0.9.10=h27cfd23_0
libuv=1.48.0=h5eee18b_0
libwebp-base=1.3.2=h5eee18b_1
libxcb=1.13=h1bed415_1
llvmlite=0.41.0=py38he621ea3_0
lz4-c=1.9.4=h6a678d5_1
markupsafe=2.1.3=py38h5eee18b_0
matplotlib=3.3.2=0
matplotlib-base=3.3.2=py38h817c723_0
mkl=2023.1.0=h213fc3f_46344
mkl-service=2.4.0=py38h5eee18b_1
mkl_fft=1.3.8=py38h5eee18b_0
mkl_random=1.2.4=py38hdb19cb5_0
ncurses=6.4=h6a678d5_0
nettle=3.7.3=hbbd107a_1
numba=0.58.1=py38h6a678d5_0
numpy=1.24.3=py38hf6e8229_1
numpy-base=1.24.3=py38h060ed82_1
openh264=2.1.1=h4ff587b_0
openjpeg=2.5.2=he7f1fd0_0
openssl=3.0.15=h5eee18b_0
packaging=24.1=py38h06a4308_0
pillow=10.4.0=py38h5eee18b_0
pip=24.2=py38h06a4308_0
platformdirs=3.10.0=py38h06a4308_0
pooch=1.7.0=py38h06a4308_0
pyg=2.1.0=py38_torch_1.11.0_cu115
pyparsing=3.1.2=py38h06a4308_0
pysocks=1.7.1=py38h06a4308_0
python=3.8.20=he870216_0
python-dateutil=2.9.0post0=py38h06a4308_2
python_abi=3.8=2_cp38
pytorch=1.11.0=py3.8_cuda11.5_cudnn8.3.2_0
pytorch-cluster=1.6.0=py38_torch_1.11.0_cu115
pytorch-mutex=1.0=cuda
pytorch-scatter=2.0.9=py38_torch_1.11.0_cu115
pytorch-sparse=0.6.15=py38_torch_1.11.0_cu115
quaternion=2023.0.3=py38h7f0c24c_0
readline=8.2=h5eee18b_0
requests=2.32.3=py38h06a4308_0
rhash=1.4.3=hdbd6064_0
scipy=1.10.1=py38hf6e8229_1
setuptools=75.1.0=py38h06a4308_0
six=1.16.0=pyhd3eb1b0_1
smmap=3.0.5=pyhd3eb1b0_0
sqlite=3.45.3=h5eee18b_0
tbb=2021.8.0=hdb19cb5_0
threadpoolctl=3.5.0=py38h2f386ee_0
tk=8.6.14=h39e8969_0
torchvision=0.12.0=py38_cu115
tornado=6.4.1=py38h5eee18b_0
tqdm=4.66.5=py38h2f386ee_0
urllib3=2.2.3=py38h06a4308_0
wget=1.24.5=h251f7ec_0
wheel=0.44.0=py38h06a4308_0
withbullet=2.0=0
xorg-fixesproto=5.0=h7f98852_1002
xorg-inputproto=2.3.2=h7f98852_1002
xorg-kbproto=1.0.7=h7f98852_1002
xorg-libx11=1.7.2=h7f98852_0
xorg-libxau=1.0.9=h7f98852_0
xorg-libxcursor=1.2.0=h7f98852_0
xorg-libxext=1.3.4=h7f98852_1
xorg-libxfixes=5.0.3=h7f98852_1004
xorg-libxi=1.7.10=h7f98852_0
xorg-libxinerama=1.1.4=h9c3ff4c_1001
xorg-libxrandr=1.5.2=h7f98852_1
xorg-libxrender=0.9.10=h7f98852_1003
xorg-randrproto=1.5.0=h7f98852_1001
xorg-renderproto=0.11.1=h7f98852_1002
xorg-xextproto=7.3.0=h7f98852_1002
xorg-xproto=7.0.31=h27cfd23_1007
xz=5.4.6=h5eee18b_1
zipp=3.20.2=py38h06a4308_0
zlib=1.2.13=h5eee18b_1
zstd=1.5.6=hc292b87_0
pip:
- annotated-types==0.7.0
- click==8.1.7
- coverage==7.6.1
- deptry==0.20.0
- docker-pycreds==0.4.0
- eval-type-backport==0.2.0
- execnet==2.1.1
- habitat-sim==0.2.2
- importlib-resources==6.4.5
- iniconfig==2.0.0
- lazy-loader==0.4
- mpmath==1.3.0
- mypy==1.11.2
- mypy-extensions==1.0.0
- networkx==3.1
- pandas==2.0.3
- pluggy==1.5.0
- protobuf==5.29.1
- psutil==6.1.0
- py==1.11.0
- pydantic==2.10.3
- pydantic-core==2.27.1
- pytest==7.1.1
- pytest-cov==3.0.0
- pytest-forked==1.6.0
- pytest-xdist==2.5.0
- pytz==2024.2
- pywavelets==1.4.1
- pyyaml==6.0.2
- ruff==0.7.1
- scikit-image==0.21.0
- scikit-learn==1.3.2
- sentry-sdk==2.19.2
- setproctitle==1.3.4
- sympy==1.13.3
- tbp-monty==0.0.0
- tifffile==2023.7.10
- tomli==2.2.1
- typing-extensions==4.12.2
- tzdata==2024.2
- wandb==0.19.1

nielsleadholm · 2024-12-19T14:39:26Z

@tristanls I can't say I have sorry, given the isolated nature of it, some mismatch in package versions seems like a possible culprit, otherwise @vkakerbeck might have come across something similar when implementing these tests?

vkakerbeck · 2024-12-19T14:50:46Z

I haven't had much time looking into the detailed logs but from the error it looks like an LM is expecting an incoming vote on one object (object_1) but isn't getting one. This could happen if the receiving LM has created an object_1 model but the sending one hasn't. The first reason why this may suddenly happen in this unit test is that the simulator you're using is returning different values than when we run the simulator, and in your setup, this leads to the receiving LM learning an object model that the sender didn't learn.
This test is for the feature matching LM (not the most recent evidence LM) so I don't have the details fresh in my mind. I would have to dig a bit deeper to figure out why exactly this is happening. But the simulator returning (maybe just slightly) different observations would be the place I would start.
Unfortunately, some of these unit tests are not super robust, and writing more actual unit tests instead of these end-to-end tests is something we should do eventually to get at the root of this problem.
Honestly, in your case, if all the other tests pass, I would probably just skip this one and chalk it up to numerical imprecision and the fickleness of this test.

Teuling added the bug Something isn't working label Dec 17, 2024

tristanls added the triaged This issue or pull request was triaged label Dec 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test_5lm_feature_experiment fails #117

test_5lm_feature_experiment fails #117

Teuling commented Dec 17, 2024 •

edited by tristanls

Loading

tristanls commented Dec 18, 2024

Teuling commented Dec 18, 2024

nielsleadholm commented Dec 19, 2024

vkakerbeck commented Dec 19, 2024

test_5lm_feature_experiment fails #117

test_5lm_feature_experiment fails #117

Comments

Teuling commented Dec 17, 2024 • edited by tristanls Loading

Describe the bug

tristanls commented Dec 18, 2024

Teuling commented Dec 18, 2024

nielsleadholm commented Dec 19, 2024

vkakerbeck commented Dec 19, 2024

Teuling commented Dec 17, 2024 •

edited by tristanls

Loading