Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_5lm_feature_experiment fails #117

Open
Teuling opened this issue Dec 17, 2024 · 4 comments
Open

test_5lm_feature_experiment fails #117

Teuling opened this issue Dec 17, 2024 · 4 comments
Labels
bug Something isn't working triaged This issue or pull request was triaged

Comments

@Teuling
Copy link

Teuling commented Dec 17, 2024

Describe the bug

tests/unit/graph_learning_test.py:1714: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
src/tbp/monty/frameworks/experiments/monty_experiment.py:522: in train
    self.run_epoch()
src/tbp/monty/frameworks/experiments/monty_experiment.py:486: in run_epoch
    self.run_episode()
src/tbp/monty/frameworks/experiments/object_recognition_experiments.py:41: in run_episode
    last_step = self.run_episode_steps()
src/tbp/monty/frameworks/experiments/object_recognition_experiments.py:106: in run_episode_steps
    self.model.step(observation)
src/tbp/monty/frameworks/models/monty_base.py:141: in step
    self._matching_step(observation)
src/tbp/monty/frameworks/models/abstract_monty_classes.py:25: in _matching_step
    self._vote()
src/tbp/monty/frameworks/models/graph_matching.py:428: in _vote
    self.send_vote_to_lm(self.learning_modules[i], i, combined_votes)
src/tbp/monty/frameworks/models/graph_matching.py:82: in send_vote_to_lm
    lm.receive_votes(combined_votes[lm_id])
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <tbp.monty.frameworks.models.feature_location_matching.FeatureGraphLM object at 0x7ab2462f7730>
vote_data = {'neg_object_id_votes': {'new_object0': 0}, 'pos_location_votes': {'new_object0': array([[ 1.01622774e-02,  1.49800395...tion.Rotation object at 0x7ab23f4cbe40>, <scipy.spatial.transform._rotation.Rotation object at 0x7ab23f4ed030>, ...]]}}

    def receive_votes(self, vote_data):
        """Use votes to remove objects and poses from possible matches.
    
        NOTE: Add object back into possible matches if majority of other modules
                think it is correct? Could help with dealing with noise but may
                also prevent LMs from narrowing down quickly. Since we are not
                working with this LM anymore, we probably wont add that.
    
        Args:
            vote_data: positive and negative votes on object IDs + positive
                votes for locations and rotations on the object.
        """
        if (vote_data is not None) and (
            self.buffer.get_num_observations_on_object() > 0
        ):
            current_possible_matches = self.get_possible_matches()
            for possible_obj in current_possible_matches:
                if (
>                   vote_data["neg_object_id_votes"][possible_obj]
                    > vote_data["pos_object_id_votes"][possible_obj]
                ):
E               KeyError: 'new_object1'

Ubuntu 24.04.1 LTS.
(the rest of the tests run fine)

@Teuling Teuling added the bug Something isn't working label Dec 17, 2024
@tristanls tristanls added the triaged This issue or pull request was triaged label Dec 17, 2024
@tristanls
Copy link
Contributor

Hi @Teuling, thank you for sharing the test failure. Since my own test runs don't encounter this issue, it is difficult for me to identify what's happening.

Have you identified the series of events that created the new_object1 key in your test run?

FYI @nielsleadholm, @vkakerbeck, maybe you've come across this circumstance before?

@Teuling
Copy link
Author

Teuling commented Dec 18, 2024

I only followed these instructions: https://thousandbrainsproject.readme.io/docs/getting-started
I made no changes to the code myself whatsoever.

Maybe this gives a clue: https://gist.github.com/Teuling/b71599c6a6cc6874f04135450e643348#file-gistfile1-txt

My packages:
name: tbp.monty
channels:

  • pyg
  • aihabitat
  • pytorch
  • conda-forge
  • defaults
  • https://repo.anaconda.com/pkgs/main
  • https://repo.anaconda.com/pkgs/r
    dependencies:
  • _libgcc_mutex=0.1=main
  • _openmp_mutex=5.1=1_gnu
  • attrs=24.2.0=py38h06a4308_0
  • blas=1.0=mkl
  • brotli-python=1.0.9=py38h6a678d5_8
  • bzip2=1.0.8=h5eee18b_6
  • c-ares=1.19.1=h5eee18b_0
  • ca-certificates=2024.11.26=h06a4308_0
  • certifi=2024.8.30=py38h06a4308_0
  • charset-normalizer=3.3.2=pyhd3eb1b0_0
  • cmake=3.26.4=h96355d8_0
  • cudatoolkit=11.5.1=h59c8dcf_10
  • cycler=0.11.0=pyhd3eb1b0_0
  • expat=2.6.4=h6a678d5_0
  • ffmpeg=4.3=hf484d3e_0
  • freetype=2.12.1=h4a9f257_0
  • gitdb=4.0.7=pyhd3eb1b0_0
  • gitpython=3.1.43=py38h06a4308_0
  • gmp=6.2.1=h295c915_3
  • gnutls=3.6.15=he1e5248_0
  • habitat-sim-mutex=1.0=display_bullet
  • idna=3.7=py38h06a4308_0
  • imageio=2.33.1=py38h06a4308_0
  • imageio-ffmpeg=0.5.1=pyhd8ed1ab_0
  • importlib-metadata=8.5.0=pyha770c72_0
  • importlib_metadata=8.5.0=hd3eb1b0_0
  • intel-openmp=2023.1.0=hdb19cb5_46306
  • jinja2=3.1.4=py38h06a4308_0
  • joblib=1.4.2=py38h06a4308_0
  • jpeg=9e=h5eee18b_3
  • kiwisolver=1.4.4=py38h6a678d5_0
  • krb5=1.20.1=h143b758_1
  • lame=3.100=h7b6447c_0
  • lcms2=2.12=h3be6417_0
  • ld_impl_linux-64=2.40=h12ee557_0
  • lerc=3.0=h295c915_0
  • libcurl=8.9.1=h251f7ec_0
  • libdeflate=1.17=h5eee18b_1
  • libedit=3.1.20230828=h5eee18b_0
  • libev=4.33=h7f8727e_1
  • libffi=3.4.4=h6a678d5_1
  • libgcc-ng=11.2.0=h1234567_1
  • libgfortran-ng=11.2.0=h00389a5_1
  • libgfortran5=11.2.0=h1234567_1
  • libgomp=11.2.0=h1234567_1
  • libiconv=1.16=h5eee18b_3
  • libidn2=2.3.4=h5eee18b_0
  • libllvm14=14.0.6=hecde1de_4
  • libnghttp2=1.57.0=h2d74bed_0
  • libpng=1.6.39=h5eee18b_0
  • libssh2=1.11.1=h251f7ec_0
  • libstdcxx-ng=11.2.0=h1234567_1
  • libtasn1=4.19.0=h5eee18b_0
  • libtiff=4.5.1=h6a678d5_0
  • libunistring=0.9.10=h27cfd23_0
  • libuv=1.48.0=h5eee18b_0
  • libwebp-base=1.3.2=h5eee18b_1
  • libxcb=1.13=h1bed415_1
  • llvmlite=0.41.0=py38he621ea3_0
  • lz4-c=1.9.4=h6a678d5_1
  • markupsafe=2.1.3=py38h5eee18b_0
  • matplotlib=3.3.2=0
  • matplotlib-base=3.3.2=py38h817c723_0
  • mkl=2023.1.0=h213fc3f_46344
  • mkl-service=2.4.0=py38h5eee18b_1
  • mkl_fft=1.3.8=py38h5eee18b_0
  • mkl_random=1.2.4=py38hdb19cb5_0
  • ncurses=6.4=h6a678d5_0
  • nettle=3.7.3=hbbd107a_1
  • numba=0.58.1=py38h6a678d5_0
  • numpy=1.24.3=py38hf6e8229_1
  • numpy-base=1.24.3=py38h060ed82_1
  • openh264=2.1.1=h4ff587b_0
  • openjpeg=2.5.2=he7f1fd0_0
  • openssl=3.0.15=h5eee18b_0
  • packaging=24.1=py38h06a4308_0
  • pillow=10.4.0=py38h5eee18b_0
  • pip=24.2=py38h06a4308_0
  • platformdirs=3.10.0=py38h06a4308_0
  • pooch=1.7.0=py38h06a4308_0
  • pyg=2.1.0=py38_torch_1.11.0_cu115
  • pyparsing=3.1.2=py38h06a4308_0
  • pysocks=1.7.1=py38h06a4308_0
  • python=3.8.20=he870216_0
  • python-dateutil=2.9.0post0=py38h06a4308_2
  • python_abi=3.8=2_cp38
  • pytorch=1.11.0=py3.8_cuda11.5_cudnn8.3.2_0
  • pytorch-cluster=1.6.0=py38_torch_1.11.0_cu115
  • pytorch-mutex=1.0=cuda
  • pytorch-scatter=2.0.9=py38_torch_1.11.0_cu115
  • pytorch-sparse=0.6.15=py38_torch_1.11.0_cu115
  • quaternion=2023.0.3=py38h7f0c24c_0
  • readline=8.2=h5eee18b_0
  • requests=2.32.3=py38h06a4308_0
  • rhash=1.4.3=hdbd6064_0
  • scipy=1.10.1=py38hf6e8229_1
  • setuptools=75.1.0=py38h06a4308_0
  • six=1.16.0=pyhd3eb1b0_1
  • smmap=3.0.5=pyhd3eb1b0_0
  • sqlite=3.45.3=h5eee18b_0
  • tbb=2021.8.0=hdb19cb5_0
  • threadpoolctl=3.5.0=py38h2f386ee_0
  • tk=8.6.14=h39e8969_0
  • torchvision=0.12.0=py38_cu115
  • tornado=6.4.1=py38h5eee18b_0
  • tqdm=4.66.5=py38h2f386ee_0
  • urllib3=2.2.3=py38h06a4308_0
  • wget=1.24.5=h251f7ec_0
  • wheel=0.44.0=py38h06a4308_0
  • withbullet=2.0=0
  • xorg-fixesproto=5.0=h7f98852_1002
  • xorg-inputproto=2.3.2=h7f98852_1002
  • xorg-kbproto=1.0.7=h7f98852_1002
  • xorg-libx11=1.7.2=h7f98852_0
  • xorg-libxau=1.0.9=h7f98852_0
  • xorg-libxcursor=1.2.0=h7f98852_0
  • xorg-libxext=1.3.4=h7f98852_1
  • xorg-libxfixes=5.0.3=h7f98852_1004
  • xorg-libxi=1.7.10=h7f98852_0
  • xorg-libxinerama=1.1.4=h9c3ff4c_1001
  • xorg-libxrandr=1.5.2=h7f98852_1
  • xorg-libxrender=0.9.10=h7f98852_1003
  • xorg-randrproto=1.5.0=h7f98852_1001
  • xorg-renderproto=0.11.1=h7f98852_1002
  • xorg-xextproto=7.3.0=h7f98852_1002
  • xorg-xproto=7.0.31=h27cfd23_1007
  • xz=5.4.6=h5eee18b_1
  • zipp=3.20.2=py38h06a4308_0
  • zlib=1.2.13=h5eee18b_1
  • zstd=1.5.6=hc292b87_0
  • pip:
    • annotated-types==0.7.0
    • click==8.1.7
    • coverage==7.6.1
    • deptry==0.20.0
    • docker-pycreds==0.4.0
    • eval-type-backport==0.2.0
    • execnet==2.1.1
    • habitat-sim==0.2.2
    • importlib-resources==6.4.5
    • iniconfig==2.0.0
    • lazy-loader==0.4
    • mpmath==1.3.0
    • mypy==1.11.2
    • mypy-extensions==1.0.0
    • networkx==3.1
    • pandas==2.0.3
    • pluggy==1.5.0
    • protobuf==5.29.1
    • psutil==6.1.0
    • py==1.11.0
    • pydantic==2.10.3
    • pydantic-core==2.27.1
    • pytest==7.1.1
    • pytest-cov==3.0.0
    • pytest-forked==1.6.0
    • pytest-xdist==2.5.0
    • pytz==2024.2
    • pywavelets==1.4.1
    • pyyaml==6.0.2
    • ruff==0.7.1
    • scikit-image==0.21.0
    • scikit-learn==1.3.2
    • sentry-sdk==2.19.2
    • setproctitle==1.3.4
    • sympy==1.13.3
    • tbp-monty==0.0.0
    • tifffile==2023.7.10
    • tomli==2.2.1
    • typing-extensions==4.12.2
    • tzdata==2024.2
    • wandb==0.19.1

@nielsleadholm
Copy link
Contributor

@tristanls I can't say I have sorry, given the isolated nature of it, some mismatch in package versions seems like a possible culprit, otherwise @vkakerbeck might have come across something similar when implementing these tests?

@vkakerbeck
Copy link
Contributor

I haven't had much time looking into the detailed logs but from the error it looks like an LM is expecting an incoming vote on one object (object_1) but isn't getting one. This could happen if the receiving LM has created an object_1 model but the sending one hasn't. The first reason why this may suddenly happen in this unit test is that the simulator you're using is returning different values than when we run the simulator, and in your setup, this leads to the receiving LM learning an object model that the sender didn't learn.
This test is for the feature matching LM (not the most recent evidence LM) so I don't have the details fresh in my mind. I would have to dig a bit deeper to figure out why exactly this is happening. But the simulator returning (maybe just slightly) different observations would be the place I would start.
Unfortunately, some of these unit tests are not super robust, and writing more actual unit tests instead of these end-to-end tests is something we should do eventually to get at the root of this problem.
Honestly, in your case, if all the other tests pass, I would probably just skip this one and chalk it up to numerical imprecision and the fickleness of this test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triaged This issue or pull request was triaged
Projects
None yet
Development

No branches or pull requests

4 participants