Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError in Loading dataset... #22

Open
hxu105 opened this issue May 11, 2023 · 5 comments
Open

ValueError in Loading dataset... #22

hxu105 opened this issue May 11, 2023 · 5 comments

Comments

@hxu105
Copy link

hxu105 commented May 11, 2023

Hello,

Thank you for sharing this fantastic work, and I have faced some issues reproducing your work. The error statement is in the following, the dataset is not well set up. I am following your instruction to download the dataset archive crossdocked_pocket10.tar.gz and the split file split_by_name.pt from (https://drive.google.com/drive/folders/1CzwxmTpjbrt83z_wBzcQncq84OVDPurM). And extracting the TAR archive.

image

Could you help to fix this issue? Any suggestion will be grateful.

HX

@pengxingang
Copy link
Owner

Kind of strange to see this error message. It seems that it raised the error when executing train_iterator = inf_itertor(DataLoader(...)). But the output Indexing: 0it before the error message indicated that the program was executing and failed at the _precompute_name2id function (line 90 of utils/datasets/pl.py) which is only called when initializing the training/validation dataset (line 27 of utils/datasets/pl.py) for the first time. Did you make any modifications to the related codes? Besides, I suggest that you can remove the files xxx_processed.lmdb and xxx_name2id.pt (if exist) and rerun the script.

@hxu105
Copy link
Author

hxu105 commented May 12, 2023

Thank you for the response, I want to mention here, the processing stage for setting up data actually skip lots of instances. The picture tells it skipped 183468 instances out of 184057. I consider there might be some errors in the try-except statement. By the way, I just download the repo and keep all the code unchanged. I also try to remove the lmdb file and pt file, but it will face the same error.

@pengxingang
Copy link
Owner

That might be where the problem is. The raw molecule data is not properly processed. It is abnormal to skip so many instances during the processing. You can check the actual errors by debugging the processing code. It is also possible that some packages are not properly installed.

@Octopus125
Copy link

Hi, I think we have the same problem. In the process of data preprocessing, most of the data was skipped because of this error:

FutureWarning: In the future 'np.long' will be defined as the corresponding NumPy scalar.
'element': np.array(self.element, dtype=np.long)

This error is caused by numpy version. I uninstalled numpy and installed version 1.22.3, the same as the author. After that I was able to run the data preprocessing as normal.

@Yuning598
Copy link

Hi, I think we have the same problem. In the process of data preprocessing, most of the data was skipped because of this error:嗨,我想我们有同样的问题。在数据预处理过程中,由于此错误,大部分数据被跳过:

FutureWarning: In the future 'np.long' will be defined as the corresponding NumPy scalar.
'element': np.array(self.element, dtype=np.long)

This error is caused by numpy version. I uninstalled numpy and installed version 1.22.3, the same as the author. After that I was able to run the data preprocessing as normal.此错误是由numpy版本引起的。我安装了numpy并安装了版本1.22.3,与作者相同。在那之后,我能够正常运行数据预处理。

Yes, it works!!! thx!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants