ValueError in Loading dataset... #22

hxu105 · 2023-05-11T23:49:11Z

Hello,

Thank you for sharing this fantastic work, and I have faced some issues reproducing your work. The error statement is in the following, the dataset is not well set up. I am following your instruction to download the dataset archive crossdocked_pocket10.tar.gz and the split file split_by_name.pt from (https://drive.google.com/drive/folders/1CzwxmTpjbrt83z_wBzcQncq84OVDPurM). And extracting the TAR archive.

Could you help to fix this issue? Any suggestion will be grateful.

HX

pengxingang · 2023-05-12T10:09:25Z

Kind of strange to see this error message. It seems that it raised the error when executing train_iterator = inf_itertor(DataLoader(...)). But the output Indexing: 0it before the error message indicated that the program was executing and failed at the _precompute_name2id function (line 90 of utils/datasets/pl.py) which is only called when initializing the training/validation dataset (line 27 of utils/datasets/pl.py) for the first time. Did you make any modifications to the related codes? Besides, I suggest that you can remove the files xxx_processed.lmdb and xxx_name2id.pt (if exist) and rerun the script.

hxu105 · 2023-05-12T14:57:10Z

Thank you for the response, I want to mention here, the processing stage for setting up data actually skip lots of instances. The picture tells it skipped 183468 instances out of 184057. I consider there might be some errors in the try-except statement. By the way, I just download the repo and keep all the code unchanged. I also try to remove the lmdb file and pt file, but it will face the same error.

pengxingang · 2023-05-14T08:06:41Z

That might be where the problem is. The raw molecule data is not properly processed. It is abnormal to skip so many instances during the processing. You can check the actual errors by debugging the processing code. It is also possible that some packages are not properly installed.

Octopus125 · 2023-10-16T11:31:55Z

Hi, I think we have the same problem. In the process of data preprocessing, most of the data was skipped because of this error:

FutureWarning: In the future 'np.long' will be defined as the corresponding NumPy scalar.
'element': np.array(self.element, dtype=np.long)

This error is caused by numpy version. I uninstalled numpy and installed version 1.22.3, the same as the author. After that I was able to run the data preprocessing as normal.

Yuning598 · 2024-09-13T08:56:23Z

Hi, I think we have the same problem. In the process of data preprocessing, most of the data was skipped because of this error:嗨，我想我们有同样的问题。在数据预处理过程中，由于此错误，大部分数据被跳过：
FutureWarning: In the future 'np.long' will be defined as the corresponding NumPy scalar.
'element': np.array(self.element, dtype=np.long)
This error is caused by numpy version. I uninstalled numpy and installed version 1.22.3, the same as the author. After that I was able to run the data preprocessing as normal.此错误是由numpy版本引起的。我安装了numpy并安装了版本1.22.3，与作者相同。在那之后，我能够正常运行数据预处理。

Yes, it works!!! thx!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError in Loading dataset... #22

ValueError in Loading dataset... #22

hxu105 commented May 11, 2023

pengxingang commented May 12, 2023

hxu105 commented May 12, 2023

pengxingang commented May 14, 2023

Octopus125 commented Oct 16, 2023

Yuning598 commented Sep 13, 2024

ValueError in Loading dataset... #22

ValueError in Loading dataset... #22

Comments

hxu105 commented May 11, 2023

pengxingang commented May 12, 2023

hxu105 commented May 12, 2023

pengxingang commented May 14, 2023

Octopus125 commented Oct 16, 2023

Yuning598 commented Sep 13, 2024