Wrong typecasting in records #766

chrisiacovella · 2023-10-05T21:28:40Z

Describe the bug

As I mentioned in the meeting the other day, I came across what I think is a few bugs in the records for the following single point datasets on the ml server for the spice datasets. It seems to specifically be impacting "spec_6" data, for the following properties:

current energy <class 'str'>
dispersion correction energy <class 'str'>
2-body dispersion correction energy <class 'str'>
b3lyp-d3(bj) dispersion correction energy <class 'str'>

For this dataset, it appears those 4 properties all store the same energy (and it is identical to 'return_energy' which is properly typed as a float). I'll note the lists of value (e.g., the fields related to gradients) are constructed correctly of floats.

The following datasets have this issue for spec_6

SPICE Solvated Amino Acids Single Points Dataset v1.0 spec_6
SPICE DES Monomers Single Points Dataset v1.0 spec_6
SPICE PubChem Set 1 Single Points Dataset v1.0 spec_6
SPICE Dipeptides Single Points Dataset v1.0 spec_6
SPICE PubChem Set 2 Single Points Dataset v1.0 spec_6
SPICE PubChem Set 3 Single Points Dataset v1.0 spec_6
SPICE PubChem Set 5 Single Points Dataset v1.0 spec_6
SPICE PubChem Set 6 Single Points Dataset v1.0 spec_6
SPICE PubChem Set 1 Single Points Dataset v1.1 spec_6
SPICE DES Monomers Single Points Dataset v1.1 spec_6
SPICE Dipeptides Single Points Dataset v1.1 spec_6
SPICE Pubchem Set 4 Single Points Dataset v1.0 spec_6
SPICE Solvated Amino Acids Single Points Dataset v1.1 spec_6
SPICE DES370K Single Points Dataset v1.0 spec_6
SPICE PubChem Set 1 Single Points Dataset v1.2 spec_6
SPICE Dipeptides Single Points Dataset v1.2 spec_6
SPICE DES370K Single Points Dataset Supplement v1.0 spec_6
SPICE PubChem Set 2 Single Points Dataset v1.2 spec_6
SPICE PubChem Set 3 Single Points Dataset v1.2 spec_6
SPICE Pubchem Set 4 Single Points Dataset v1.2 spec_6
SPICE PubChem Set 5 Single Points Dataset v1.2 spec_6
SPICE Ion Pairs Single Points Dataset v1.0 spec_6
SPICE PubChem Set 6 Single Points Dataset v1.2 spec_6
SPICE Ion Pairs Single Points Dataset v1.1 spec_6

To Reproduce

Just a quick code to loop over everything.

from qcportal import PortalClient
client = PortalClient("ml.qcarchive.molssi.org")
dataset_type = "singlepoint"


datasets = client.list_datasets()

datasets_to_consider = [] 
for dataset in datasets:
    if dataset['dataset_type'] == 'singlepoint':
        if 'SPICE' in dataset['dataset_name']:
            datasets_to_consider.append(dataset['dataset_name'])

spec = 'spec_6'
for dataset_name in datasets_to_consider:
    ds = client.get_dataset(
                dataset_type=dataset_type, dataset_name=dataset_names[0]
            )
    
    
    entry_names = ds.entry_names
    
    max_val = 1
    
    for record in ds.iterate_records(entry_names[0:max_val], specification_names=[spec]):
        has_strings = False
        for k in record[2].dict()['properties'].keys():
            if isinstance(record[2].dict()['properties'][k], str):
                has_strings = True
                #print(k, type(record[2].dict()['properties'][k]))
        if has_strings:
            print(f'{dataset_name} {spec}')

The text was updated successfully, but these errors were encountered:

bennybp · 2023-10-06T14:49:08Z

This seems to apply only to DFTD3 calculations, where the values are converted to strings: https://github.com/MolSSI/QCEngine/blob/1b27a14255817f13092ae846593b0fb7c975625b/qcengine/programs/dftd3.py#L273C41-L273C41

@loriab is looking to clean that up in qcengine soon. I can convert the existing values in the database next week.

(The DFTD3 calculations come from specifying b3lyp-d3 calculations. In the legacy version, this caused two separate records/specifications to be created - one for b3lyp and one for the d3 correction. The new version makes these existing records explicit, but no longer does the splitting for new calculations. It's a bit complicated...)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong typecasting in records #766

Wrong typecasting in records #766

chrisiacovella commented Oct 5, 2023

bennybp commented Oct 6, 2023

Wrong typecasting in records #766

Wrong typecasting in records #766

Comments

chrisiacovella commented Oct 5, 2023

bennybp commented Oct 6, 2023