Skip to content

Latest commit

 

History

History
78 lines (56 loc) · 3.83 KB

DB.md

File metadata and controls

78 lines (56 loc) · 3.83 KB

Auto-QChem Database User Guide

Auto-QChem stores molecular descriptors in a MongoDB type database. A small web-based user-interface has been created to facilitate extractions of descriptors from the database into .xlsx files for further analysis.

1. Query available molecules

Navigate to the landing page

landing page

1.1 Query Form

a) Fields

Query form has 2 fields, both are optional:

  • Select tags (multiple choice) - each molecule in the DB has an associated tag (or a list of tags), they are used to mark specific collections of molecules. If you select multiple tags, molecules for all tags will be displayed. If left blank all molecules in the DB will be queried
  • SMARTS substructure - queries the molecules for a substructre using the SMARTS query (SMILES strings are a subset of SMARTS), quick reference to the SMARTS query language can be looked up here https://www.daylight.com/dayhtml_tutorials/languages/smarts/index.html
b) Buttons

There are two buttons Query and Export.

  • Query - queries the DB and displays the table of queried molecules
  • Export - downloads the displayed table as an .xlsx file, shall be used after hitting Query

Result of an example query on a single tag with 1166 molecules, and with SMARTS query for anhydrides: query result

1.2 Descriptors Lookup

For each entry in the table a link to a descriptors lookup called is available in the rightmost column. It will display the the QChem descriptors for the given molecule. If the molecule contains multiple conformations, the "Boltzmann" average of all descriptors is shown.

descriptors lookup

2. Descriptors extraction

Once molecules have been queried, their descriptors can be extracted into an .xlsx file by toggling the Download descriptors bar and filling the form.

download form

2.1 Download Form

a) Fields

All fields are required

  • Descriptor Presets (multiple choice) - the following presets are available, choose as many as needed:
    • Global - molecule level descriptors, e.g. homo energy, dipole moment, molecular weight, etc.
    • Min Max Atomic - atomic level descriptors minimum and maximum over the atoms within the molecule, e.g. buried volume, Mulliken charge, NMR shift, etc.
    • Substructure Core - atomic level descriptors for the common core of atoms within the dataset, the common core is determined using the MCS procedure from rdkit. If substructure has been used for filtering, the common core will include the substructure and potentially more atoms.
    • Substructure Labeled - atomic level descriptors for labeled molecules. The labels must be consistent, i.e. each molecule must have exactly the same labels, for example 1,2,3,4, the labelled elements can be different, only the numbering scheme shall be consistent.
    • Transitions - top 10 excited state transitions ordered by their oscillation strength
  • Conformer option (single choice) - choose one of the following options:
    • Boltzmann - Boltzmann average
    • Max - lowest energy conformer (maximum weight conformer)
    • Min - highest energy conformer (minimum weight conformer)
    • Mean - arithmetic average
    • Std - standard deviation over the conformers
    • Any - randomily chosen conformer
b) Buttons
  • Download - download the descriptors to an .xlsx file. Note: when extracting descriptors for hundreds of molecules this operation can take up to few minutes, depending on the server load.