-
Notifications
You must be signed in to change notification settings - Fork 1
Methods and Levels of Theory
- 1 Introduction
- 2 Methods
- 2.1. Molecular dynamics (MD)
- 2.2. Semi-empirical
- 2.3. Hartree-Fock
- 2.4. DFT
- 2.5. Wave function
- 2.5.1. Configuration Interaction (CI)
- 2.5.2. Coupled Cluster (CC)CCSD(T) and DLPNO-CCSD(T)
- 2.5.3. Multi-reference
- 3 Basis sets
Computational chemistry is a branch of chemistry that uses computer simulation to assist in solving chemical problems. It uses methods of theoretical chemistry, incorporated into efficient computer programs, to calculate the structures and properties of molecules and solids. While computational results normally complement the information obtained by chemical experiments it can, in some cases, predict unobserved chemical phenomena. It is widely used in the design of new drugs and materials.
The methods used cover both static and dynamic situations. In all cases, the computer time and other resources (such as memory and disk space) increase rapidly with the size of the system being studied. That system can be one molecule, a group of molecules, or a solid. Computational chemistry methods range from very approximate to highly accurate; the latter are usually feasible only for small systems.
A “level of theory” refers to the method(s) we’re using and the assumptions we make to describe a physio-chemical system. Usually, there’s a certain method for solving the system and a basis set for constructing the electron orbitals. We denote the corresponding level as “method/basis”. For example, a DFT method called “wb97xd” and a basis set called “def2-tzvp” is denoted as “wb97xd/def2-tzvp”, and a couple cluster method called “CCSD(T)-F12” and a basis set called “cc-pvtz-f12” is denoted as “CCSD(T)-F12/cc-pvtz-f12”. It’s OK if these names don’t mean much to you at this point, hopefully that will be clarified below.
The “level of theory” refers to the sp and opt levels, and we donate it as “sp//opt”. For example, if the above DFT level was used for optimization, and the above couple cluster level was used for the single-point energy calculation, the corresponding level of theory would be: “CCSD(T)-F12/cc-pvtz-f12//wb97xd/def2-tzvp”.
The molecular structures, properties and energies of a molecule are better understood through the use of the “mechanical” molecular model. This model involves the development of a simple molecular mechanics energy equation representing the sum of various energy interaction terms comprised of bonds, angles , torsions of bonded, as well as, nonbonded atoms. Referred to as “force fields”, the model serves as a simple “descriptor” for vibrations in molecules.
A force field (FF) is a collection of equations and associated constants designed to reproduce molecular 3D geometry and selected properties of tested structures. A set of models which use an empirical, algebraic, atomistic energy function for chemical systems.
Since FFs are algebraic and simple in their form, one can perform simulations of up to millions of atoms per system. The best computers and the best algorithms can do trillions and quadrillions of configurations simulating them within seconds of real time. The limitation is typically the accuracy of the energy function- how accurate you can get across a wide portion of all chemical systems. Another limitation is the scope under which the parameters are intended to be used (going outside the scope of what the models are intended to simulate results in very poor accuracy of your results).
See video on the topic.
Like its name, this method is semi-empirical and semi-theoretical (so it is partially based on experimental data, and not entirely ab-initio). These methods are efficient for treating large molecules, but not very accurate. Semi-empirical results are fitted by a set of parameters, normally in such a way as to produce results that best agree with experimental data, but sometimes to agree with ab initio results. (Ab-initio calculations provide a convenient source of reference data).
Semi-empirical calculations are much faster than their ab initio counterparts, mostly due to the use of the zero differential overlap approximation. Their results, however, can be very wrong if the molecule being computed is not similar enough to the molecules in the database used to parametrize the method.
In this method we can speed up our computing and our computation by neglecting all of the electrons that are very close to the nucleus and calculate things “going on” with the electrons on the outside. Practically, we might use such methods when we want to quickly optimize geometries using something better than force fields, but not as expensive as DFT.
Two main semiempirical methods that we think you should know about are-
AM-1: AM1, was made to improve MNDO (an earlier method) by adding a stabilizing Gaussian function to the core-core interaction to represent the hydrogen bond.
PM-6: Several modifications that have been made to the NDDO (Neglect of Diatomic Differential Overlap) core-core interaction term and to the method of parameter optimization. These changes have resulted in a more complete parameter optimization, called PM6, which has, in turn, allowed 70 elements to be parameterized.
The Schrödinger equation in quantum mechanics is the equivalent on Newton’s laws of motion in classical physics: it predicts momentum, location, and other properties for systems with small masses and small dimensions. It is very challenging to arrive to an exact solution of the Schrödinger equation (for large systems with many electrons it’s considered non-feasible), which motivates us to simplify it and use assumptions to practically solve it for the systems/molecules we care about.
The Hartree-Fock theory is fundamental to much of electronic structure theory. It is the basis of molecular orbital (MO) theory, which posits that each electron’s motion can be described by a single-particle function (orbital) which does not depend explicitly on the instantaneous motions of the other electrons.
- The Born–Oppenheimer approximation is based on the fact that the nuclei are much heavier than the electrons, and hence electrons motions are much faster. This approximation hence assumes that the electrons and the nuclei can be treated separately in our equations, significantly simplifying the system we need to solve.
- The effects of Einstein’s relativistic theorem are completely neglected.
- The solution is assumed to be a linear combination of a finite number of basis functions.
- Each energy eigenfunction is assumed to be describable by a single Slater determinant, an antisymmetrized product of one-electron wave functions (i.e., orbitals).
- It assumes the system can be solved via a Self-Consistent Field (SCF), also called the mean-field approximation. Since the solution to the Schrödinger equation is too complex, SCF approximates it and then solves the equation again iteratively until convergence (hence “self-consisted”, which means we iterate until convergence, and when we converge the solution is consistent with the equation).
The density functional theory (DFT) is a computational quantum mechanical modelling method used in physics, chemistry and materials science to investigate the electronic structure (or nuclear structure, principally at the ground state) of many-body systems, in particular atoms and molecules. Using this theory, the properties of a many-electron system can be determined by using functionals (a functional is a function of another function). In the case of DFT, these are functionals of the spatially dependent electron density (i.e., the electron density is a function of the nuclei type and position and the overall charge, and the computed energy is a function of the electron density). DFT is among the most popular and versatile methods available in condensed-matter physics, computational physics, and computational chemistry.
In any atom other than hydrogen there are many electrons. Most systems we work with have tens, hundreds or thousands of electrons. Due to this, we have to involve some approximations in order to make calculations.
- The Born–Oppenheimer approximation; reducing the degrees of freedom. The electronic wavefunction depends on the nuclear position but not their velocities, meaning the nuclear motion is much slower than electron motion that it can be assumed to be constant (just like in HF).
- Hohenberg Kohn Theorem; electron density of any system determines all ground-state properties of the system.
- If we know the electron density we can calculate all other ground state properties.
See video on the topic.
Fundamental particles, such as electrons, may be described as particles or waves (known as “the wave-particle duality”). The wave function, which is the solution we get by solving the Schrödinger equation, gives a mathematical description for the shape of the wave.
The wave function is a complex-valued probability amplitude, and the probabilities for the possible results of measurements made on the system can be derived from it. We represent the wave function as the Greek letter ψ and Ψ (lower-case and capital psi, respectively). It carries crucial information about the electron it is associated with: from the wave function we obtain the electron's energy, angular momentum, and orbital orientation in the shape of the quantum numbers n, l, and mI.
The WF can give us the probability of finding electrons in a certain region. Ψ2 is the probability density. It tells us where the electron is most likely to be found in the space around the nucleus.
To compute thermodynamic data of species and reaction rate coefficients, we’ll need to perform many types of computations; finding the correct 3D conformation of the molecule, geometry optimization, frequency calculations, various scans, and single-point energy calculations. All automated by ARC.
Each of these computations has its own nuances and challenges. For now, let’s just focus on the single-point energy calculation. This calculation takes an arrangement of atoms in a 3D space (a molecule) and calculates its relative energy. There are many methods for conducting this computation. One is Molecular Dynamics (Section 3.1.1.) and Semi-Empirical (Section 3.1.2.) approaches.
A more advanced method is Density Functional Theory (DFT) (Section 3.1.4.). Which is a pure “ab initio” (from first principles) approaches. They use quantum mechanics theories and compute energies based on the electron density of a many-electron body (a molecule). There are many DFT methods. More accurate than DFT are “wave function” methods (Section 3.1.5.). Two common wave functions are Configuration-Interaction (CI) (Section 3.1.5.1) and Couple-Cluster (CC).
Configuration interaction (CI) is a post-Hartree–Fock linear variational method for solving the nonrelativistic Schrödinger equation within the Born–Oppenheimer approximation for a quantum chemical multi-electron system.
In contrast to the Hartree–Fock method, in order to account for electron correlation, CI uses a variational wave function that is a linear combination of configuration state functions (CSFs) built from spin orbitals.
See video on the topic.
The CC method tries to compute the electron correlation energy by exciting some of the electrons and calculating the resulting energy. If the user asks to only excite electrons one by one (called Singlets), then the method is abbreviated CCS. If the user asks to also excite pairs of electrons (called Doubles), then the method is abbreviated CCSD. If the user asks to also excite triplets of electrons, then the method is abbreviated CCSDT.
Now we have a problem of scalability. The advanced methods scale poorly with the system size (they become incredibly “expensive” in terms of the time it takes to solve the system and run the computation). For example, CCSD scales approximately as N6 (where N is the number of electrons), and CCSDT scales as N8. This makes CCSDT impractical for all but the very small molecules (you can do methane, but not hexane). So there are several approximations made in literature. The first is to estimate the effect of the triplets perturbation, and the resulting method is called CCSD(T), T in parentheses to show that it’s estimated. This method works well, especially since the assumptions cause a favorable “cancellation of errors”, and the results scales a bit better (N7) and is relatively accurate. The CCSD(T) method is widely used, and many call it “the gold standard of computational chemistry”. We can use it for small or medium systems. But for large systems it is impractical to use CCSD(T).
Another assumption called Pair Natural Orbitals (PNO) comes into play, where selected pairs of electronic orbitals are used instead of all electrons together. A more developed method is called domain based local pair-natural orbital (or DLPNO), and we’ll therefore use DLPNO-CCSD(T) to calculate energies. DLPNO-CCSD(T) magically scales almost linearly with the system size (see Figure below). So we can use it on very large molecules and expect to get reasonably accurate energies.
The scaling behavior of DLPNO-CCSD(T) and DLPNO-CCSD(T0) for linear alkane chains with def2-TZVP basis set. The timings for the (T)/(T0) steps and the overall DLPNO-CCSD(T)/(T0) calculations are plotted separately."https://aip.scitation.org/doi/full/10.1063/1.5011798"