Examples¶

Generating representations using the `Compound` class¶

The following example demonstrates how to generate a representation via the qml.Compound class.

# Read in an xyz or cif file.
water = Compound(xyz="water.xyz")

# Generate a molecular coulomb matrices sorted by row norm.
water.generate_coulomb_matrix(size=5, sorting="row-norm")

print(water.representation)

Might print the following representation:

[ 73.51669472   8.3593106    0.5          8.35237809   0.66066557   0.5
   0.           0.           0.           0.           0.           0.           0.
   0.           0.        ]

Generating representations via the `qml.representations` module¶

import numpy as np
from qml.representations import *

# Dummy coordinates for a water molecule
coordinates = np.array([[1.464, 0.707, 1.056],
                        [0.878, 1.218, 0.498],
                        [2.319, 1.126, 0.952]])

# Oxygen, Hydrogen, Hydrogen
nuclear_charges = np.array([8, 1, 1])

# Generate a molecular coulomb matrices sorted by row norm.
cm1 = generate_coulomb_matrix(nuclear_charges, coordinates,
                                size=5, sorting="row-norm")
print(cm1)

The resulting Coulomb-matrix for water:

[ 73.51669472   8.3593106    0.5          8.35237809   0.66066557   0.5
   0.           0.           0.           0.           0.           0.           0.
   0.           0.        ]

# Generate all atomic coulomb matrices sorted by distance to
# query atom.
cm2 = generate_atomic_coulomb_matrix(atomtypes, coordinates,
                                size=5, sort="distance")
print cm2

[[ 73.51669472   8.3593106    0.5          8.35237809   0.66066557   0.5
    0.           0.           0.           0.           0.           0.
    0.           0.           0.        ]
 [  0.5          8.3593106   73.51669472   0.66066557   8.35237809   0.5
    0.           0.           0.           0.           0.           0.
    0.           0.           0.        ]
 [  0.5          8.35237809  73.51669472   0.66066557   8.3593106    0.5
    0.           0.           0.           0.           0.           0.
    0.           0.           0.        ]]

Calculating a Gaussian kernel¶

The input for most of the kernels in QML is a numpy array, where the first dimension is the number of representations, and the second dimension is the size of each representation. An brief example is presented here, where compounds is a list of Compound() objects:

import numpy as np
from qml.kernels import gaussian_kernel

# Generate a numpy-array of the representation
X = np.array([c.representation for c in compounds])

# Kernel-width
sigma = 100.0

# Calculate the kernel-matrix
K = gaussian_kernel(X, X, sigma)

Calculating a Gaussian kernel using a local representation¶

The easiest way to calculate the kernel matrix using an explicit, local representation is via the wrappers module. Note that here the sigmas is a list of sigmas, and the result is a kernel for each sigma. The following examples currently work with the atomic coulomb matrix representation and the local SLATM representation:

import numpy as np
from qml.kernels import get_local_kernels_gaussian

# Assume the QM7 dataset is loaded into a list of Compound()
for compound in qm7:

    # Generate the desired representation for each compound
    compound.generate_atomic_coulomb_matrix(size=23, sort="row-norm")

# Make a big array with all the atomic representations
X = np.concatenate([mol.representation for mol in qm7])

# Make an array with the number of atoms in each compound
N = np.array([mol.natoms for mol in qm7])

# List of kernel-widths
sigmas = [50.0, 100.0, 200.0]

# Calculate the kernel-matrix
K = get_local_kernels_gaussian(X, X, N, N, sigmas)

print(K.shape)

(3, 7101, 7101)

Note that mol.representation is just a 1D numpy array.

Generating the SLATM representation¶

The Spectrum of London and Axillrod-Teller-Muto potential (SLATM) representation requires additional input to reduce the size of the representation. This input (the types of many-body terms) is generate via the get_slatm_mbtypes() function. The function takes a list of the nuclear charges for each molecule in the dataset as input. E.g.:

from qml.representations import get_slatm_mbtypes

# Assume 'qm7' is a list of Compound() objects.
mbtypes = get_slatm_mbtypes([mol.nuclear_charges for compound in qm7])

# Assume the QM7 dataset is loaded into a list of Compound()
for compound in qm7:

    # Generate the desired representation for each compound
    compound.generate_slatm(mbtypes, local=True)

The local keyword in this example specifies that a local representation is produced. Alternatively the SLATM representation can be generate via the qml.representations module:

from qml.representations import generate_slatm

# Dummy coordinates
coordinates = ...

# Dummy nuclear charges
nuclear_charges = ...

# Dummy mbtypes
mbtypes = get_slatm_mbtypes( ... )

# Generate one representation
rep = generate_slatm(coordinates, nuclear_charges, mbtypes)

Here coordinates is an Nx3 numpy array, and nuclear_charges is simply a list of charges.

Generating the FCHL representation¶

The FCHL representation does not have an explicit representation in the form of a vector, and the kernel elements must be calculated analytically in a separate kernel function. The syntax is analogous to the explicit representations (e.g. Coulomb matrix, BoB, SLATM, etc), but is handled by kernels from the separate qml.fchl module.

The code below show three ways to create the input representations for the FHCL kernel functions.

First using the Compound class:

# Assume the dataset is loaded into a list of Compound()
for compound in mols:

    # Generate the desired representation for each compound, cut off in angstrom
    compound.generate_fchl_representation(size=23, cut_off=10.0)

# Make Numpy array of the representation, which can be parsed to the kernel
X = np.array([c.representation for c in mols])

The dimensions of the array should be (number_molecules, size, 5, size), where size is the size keyword used when generating the representations.

In addition to using the Compound class to generate the representations, FCHL representations can also be generated via the qml.fchl.generate_fchl_representation() function, using similar notation to the functions in the qml.representations.* functions.

from qml.fchl import generate_representation

# Dummy coordinates for a water molecule
coordinates = np.array([[1.464, 0.707, 1.056],
                        [0.878, 1.218, 0.498],
                        [2.319, 1.126, 0.952]])

# Oxygen, Hydrogen, Hydrogen
nuclear_charges = np.array([8, 1, 1])

rep = generate_representation(coordinates, nuclear_charges)

To create the representation for a crystal, the notation is as follows:

from qml.fchl import generate_representation

# Dummy fractional coordinates
fractional_coordinates = np.array(
        [[ 0.        ,  0.        ,  0.        ],
         [ 0.75000042,  0.50000027,  0.25000015],
         [ 0.15115386,  0.81961403,  0.33154037],
         [ 0.51192691,  0.18038651,  0.3315404 ],
         [ 0.08154025,  0.31961376,  0.40115401],
         [ 0.66846017,  0.81961403,  0.48807366],
         [ 0.08154025,  0.68038678,  0.76192703],
         [ 0.66846021,  0.18038651,  0.84884672],
         [ 0.23807355,  0.31961376,  0.91846033],
         [ 0.59884657,  0.68038678,  0.91846033],
         [ 0.50000031,  0.        ,  0.50000031],
         [ 0.25000015,  0.50000027,  0.75000042]]
    )

# Dummy nuclear charges
nuclear_charges = np.array(
        [58, 58, 8, 8, 8, 8, 8, 8, 8, 8, 23, 23]
    )

# Dummy unit cell
unit_cell = np.array(
        [[ 3.699168,  3.699168, -3.255938],
         [ 3.699168, -3.699168,  3.255938],
         [-3.699168, -3.699168, -3.255938]]
    )

# Generate the representation
rep = generate_representation(fractional_coordinates, nuclear_charges,
        cell=unit_cell, neighbors=100, cut_distance=7.0)

The neighbors keyword is the max number of atoms with the cutoff-distance

Generating the FCHL kernel¶

The following example demonstrates how to calculate the local FCHL kernel elements between FCHL representations. X1 and X2 are numpy arrays with the shape (number_compounds,max_size, 5,neighbors), as generated in one of the previous examples. You MUST use the same, or larger, cut-off distance to generate the representation, as to calculate the kernel.

from qml.fchl import get_local_kernels

# You can get kernels for multiple kernel-widths
sigmas = [2.5, 5.0, 10.0]

# Calculate the kernel-matrices for each sigma
K = get_local_kernels(X1, X2, sigmas, cut_distance=10.0)

print(K.shape)

As output you will get a kernel for each kernel-width.

(3, 100, 200)

In case X1 and X2 are identical, K will be symmetrical. This is handled by a separate function with exploits this symmetry (thus being twice as fast).

from qml.fchl import get_local_symmetric_kernels

# You can get kernels for multiple kernel-widths
sigmas = [2.5, 5.0, 10.0]

# Calculate the kernel-matrices for each sigma
K = get_local_kernels(X1, sigmas, cut_distance=10.0)

print(K.shape)

(3, 100, 100)

In addition to the local kernel, the FCHL module also provides kernels for atomic properties (e.g. chemical shifts, partial charges, etc). These have the name “atomic”, rather than “local”.

from qml.fchl import get_atomic_kernels
from qml.fchl import get_atomic_symmetric_kernels

The only difference between the local and atomic kernels is the shape of the input. Since the atomic kernel outputs kernels with atomic resolution, the atomic input has the shape (number_atoms, 5, size).