Welcome to IOData’s documentation!

IOData is a free and open-source Python library for parsing, storing, and converting various file formats commonly used by quantum chemistry, molecular dynamics, and plane-wave density-functional-theory software programs. It also supports a flexible framework for generating input files for various software packages.

Please use the following citation in any publication using IOData library:

“IOData: A python library for reading, writing, and converting computational chemistry file formats and generating input files.”, T. Verstraelen, W. Adams, L. Pujal, A. Tehrani, B. D. Kelly, L. Macaya, F. Meng, M. Richer, R. Hernandez‐Esparza, X. D. Yang, M. Chan, T. D. Kim, M. Cools‐Ceuppens, V. Chuiko, E. Vohringer‐Martinez,P. W. Ayers, F. Heidar‐Zadeh, J Comput Chem. 2021; 42: 458– 464.

For the list of file formats that can be loaded or dumped by IOData, see Supported File Formats. The two tables below summarize the file formats and features supported by IOData.

Code

Definition

L

loading is supported

D

dumping is supported

(d)

attribute may be derived from other attributes

R

attribute is always read

r

attribute is read if present

W

attribute is always written

w

attribute is is written if present

Attribute

fchk: LD

json: LD

qchemlog: L

extxyz: L

wfx: LD

mwfn: L

gamess: L

wfn: LD

pdb: LD

molden: LD

cp2klog: L

orcalog: L

molekel: LD

mol2: LD

locpot: L

gromacs: L

fcidump: LD

cube: LD

chgcar: L

charmm: L

sdf: LD

poscar: LD

xyz: LD

gaussianinput: L

gaussianlog: L

atcharges

Rw

.

.

.

.

.

.

.

.

.

.

.

rw

Rw

.

.

.

.

.

.

.

.

.

.

.

atcoords

Rw

RW

R

r

RW

R

R

RW

RW

RW

R

R

RW

RW

R

R

.

RW

R

R

RW

RW

RW

R

.

atcorenums (d)

RW

Rw

.

.

W

R

.

.

.

Rw

R

.

.

.

.

.

.

Rw

.

.

.

.

.

.

.

atffparams

.

.

.

.

.

.

.

.

Rw

.

.

.

.

Rw

.

R

.

.

.

R

.

.

.

.

.

atfrozen

rw

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

atgradient

rw

.

.

r

Rw

.

R

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

athessian

rw

.

r

.

.

.

R

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

atmasses

rw

rw

R

r

.

.

R

.

.

.

.

.

.

.

.

.

.

.

.

R

.

.

.

.

.

atnums

RW

RW

R

r

RW

R

R

RW

RW

RW

R

R

RW

RW

R

.

.

RW

R

.

RW

RW

RW

R

.

basisdef

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

bonds

.

rw

.

.

.

.

.

.

rw

.

.

.

.

.

.

.

.

.

.

.

Rw

.

.

.

.

cellvecs

.

.

.

r

.

.

.

.

.

.

.

.

.

.

R

R

.

R

R

.

.

RW

.

.

.

charge (d)

w

RW

.

r

W

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

core_energy

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Rw

.

.

.

.

.

.

.

.

cube

.

.

.

.

.

.

.

.

.

.

.

.

.

.

R

.

.

RW

R

.

.

.

.

.

.

energy

rw

r

R

r

Rw

R

R

RW

.

.

R

R

.

.

.

.

.

.

.

.

.

.

.

.

.

extcharges

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

extra

rw

rw

R

r

Rw

R

.

RW

RW

.

.

R

.

.

.

R

.

.

.

R

.

.

.

.

.

g_rot

.

rw

R

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

lot

Rw

r

R

.

w

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

mo

Rw

.

R

.

RW

R

.

RW

.

RW

R

.

RW

.

.

.

.

.

.

.

.

.

.

.

.

moments

rw

.

.

.

.

.

.

.

.

.

.

R

.

.

.

.

.

.

.

.

.

.

.

.

.

natom (d)

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

nelec (d)

.

R

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Rw

.

.

.

.

.

.

.

.

obasis

R

r

.

.

RW

R

.

RW

.

RW

R

.

RW

.

.

.

.

.

.

.

.

.

.

.

.

obasis_name

Rw

r

R

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

one_ints

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

RW

.

.

.

.

.

.

.

r

one_rdms

rw

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

run_type

R

.

R

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

spinpol (d)

.

RW

.

.

w

.

.

.

.

.

.

.

.

.

.

.

Rw

.

.

.

.

.

.

.

.

title

R

rw

.

R

Rw

R

R

RW

rw

rw

.

.

.

rw

R

R

.

w

R

r

Rw

Rw

Rw

R

.

two_ints

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

RW

.

.

.

.

.

.

.

r

two_rdms

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

User Documentation

Installation

Stable releases

Warning

We are preparing a 1.0 release. Until then, these instructions for installing a stable release will not work yet. If you enjoy living on the edge, try the development release as explained in the “Latest git revision” section below.

Python 3 (>=3.6) must be installed before you can install IOData. In addition, IOData has the following dependencies:

Normally, you don’t need to install these dependencies manually. They will be installed automatically when you follow the instructions below.

Installation with Ana- or Miniconda

To install IOData using the conda package management system, install miniconda or anaconda first, and then:

# Activate your main conda environment if it is not loaded in your .bashrc.
# E.g. run the following if you have miniconda installed in e.g. ~/miniconda3
source ~/miniconda3/bin/activate

# Create a horton3 conda environment. (optional, recommended)
conda create -n horton3
source activate horton3

# Install the stable release.
conda install -c theochem qc-iodata

# Unstable releases
# (Only do this if you understand the implications.)
# Install the testing release. (beta)
conda install -c theochem/label/test qc-iodata
# Install the development release. (alpha)
conda install -c theochem/label/dev qc-iodata
Installation with Pip
  1. You can work in a virtual environment:

    # Create a virtual environment in ~/horton3
    # Feel free to change the path.
    python3 -m venv ~/horton3
    
    # Activate the virtual environemnt.
    source ~/horton3/bin/activate
    
    # Install the stable release in the venv horton3.
    pip3 install qc-iodata
    # alternative: python3 -m pip install qc-iodata
    
    # For developers, install a pre-release (alpha or beta).
    # (Only do this if you understand the implications.)
    pip3 install --pre qc-iodata
    # alternative: python3 -m pip install --pre qc-iodata
    
  2. You can install into your ${HOME} directory, without creating a virtual environment.

    # Install the stable release in your home directory.
    pip3 install qc-iodata --user
    # alternative: python3 -m pip install qc-iodata --user
    
    # For developers, install a pre-release (alpha or beta).
    # (Only do this if you understand the implications.)
    pip3 install --pre qc-iodata --user
    # alternative: python3 -m pip install --pre qc-iodata --user
    

    This is by far the simplest method, ideal to get started, but you have only one home directory. If the installation breaks due to some experimentation, it is harder to make a clean start in comparison to the other options.

In case the pip3 executable is not found, pip may be installed in a directory which is not included in your ${PATH} variable. This seems to be a common issue on macOS. A simple workaround is to replace pip3 by python3 -m pip.

In case Python and your operating system are up to date, you may also use pip instead of pip3 or python instead of python3. The 3 is only used to avoid potential confusion with Python 2. Note that the 3 is only present in names of executables, not names of Python modules.

Latest git revision

This section shows how one can install the latest revision of IOData from the git repository. This kind of installation comes with some risks (sudden API changes, bugs, …) and so be prepared to accept them when using the following installation instructions.

There are two installation methods:

  1. Quick and dirty. Of this method, there are four variants, depending on the correctness of your PATH variable and the presence of a virtual or conda environment. These different scenarios are explained in more detail in the previous section.

    # with env, correct PATH
    pip install git+https://github.com/theochem/iodata.git
    # with env, broken PATH
    python -m pip install git+https://github.com/theochem/iodata.git
    # without env, correct PATH
    pip install git+https://github.com/theochem/iodata.git --user
    # without env, broken PATH
    python -m pip install git+https://github.com/theochem/iodata.git --user
    
  2. Slow and smart. In addition to the four variations in the quick and dirty method, the slow and smart can be used with pip or just with setup.py. You also have the options to use SSH or HTTPS protocols to clone the git repository. Pick whichever works best for you.

    # A) Clone git repo with https OR ssh:
    # The second one only works if you have ssh set up for Github
    #  A1) https
    git clone https://github.com/theochem/iodata.git
    #  A2) ssh
    git clone git@github.com:theochem/iodata.git
    # B) Optionally write the version string
    pip install roberto  # or any of the three other ways of running pip, see above.
    rob write-version
    # C) Actual install, 6 different methods.
    #  C1) setup.py, with env
    python setup.py install
    #  C2) pip, with env, correct PATH
    pip install .
    #  C3) pip, with env, broken PATH
    python -m pip install .
    #  C4) setup.py, without env
    python setup.py install --user
    #  C5) pip, without env, correct PATH
    pip install . --user
    #  C6) pip, without env, broken PATH
    python -m pip install . --user
    

Testing

The tests are automatically run when we build packages with conda, but you may try them again on your own machine after installation.

With Ana- or Miniconda:

# Install pytest in your conda env.
conda install pytest pytest-xdist
# Then run the tests.
pytest --pyargs iodata -n auto

With Pip:

# Install pytest in your conda env ...
pip install pytest pytest-xdist
# .. and refresh the virtual environment.
# This is a venv quirk. Without it, pytest may not find IOData.
deactivate && source ~/horton3/activate

# Alternatively, install pytest in your home directory.
pip install pytest pytest-xdist --user

# Finally, run the tests.
pytest --pyargs iodata -n auto

Getting Started

IOData can be used to read and write different quantum chemistry file formats.

Script usage

The simplest way to use IOData, without writing any code is to use the iodata-convert script.

iodata-convert in.fchk out.molden

See the --help option for more details on usage.

Code usage

More complex use cases can be implemented in Python, using IOData as a library. IOData stores an object containing the data read from the file.

Reading

To read a file, use something like this:

from iodata import load_one

mol = load_one('water.xyz')  # XYZ files contain atomic coordinates in Angstrom
print(mol.atcoords)  # print coordinates in Bohr.

Note that IOData will automatically convert units from the file format’s official specification to atomic units (which is the format used throughout HORTON3).

The file format is inferred from the extension, but one can override the detection mechanism by manually specifying the format:

from iodata import load_one

mol = load_one('water.foo', 'xyz')  # XYZ file with unusual extension
print(mol.atcoords)

IOData also has basic support for loading databases of molecules. For example, the following will iterate over all frames in an XYZ file:

from iodata import load_many

# print the title line from each frame in the trajectory.
for mol in load_many('trajectory.xyz'):
    print(mol.title)
Writing

IOData can also be used to write different file formats:

from iodata import load_one, dump_one

mol = load_one('water.fchk')
# Here you may put some code to manipulate mol before writing it the data
# to a different file.
dump_one(mol, 'water.molden')

One could also convert (and manipulate) an entire trajectory. The following example converts a geometry optimization trajectory from a Gaussian FCHK file to an XYZ file:

from iodata import load_many, dump_many

# Conversion without manipulation.
dump_many((mol for mol in load_many('water_opt.fchk')), 'water_opt.xyz')

If you wish to perform some manipulations before writing the trajectory, the simplest way is to load the entire trajectory in a list of IOData objects and dump it later:

from iodata import load_many, dump_many

# Read the trajectory
trj = list(load_many('water_opt.fchk'))
# Manipulate if desired
# ...
# Write the trajectory
dump_many(trj, 'water_opt.xyz')

For very large trajectories, you may want to avoid loading it as a whole in memory. For this, one should avoid making the list object in the above example. The following approach would be more memory efficient.

from iodata import load_many, dump_many

def itermols():
    for mol in load_many("traj1.xyz"):
        # Do some manipulations
        yield modified_mol

dump_many(itermols(), "traj2.xyz")
Input files

IOData can be used to write input files for quantum-chemistry software. By default minimal settings are used, which can be changed if needed. For example, the following will prepare a Gaussian input for a HF/STO-3G calculation from a PDB file:

from iodata import load_one, write_input

write_input(load_one("water.pdb"), "water.com", fmt="gaussian")

The level of theory and other settings can be modified by setting corresponding attributes in the IOData object:

from iodata import load_one, write_input

mol = load_one("water.pdb")
mol.lot = "B3LYP"
mol.obasis_name = "6-31g*"
mol.run_type = "opt"
write_input(mol, "water.com", fmt="gaussian")

The run types can be any of the following: energy, energy_force, opt, scan or freq. These are translated into program-specific keywords when the file is written.

It is possible to define a custom input file template to allow for specialized commands. This is done by passing a template string using the optional template keyword, placing each IOData attribute (or additional keyword, as shown below) in curly brackets:

from iodata import load_one, write_input

mol = load_one("water.pdb")
mol.lot = "B3LYP"
mol.obasis_name = "Def2QZVP"
mol.run_type = "opt"
custom_template = """\
%NProcShared=4
%mem=16GB
%chk=B3LYP_def2qzvp_H2O
#n {lot}/{obasis_name} scf=(maxcycle=900,verytightlineq,xqc) integral=(grid=ultrafinegrid) pop=(cm5, hlygat, mbs, npa, esp)

{title}

{charge} {spinmult}
{geometry}

"""
write_input(mol, "water.com", fmt="gaussian", template=custom_template)

The input file template may also include keywords that are not part of the IOData object:

from iodata import load_one, write_input

mol = load_one("water.pdb")
mol.lot = "B3LYP"
mol.obasis_name = "Def2QZVP"
mol.run_type = "opt"
custom_template = """\
%chk={chk_name}
#n {lot}/{obasis_name} {run_type}

{title}

{charge} {spinmult}
{geometry}

"""
# Custom keywords as arguments (best for few extra arguments)
write_input(mol, "water.com", fmt="gaussian", template=custom_template, chk_name="B3LYP_def2qzvp_water")

# Custom keywords from a dict (in cases with many extra arguments)
custom_keywords = {"chk_name": "B3LYP_def2qzvp_waters"}
write_input(mol, "water.com", fmt="gaussian", template=custom_template, **custom_keywords)

In some cases, it may be preferable to load the template from file, instead of defining it in the script:

from iodata import load_one, write_input

mol = load_one("water.pdb")
mol.lot = "B3LYP"
mol.obasis_name = "6-31g*"
mol.run_type = "opt"
write_input(mol, "water.com", fmt="gaussian", template=open("my_template.com", "r").read())
Data storage

IOData can be used to store data in a consistent format for writing at a future point.

import numpy as np
from iodata import IOData

mol = IOData(title="water")
mol.atnums = np.array([8, 1, 1])
mol.atcoords = np.array([[0, 0, 0,], [0, 1, 0,], [0, -1, 0,]])  # in Bohr
Unit conversion

IOData always represents all quantities in atomic units and unit conversion constants are defined in iodata.utils. Conversion to atomic units is done by multiplication with a unit constant. This convention can be easily remembered with the following examples:

  • When you say “this bond length is 1.5 Å”, the IOData equivalent is bond_length = 1.5 * angstrom.

  • The conversion from atomic units is similar to axes labels in old papers. For example. a bond length in angstrom is printed as “Bond length / Å”. Expressing this with IOData’s conventions gives print("Bond length in Angstrom:", bond_length /  angstrom)

(This is rather different from the ASE conventions.)

Supported File Formats

CHARMM crd file format (charmm)

CHARMM coordinate files contain information about the location of each atom in Cartesian space. The format of the ASCII (CARD) CHARMM coordinate files is: Title line(s), number of atoms in file and the coordinate lines (one for each atom in the file).

The coordinate lines contain specific information about each atom. These have the following structure: Atom number (sequential), residue number (specified relative to first residue in the PSF), residue name, atom type, x-coordinate, y-coordinate, z-coordinate, segment identifier, residue identifier and a weighting array value.

Filename patterns: *.crd

iodata.formats.charmm.load_one()
  • Always loads atcoords, atffparams, atmasses, extra

  • May load title

VASP 5 CHGCAR file format (chgcar)

This format is used by VASP 5.X and VESTA.

Note that even though the CHGCAR and LOCPOT files look very similar, they require different conversions to atomic units.

Filename patterns: CHGCAR*, AECCAR*

iodata.formats.chgcar.load_one()
  • Always loads atcoords, atnums, cellvecs, cube, title

CP2K ATOM output file format (cp2klog)

Filename patterns: *.cp2k.out

iodata.formats.cp2klog.load_one()
  • Always loads atcoords, atcorenums, atnums, energy, mo, obasis

This function assumes that the following subsections are present in the CP2K ATOM input file, in the section ATOM%PRINT:

&PRINT
  &POTENTIAL
  &END POTENTIAL
  &BASIS_SET
  &END BASIS_SET
  &ORBITALS
  &END ORBITALS
&END PRINT

Gaussian Cube file format (cube)

Cube files are generated by various QC codes these days, including Gaussian, CP2K, GPAW, Q-Chem, …

Note that the second column in the geometry specification of the cube file is interpreted as the effective core charges.

Filename patterns: *.cube, *.cub

iodata.formats.cube.load_one()
  • Always loads atcoords, atcorenums, atnums, cellvecs, cube

iodata.formats.cube.dump_one()
  • Requires atcoords, atnums, cube

  • May dump title, atcorenums

Extended XYZ file format (extxyz)

The extended XYZ file format is defined in the ASE documentation.

Usually, the different frames in a trajectory describe different geometries of the same molecule, with atoms in the same order. The load_many function below can also handle an XYZ with different molecules, e.g. a molecular database.

Filename patterns: *.extxyz

iodata.formats.extxyz.load_one()
  • Always loads title

  • May load atcoords, atgradient, atmasses, atnums, cellvecs, charge, energy, extra

iodata.formats.extxyz.load_many()
  • Always loads title

  • May load atcoords, atgradient, atmasses, atnums, cellvecs, charge, energy, extra

Gaussian FCHK file format (fchk)

Filename patterns: *.fchk, *.fch

iodata.formats.fchk.load_one()
  • Always loads atcharges, atcoords, atnums, atcorenums, lot, mo, obasis, obasis_name, run_type, title

  • May load energy, atfrozen, atgradient, athessian, atmasses, one_rdms, extra, moments

iodata.formats.fchk.dump_one()
  • Requires atnums, atcorenums

  • May dump atcharges, atcoords, atfrozen, atgradient, athessian, atmasses, charge, energy, lot, mo, one_rdms, obasis_name, extra, moments

iodata.formats.fchk.load_many()
  • Always loads atcoords, atgradient, atnums, atcorenums, energy, extra, title

Trajectories from a Gaussian optimization, relaxed scan or IRC calculation are written in groups of frames, called “points” in the Gaussian world, e.g. to discrimininate between different values of the constraint in a relaxed geometry. In most cases, e.g. IRC or conventional optimization, there is only one “point”. Within one “point”, one can have multiple geometries and their properties. This information is stored in the extra attribute:

  • ipoint is the counter for a point

  • npoint is the total number of points.

  • istep is the counter within one “point”

  • nstep is the total number of geometries within in a “point”.

  • reaction_coordinate is only present in case of an IRC calculation.

Molpro 2012 FCIDUMP file format (fcidump)

Notes
  1. This function works only for restricted wave-functions.

  2. One- and two-electron integrals are stored in chemists’ notation in an FCIDUMP file, while IOData internally uses Physicist’s notation.

  3. Keep in mind that the FCIDUMP format changed in MOLPRO 2012, so files generated with older versions are not supported.

Filename patterns: *FCIDUMP*

iodata.formats.fcidump.load_one()
  • Always loads core_energy, one_ints, nelec, spinpol, two_ints

iodata.formats.fcidump.dump_one()
  • Requires one_ints, two_ints

  • May dump core_energy, nelec, spinpol

The dictionary one_ints must contain a field core_mo. Similarly, two_ints must contain two_mo.

GAMESS punch file format (gamess)

Filename patterns: *.dat

iodata.formats.gamess.load_one()
  • Always loads title, energy, grot, atgradient, athessian, atmasses, atnums, atcoords

Gaussian input format (gaussianinput)

Filename patterns: *.com, *.gjf

iodata.formats.gaussianinput.load_one()
  • Always loads atcoords, atnums, title

Gaussian Log file format (gaussianlog)

To write out the integrals in a Gaussian log file, which can be loaded with this module, you need to use the following Gaussian command line:

scf(conventional) iop(3/33=5) extralinks=l316 iop(3/27=999)

Filename patterns: *.log

iodata.formats.gaussianlog.load_one()
  • Always loads

  • May load one_ints, two_ints

GROMACS gro file format (gromacs)

Files with the gro file extension contain a molecular structure in Gromos87 format. GROMACS gro files can be used as trajectory by simply concatenating files.

http://manual.gromacs.org/current/reference-manual/file-formats.html#gro

Filename patterns: *.gro

iodata.formats.gromacs.load_one()
  • Always loads atcoords, atffparams, cellvecs, extra, title

iodata.formats.gromacs.load_many()
  • Always loads atcoords, atffparams, cellvecs, extra, title

QCSchema JSON file format (json)

QCSchema defines four different subschema:

  • Molecule: specifying a molecular system

  • Input: specifying QC program input for a specific Molecule

  • Output: specifying QC program output for a specific Molecule

  • Basis: specifying a basis set for a specific Molecule

General Usage

The QCSchema format is intended to be a catch-all file format for storing and sharing QC calculation data. Due to the wide number of possibilities of the data contained in a single file, not every field in a QCSchema file directly corresponds to an IOData attribute. For example, qcschema_output files allow for many fields capturing different energy contributions, especially for coupled-cluster calculations. To accommodate this fact, IOData does not always assume the intent of the user; instead, IOData ensures that every field in the file is stored in a structured manner. When a QCSchema field does not correspond to an IOData attribute, that data is instead stored in the extra dict, in a dictionary corresponding to the subschema where that data was found. In cases where multiple subschema contain the relevant field (e.g. the Output subschema contains the entirety of the Input subschema), the data will be found in the smallest subschema (for the example above, in IOData.extra["input"], not IOData.extra["output"]).

Dumping an IOData instance to a QCSchema file involves adding relevant required (and optional, if needed) fields to the necessary dictionaries in the extra dict. One exception is the provenance field: if the only desired provenance data is the creation of the file by IOData, that data will be added automatically.

The following sections will describe the requirements of each subschema and the behaviour to expect from IOData when loading in or dumping out a QCSchema file.

Schema Definitions
Provenance Information

The provenance field contains information about how the associated QCSchema object and its attributes were generated, provided, and manipulated. A provenance entry expects these fields:

Field

Description

creator

Required. The program that generated, provided, or manipulated this file.

version

The version of the creator.

routine

The routine of the creator.

In QCElemental, only a single provenance entry is permitted. When generating a QCSchema file for use with QCElemental, the easiest way to ensure compliance is to leave the provenance field blank, to allow the dump_one function to generate the correct provenance information. However, allowing only one entry for provenance information limits the ability to properly trace a file through several operations during complex workflows. With this in mind, IOData supports an enhanced provenance field, in the form of a list of provenance entries, with new entries appended to the end of the list.

Molecule Schema

The qcschema_molecule subschema describes a molecular system, and contains the data necessary to specify a molecular system and support I/O and manipulation processes.

The following is an example of a minimal qcschema_molecule file:

{
  "schema_name": "qcschema_molecule",
  "schema_version": 2,
  "symbols":  ["Li", "Cl"],
  "geometry": [0.000000, 0.000000, -1.631761, 0.000000, 0.000000, 0.287958],
  "molecular_charge": 0,
  "molecular_multiplicity": 1,
  "provenance": {
    "creator": "HORTON3",
    "routine": "Manual validation"
  }
}

The required fields and corresponding types for a qcschema_molecule file are:

Field

Type

IOData attr.

Description

schema_name

str

N/A

The name of the QCSchema subschema. Fixed as qcschema_molecule.

schema_version

str

N/A

The version of the subschema specification. 2.0 is the current version.

symbols

list(N_at)

atnums

An array of the atomic symbols for the system.

geometry

list(3*N_at)

atcoords

An ordered array of XYZ atomic coordinates, corresponding to the order of symbols. The first three elements correspond to atom one, the second three to atom two, etc.

molecular_charge

float

charge

The net electrostatic charge of the molecule. Some writers assume a default of 0.

molecular_multiplicity

int

spinpol

The total multiplicity of this molecule. Some writers assume a default of 1.

provenance

dict or list

N/A

Information about the file was generated, provided, and manipulated. See Provenance section above for more details.

Note: N_at corresponds to the number of atoms in the molecule, as defined by the length of symbols.

The optional fields and corresponding types for a qcschema_molecule file are:

Field

Type

IOData attr.

Description

atom_labels

list(N_at)

N/A

Additional per-atom labels. Typically used for model conversions, not user assignment. The indices of this array correspond to the symbols ordering.

atomic_numbers

list(N_at)

atnums

An array of atomic numbers for each atom. Typically inferred from symbols.

comment

str

N/A

Additional comments for this molecule. These comments are intended for user information, not any computational tasks.

connectivity

list

bonds

The connectivity information between each atom in the symbols array. Each entry in this array is a 3-item array, [index_a, index_b, bond_order], where the indices correspond to the atom indices in symbols.

extras

dict

N/A

Extra information to associate with this molecule.

fix_symmetry

str

g_rot

Maximal point group symmetry with which the molecule should be treated.

fragments

list(N_fr)

N/A

An array that designates which sets of atoms are fragments within the molecule. This is a nested array, with the indices of the base array corresponding to the values in fragment_charges and fragment_multiplicities and the values in the nested arrays corresponding to the indices of symbols.

fragment_charges

list(N_fr)

N/A

The total charge of each fragment in fragments. The indices of this array correspond to the fragments ordering.

fragment_multiplicities

list(N_fr)

N/A

The multiplicity of each fragment in fragments. The indices of this array correspond to the fragments ordering.

id

str

N/A

A unique identifier for this molecule.

identifiers

dict

N/A

Additional identifiers by which this molecule can be referenced, such as INCHI, SMILES, etc.

real

list(N_at)

atcorenums

An array indicating whether each atom is real (true) or a ghost/virtual atom (false). The indices of this array correspond to the symbols ordering.

mass_numbers

list(N_at)

atmasses

An array of atomic mass numbers for each atom. The indices of this array correspond to the symbols ordering.

masses

list(N_at)

atmasses

An array of atomic masses [u] for each atom. Typically inferred from symbols. The indices of this array correspond to the symbols ordering.

name

str

title

An arbitrary, common, or human-readable name to assign to this molecule.

Note: N_at corresponds to the number of atoms in the molecule, as defined by the length of symbols; N_fr corresponds to the number of fragments in the molecule, as defined by the length of fragments. Fragment data is stored in a sub-dictionary, fragments.

The following are additional optional keywords used in QCElemental’s QCSchema implementation. These keywords mostly correspond to specific QCElemental functionality, and may not necessarily produce similar results in other QCSchema parsers.

Field

Type

Description

fix_com

bool

An indicator to prevent pre-processing the molecule by translating the COM to (0,0,0) in Euclidean coordinate space.

fix_orientation

bool

An indicator to prevent pre-processing the molecule by orienting via the inertia tensor.

validated

bool

An indicator that the input molecule data has been previously checked for schema and physics (e.g. non-overlapping atoms, feasible multiplicity) compliance. Generally should only be true when set by a trusted validator.

Input Schema

The qcschema_input subschema describes all data necessary to generate and parse a QC program input file for a given molecule.

The following is an example of a minimal qcschema_input file:

{
  "schema_name": "qcschema_input",
  "schema_version": 2.0,
  "molecule": {
    "schema_name": "qcschema_molecule",
    "schema_version": 2.0,
    "symbols":  ["Li", "Cl"],
    "geometry": [0.000000, 0.000000, -1.631761, 0.000000, 0.000000, 0.287958],
    "molecular_charge": 0.0,
    "molecular_multiplicity": 1,
    "provenance": {
      "creator": "HORTON3",
      "routine": "Manual validation"
    }
  },
  "driver": "energy",
  "model": {
    "method": "B3LYP",
    "basis": "Def2TZVP"
  }
}

The required fields and corresponding types for a qcschema_input file are:

Field

Type

IOData attr.

Description

schema_name

str

N/A

The QCSchema specification to which this model conforms. Fixed as qcschema_input.

schema_version

float

N/A

The version number of schema_name to which this model conforms, currently 2.

molecule

dict

N/A

QCSchema Molecule instance.

driver

str

N/A

The type of calculation being performed. One of energy, gradient, hessian, or properties.

model

dict

N/A

The quantum chemistry model specification for a given operation to compute against. See Model section below.

The optional fields and corresponding types for a qcschema_input file are:

Field

Type

IOData attr.

Description

extras

dict

N/A

Extra information associated with the input.

id

str

N/A

An identifier for the input object.

keywords

dict

N/A

QC program-specific keywords to be used for a computation. See details below for IOData-specific usages.

protocols

dict

N/A

Protocols regarding the manipulation of the output that results from this input. See Protocols section below.

provenance

dict or list

N/A

Information about the file was generated, provided, and manipulated. See Provenance section above for more information.

IOData currently supports the following keywords for qcschema_input files:

Keyword

Type

IOData attr.

Description

run_type

str

run_type

The type of calculation that lead to the results stored in IOData, which must be one of the following: energy, energy_force, opt, scan, freq or None.

Model Subschema

The model dict contains the following fields:

Field

Type

IOData attr.

Description

method

str

lot

The level of theory used for the computation (e.g. B3LYP, PBE, CCSD(T), etc.)

basis

str or dict

N/A

The quantum chemistry basis set to evaluate (e.g. 6-31G, cc-pVDZ, etc.) Can be ‘none’ for methods without basis sets. Must be either a string specifying the basis set name (the same as its name in the Basis Set Exchange, when possible) or a qcschema_basis instance.

Protocols Subschema

The protocols dict contains the following fields:

Field

Type

IOData attr.

Description

wavefunction

str

N/A

Specification of the wavefunction properties to keep from the resulting output. One of all, orbitals_and_eigenvalues, return_results, or none.

keep_stdout

bool

N/A

An indicator to keep the output file from the resulting output.

Output Schema

The qcschema_output subschema describes all data necessary to generate and parse a QC program’s output file for a given molecule.

The following is an example of a minimal qcschema_output file:

{
  "schema_name": "qcschema_output",
  "schema_version": 2.0,
  "molecule": {
    "schema_name": "qcschema_molecule",
    "schema_version": 2.0,
    "symbols":  ["Li", "Cl"],
    "geometry": [0.000000, 0.000000, -1.631761, 0.000000, 0.000000, 0.287958],
    "molecular_charge": 0.0,
    "molecular_multiplicity": 1,
    "provenance": {
      "creator": "HORTON3",
      "routine": "Manual validation"
    }
  },
  "driver": "energy",
  "model": {
    "method": "HF",
    "basis": "STO-4G"
  },
  "properties": {},
  "return_result": -464.626219879,
  "success": true
}

The required fields and corresponding types for a qcschema_output file are:

Field

Type

IOData attr.

Description

schema_name

str

N/A

The QCSchema specification to which this model conforms. Fixed as qcschema_output.

schema_version

float

N/A

The version number of schema_name to which this model conforms, currently 2.

molecule

dict

N/A

QCSchema Molecule instance.

driver

str

N/A

The type of calculation being performed. One of energy, gradient, hessian, or properties.

model

dict

N/A

The quantum chemistry model specification for a given operation to compute against.

properties

dict

N/A

Named properties of quantum chemistry computations. See Properties section below.

return_result

varies

N/A

The result requested by the driver. The type depends on the driver.

success

bool

N/A

An indicator for the success of the QC program’s execution.

The optional fields and corresponding types for a qcschema_output file are:

Field

Type

IOData attr.

Description

error

dict

N/A

A complete description of an error-terminated computation. See Error section below.

extras

dict

N/A

Extra information associated with the input. Also specified for qcschema_input.

id

str

N/A

An identifier for the input object. Also specified for qcschema_input.

keywords

dict

N/A

QC program-specific keywords to be used for a computation. See details below for IOData-specific usages. Also specified for qcschema_input.

protocols

dict

N/A

Protocols regarding the manipulation of the output that results from this input. See Protocols section above. Also specified for qcschema_input.

provenance

dict or list

N/A

Information about the file was generated, provided, and manipulated. See Provenance section above for more information. Also specified for qcschema_input.

stderr

str

N/A

The standard error (stderr) of the associated computation.

stdout

str

N/A

The standard output (stdout) of the associated computation.

wavefunction

dict

N/A

The wavefunction properties of a QC computation. All matrices appear in column-major order. See Wavefunction section below.

Properties Subschema

The properties dict contains named properties of quantum chemistry computations. Due to the variability possible for the contents of an output file, IOData does not guess at which properties are desired by the user, and stores all properties in the extra["output]["properties"] dict for easy retrieval. The current QCSchema standard provides names for the following properties:

Field

Description

calcinfo_nbasis

The number of basis functions for the computation.

calcinfo_nmo

The number of molecular orbitals for the computation.

calcinfo_nalpha

The number of alpha electrons in the computation.

calcinfo_nbeta

The number of beta electrons in the computation.

calcinfo_natom

The number of atoms in the computation.

nuclear_repulsion_energy

The nuclear repulsion energy term.

return_energy

The energy of the requested method, identical to return_value for energy computations.

scf_one_electron_energy

The one-electron (core Hamiltonian) energy contribution to the total SCF energy.

scf_two_electron_energy

The two-electron energy contribution to the total SCF energy.

scf_vv10_energy

The VV10 functional energy contribution to the total SCF energy.

scf_xc_energy

The functional (XC) energy contribution to the total SCF energy.

scf_dispersion_correction_energy

The dispersion correction appended to an underlying functional when a DFT-D method is requested.

scf_dipole_moment

The X, Y, and Z dipole components.

scf_total_energy

The total electronic energy of the SCF stage of the calculation.

scf_iterations

The number of SCF iterations taken before convergence.

mp2_same_spin_correlation_energy

The portion of MP2 doubles correlation energy from same-spin (i.e. triplet) correlations.

mp2_opposite_spin_correlation_energy

The portion of MP2 doubles correlation energy from opposite-spin (i.e. singlet) correlations.

mp2_singles_energy

The singles portion of the MP2 correlation energy. Zero except in ROHF.

mp2_doubles_energy

The doubles portion of the MP2 correlation energy including

same-spin and opposite-spin correlations.

mp2_total_correlation_energy

The MP2 correlation energy.

mp2_correlation_energy

The MP2 correlation energy.

mp2_total_energy

The total MP2 energy (MP2 correlation energy + HF energy).

mp2_dipole_moment

The MP2 X, Y, and Z dipole components.

ccsd_same_spin_correlation_energy

The portion of CCSD doubles correlation energy from same-spin (i.e. triplet) correlations.

ccsd_opposite_spin_correlation_energy

The portion of CCSD doubles correlation energy from opposite-spin (i.e. singlet) correlations

ccsd_singles_energy

The singles portion of the CCSD correlation energy. Zero except in ROHF.

ccsd_doubles_energy

The doubles portion of the CCSD correlation energy including same-spin and opposite-spin correlations.

ccsd_correlation_energy

The CCSD correlation energy.

ccsd_total_energy

The total CCSD energy (CCSD correlation energy + HF energy).

ccsd_dipole_moment

The CCSD X, Y, and Z dipole components.

ccsd_iterations

The number of CCSD iterations taken before convergence.

ccsd_prt_pr_correlation_energy

The CCSD(T) correlation energy.

ccsd_prt_pr_total_energy

The total CCSD(T) energy (CCSD(T) correlation energy + HF energy).

ccsd_prt_pr_dipole_moment

The CCSD(T) X, Y, and Z dipole components.

ccsd_prt_pr_iterations

The number of CCSD(T) iterations taken before convergence.

ccsdt_correlation_energy

The CCSDT correlation energy.

ccsdt_total_energy

The total CCSDT energy (CCSDT correlation energy + HF energy).

ccsdt_dipole_moment

The CCSDT X, Y, and Z dipole components.

ccsdt_iterations

The number of CCSDT iterations taken before convergence.

ccsdtq_correlation_energy

The CCSDTQ correlation energy.

ccsdtq_total_energy

The total CCSDTQ energy (CCSDTQ correlation energy + HF energy).

ccsdtq_dipole_moment

The CCSDTQ X, Y, and Z dipole components.

ccsdtq_iterations

The number of CCSDTQ iterations taken before convergence.

Error Subschema

The error dict contains the following fields:

Field

Type

IOData attr.

Description

error_type

str

N/A

The type of error raised during the computation.

error_message

str

N/A

Additional information related to the error, such as the backtrace.

extras

dict

N/A

Additional data associated with the error.

Wavefunction subschema

The wavefunction subschema contains the wavefunction properties of a QC computation. All matrices appear in column-major order. The current QCSchema standard provides names for the following wavefunction properties:

https://github.com/evaleev/libint/wiki/using-modern-CPlusPlus-API#solid-harmonic-gaussians-ordering-and-normalization

Field

Description

basis

A qcschema_basis instance for the one-electron AO basis set. AO basis functions are ordered according to the CCA standard as implemented in libint.

restricted

An indicator for a restricted calculation (alpha == beta). When true, all beta quantites are omitted, since quantity_b == quantity_a

h_core_a

Alpha-spin core (one-electron) Hamiltonian.

h_core_b

Beta-spin core (one-electron) Hamiltonian.

h_effective_a

Alpha-spin effective core (one-electron) Hamiltonian.

h_effective_b

Beta-spin effective core (one-electron) Hamiltonian.

scf_orbitals_a

Alpha-spin SCF orbitals.

scf_orbitals_b

Beta-spin SCF orbitals.

scf_density_a

Alpha-spin SCF density matrix.

scf_density_b

Beta-spin SCF density matrix.

scf_fock_a

Alpha-spin SCF Fock matrix.

scf_fock_b

Beta-spin SCF Fock matrix.

scf_eigenvalues_a

Alpha-spin SCF eigenvalues.

scf_eigenvalues_b

Beta-spin SCF eigenvalues.

scf_occupations_a

Alpha-spin SCF orbital occupations.

scf_occupations_b

Beta-spin SCF orbital occupations.

orbitals_a

Keyword for the primary return alpha-spin orbitals.

orbitals_b

Keyword for the primary return beta-spin orbitals.

density_a

Keyword for the primary return alpha-spin density.

density_b

Keyword for the primary return beta-spin density.

fock_a

Keyword for the primary return alpha-spin Fock matrix.

fock_b

Keyword for the primary return beta-spin Fock matrix.

eigenvalues_a

Keyword for the primary return alpha-spin eigenvalues.

eigenvalues_b

Keyword for the primary return beta-spin eigenvalues.

occupations_a

Keyword for the primary return alpha-spin orbital occupations.

occupations_b

Keyword for the primary return beta-spin orbital occupations.

Filename patterns: *.json

iodata.formats.json.load_one()
  • Always loads atnums, atcorenums, atcoords, charge, nelec, spinpol

  • May load atmasses, bonds, energy, g_rot, lot, obasis, obasis_name, title, extra

iodata.formats.json.dump_one()
  • Requires atnums, atcoords, charge, spinpol

  • May dump title, atcorenums, atmasses, bonds, g_rot, extra

VASP 5 LOCPOT file format (locpot)

This format is used by VASP 5.X and VESTA.

Note that even though the CHGCAR and LOCPOT files look very similar, they require different conversions to atomic units.

Filename patterns: LOCPOT*

iodata.formats.locpot.load_one()
  • Always loads atcoords, atnums, cellvecs, cube, title

MOL2 file format (mol2)

There are different formats of mol2 files. Here the compatibility with AMBER software was the main objective to write out files with atomic charges used by antechamber.

Filename patterns: *.mol2

iodata.formats.mol2.load_one()
  • Always loads atcoords, atnums, atcharges, atffparams

  • May load title

iodata.formats.mol2.dump_one()
  • Requires atcoords, atnums

  • May dump atcharges, atffparams, title

iodata.formats.mol2.load_many()
  • Always loads atcoords, atnums, atcharges, atffparams

  • May load title

iodata.formats.mol2.dump_many()
  • Requires atcoords, atnums, atcharges

  • May dump title

Molden file format (molden)

Many QC codes can write out Molden files, e.g. Molpro, Orca, PSI4, Molden, Turbomole. Keep in mind that several of these write incorrect versions of the file format, but these errors are corrected when loading them with IOData.

Filename patterns: *.molden.input, *.molden

iodata.formats.molden.load_one()
  • Always loads atcoords, atnums, atcorenums, mo, obasis

  • May load title

  • Keyword arguments norm_threshold

iodata.formats.molden.dump_one()
  • Requires atcoords, atnums, mo, obasis

  • May dump atcorenums, title

Molekel file format (molekel)

This format is used by two programs: Molekel and Orca.

Filename patterns: *.mkl

iodata.formats.molekel.load_one()
  • Always loads atcoords, atnums, mo, obasis

  • May load atcharges

  • Keyword arguments norm_threshold

iodata.formats.molekel.dump_one()
  • Requires atcoords, atnums, mo, obasis

  • May dump atcharges

Multiwfn MWFN file format (mwfn)

Filename patterns: *.mwfn

iodata.formats.mwfn.load_one()
  • Always loads atcoords, atnums, atcorenums, energy, mo, obasis, extra, title

Orca output file format (orcalog)

Filename patterns: *.out

iodata.formats.orcalog.load_one()
  • Always loads atcoords, atnums, energy, moments, extra

PDB file format (pdb)

There are different formats of pdb files. The convention used here is the last updated one and is described in this link: http://www.wwpdb.org/documentation/file-format-content/format33/v3.3.html

Filename patterns: *.pdb

iodata.formats.pdb.load_one()
  • Always loads atcoords, atnums, atffparams, extra

  • May load title, bonds

iodata.formats.pdb.dump_one()
  • Requires atcoords, atnums, extra

  • May dump atffparams, title, bonds

iodata.formats.pdb.load_many()
  • Always loads atcoords, atnums, atffparams, extra

  • May load title

iodata.formats.pdb.dump_many()
  • Requires atcoords, atnums, extra

  • May dump atffparams, title

VASP 5 POSCAR file format (poscar)

This format is used by VASP 5.X and VESTA.

Filename patterns: POSCAR*

iodata.formats.poscar.load_one()
  • Always loads atcoords, atnums, cellvecs, title

iodata.formats.poscar.dump_one()
  • Requires atcoords, atnums, cellvecs

  • May dump title

Q-Chem Log file format (qchemlog)

This module will load Q-Chem log file into IODATA.

Filename patterns: *.qchemlog

iodata.formats.qchemlog.load_one()
  • Always loads atcoords, atmasses, atnums, energy, g_rot, mo, lot, obasis_name, run_type, extra

  • May load athessian

SDF file format (sdf)

Usually, the different frames in a trajectory describe different geometries of the same molecule, with atoms in the same order. The load_many and dump_many functions below can also handle an SDF file with different molecules, e.g. a molecular database.

The SDF format is somewhat documented on the following page: http://www.nonlinear.com/progenesis/sdf-studio/v0.9/faq/sdf-file-format-guidance.aspx

This format is one of the chemical table file formats: https://en.wikipedia.org/wiki/Chemical_table_file

Filename patterns: *.sdf

iodata.formats.sdf.load_one()
  • Always loads atcoords, atnums, bonds, title

iodata.formats.sdf.dump_one()
  • Requires atcoords, atnums

  • May dump title, bonds

iodata.formats.sdf.load_many()
  • Always loads atcoords, atnums, bonds, title

iodata.formats.sdf.dump_many()
  • Requires atcoords, atnums

  • May dump title, bonds

Gaussian/GAMESS-US WFN file format (wfn)

Only use this format if the program that generated it does not offer any alternatives that HORTON can load. The WFN format has the disadvantage that it cannot represent contractions and therefore expands all orbitals into a decontracted basis. This makes the post-processing less efficient compared to formats that do support contractions of Gaussian functions.

Filename patterns: *.wfn

iodata.formats.wfn.load_one()
  • Always loads atcoords, atnums, energy, mo, obasis, title, extra

iodata.formats.wfn.dump_one()
  • Requires atcoords, atnums, energy, mo, obasis, title, extra

AIM/AIMAll WFX file format (wfx)

See http://aim.tkgristmill.com/wfxformat.html

Filename patterns: *.wfx

iodata.formats.wfx.load_one()
  • Always loads atcoords, atgradient, atnums, energy, extra, mo, obasis, title

iodata.formats.wfx.dump_one()
  • Requires atcoords, atnums, atcorenums, mo, obasis, charge

  • May dump title, energy, spinpol, lot, atgradient, extra

XYZ file format (xyz)

Usually, the different frames in a trajectory describe different geometries of the same molecule, with atoms in the same order. The load_many and dump_many functions below can also handle an XYZ with different molecules, e.g. a molecular database.

The load_* and dump_* functions all accept the optional argument atom_columns. This argument fixes the meaning of the columns to be loaded from or dumped to an XYZ file. The following example defines, in addition to the conventional columns, also a column with atomic charges and three columns with atomic forces.

atom_columns = iodata.formats.xyz.DEFAULT_ATOM_COLUMNS + [
    # Atomic charges are stored in a dictionary atcharges and they key
    # refers to the name of the partitioning method.
    ("atcharges", "mulliken", (), float, float, "{:10.5f}".format),
    # Note that in IOData, the energy gradient is stored, which contains the
    # negative forces.
    ("atgradient", None, (3,), float,
     (lambda word: -float(word)),
     (lambda value: "{:15.10f}".format(-value)))
]

mol = load_one("test.xyz", atom_columns=atom_columns)
# The following attributes are present:
print(mol.atnums)
print(mol.atcoords)
print(mol.atcharges["mulliken"])
print(mol.atgradient)

When defining atom_columns, no columns can be skipped, such that all information loaded from a file can also be written back out when dumping it.

Filename patterns: *.xyz

iodata.formats.xyz.load_one()
  • Always loads atcoords, atnums, title

  • Keyword arguments atom_columns

iodata.formats.xyz.dump_one()
  • Requires atcoords, atnums

  • May dump title

  • Keyword arguments atom_columns

iodata.formats.xyz.load_many()
  • Always loads atcoords, atnums, title

  • Keyword arguments atom_columns

iodata.formats.xyz.dump_many()
  • Requires atcoords, atnums

  • May dump title

  • Keyword arguments atom_columns

Supported Input Formats

Gaussian Input Module (gaussian)

iodata.formats.gaussian.write_input()
  • Requires atnums, atcoords

  • May use title, run_type, lot, obasis_name, spinmult, charge

Default Template
'''\
#n {lot}/{obasis_name} {run_type}

{title}

{charge} {spinmult}
{geometry}


'''

Orca Input Module (orca)

iodata.formats.orca.write_input()
  • Requires atnums, atcoords

  • May use title, run_type, lot, obasis_name, spinmult, charge

Default Template
'''\
! {lot} {obasis_name} {run_type}
# {title}
*xyz {charge} {spinmult}
{geometry}
*
'''

Basis set conventions

IOData can load molecular orbital coefficients, density matrices and atomic orbital basis sets from various file formats, and it can also write orbitals and the basis sets in the Molden format. To achieve an unambiguous numerical representation of these objects, conventions for the ordering basis functions (within one shell) and normalization of Gaussian primitives must be fixed.

IOData does not use hard-coded conventions but keeps track of them in attributes of them in IOData.obasis. This attribute is an instance of the iodata.basis.MolecularBasis class, of which the conventions and primitive_normalization attributes contain all the relevant information.

For the time being, the primitive_normalization is always set to 'L2', meaning that the contraction coefficients assume L2-normalized Gaussian primitives. However, IOData does not enforce normalized contractions.

The first subsection provides a mathematical definition of the Gaussian basis functions, which is followed by the specification of the conventions attribute of the MolecularBasis class.

Gaussian basis functions

IOData supports contracted Gaussian basis functions, which have in general the following form:

\[b(\mathbf{r}; D_1, \ldots, D_k, P, \alpha_1, \ldots, \alpha_K, \mathbf{r}_A) = \sum_{k=1}^K D_k N(\alpha_k, P) P(\mathbf{r} - \mathbf{r}_A) \exp(-\alpha_k \Vert \mathbf{r} - \mathbf{r}_A \Vert^2)\]

where \(K\) is the contraction length, \(D_k\) is a contraction coefficient, \(N\) is a normalization constant, \(P\) is a Cartesian polynomial, \(\alpha_k\) is an exponent and \(\mathbf{r}_A\) is the center of the basis function. The summation over \(k\) is conventionally called a contraction of primitive Gaussian basis functions. The L2-normalization of each primitive depends on both the polynomial and the exponent and is defined by the following relation:

\[\int \Bigl\vert N(\alpha_k, P) P(\mathbf{r} - \mathbf{r}_A) \exp(-\alpha_k \Vert \mathbf{r} - \mathbf{r}_A \Vert^2) \Bigr\vert^2 d\mathbf{r} = 1\]

Two types of polynomials will be defined below: Cartesian and pure (harmonic) basis functions.

Cartesian basis functions

When the polynomial consists of a single term as follows:

\[P(x,y,z) = x^{n_x} y^{n_y} z^{n_z}\]

with \(n_x\), \(n_y\), \(n_z\), zero or positive integer powers, one speaks of Cartesian Gaussian basis functions. One refers to the sum of the powers as the angular momentum of the Cartesian Gaussian basis.

The normalization constant of a primitive function is:

\[N(\alpha_k, n_x, n_y, n_z) = \sqrt{\frac {(2\alpha_k/\pi)^{3/2} (4\alpha_k)^{n_x+n_y+n_z}} {(2n_x-1)!! (2n_y-1)!! (2n_z-1)!!} }\]

In practice one combines all basis functions of a given angular momentum (or algebraic order) into one shell. A basis specification typically only mentions the total angular momentum, and it is assumed that all polynomials of that order are included in the basis set. The number of basis functions, i.e. the number of polynomials, for a given angular momentum, \(\ell=n_x+n_y+n_z\), is \((\ell+1)(\ell+2)/2\).

Pure or harmonic basis functions

When the polynomial is a real regular solid harmonic, one speaks of pure Gaussian basis functions:

\[P(r,\theta,\phi) = C_{\ell m}(r,\theta,\phi) \quad \text{or} \quad P(r,\theta,\phi) = S_{\ell m}(r,\theta,\phi)\]

where \(C_{\ell m}\) and \(S_{\ell m}\) are cosine- and sine-like real regular solid harmonics, defined for \(\ell \ge 0\) as follows:

\[\begin{split}C_{\ell 0}(r,\theta,\phi) &= R_\ell^0(r,\theta,\phi) \\ C_{\ell m}(r,\theta,\phi) &= \sqrt{2} (-1)^m \operatorname{Re} R_\ell^m(\theta,\phi) \quad m = 1\ldots \ell \\ S_{\ell m}(r,\theta,\phi) &= \sqrt{2} (-1)^m \operatorname{Im} R_\ell^m(\theta,\phi) \quad m = 1\ldots \ell\end{split}\]

where \(R_\ell^m\) are the regular solid harmonics, which have in general complex function values. The factor \((-1)^m\) undoes the Condon-Shortley phase. In these equations, spherical coordinates are used:

\[\begin{split}x &= R\sin\theta\cos\phi \\ y &= R\sin\theta\sin\phi \\ z &= R\cos\theta\end{split}\]

The regular solid harmonics are derived from the standard spherical harmonics, \(Y_\ell^m\), as follows:

\[\begin{split}R_\ell^m(r, \theta, \varphi) &= \sqrt{\frac{4\pi}{2\ell+1}} \, r^\ell \, Y_\ell^m(\theta, \varphi) \\ &= \sqrt{\frac{(\ell-m)!}{(\ell+m)!}} \, r^\ell \, P_\ell^m(\cos{\theta}) \, e^{i m \varphi}\end{split}\]

where \(P_\ell^m\) are the associated Legendre functions. After substituting this definition of the regular solid harmonics into the real forms, one obtains:

\[\begin{split}C_{\ell 0}(r,\theta,\phi) & = P_\ell^0(\cos{\theta}) \, r^\ell \\ C_{\ell m}(r,\theta,\phi) & = (-1)^m \sqrt{\frac{2(\ell-m)!}{(\ell+m)!}} \, r^\ell \, P_\ell^m(\cos{\theta}) \, \cos(m \phi) \quad m = 1\ldots \ell \\ S_{\ell m}(r,\theta,\phi) & = (-1)^m \sqrt{\frac{2(\ell-m)!}{(\ell+m)!}} \, r^\ell \, P_\ell^m(\cos{\theta}) \, \sin(m \phi) \quad m = 1\ldots \ell \\\end{split}\]

Also here, the factor \((-1)^m\) cancels out the Condon-Shortley phase. These expressions show that cosine-like functions contain a factor \(\cos(m \phi)\), and similarly the sine-like functions contain a factor \(\sin(m \phi)\). The factor \(r^\ell\) causes real regular solid harmonics to be homogeneous Cartesian polynomials, i.e. linear combinations of the Cartesian polynomials defined in the previous subsection.

Real regular solid harmonics are used because the pure s- and p-type functions are consistent with their Cartesian counterparts:

\[\begin{split}C_{00}(x,y,z) & = 1 \\ C_{10}(x,y,z) & = z \\ C_{11}(x,y,z) & = x \\ S_{11}(x,y,z) & = y \\ \dots &\end{split}\]

The normalization constant of a pure Gaussian basis function is:

\[N(\alpha_k, \ell) = \sqrt{\frac {(2\alpha_k/\pi)^{3/2} (4\alpha_k)^\ell} {(2\ell-1)!!} }\]

In practical applications, all the basis functions of a given angular momentum are used and grouped into a shell. A basis specification typically only mentions the total angular momentum, and it is assumed that all polynomials of that order are included in the basis set. The number of basis functions, i.e. the number of polynomials, for a given angular momentum, \(\ell\), is \(2\ell+1\).

The conventions attribute

Different file formats supported by IOData have an incompatible ordering of basis functions within one shell. Also the sign conventions may differ from the definitions given above. The conventions attribute of iodata.basis.MolecularBasis specifies the ordering and sign flips relative to the above definitions. It is a dictionary,

  • whose keys are tuples denoting a shell type (angmom, char) where angmom is a positive integer denoting the angular momentum and char is either 'c' or 'p' for Cartesian are pure, respectively

  • and whose values are lists of basis function strings, where each string denotes one basis function.

A basis function string has a one-to-one correspondence to the Cartesian or pure polynomials defined above.

  • In case of Cartesian functions, \(x^{n_x} y^{n_y} z^{n_z}\) is represented by the string 'x' * nx + 'y' * ny + 'z' * nz, except for the s-type function, which is represented by '1'.

  • In case of pure functions, \(C_{\ell m}\) is represented by 'c{}'.format(m) and \(S_{\ell m}\) is by 's{}'.format(m). The angular momentum quantum number is not included because it is implied by the key in the conventions dictionary.

Each basis function string can be prefixed with a minus sign, to denote a sign flip with respect to the definitions on this page. The order of the string in the list defines the order of the corresponding basis functions within one shell.

For example, pure and Cartesian s, p and d functions in Gaussian FCHK files adhere to the following convention:

conventions = {
    (0, 'c'): ['1'],
    (1, 'c'): ['x', 'y', 'z'],
    (2, 'c'): ['xx', 'yy', 'zz', 'xy', 'xz', 'yz'],
    (2, 'p'): ['c0', 'c1', 's1', 'c2', 's2'],
}

(Pure s and p functions are never used in a Gaussian FCHK file.)

Notes on other conventions

To avoid confusion, negative magnetic quantum numbers are never used to label pure functions in IOData. The basis strings contain ‘c’ and ‘s’ instead. In the literature, e.g. in the book Molecular Electronic-Structure Theory by Helgaker, Jørgensen and Olsen, negative magnetic quantum numbers for pure functions are usually referring to sine-like functions:

\[\begin{split}R_{\ell, m} &= C_{\ell m} \quad m = 0 \ldots \ell \\ R_{\ell, -m} &= S_{\ell m} \quad m = 1 \ldots \ell\end{split}\]

Note that \(\ell\) and \(m\) both appear as subscripts in \(R_{\ell, m}\) and \(R_{\ell, -m}\) to tell them apart from their complex counterparts.

Transformation from Cartesian to pure functions

Pure Gaussian primitives can written as linear combinations of Cartesian ones. Hence, integrals over Cartesian functions can also be transformed into integrals over pure primitives. This transformation is the last step in the calculation of the overlap matrix in IOData:

  1. Integrals are first computed for Gaussian primitives without normalization.

  2. Normalization constants for Cartesian primitives are multiplied into the integrals.

  3. Integrals over primitives are contracted.

  4. Optionally, the integrals for Cartesian functions are transformed into integrals for pure functions.

For the last step, pre-computed transformations matrices (generated by tools/harmonics.py are stored in iodata/overlap_cartpure.py using the HORTON2_CONVENTIONS. The derivation of these transformation matrices is explained below.

Recursive computation of real regular solid harmonics

First, we construct two sets of recursion relations for \(\phi\) and \(\theta\) separately. These will be combined to form the final set of recursion relations that directly operate on the real regular solid harmonics. In these two sets, the notation \(\rho = \sqrt{x^2 + y^2}\) is used.

The first set of recursion relations starts from a fairly trivial idea:

\[\begin{split}\begin{split} \rho^m [\cos(m\phi) + i\sin(m\phi)] &= \rho^m \exp(im\phi) \\ &= \rho \exp(i\phi) \; \rho^{m-1}\exp(i(m-1)\phi) \\ &= (x + iy) \; \rho^{m-1} [\cos((m-1)\phi) + i\sin((m-1)\phi)] \end{split}\end{split}\]
\[\begin{split}\rho \cos(\phi) &= x \\ \rho \sin(\phi) &= y \\ \rho \cos(m\phi) &= x \cos((m-1)\phi) - y \sin((m-1)\phi) \\ \rho \sin(m\phi) &= x \sin((m-1)\phi) + y \cos((m-1)\phi)\end{split}\]

Second, recursion relations for associated Legendre functions can be modified to contain \(r\), \(z\) and \(\rho\), such that \(\cos\theta\) does not appear explicitly:

\[\begin{split}P_0^0(\cos\theta) &= 1 \\ r^\ell P_\ell^\ell(\cos\theta) &= (2\ell - 1) \rho \; r^{\ell-1} P_{\ell-1}^{\ell-1}(\cos\theta) \\ r^{\ell} P_{\ell}^{\ell-1}(\cos\theta) &= -(2\ell - 1) z \; r^{\ell-1} P_{\ell-1}^{\ell-1}(\cos\theta) \\ r^\ell P_{\ell}^{m}(\cos\theta) &= \frac{2\ell - 1}{\ell - m} z \; r^{\ell-1} P_{\ell-1}^{m}(\cos\theta) -\frac{\ell + m - 1}{\ell - m} r^2 \; r^{\ell-2} P_{\ell-2}^{m}(\cos\theta)\end{split}\]

The two sets could be used separately to construct real regular solid harmonics, but they feature \(\rho=\sqrt{x^2+y^2}\), while the regular solid harmonics should be homogeneous polynomials. We can get rid of \(\rho\) by combining the two sets into one:

\[\begin{split}C_{0,0} ={}& 1 \\ C_{1,0} ={}& z \\ C_{1,1} ={}& x \\ S_{1,1} ={}& y \\ C_{\ell,\ell} ={}& \sqrt{\frac{2\ell-1}{2\ell}} \; \Bigl[x C_{\ell-1,\ell-1} - y S_{\ell-1,\ell-1} \Bigr] \quad \forall \; \ell > 1 \\ S_{\ell,\ell} ={}& \sqrt{\frac{2\ell-1}{2\ell}} \; \Bigl[x S_{\ell-1,\ell-1} + y C_{\ell-1,\ell-1} \Bigr] \quad \forall \; \ell > 1 \\ \{CS\}_{\ell,\ell-1} ={}& z \sqrt{2\ell-1} \; \{CS\}_{\ell-1, \ell-1} \quad \forall \; \ell > 1 \\ \{CS\}_{\ell,m} ={}& \frac{(2\ell - 1)z}{\sqrt{(\ell+m)(\ell-m)}} \{CS\}_{\ell-1,m} \nonumber \\ & - r^2 \sqrt{\frac{(\ell - m - 1)(\ell + m - 1)}{(\ell + m)(\ell - m)}} \{CS\}_{\ell - 2,m} \nonumber \\ & \quad \forall \; \ell > m + 1 \text{ and } m \ge 0\end{split}\]

These equations show that real regular solid harmonics are homogeneous polynomials in \(x\), \(y\) and \(z\). Advantages of this approach are (i) the absence of trigonometric expressions and (ii) the similarity between cosine and sine expressions. (Coefficients can be reused.) These recursion relations should be numerically stable for the computation of real regular solid harmonics as a function of Cartesian coordinates. They can also be used to build a transformation matrix from Cartesian mononomials into real regular solid harmonics.

Transformation matrices without normalization

The above recursion relations result in the following transformation matrices. These were obtained by running:

python tools/harmonics.py none latex 3
\[\begin{split}\left(\begin{array}{c} b(C_{20}) \\ b(C_{21}) \\ b(S_{21}) \\ b(C_{22}) \\ b(S_{22}) \end{array}\right) &= \left(\begin{array}{cccccc} - \frac{1}{2} & \cdot & \cdot & - \frac{1}{2} & \cdot & 1 \\ \cdot & \cdot & \sqrt{3} & \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot & \sqrt{3} & \cdot \\ \frac{\sqrt{3}}{2} & \cdot & \cdot & - \frac{\sqrt{3}}{2} & \cdot & \cdot \\ \cdot & \sqrt{3} & \cdot & \cdot & \cdot & \cdot \\ \end{array}\right) \left(\begin{array}{c} b(xx) \\ b(xy) \\ b(xz) \\ b(yy) \\ b(yz) \\ b(zz) \end{array}\right) \\ \left(\begin{array}{c} b(C_{30}) \\ b(C_{31}) \\ b(S_{31}) \\ b(C_{32}) \\ b(S_{32}) \\ b(C_{33}) \\ b(S_{33}) \end{array}\right) &= \left(\begin{array}{cccccccccc} \cdot & \cdot & - \frac{3}{2} & \cdot & \cdot & \cdot & \cdot & - \frac{3}{2} & \cdot & 1 \\ - \frac{\sqrt{6}}{4} & \cdot & \cdot & - \frac{\sqrt{6}}{4} & \cdot & \sqrt{6} & \cdot & \cdot & \cdot & \cdot \\ \cdot & - \frac{\sqrt{6}}{4} & \cdot & \cdot & \cdot & \cdot & - \frac{\sqrt{6}}{4} & \cdot & \sqrt{6} & \cdot \\ \cdot & \cdot & \frac{\sqrt{15}}{2} & \cdot & \cdot & \cdot & \cdot & - \frac{\sqrt{15}}{2} & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot & \sqrt{15} & \cdot & \cdot & \cdot & \cdot & \cdot \\ \frac{\sqrt{10}}{4} & \cdot & \cdot & - \frac{3 \sqrt{10}}{4} & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ \cdot & \frac{3 \sqrt{10}}{4} & \cdot & \cdot & \cdot & \cdot & - \frac{\sqrt{10}}{4} & \cdot & \cdot & \cdot \\ \end{array}\right) \left(\begin{array}{c} b(xxx) \\ b(xxy) \\ b(xxz) \\ b(xyy) \\ b(xyz) \\ b(xzz) \\ b(yyy) \\ b(yyz) \\ b(yzz) \\ b(zzz) \end{array}\right)\end{split}\]
Taking into account normalization

For the calculation of the overlap matrix, the transformations need to be modified, to transform normalized Cartesian functions into normalized pure functions. Accounting for normalization yields slightly different matrices shown below. These were obtained by running:

python tools/harmonics.py L2 latex 3
\[\begin{split}\left(\begin{array}{c} b(C_{20}) \\ b(C_{21}) \\ b(S_{21}) \\ b(C_{22}) \\ b(S_{22}) \end{array}\right) &= \left(\begin{array}{cccccc} - \frac{1}{2} & \cdot & \cdot & - \frac{1}{2} & \cdot & 1 \\ \cdot & \cdot & 1 & \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot & 1 & \cdot \\ \frac{\sqrt{3}}{2} & \cdot & \cdot & - \frac{\sqrt{3}}{2} & \cdot & \cdot \\ \cdot & 1 & \cdot & \cdot & \cdot & \cdot \\ \end{array}\right) \left(\begin{array}{c} b(xx) \\ b(xy) \\ b(xz) \\ b(yy) \\ b(yz) \\ b(zz) \end{array}\right) \\ \left(\begin{array}{c} b(C_{30}) \\ b(C_{31}) \\ b(S_{31}) \\ b(C_{32}) \\ b(S_{32}) \\ b(C_{33}) \\ b(S_{33}) \end{array}\right) &= \left(\begin{array}{cccccccccc} \cdot & \cdot & - \frac{3 \sqrt{5}}{10} & \cdot & \cdot & \cdot & \cdot & - \frac{3 \sqrt{5}}{10} & \cdot & 1 \\ - \frac{\sqrt{6}}{4} & \cdot & \cdot & - \frac{\sqrt{30}}{20} & \cdot & \frac{\sqrt{30}}{5} & \cdot & \cdot & \cdot & \cdot \\ \cdot & - \frac{\sqrt{30}}{20} & \cdot & \cdot & \cdot & \cdot & - \frac{\sqrt{6}}{4} & \cdot & \frac{\sqrt{30}}{5} & \cdot \\ \cdot & \cdot & \frac{\sqrt{3}}{2} & \cdot & \cdot & \cdot & \cdot & - \frac{\sqrt{3}}{2} & \cdot & \cdot \\ \cdot & \cdot & \cdot & \cdot & 1 & \cdot & \cdot & \cdot & \cdot & \cdot \\ \frac{\sqrt{10}}{4} & \cdot & \cdot & - \frac{3 \sqrt{2}}{4} & \cdot & \cdot & \cdot & \cdot & \cdot & \cdot \\ \cdot & \frac{3 \sqrt{2}}{4} & \cdot & \cdot & \cdot & \cdot & - \frac{\sqrt{10}}{4} & \cdot & \cdot & \cdot \\ \end{array}\right) \left(\begin{array}{c} b(xxx) \\ b(xxy) \\ b(xxz) \\ b(xyy) \\ b(xyz) \\ b(xzz) \\ b(yyy) \\ b(yyz) \\ b(yzz) \\ b(zzz) \end{array}\right)\end{split}\]

IOData Changelog

Version 1.0.0

Originally, IOData was a subpackage of HORTON2. It is currently factored out, modernized and ported to Python 3. In this process, the API was seriously refactored, essentially designed from scratch. Compared to HORTON2, IOData 1.0.0 contains the following API-breaking changes:

  • The user-facing API is now a set of five functions: iodata.api.load_one(), iodata.api.dump_one(), iodata.api.load_many(), iodata.api.dump_many() and iodata.api.write_input().

  • The iodata.iodata.IOData object is implemented with the attrs module, which facilites type hinting and checking.

  • The load_many and dump_many functions can handle trajectories and database formats. (At the time of writing, only XYZ and FCHK are supported.)

  • The write_input function can be used to prepare inputs for quantum chemistry software. This function supports user-provided templates.

  • IOData does not impose a specific ordering of the atomic orbital basis functions (within one shell). Practically all possible conventions are supported and one can easily convert from one to another.

  • All attributes of IOData are either built-in Python types, Numpy arrays or NamedTuples defined in IOData. It no longer relies on other parts of HORTON2 to define these data types. (This is most relevant for the orbital basis, the molecular orbitals and the cube data.)

  • Nearly all attributes of the IOData class have been renamed to more systematic terminology.

  • All file format modules have an identical API (and therefore do not fit into a single namespace).

  • Ghost atoms are now loaded as atoms with a zero effective core charge (atcorenums).

  • Spin multiplicity is no longer used. Instead, the spin polarization is stored = abs(nalpha - nbeta).

  • The internal HDF5 file format support has been removed.

  • Many smaller changes have been made, which would be too tedious to be listed here.

In addition, several new attributes were added to the IOData class, and several of them can also be read from file formats we already supported previously. This work will be expanded upon in future releases.

Acknowledgments

This software was developed using funding from a variety of international sources including, but not limited to: Canarie, Canada Research Chairs, Compute Canada, European Union’s Horizon 2020 Marie Sklodowska-Curie Actions (Individual Fellowship No 800130), Foundation of Scientific Research–Flanders (FWO), McMaster University, Queen’s University, Natural Sciences and Engineering Research Council of Canada (NSERC), National Fund for Scientific and Technological Development of Chile (FONDECYT), Research Board of Ghent University (BOF), and Compute Canada.

Developer Documentation

Contributing

We’d love you to contribute. Here are some practical hints to help out.

This document assumes you are familiar with Bash and Python.

General recommendations

  • Please, be careful with tools like autopep8, black or yapf. They may result in a massive number of changes, making pull requests harder to review. Also, when using them, use a maximum line length of 100. To avoid confusion, only clean up the code you are working on. A safer option is to use cardboardlint -F -r master. This will only clean code where you have already made changes.

  • Do not add module-level pylint: disable=... lines, except for the no-member warning in the unit test modules. When adding pylint exception, place them as locally as possible and make sure they are justified.

  • Use type hinting to document the types of function (and method) arguments and return values. This is not yet consistently done throughout IOData at the moment, but it would be helpful to do so in future pull requests. Avoid using strings to postpone the evaluation of the type. (See PEP 0563 for more details on postponed type annotation.)

  • In unit testing, use np.testing.assert_allclose and np.testing.assert_equal for comparing floating-point and integer numpy arrays respectively. np.testing.assert_allclose can also be used for comparing floating point scalars. In all other cases (not involving floating point numbers), the simple assert a == b works equally well and is more readable.

  • IOData always uses atomic units internally. See Unit conversion for details.

Adding new file formats

Each file format is implemented in a module of the package iodata.formats. These modules all follow the same API. Please consult existing formats for some guidance, e.g. the iodata.formats.xyz is a simple but complete example. From the following list, PATTERNS and one of the functions must be implemented:

  • PATTERNS = [ ... ]: a list of glob patterns used to recognize file formats from the file names. This is used to select the correct module from iodata.formats in functions in iodata.api.

  • load_one: load a single IOData object.

  • dump_one: dump a single IOData object.

  • load_many: load multiple IOData objects (iterator) from a single file.

  • dump_many: dump multiple IOData objects (iterator) to a single file.

load_one function: reading a single IOData object from a file

In order to read from a new file format, the module must contain a load_one function with the following signature:

@document_load_one("format", ['list', 'of', 'guaranteed', 'attributes'],
                   ['list', 'of', 'attributes', 'which', 'may', 'be', 'read'],
                   notes)
def load_one(lit: LineIterator) -> dict:
    """Do not edit this docstring. It will be overwritten."""
    # Actual code to read the file

The LineIterator instance provides a convenient interface for reading files and can be found in iodata.utils. As a rule of thumb, always use next(lit) to read a new line from the file. You can use this iterator in a few ways:

# When you need to read one line.
line = next(lit)

# When sections appear in a file in fixed order, you can use helper functions.
data1 = _load_helper_section1(lit)
data2 = _load_helper_section2(lit)

# When you intend to read everything in a file (not for trajectories).
for line in lit:
    # do something with line.

# When you just need to read a section.
for line in lit:
    # do something with line
    if done_with_section:
        break

# When you need a fixed numbers of lines, say 10.
for i in range(10):
    line = next(lit)

# More complex example, in which you detect several sections and call other
# functions to parse those sections. The code is not sensitive to the
# order of the sections.
while True:
    line = next(lit)
    if end_pattern in line:
        break
    elif line == 'section1':
        data1 = _load_helper_section1(lit)
    elif line == 'section2':
        data2 = _load_helper_section2(lit)

# Same as above, but reading till end of file. You cannot use a for loop
# when multiple lines must be read in one iteration.
while True:
    try:
        line = next(lit)
    except StopIteration:
        break
    if end_pattern in line:
        break
    elif line == 'section1':
        data1 = _load_helper_section1(lit)
    elif line == 'section2':
        data2 = _load_helper_section2(lit)

In some cases, one may have to push back a line because it was read too early. For example, in the Molden format, this is sometimes unavoidable. When needed you can push back the line for later reading with lit.back(line).

# When you just need to read a section
for line in lit:
    # do something with line
    if done_with_section:
        # only now it becomes clear that you've read one line to far
        lit.back(line)
        break

When you encounter a file-format error while reading the file, call lit.error(msg), where msg is a short message describing the problem. The error appearing on screen will automatically also contain the filename and line number.

dump_one functions: writing a single IOData object to a file

The dump_one functions are conceptually simpler: they just receive an open file object and an IOData instance as arguments, and should write the data to the open file.

@document_dump_one("format", ['guaranteed', 'attributes'], ['optional', 'attribtues'], notes)
def dump_one(f: TextIO, data: IOData):
    """Do not edit this docstring. It will be overwritten."""
    # code to write data to f.
load_many function: reading multiple IOData objects from a single file

This function works essentially in the same way as load_one, but can load multiple molecules. For example:

@document_load_many("XYZ", ['atcoords', 'atnums', 'title'])
def load_many(lit: LineIterator) -> Iterator[dict]:
    """Do not edit this docstring. It will be overwritten."""
    # XYZ Trajectory files are a simple concatenation of individual XYZ files,'
    # making it travial to load many frames.
    while True:
        try:
            yield load_one(lit)
        except StopIteration:
            return

The XYZ trajectory format is simply a concatenation of individual XYZ files, such that one can use the load_one function to read a single frame. In some file formats, more complicated approaches are needed. In any case, one must use the yield keyword for every frame read from a file.

dump_many function: writing multiple IOData objects to a single file

Also dump_many is very similar to dump_one, but just takes an iterator over multiple IOData instances as argument. It is expected to write all of these to a single open file object. For example:

@document_dump_many("XYZ", ['atcoords', 'atnums'], ['title'])
def dump_many(f: TextIO, datas: Iterator[IOData]):
    """Do not edit this docstring. It will be overwritten."""
    # Similar to load_many, this is relatively easy.
    for data in datas:
        dump_one(f, data)

Also here, we take advantage of the simple structure of the XYZ trajectory format, i.e. the simple concatenation of individual XYZ files. For other formats, this could become more complicated.

Github work flow

Before diving into technicalities: if you intend to make major changes, beyond fixing bugs and small functionality improvements, please open a Github issue first, so we can discuss before coding. Please explain what you intend to accomplish and why. That often saves a lot of time and trouble in the long run.

Use the issue to plan your changes. Try to solve only one problem at a time, instead of fixing several issues and adding different features in a single shot. Small changes are easier to handle, also for the reviewer in the last step below.

Mention in the corresponding issue when you are working on it. “Claim” the issue to avoid duplicate efforts.

  1. Check your GitHub settings and your local git configuration:

    • If you don’t have an SSH key pair yet, create one with the following terminal command:

      ssh-keygen -t rsa -b 4096 -C "your_email@example.com"
      

      A suitable name for this key would be id_rsa_github. An empty pass phrase is convenient and should be fine. This will generate a private and a public key in ${HOME}/.ssh.

    • Upload your public SSH key to https://github.com/settings/keys. This is a single long line in id_rsa_github.pub, which you can copy-paste into the browser.

    • Configure SSH to use this key pair for authentication when pushing branches to Github. Add the following to your .ssh/config file:

      Host github.com
          Hostname github.com
          ForwardX11 no
          IdentityFile /home/your_user_name/.ssh/id_rsa_github
      

      (Make sure you have the correct path to the private key file.)

    • Configure git to use the name and e-mail address tied to your Github account:

      git config --global user.name "Your Name"
      git config --global user.email "youremail@yourdomain.com"
      
  2. Install Roberto, which is the driver for our CI setup. It can also replicate the continuous integration on your local machine, which makes it easier to prepare a passable pull request. See https://theochem.github.io/roberto/.

  3. Make a fork of the project, using the Github “fork” feature.

  4. Clone the original repository on your local machine and enter the directory

    git clone git@github.com:theochem/iodata.git
    cd iodata
    
  5. Add your fork as a second remote to your local repository, for which we will use the short name mine below, but any short name is fine:

    git remote add mine git@github.com:<your-github-account>/iodata.git
    
  6. Make a new branch, with a name that hints at the purpose of your modification:

    git checkout -b new-feature
    
  7. Make changes to the source. Please, make it easy for others to understand your code. Also, add tests that verify your code works as intended. Rules of thumb:

    • Write transparent code, e.g. self-explaining variable names.

    • Add comments to passages that are not easy to understand at first glance.

    • Write docstrings explaining the API.

    • Add unit tests when feasible.

  8. Commit your changes with a meaningful commit message. The first line is a short summary, written in the imperative mood. Optionally, this can be followed by an empty line and a longer description.

    If you feel the summary line is too short to describe what you did, it may be better to split your changes into multiple commits.

  9. Run Roberto and fix all problems it reports. Either one of the following should work

    rob                 # Normal case
    python3 -m roberto  # Only if your PATH is not set correctly
    

    Style issues, failing tests and packaging issues should all be detected at this stage.

  10. Push your branch to your forked repository on Github:

    git push mine -u new-feature
    

    A link should be printed on screen, which will take the next step for you.

  11. Make a pull request from your branch new-feature in your forked repository to the master branch in the original repository.

  12. Wait for the tests on Travis-CI to complete. These should pass. Also coverage analysis will be shown, but this is merely indicative. Normally, someone should review your pull request in a few days. Ideally, the review results in minor corrections at worst. We’ll do our best to avoid larger problems in step 1.

Notes on attrs

IOData uses the attrs library, not to be confused with the attr library, for classes representing data loaded from files: IOData, MolecularBasis, Shell, MolecularOrbitals and Cube. This enables basic attribute validation, which eliminates potentially silly bugs. (See iodata/attrutils.py and the usage of validate_shape in all those classes.)

The following attrs functions could be convenient when working with these classes:

  • The data can be turned into plain Python data types with the attr.asdict function. Make sure you add the retain_collection_types=True option, to avoid the following issue: https://github.com/python-attrs/attrs/issues/646 For example.

    from iodata import load_one
    import attr
    iodata = load_one("example.xyz")
    fields = attr.asdict(iodata, retain_collection_types=True)
    

    A similar astuple function works as you would expect.

  • A shallow copy with a few modified attributes can be created with the evolve method, which is a wrapper for attr.evolve:

    from iodata import load_one
    import attr
    iodata1 = load_one("example.xyz")
    iodata2 = attr.evolve(iodata1, title="another title")
    

    The usage of evolve becomes mandatory when you want to change two or more attributes whose shape need to be consistent. For example, the following would fail:

    from iodata import IOData
    iodata = IOData(atnums=[7, 7], atcoords=[[0, 0, 0], [2, 0, 0]])
    # The next line will fail because the size of atnums and atcoords
    # becomes inconsistent.
    iodata.atnums = [8, 8, 8]
    iodata.atcoords = [[0, 0, 0], [2, 0, 1], [4, 0, 0]]
    

    The following code, which has the same intent, does work:

       from iodata import IOData
       import attr
       iodata1 = IOData(atnums=[7, 7], atcoords=[[0, 0, 0], [2, 0, 0]])
       iodata2 = attr.evolve(
           iodata1,
           atnums=[8, 8, 8],
           atcoords=[[0, 0, 0], [2, 0, 1], [4, 0, 0]],
       )
    
    For brevity, lists (of lists) were used in these examples. These are always
    converted to arrays by the constructor or when assigning them to attributes.
    

API Reference

iodata

iodata package

Subpackages
iodata.formats package
Submodules
iodata.formats.charmm module

CHARMM crd file format.

CHARMM coordinate files contain information about the location of each atom in Cartesian space. The format of the ASCII (CARD) CHARMM coordinate files is: Title line(s), number of atoms in file and the coordinate lines (one for each atom in the file).

The coordinate lines contain specific information about each atom. These have the following structure: Atom number (sequential), residue number (specified relative to first residue in the PSF), residue name, atom type, x-coordinate, y-coordinate, z-coordinate, segment identifier, residue identifier and a weighting array value.

load_one(lit)[source]

Load a single frame from a CRD file.

Parameters:

lit (LineIterator) – The line iterator to read the data from.

Returns:

result – A dictionary with IOData attributes. The following attributes are guaranteed to be loaded: atcoords, atffparams, atmasses, extra. The following may be loaded if present in the file: title.

Return type:

dict

Notes

iodata.formats.chgcar module

VASP 5 CHGCAR file format.

This format is used by VASP 5.X and VESTA.

Note that even though the CHGCAR and LOCPOT files look very similar, they require different conversions to atomic units.

load_one(lit)[source]

Load a single frame from a VASP 5 CHGCAR file.

Parameters:

lit (LineIterator) – The line iterator to read the data from.

Returns:

result – A dictionary with IOData attributes. The following attributes are guaranteed to be loaded: atcoords, atnums, cellvecs, cube, title.

Return type:

dict

Notes

iodata.formats.cp2klog module

CP2K ATOM output file format.

load_one(lit)[source]

Load a single frame from a CP2K ATOM outupt file.

Parameters:

lit (LineIterator) – The line iterator to read the data from.

Returns:

result – A dictionary with IOData attributes. The following attributes are guaranteed to be loaded: atcoords, atcorenums, atnums, energy, mo, obasis.

Return type:

dict

Notes

This function assumes that the following subsections are present in the CP2K ATOM input file, in the section ATOM%PRINT:

&PRINT
  &POTENTIAL
  &END POTENTIAL
  &BASIS_SET
  &END BASIS_SET
  &ORBITALS
  &END ORBITALS
&END PRINT
iodata.formats.cube module

Gaussian Cube file format.

Cube files are generated by various QC codes these days, including Gaussian, CP2K, GPAW, Q-Chem, …

Note that the second column in the geometry specification of the cube file is interpreted as the effective core charges.

dump_one(f, data)[source]

Dump a single frame into a Gaussian Cube file.

Parameters:
  • f (TextIO) – A writeable file object.

  • data (IOData) – An IOData instance which must have the following attributes initialized: atcoords, atnums, cube. If the following attributes are present, they are also dumped into the file: title, atcorenums.

Notes

load_one(lit)[source]

Load a single frame from a Gaussian Cube file.

Parameters:

lit (LineIterator) – The line iterator to read the data from.

Returns:

result – A dictionary with IOData attributes. The following attributes are guaranteed to be loaded: atcoords, atcorenums, atnums, cellvecs, cube.

Return type:

dict

Notes

iodata.formats.extxyz module

Extended XYZ file format.

The extended XYZ file format is defined in the ASE documentation.

Usually, the different frames in a trajectory describe different geometries of the same molecule, with atoms in the same order. The load_many function below can also handle an XYZ with different molecules, e.g. a molecular database.

load_many(lit)[source]

Load multiple frames from a EXTXYZ file.

Parameters:

lit (LineIterator) – The line iterator to read the data from.

Yields:

result (dict) – A dictionary with IOData attributes. The following attribtues are guaranteed to be loaded: title. The following may be loaded if present in the file: atcoords, atgradient, atmasses, atnums, cellvecs, charge, energy, extra.

Return type:

Iterator[dict]

Notes

load_one(lit)[source]

Load a single frame from a EXTXYZ file.

Parameters:

lit (LineIterator) – The line iterator to read the data from.

Returns:

result – A dictionary with IOData attributes. The following attributes are guaranteed to be loaded: title. The following may be loaded if present in the file: atcoords, atgradient, atmasses, atnums, cellvecs, charge, energy, extra.

Return type:

dict

Notes

iodata.formats.fchk module

Gaussian FCHK file format.

dump_one(f, data)[source]

Dump a single frame into a Gaussian Formatted Checkpoint file.

Parameters:
  • f (TextIO) – A writeable file object.

  • data (IOData) – An IOData instance which must have the following attributes initialized: atnums, atcorenums. If the following attributes are present, they are also dumped into the file: atcharges, atcoords, atfrozen, atgradient, athessian, atmasses, charge, energy, lot, mo, one_rdms, obasis_name, extra, moments.

Notes

load_many(lit)[source]

Load multiple frames from a XYZ file.

Parameters:

lit (LineIterator) – The line iterator to read the data from.

Yields:

result (dict) – A dictionary with IOData attributes. The following attribtues are guaranteed to be loaded: atcoords, atgradient, atnums, atcorenums, energy, extra, title.

Return type:

Iterator[dict]

Notes

Trajectories from a Gaussian optimization, relaxed scan or IRC calculation are written in groups of frames, called “points” in the Gaussian world, e.g. to discrimininate between different values of the constraint in a relaxed geometry. In most cases, e.g. IRC or conventional optimization, there is only one “point”. Within one “point”, one can have multiple geometries and their properties. This information is stored in the extra attribute:

  • ipoint is the counter for a point

  • npoint is the total number of points.

  • istep is the counter within one “point”

  • nstep is the total number of geometries within in a “point”.

  • reaction_coordinate is only present in case of an IRC calculation.

load_one(lit)[source]

Load a single frame from a Gaussian Formatted Checkpoint file.

Parameters:

lit (LineIterator) – The line iterator to read the data from.

Returns:

result – A dictionary with IOData attributes. The following attributes are guaranteed to be loaded: atcharges, atcoords, atnums, atcorenums, lot, mo, obasis, obasis_name, run_type, title. The following may be loaded if present in the file: energy, atfrozen, atgradient, athessian, atmasses, one_rdms, extra, moments.

Return type:

dict

Notes

iodata.formats.fcidump module

Molpro 2012 FCIDUMP file format.

Notes

  1. This function works only for restricted wave-functions.

  2. One- and two-electron integrals are stored in chemists’ notation in an FCIDUMP file, while IOData internally uses Physicist’s notation.

  3. Keep in mind that the FCIDUMP format changed in MOLPRO 2012, so files generated with older versions are not supported.

dump_one(f, data)[source]

Dump a single frame into a Molpro 2012 FCIDUMP file.

Parameters:
  • f (TextIO) – A writeable file object.

  • data (IOData) – An IOData instance which must have the following attributes initialized: one_ints, two_ints. If the following attributes are present, they are also dumped into the file: core_energy, nelec, spinpol.

Notes

The dictionary one_ints must contain a field core_mo. Similarly, two_ints must contain two_mo.

load_one(lit)[source]

Load a single frame from a Molpro 2012 FCIDUMP file.

Parameters:

lit (LineIterator) – The line iterator to read the data from.

Returns:

result – A dictionary with IOData attributes. The following attributes are guaranteed to be loaded: core_energy, one_ints, nelec, spinpol, two_ints.

Return type:

dict

Notes

iodata.formats.gamess module

GAMESS punch file format.

load_one(lit)[source]

Load a single frame from a PUNCH file.

Parameters:

lit (LineIterator) – The line iterator to read the data from.

Returns:

result – A dictionary with IOData attributes. The following attributes are guaranteed to be loaded: title, energy, grot, atgradient, athessian, atmasses, atnums, atcoords.

Return type:

dict

Notes

iodata.formats.gaussianinput module

Gaussian input format.

load_one(lit)[source]

Load a single frame from a Gaussian Input File file.

Parameters:

lit (LineIterator) – The line iterator to read the data from.

Returns:

result – A dictionary with IOData attributes. The following attributes are guaranteed to be loaded: atcoords, atnums, title.

Return type:

dict

Notes

iodata.formats.gaussianlog module

Gaussian Log file format.

To write out the integrals in a Gaussian log file, which can be loaded with this module, you need to use the following Gaussian command line:

scf(conventional) iop(3/33=5) extralinks=l316 iop(3/27=999)
load_one(lit)[source]

Load a single frame from a Gaussian Log file.

Parameters:

lit (LineIterator) – The line iterator to read the data from.

Returns:

result – A dictionary with IOData attributes. The following attributes are guaranteed to be loaded: . The following may be loaded if present in the file: one_ints, two_ints.

Return type:

dict

Notes

iodata.formats.gromacs module

GROMACS gro file format.

Files with the gro file extension contain a molecular structure in Gromos87 format. GROMACS gro files can be used as trajectory by simply concatenating files.

http://manual.gromacs.org/current/reference-manual/file-formats.html#gro

load_many(lit)[source]

Load multiple frames from a GRO file.

Parameters:

lit (LineIterator) – The line iterator to read the data from.

Yields:

result (dict) – A dictionary with IOData attributes. The following attribtues are guaranteed to be loaded: atcoords, atffparams, cellvecs, extra, title.

Return type:

Iterator[dict]

Notes

load_one(lit)[source]

Load a single frame from a GRO file.

Parameters:

lit (LineIterator) – The line iterator to read the data from.

Returns:

result – A dictionary with IOData attributes. The following attributes are guaranteed to be loaded: atcoords, atffparams, cellvecs, extra, title.

Return type:

dict

Notes

iodata.formats.json module

QCSchema JSON file format.

QCSchema defines four different subschema:

  • Molecule: specifying a molecular system

  • Input: specifying QC program input for a specific Molecule

  • Output: specifying QC program output for a specific Molecule

  • Basis: specifying a basis set for a specific Molecule

General Usage

The QCSchema format is intended to be a catch-all file format for storing and sharing QC calculation data. Due to the wide number of possibilities of the data contained in a single file, not every field in a QCSchema file directly corresponds to an IOData attribute. For example, qcschema_output files allow for many fields capturing different energy contributions, especially for coupled-cluster calculations. To accommodate this fact, IOData does not always assume the intent of the user; instead, IOData ensures that every field in the file is stored in a structured manner. When a QCSchema field does not correspond to an IOData attribute, that data is instead stored in the extra dict, in a dictionary corresponding to the subschema where that data was found. In cases where multiple subschema contain the relevant field (e.g. the Output subschema contains the entirety of the Input subschema), the data will be found in the smallest subschema (for the example above, in IOData.extra["input"], not IOData.extra["output"]).

Dumping an IOData instance to a QCSchema file involves adding relevant required (and optional, if needed) fields to the necessary dictionaries in the extra dict. One exception is the provenance field: if the only desired provenance data is the creation of the file by IOData, that data will be added automatically.

The following sections will describe the requirements of each subschema and the behaviour to expect from IOData when loading in or dumping out a QCSchema file.

Schema Definitions
Provenance Information

The provenance field contains information about how the associated QCSchema object and its attributes were generated, provided, and manipulated. A provenance entry expects these fields:

Field

Description

creator

Required. The program that generated, provided, or manipulated this file.

version

The version of the creator.

routine

The routine of the creator.

In QCElemental, only a single provenance entry is permitted. When generating a QCSchema file for use with QCElemental, the easiest way to ensure compliance is to leave the provenance field blank, to allow the dump_one function to generate the correct provenance information. However, allowing only one entry for provenance information limits the ability to properly trace a file through several operations during complex workflows. With this in mind, IOData supports an enhanced provenance field, in the form of a list of provenance entries, with new entries appended to the end of the list.

Molecule Schema

The qcschema_molecule subschema describes a molecular system, and contains the data necessary to specify a molecular system and support I/O and manipulation processes.

The following is an example of a minimal qcschema_molecule file:

{
  "schema_name": "qcschema_molecule",
  "schema_version": 2,
  "symbols":  ["Li", "Cl"],
  "geometry": [0.000000, 0.000000, -1.631761, 0.000000, 0.000000, 0.287958],
  "molecular_charge": 0,
  "molecular_multiplicity": 1,
  "provenance": {
    "creator": "HORTON3",
    "routine": "Manual validation"
  }
}

The required fields and corresponding types for a qcschema_molecule file are:

Field

Type

IOData attr.

Description

schema_name

str

N/A

The name of the QCSchema subschema. Fixed as qcschema_molecule.

schema_version

str

N/A

The version of the subschema specification. 2.0 is the current version.

symbols

list(N_at)

atnums

An array of the atomic symbols for the system.

geometry

list(3*N_at)

atcoords

An ordered array of XYZ atomic coordinates, corresponding to the order of symbols. The first three elements correspond to atom one, the second three to atom two, etc.

molecular_charge

float

charge

The net electrostatic charge of the molecule. Some writers assume a default of 0.

molecular_multiplicity

int

spinpol

The total multiplicity of this molecule. Some writers assume a default of 1.

provenance

dict or list

N/A

Information about the file was generated, provided, and manipulated. See Provenance section above for more details.

Note: N_at corresponds to the number of atoms in the molecule, as defined by the length of symbols.

The optional fields and corresponding types for a qcschema_molecule file are:

Field

Type

IOData attr.

Description

atom_labels

list(N_at)

N/A

Additional per-atom labels. Typically used for model conversions, not user assignment. The indices of this array correspond to the symbols ordering.

atomic_numbers

list(N_at)

atnums

An array of atomic numbers for each atom. Typically inferred from symbols.

comment

str

N/A

Additional comments for this molecule. These comments are intended for user information, not any computational tasks.

connectivity

list

bonds

The connectivity information between each atom in the symbols array. Each entry in this array is a 3-item array, [index_a, index_b, bond_order], where the indices correspond to the atom indices in symbols.

extras

dict

N/A

Extra information to associate with this molecule.

fix_symmetry

str

g_rot

Maximal point group symmetry with which the molecule should be treated.

fragments

list(N_fr)

N/A

An array that designates which sets of atoms are fragments within the molecule. This is a nested array, with the indices of the base array corresponding to the values in fragment_charges and fragment_multiplicities and the values in the nested arrays corresponding to the indices of symbols.

fragment_charges

list(N_fr)

N/A

The total charge of each fragment in fragments. The indices of this array correspond to the fragments ordering.

fragment_multiplicities

list(N_fr)

N/A

The multiplicity of each fragment in fragments. The indices of this array correspond to the fragments ordering.

id

str

N/A

A unique identifier for this molecule.

identifiers

dict

N/A

Additional identifiers by which this molecule can be referenced, such as INCHI, SMILES, etc.

real

list(N_at)

atcorenums

An array indicating whether each atom is real (true) or a ghost/virtual atom (false). The indices of this array correspond to the symbols ordering.

mass_numbers

list(N_at)

atmasses

An array of atomic mass numbers for each atom. The indices of this array correspond to the symbols ordering.

masses

list(N_at)

atmasses

An array of atomic masses [u] for each atom. Typically inferred from symbols. The indices of this array correspond to the symbols ordering.

name

str

title

An arbitrary, common, or human-readable name to assign to this molecule.

Note: N_at corresponds to the number of atoms in the molecule, as defined by the length of symbols; N_fr corresponds to the number of fragments in the molecule, as defined by the length of fragments. Fragment data is stored in a sub-dictionary, fragments.

The following are additional optional keywords used in QCElemental’s QCSchema implementation. These keywords mostly correspond to specific QCElemental functionality, and may not necessarily produce similar results in other QCSchema parsers.

Field

Type

Description

fix_com

bool

An indicator to prevent pre-processing the molecule by translating the COM to (0,0,0) in Euclidean coordinate space.

fix_orientation

bool

An indicator to prevent pre-processing the molecule by orienting via the inertia tensor.

validated

bool

An indicator that the input molecule data has been previously checked for schema and physics (e.g. non-overlapping atoms, feasible multiplicity) compliance. Generally should only be true when set by a trusted validator.

Input Schema

The qcschema_input subschema describes all data necessary to generate and parse a QC program input file for a given molecule.

The following is an example of a minimal qcschema_input file:

{
  "schema_name": "qcschema_input",
  "schema_version": 2.0,
  "molecule": {
    "schema_name": "qcschema_molecule",
    "schema_version": 2.0,
    "symbols":  ["Li", "Cl"],
    "geometry": [0.000000, 0.000000, -1.631761, 0.000000, 0.000000, 0.287958],
    "molecular_charge": 0.0,
    "molecular_multiplicity": 1,
    "provenance": {
      "creator": "HORTON3",
      "routine": "Manual validation"
    }
  },
  "driver": "energy",
  "model": {
    "method": "B3LYP",
    "basis": "Def2TZVP"
  }
}

The required fields and corresponding types for a qcschema_input file are:

Field

Type

IOData attr.

Description

schema_name

str

N/A

The QCSchema specification to which this model conforms. Fixed as qcschema_input.

schema_version

float

N/A

The version number of schema_name to which this model conforms, currently 2.

molecule

dict

N/A

QCSchema Molecule instance.

driver

str

N/A

The type of calculation being performed. One of energy, gradient, hessian, or properties.

model

dict

N/A

The quantum chemistry model specification for a given operation to compute against. See Model section below.

The optional fields and corresponding types for a qcschema_input file are:

Field

Type

IOData attr.

Description

extras

dict

N/A

Extra information associated with the input.

id

str

N/A

An identifier for the input object.

keywords

dict

N/A

QC program-specific keywords to be used for a computation. See details below for IOData-specific usages.

protocols

dict

N/A

Protocols regarding the manipulation of the output that results from this input. See Protocols section below.

provenance

dict or list

N/A

Information about the file was generated, provided, and manipulated. See Provenance section above for more information.

IOData currently supports the following keywords for qcschema_input files:

Keyword

Type

IOData attr.

Description

run_type

str

run_type

The type of calculation that lead to the results stored in IOData, which must be one of the following: energy, energy_force, opt, scan, freq or None.

Model Subschema

The model dict contains the following fields:

Field

Type

IOData attr.

Description

method

str

lot

The level of theory used for the computation (e.g. B3LYP, PBE, CCSD(T), etc.)

basis

str or dict

N/A

The quantum chemistry basis set to evaluate (e.g. 6-31G, cc-pVDZ, etc.) Can be ‘none’ for methods without basis sets. Must be either a string specifying the basis set name (the same as its name in the Basis Set Exchange, when possible) or a qcschema_basis instance.

Protocols Subschema

The protocols dict contains the following fields:

Field

Type

IOData attr.

Description

wavefunction

str

N/A

Specification of the wavefunction properties to keep from the resulting output. One of all, orbitals_and_eigenvalues, return_results, or none.

keep_stdout

bool

N/A

An indicator to keep the output file from the resulting output.

Output Schema

The qcschema_output subschema describes all data necessary to generate and parse a QC program’s output file for a given molecule.

The following is an example of a minimal qcschema_output file:

{
  "schema_name": "qcschema_output",
  "schema_version": 2.0,
  "molecule": {
    "schema_name": "qcschema_molecule",
    "schema_version": 2.0,
    "symbols":  ["Li", "Cl"],
    "geometry": [0.000000, 0.000000, -1.631761, 0.000000, 0.000000, 0.287958],
    "molecular_charge": 0.0,
    "molecular_multiplicity": 1,
    "provenance": {
      "creator": "HORTON3",
      "routine": "Manual validation"
    }
  },
  "driver": "energy",
  "model": {
    "method": "HF",
    "basis": "STO-4G"
  },
  "properties": {},
  "return_result": -464.626219879,
  "success": true
}

The required fields and corresponding types for a qcschema_output file are:

Field

Type

IOData attr.

Description

schema_name

str

N/A

The QCSchema specification to which this model conforms. Fixed as qcschema_output.

schema_version

float

N/A

The version number of schema_name to which this model conforms, currently 2.

molecule

dict

N/A

QCSchema Molecule instance.

driver

str

N/A

The type of calculation being performed. One of energy, gradient, hessian, or properties.

model

dict

N/A

The quantum chemistry model specification for a given operation to compute against.

properties

dict

N/A

Named properties of quantum chemistry computations. See Properties section below.

return_result

varies

N/A

The result requested by the driver. The type depends on the driver.

success

bool

N/A

An indicator for the success of the QC program’s execution.

The optional fields and corresponding types for a qcschema_output file are:

Field

Type

IOData attr.

Description

error

dict

N/A

A complete description of an error-terminated computation. See Error section below.

extras

dict

N/A

Extra information associated with the input. Also specified for qcschema_input.

id

str

N/A

An identifier for the input object. Also specified for qcschema_input.

keywords

dict

N/A

QC program-specific keywords to be used for a computation. See details below for IOData-specific usages. Also specified for qcschema_input.

protocols

dict

N/A

Protocols regarding the manipulation of the output that results from this input. See Protocols section above. Also specified for qcschema_input.

provenance

dict or list

N/A

Information about the file was generated, provided, and manipulated. See Provenance section above for more information. Also specified for qcschema_input.

stderr

str

N/A

The standard error (stderr) of the associated computation.

stdout

str

N/A

The standard output (stdout) of the associated computation.

wavefunction

dict

N/A

The wavefunction properties of a QC computation. All matrices appear in column-major order. See Wavefunction section below.

Properties Subschema

The properties dict contains named properties of quantum chemistry computations. Due to the variability possible for the contents of an output file, IOData does not guess at which properties are desired by the user, and stores all properties in the extra["output]["properties"] dict for easy retrieval. The current QCSchema standard provides names for the following properties:

Field

Description

calcinfo_nbasis

The number of basis functions for the computation.

calcinfo_nmo

The number of molecular orbitals for the computation.

calcinfo_nalpha

The number of alpha electrons in the computation.

calcinfo_nbeta

The number of beta electrons in the computation.

calcinfo_natom

The number of atoms in the computation.

nuclear_repulsion_energy

The nuclear repulsion energy term.

return_energy

The energy of the requested method, identical to return_value for energy computations.

scf_one_electron_energy

The one-electron (core Hamiltonian) energy contribution to the total SCF energy.

scf_two_electron_energy

The two-electron energy contribution to the total SCF energy.

scf_vv10_energy

The VV10 functional energy contribution to the total SCF energy.

scf_xc_energy

The functional (XC) energy contribution to the total SCF energy.

scf_dispersion_correction_energy

The dispersion correction appended to an underlying functional when a DFT-D method is requested.

scf_dipole_moment

The X, Y, and Z dipole components.

scf_total_energy

The total electronic energy of the SCF stage of the calculation.

scf_iterations

The number of SCF iterations taken before convergence.

mp2_same_spin_correlation_energy

The portion of MP2 doubles correlation energy from same-spin (i.e. triplet) correlations.

mp2_opposite_spin_correlation_energy

The portion of MP2 doubles correlation energy from opposite-spin (i.e. singlet) correlations.

mp2_singles_energy

The singles portion of the MP2 correlation energy. Zero except in ROHF.

mp2_doubles_energy

The doubles portion of the MP2 correlation energy including

same-spin and opposite-spin correlations.

mp2_total_correlation_energy

The MP2 correlation energy.

mp2_correlation_energy

The MP2 correlation energy.

mp2_total_energy

The total MP2 energy (MP2 correlation energy + HF energy).

mp2_dipole_moment

The MP2 X, Y, and Z dipole components.

ccsd_same_spin_correlation_energy

The portion of CCSD doubles correlation energy from same-spin (i.e. triplet) correlations.

ccsd_opposite_spin_correlation_energy

The portion of CCSD doubles correlation energy from opposite-spin (i.e. singlet) correlations

ccsd_singles_energy

The singles portion of the CCSD correlation energy. Zero except in ROHF.

ccsd_doubles_energy

The doubles portion of the CCSD correlation energy including same-spin and opposite-spin correlations.

ccsd_correlation_energy

The CCSD correlation energy.

ccsd_total_energy

The total CCSD energy (CCSD correlation energy + HF energy).

ccsd_dipole_moment

The CCSD X, Y, and Z dipole components.

ccsd_iterations

The number of CCSD iterations taken before convergence.

ccsd_prt_pr_correlation_energy

The CCSD(T) correlation energy.

ccsd_prt_pr_total_energy

The total CCSD(T) energy (CCSD(T) correlation energy + HF energy).

ccsd_prt_pr_dipole_moment

The CCSD(T) X, Y, and Z dipole components.

ccsd_prt_pr_iterations

The number of CCSD(T) iterations taken before convergence.

ccsdt_correlation_energy

The CCSDT correlation energy.

ccsdt_total_energy

The total CCSDT energy (CCSDT correlation energy + HF energy).

ccsdt_dipole_moment

The CCSDT X, Y, and Z dipole components.

ccsdt_iterations

The number of CCSDT iterations taken before convergence.

ccsdtq_correlation_energy

The CCSDTQ correlation energy.

ccsdtq_total_energy

The total CCSDTQ energy (CCSDTQ correlation energy + HF energy).

ccsdtq_dipole_moment

The CCSDTQ X, Y, and Z dipole components.

ccsdtq_iterations

The number of CCSDTQ iterations taken before convergence.

Error Subschema

The error dict contains the following fields:

Field

Type

IOData attr.

Description

error_type

str

N/A

The type of error raised during the computation.

error_message

str

N/A

Additional information related to the error, such as the backtrace.

extras

dict

N/A

Additional data associated with the error.

Wavefunction subschema

The wavefunction subschema contains the wavefunction properties of a QC computation. All matrices appear in column-major order. The current QCSchema standard provides names for the following wavefunction properties:

https://github.com/evaleev/libint/wiki/using-modern-CPlusPlus-API#solid-harmonic-gaussians-ordering-and-normalization

Field

Description

basis

A qcschema_basis instance for the one-electron AO basis set. AO basis functions are ordered according to the CCA standard as implemented in libint.

restricted

An indicator for a restricted calculation (alpha == beta). When true, all beta quantites are omitted, since quantity_b == quantity_a

h_core_a

Alpha-spin core (one-electron) Hamiltonian.

h_core_b

Beta-spin core (one-electron) Hamiltonian.

h_effective_a

Alpha-spin effective core (one-electron) Hamiltonian.

h_effective_b

Beta-spin effective core (one-electron) Hamiltonian.

scf_orbitals_a

Alpha-spin SCF orbitals.

scf_orbitals_b

Beta-spin SCF orbitals.

scf_density_a

Alpha-spin SCF density matrix.

scf_density_b

Beta-spin SCF density matrix.

scf_fock_a

Alpha-spin SCF Fock matrix.

scf_fock_b

Beta-spin SCF Fock matrix.

scf_eigenvalues_a

Alpha-spin SCF eigenvalues.

scf_eigenvalues_b

Beta-spin SCF eigenvalues.

scf_occupations_a

Alpha-spin SCF orbital occupations.

scf_occupations_b

Beta-spin SCF orbital occupations.

orbitals_a

Keyword for the primary return alpha-spin orbitals.

orbitals_b

Keyword for the primary return beta-spin orbitals.

density_a

Keyword for the primary return alpha-spin density.

density_b

Keyword for the primary return beta-spin density.

fock_a

Keyword for the primary return alpha-spin Fock matrix.

fock_b

Keyword for the primary return beta-spin Fock matrix.

eigenvalues_a

Keyword for the primary return alpha-spin eigenvalues.

eigenvalues_b

Keyword for the primary return beta-spin eigenvalues.

occupations_a

Keyword for the primary return alpha-spin orbital occupations.

occupations_b

Keyword for the primary return beta-spin orbital occupations.

dump_one(f, data)[source]

Dump a single frame into a QCSchema file.

Parameters:
  • f (TextIO) – A writeable file object.

  • data (IOData) – An IOData instance which must have the following attributes initialized: atnums, atcoords, charge, spinpol. If the following attributes are present, they are also dumped into the file: title, atcorenums, atmasses, bonds, g_rot, extra.

Notes

load_one(lit)[source]

Load a single frame from a QCSchema file.

Parameters:

lit (LineIterator) – The line iterator to read the data from.

Returns:

result – A dictionary with IOData attributes. The following attributes are guaranteed to be loaded: atnums, atcorenums, atcoords, charge, nelec, spinpol. The following may be loaded if present in the file: atmasses, bonds, energy, g_rot, lot, obasis, obasis_name, title, extra.

Return type:

dict

Notes

iodata.formats.locpot module

VASP 5 LOCPOT file format.

This format is used by VASP 5.X and VESTA.

Note that even though the CHGCAR and LOCPOT files look very similar, they require different conversions to atomic units.

load_one(lit)[source]

Load a single frame from a VASP 5 LOCPOT file.

Parameters:

lit (LineIterator) – The line iterator to read the data from.

Returns:

result – A dictionary with IOData attributes. The following attributes are guaranteed to be loaded: atcoords, atnums, cellvecs, cube, title.

Return type:

dict

Notes

iodata.formats.mol2 module

MOL2 file format.

There are different formats of mol2 files. Here the compatibility with AMBER software was the main objective to write out files with atomic charges used by antechamber.

dump_many(f, datas)[source]

Dump multiple frames into a MOL2 file.

Parameters:
  • f (TextIO) – A writeable file object.

  • datas (Iterator[IOData]) – An iterator over IOData instances which must have the following attributes initialized: atcoords, atnums, atcharges. If the following attributes are present, they are also dumped into the file: title.

Notes

dump_one(f, data)[source]

Dump a single frame into a MOL2 file.

Parameters:
  • f (TextIO) – A writeable file object.

  • data (IOData) – An IOData instance which must have the following attributes initialized: atcoords, atnums. If the following attributes are present, they are also dumped into the file: atcharges, atffparams, title.

Notes

load_many(lit)[source]

Load multiple frames from a MOL2 file.

Parameters:

lit (LineIterator) – The line iterator to read the data from.

Yields:

result (dict) – A dictionary with IOData attributes. The following attribtues are guaranteed to be loaded: atcoords, atnums, atcharges, atffparams. The following may be loaded if present in the file: title.

Return type:

Iterator[dict]

Notes

load_one(lit)[source]

Load a single frame from a MOL2 file.

Parameters:

lit (LineIterator) – The line iterator to read the data from.

Returns:

result – A dictionary with IOData attributes. The following attributes are guaranteed to be loaded: atcoords, atnums, atcharges, atffparams. The following may be loaded if present in the file: title.

Return type:

dict

Notes

iodata.formats.molden module

Molden file format.

Many QC codes can write out Molden files, e.g. Molpro, Orca, PSI4, Molden, Turbomole. Keep in mind that several of these write incorrect versions of the file format, but these errors are corrected when loading them with IOData.

dump_one(f, data)[source]

Dump a single frame into a Molden file.

Parameters:
  • f (TextIO) – A writeable file object.

  • data (IOData) – An IOData instance which must have the following attributes initialized: atcoords, atnums, mo, obasis. If the following attributes are present, they are also dumped into the file: atcorenums, title.

Notes

load_one(lit, norm_threshold=0.0001)[source]

Load a single frame from a Molden file.

Parameters:
  • lit (LineIterator) – The line iterator to read the data from.

  • norm_threshold (float) – When the normalization of one of the orbitals exceeds norm_threshold, a correction is attempted or an error is raised when no suitable correction can be found.

Returns:

result – A dictionary with IOData attributes. The following attributes are guaranteed to be loaded: atcoords, atnums, atcorenums, mo, obasis. The following may be loaded if present in the file: title.

Return type:

dict

Notes

iodata.formats.molekel module

Molekel file format.

This format is used by two programs: Molekel and Orca.

dump_one(f, data)[source]

Dump a single frame into a Molekel file.

Parameters:
  • f (TextIO) – A writeable file object.

  • data (IOData) – An IOData instance which must have the following attributes initialized: atcoords, atnums, mo, obasis. If the following attributes are present, they are also dumped into the file: atcharges.

Notes

load_one(lit, norm_threshold=0.0001)[source]

Load a single frame from a Molekel file.

Parameters:
  • lit (LineIterator) – The line iterator to read the data from.

  • norm_threshold (float) – When the normalization of one of the orbitals exceeds norm_threshold, a correction is attempted or an error is raised when no suitable correction can be found.

Returns:

result – A dictionary with IOData attributes. The following attributes are guaranteed to be loaded: atcoords, atnums, mo, obasis. The following may be loaded if present in the file: atcharges.

Return type:

dict

Notes

iodata.formats.mwfn module

Multiwfn MWFN file format.

load_one(lit)[source]

Load a single frame from a MWFN file.

Parameters:

lit (LineIterator) – The line iterator to read the data from.

Returns:

result – A dictionary with IOData attributes. The following attributes are guaranteed to be loaded: atcoords, atnums, atcorenums, energy, mo, obasis, extra, title.

Return type:

dict

Notes

iodata.formats.orcalog module

Orca output file format.

load_one(lit)[source]

Load a single frame from a Orca output file.

Parameters:

lit (LineIterator) – The line iterator to read the data from.

Returns:

result – A dictionary with IOData attributes. The following attributes are guaranteed to be loaded: atcoords, atnums, energy, moments, extra.

Return type:

dict

Notes

iodata.formats.pdb module

PDB file format.

There are different formats of pdb files. The convention used here is the last updated one and is described in this link: http://www.wwpdb.org/documentation/file-format-content/format33/v3.3.html

dump_many(f, datas)[source]

Dump multiple frames into a PDB file.

Parameters:
  • f (TextIO) – A writeable file object.

  • datas (Iterator[IOData]) – An iterator over IOData instances which must have the following attributes initialized: atcoords, atnums, extra. If the following attributes are present, they are also dumped into the file: atffparams, title.

Notes

dump_one(f, data)[source]

Dump a single frame into a PDB file.

Parameters:
  • f (TextIO) – A writeable file object.

  • data (IOData) – An IOData instance which must have the following attributes initialized: atcoords, atnums, extra. If the following attributes are present, they are also dumped into the file: atffparams, title, bonds.

Notes

load_many(lit)[source]

Load multiple frames from a PDB file.

Parameters:

lit (LineIterator) – The line iterator to read the data from.

Yields:

result (dict) – A dictionary with IOData attributes. The following attribtues are guaranteed to be loaded: atcoords, atnums, atffparams, extra. The following may be loaded if present in the file: title.

Return type:

Iterator[dict]

Notes

load_one(lit)[source]

Load a single frame from a PDB file.

Parameters:

lit (LineIterator) – The line iterator to read the data from.

Returns:

result – A dictionary with IOData attributes. The following attributes are guaranteed to be loaded: atcoords, atnums, atffparams, extra. The following may be loaded if present in the file: title, bonds.

Return type:

dict

Notes

iodata.formats.poscar module

VASP 5 POSCAR file format.

This format is used by VASP 5.X and VESTA.

dump_one(f, data)[source]

Dump a single frame into a VASP 5 POSCAR file.

Parameters:
  • f (TextIO) – A writeable file object.

  • data (IOData) – An IOData instance which must have the following attributes initialized: atcoords, atnums, cellvecs. If the following attributes are present, they are also dumped into the file: title.

Notes

load_one(lit)[source]

Load a single frame from a VASP 5 POSCAR file.

Parameters:

lit (LineIterator) – The line iterator to read the data from.

Returns:

result – A dictionary with IOData attributes. The following attributes are guaranteed to be loaded: atcoords, atnums, cellvecs, title.

Return type:

dict

Notes

iodata.formats.qchemlog module

Q-Chem Log file format.

This module will load Q-Chem log file into IODATA.

load_one(lit)[source]

Load a single frame from a qchemlog file.

Parameters:

lit (LineIterator) – The line iterator to read the data from.

Returns:

result – A dictionary with IOData attributes. The following attributes are guaranteed to be loaded: atcoords, atmasses, atnums, energy, g_rot, mo, lot, obasis_name, run_type, extra. The following may be loaded if present in the file: athessian.

Return type:

dict

Notes

load_qchemlog_low(lit)[source]

Load the information from Q-Chem log file.

Return type:

dict

iodata.formats.sdf module

SDF file format.

Usually, the different frames in a trajectory describe different geometries of the same molecule, with atoms in the same order. The load_many and dump_many functions below can also handle an SDF file with different molecules, e.g. a molecular database.

The SDF format is somewhat documented on the following page: http://www.nonlinear.com/progenesis/sdf-studio/v0.9/faq/sdf-file-format-guidance.aspx

This format is one of the chemical table file formats: https://en.wikipedia.org/wiki/Chemical_table_file

dump_many(f, datas)[source]

Dump multiple frames into a SDF file.

Parameters:
  • f (TextIO) – A writeable file object.

  • datas (Iterator[IOData]) – An iterator over IOData instances which must have the following attributes initialized: atcoords, atnums. If the following attributes are present, they are also dumped into the file: title, bonds.

Notes

dump_one(f, data)[source]

Dump a single frame into a SDF file.

Parameters:
  • f (TextIO) – A writeable file object.

  • data (IOData) – An IOData instance which must have the following attributes initialized: atcoords, atnums. If the following attributes are present, they are also dumped into the file: title, bonds.

Notes

load_many(lit)[source]

Load multiple frames from a SDF file.

Parameters:

lit (LineIterator) – The line iterator to read the data from.

Yields:

result (dict) – A dictionary with IOData attributes. The following attribtues are guaranteed to be loaded: atcoords, atnums, bonds, title.

Return type:

Iterator[dict]

Notes

load_one(lit)[source]

Load a single frame from a SDF file.

Parameters:

lit (LineIterator) – The line iterator to read the data from.

Returns:

result – A dictionary with IOData attributes. The following attributes are guaranteed to be loaded: atcoords, atnums, bonds, title.

Return type:

dict

Notes

iodata.formats.wfn module

Gaussian/GAMESS-US WFN file format.

Only use this format if the program that generated it does not offer any alternatives that HORTON can load. The WFN format has the disadvantage that it cannot represent contractions and therefore expands all orbitals into a decontracted basis. This makes the post-processing less efficient compared to formats that do support contractions of Gaussian functions.

build_obasis(icenters, type_assignments, exponents, lit)[source]

Construct a basis set using the arrays read from a WFN or WFX file.

Parameters:
  • icenters (ndarray) – The center indices for all basis functions. shape=(nbasis,). Lowest index is zero.

  • type_assignments (ndarray) – Integer codes for basis function names. shape=(nbasis,). Lowest index is zero.

  • exponents (ndarray) – The Gaussian exponents of all basis functions. shape=(nbasis,)

Return type:

Tuple[MolecularBasis, ndarray]

dump_one(f, data)[source]

Dump a single frame into a WFN file.

Parameters:
  • f (TextIO) – A writeable file object.

  • data (IOData) – An IOData instance which must have the following attributes initialized: atcoords, atnums, energy, mo, obasis, title, extra.

Return type:

None

Notes

get_mocoeff_scales(obasis)[source]

Get the L2-normalization of the un-normalized Cartesian basis functions.

Parameters:

obasis (MolecularBasis) – The molecular orbital basis.

Returns:

Scaling factors to be multiplied into the molecular orbital coefficients.

Return type:

scales

load_one(lit)[source]

Load a single frame from a WFN file.

Parameters:

lit (LineIterator) – The line iterator to read the data from.

Returns:

result – A dictionary with IOData attributes. The following attributes are guaranteed to be loaded: atcoords, atnums, energy, mo, obasis, title, extra.

Return type:

dict

Notes

load_wfn_low(lit)[source]

Load data from a WFN file into arrays.

Parameters:

lit (LineIterator) – The line iterator to read the data from.

Return type:

Tuple

iodata.formats.wfx module

AIM/AIMAll WFX file format.

See http://aim.tkgristmill.com/wfxformat.html

dump_one(f, data)[source]

Dump a single frame into a WFX file.

Parameters:
  • f (TextIO) – A writeable file object.

  • data (IOData) – An IOData instance which must have the following attributes initialized: atcoords, atnums, atcorenums, mo, obasis, charge. If the following attributes are present, they are also dumped into the file: title, energy, spinpol, lot, atgradient, extra.

Notes

load_data_wfx(lit)[source]

Process loaded WFX data.

Return type:

dict

load_one(lit)[source]

Load a single frame from a WFX file.

Parameters:

lit (LineIterator) – The line iterator to read the data from.

Returns:

result – A dictionary with IOData attributes. The following attributes are guaranteed to be loaded: atcoords, atgradient, atnums, energy, extra, mo, obasis, title.

Return type:

dict

Notes

parse_wfx(lit, required_tags=None)[source]

Load data in all sections existing in the given WFX file LineIterator.

Return type:

dict

iodata.formats.xyz module

XYZ file format.

Usually, the different frames in a trajectory describe different geometries of the same molecule, with atoms in the same order. The load_many and dump_many functions below can also handle an XYZ with different molecules, e.g. a molecular database.

The load_* and dump_* functions all accept the optional argument atom_columns. This argument fixes the meaning of the columns to be loaded from or dumped to an XYZ file. The following example defines, in addition to the conventional columns, also a column with atomic charges and three columns with atomic forces.

atom_columns = iodata.formats.xyz.DEFAULT_ATOM_COLUMNS + [
    # Atomic charges are stored in a dictionary atcharges and they key
    # refers to the name of the partitioning method.
    ("atcharges", "mulliken", (), float, float, "{:10.5f}".format),
    # Note that in IOData, the energy gradient is stored, which contains the
    # negative forces.
    ("atgradient", None, (3,), float,
     (lambda word: -float(word)),
     (lambda value: "{:15.10f}".format(-value)))
]

mol = load_one("test.xyz", atom_columns=atom_columns)
# The following attributes are present:
print(mol.atnums)
print(mol.atcoords)
print(mol.atcharges["mulliken"])
print(mol.atgradient)

When defining atom_columns, no columns can be skipped, such that all information loaded from a file can also be written back out when dumping it.

dump_many(f, datas, atom_columns=None)[source]

Dump multiple frames into a XYZ file.

Parameters:
  • f (TextIO) – A writeable file object.

  • datas (Iterator[IOData]) – An iterator over IOData instances which must have the following attributes initialized: atcoords, atnums. If the following attributes are present, they are also dumped into the file: title.

  • atom_columns – A list of atomic fields to be loaded. Each field as a tuple with the following items: attribute (str), key (None or str, when str the IOData attribute is a dict), shape for one atom (tuple), dtype, load_word (function taking string and returning a value with the correct type), dump_word (function taking a value and returning a formatted string).

Notes

dump_one(f, data, atom_columns=None)[source]

Dump a single frame into a XYZ file.

Parameters:
  • f (TextIO) – A writeable file object.

  • data (IOData) – An IOData instance which must have the following attributes initialized: atcoords, atnums. If the following attributes are present, they are also dumped into the file: title.

  • atom_columns – A list of atomic fields to be loaded. Each field as a tuple with the following items: attribute (str), key (None or str, when str the IOData attribute is a dict), shape for one atom (tuple), dtype, load_word (function taking string and returning a value with the correct type), dump_word (function taking a value and returning a formatted string).

Notes

load_many(lit, atom_columns=None)[source]

Load multiple frames from a XYZ file.

Parameters:
  • lit (LineIterator) – The line iterator to read the data from.

  • atom_columns – A list of atomic fields to be loaded. Each field as a tuple with the following items: attribute (str), key (None or str, when str the IOData attribute is a dict), shape for one atom (tuple), dtype, load_word (function taking string and returning a value with the correct type), dump_word (function taking a value and returning a formatted string).

Yields:

result (dict) – A dictionary with IOData attributes. The following attribtues are guaranteed to be loaded: atcoords, atnums, title.

Return type:

Iterator[dict]

Notes

load_one(lit, atom_columns=None)[source]

Load a single frame from a XYZ file.

Parameters:
  • lit (LineIterator) – The line iterator to read the data from.

  • atom_columns – A list of atomic fields to be loaded. Each field as a tuple with the following items: attribute (str), key (None or str, when str the IOData attribute is a dict), shape for one atom (tuple), dtype, load_word (function taking string and returning a value with the correct type), dump_word (function taking a value and returning a formatted string).

Returns:

result – A dictionary with IOData attributes. The following attributes are guaranteed to be loaded: atcoords, atnums, title.

Return type:

dict

Notes

Module contents
iodata.inputs package
Submodules
iodata.inputs.common module

Utilities for writing input files.

populate_fields(data)[source]

Generate a dictionary with fields to replace in the template.

Return type:

dict

iodata.inputs.gaussian module

Gaussian Input Module.

write_input(f, data, template=None, **kwargs)[source]

Write a GAUSSIAN input file.

Parameters:
  • f (TextIO) – A writeable file object.

  • data (IOData) – An IOData instance which must have the following attributes initialized: atnums, atcoords. If the following attributes are present, they are also written into the file: title, run_type, lot, obasis_name, spinmult, charge. If these attributes are not assigned, internal default values are used.

  • template (Optional[str]) – A template input string.

Notes

iodata.inputs.orca module

Orca Input Module.

write_input(f, data, template=None, **kwargs)[source]

Write a ORCA input file.

Parameters:
  • f (TextIO) – A writeable file object.

  • data (IOData) – An IOData instance which must have the following attributes initialized: atnums, atcoords. If the following attributes are present, they are also written into the file: title, run_type, lot, obasis_name, spinmult, charge. If these attributes are not assigned, internal default values are used.

  • template (Optional[str]) – A template input string.

Notes

Module contents
iodata.test package
Subpackages
iodata.test.data package
Module contents
Submodules
iodata.test.common module
iodata.test.test_attrutils module
iodata.test.test_basis module
iodata.test.test_charmm module

Test iodata.formats.orcalog module.

test_load_crambin()[source]
iodata.test.test_chgcar module

Test iodata.formats.chgcar module.

test_load_chgcar_oxygen()[source]
test_load_chgcar_water()[source]
iodata.test.test_cli module

Unit tests for iodata.__main__.

test_convert_many_autofmt(tmpdir)[source]
test_convert_many_manfmt(tmpdir)[source]
test_convert_one_autofmt(tmpdir)[source]
test_convert_one_manfmt(tmpdir)[source]
test_script_many_autofmt(tmpdir)[source]
test_script_many_manfmt(tmpdir)[source]
test_script_one_autofmt(tmpdir)[source]
test_script_one_manfmt(tmpdir)[source]
iodata.test.test_cp2klog module
iodata.test.test_cube module

Test iodata.formats.cube module.

test_load_aelta()[source]
test_load_dump_ch4_6points(tmpdir)[source]
test_load_dump_h2o_5points(tmpdir)[source]
test_load_dump_load_aelta(tmpdir)[source]
test_load_dump_nh3_7points(tmpdir)[source]
iodata.test.test_extxyz module

Test iodata.formats.extxyz module.

test_load_fcc_extended()[source]
test_load_many_extended()[source]
test_load_mgo()[source]
iodata.test.test_fchk module
iodata.test.test_fcidump module

Test iodata.formats.fcidump module.

test_dump_load_fcidimp_consistency_ao(tmpdir)[source]
test_load_fcidump_molpro_h2()[source]
test_load_fcidump_psi4_h2()[source]
iodata.test.test_gamess module

Test iodata.formats.gamess module.

test_load_one_gamess_punch()[source]
iodata.test.test_gaussianinput module

Test iodata.formats.gaussianinput module.

check_water(mol, title)[source]

Test water molecule attributes.

test_load_error()[source]
test_load_multi_route()[source]
test_load_multi_title()[source]
test_load_water_com()[source]
test_load_water_gjf()[source]
iodata.test.test_gaussianlog module

Test iodata.formats.log module.

load_log_helper(fn_log)[source]

Load a testing Gaussian log file with iodata.load_one.

test_load_operators_water_ccpvdz_pure_hf_g03()[source]
test_load_operators_water_sto3g_hf_g03()[source]
iodata.test.test_gromacs module

Test iodata.formats.gromacs module.

check_water(mol)[source]

Test some things on a water file.

test_load_many()[source]
test_load_water()[source]
iodata.test.test_inputs module

Test iodata.inputs module.

check_load_input_and_compare(fname, fname_expected)[source]

Load saved input file and compare to expected input file.

Parameters:
  • fname (str) – Path to input file name to load.

  • fname_expected (str) – Path to expected input file to load.

test_input_gaussian_from_fchk(tmpdir)[source]
test_input_gaussian_from_iodata(tmpdir)[source]
test_input_gaussian_from_xyz(tmpdir)[source]
test_input_orca_from_iodata(tmpdir)[source]
test_input_orca_from_molden(tmpdir)[source]
test_input_orca_from_xyz(tmpdir)[source]
iodata.test.test_iodata module
iodata.test.test_json module
iodata.test.test_locpot module

Test iodata.formats.locpot module.

test_load_locpot_oxygen()[source]
iodata.test.test_mol2 module
iodata.test.test_molden module
iodata.test.test_molekel module
iodata.test.test_mwfn module

Test iodata.formats.mwfn module.

load_helper(fn)[source]

Load a test file with iodata.iodata.load_one.

test_load_mwfn_ch3_hf_g03()[source]
test_load_mwfn_ch3_rohf_g03()[source]
test_load_mwfn_he_spdfgh_g03()[source]
test_nelec_charge()[source]
iodata.test.test_orbitals module
iodata.test.test_orcalog module

Test iodata.formats.orcalog module.

test_load_water_number()[source]
iodata.test.test_overlap module
iodata.test.test_pdb module
iodata.test.test_poscar module

Test iodata.formats.poscar module.

test_load_dump_consistency(tmpdir)[source]
test_load_poscar_cubicbn_cartesian()[source]
test_load_poscar_cubicbn_direct()[source]
test_load_poscar_water()[source]
iodata.test.test_qchemlog module

Test iodata.formats.qchemlog module.

test_load_one_h2o_dimer_eda2()[source]

Test load_one with h2o_dimer_eda_qchem5.3.out.

test_load_one_qchemlog_freq()[source]
test_load_qchemlog_low_h2o()[source]

Test load_qchemlog_low with water_hf_ccpvtz_freq_qchem.out.

test_load_qchemlog_low_qchemlog_h2o_dimer_eda2()[source]

Test load_qchemlog_low with h2o_dimer_eda_qchem5.3.out.

iodata.test.test_sdf module
iodata.test.test_utils module

Unit tests for iodata.utils.

test_amu()[source]
iodata.test.test_wfn module
iodata.test.test_wfx module
iodata.test.test_xyz module

Test iodata.formats.xyz module.

check_load_dump_consistency(tmpdir, fn, atom_columns=None)[source]

Check if dumping and loading an XYZ file results in the same data.

check_water(mol)[source]

Test some things on a water file.

test_dump_xyz_fcc(tmpdir)[source]
test_dump_xyz_water_element(tmpdir)[source]
test_dump_xyz_water_number(tmpdir)[source]
test_load_dump_consistency(tmpdir)[source]
test_load_dump_many_consistency(tmpdir)[source]
test_load_fcc_columns()[source]
test_load_many()[source]
test_load_many_dataset_emptylines()[source]
test_load_water_element()[source]
test_load_water_number()[source]
Module contents
Submodules
iodata.api module

Functions to be used by end users.

dump_many(iodatas, filename, fmt=None, **kwargs)[source]

Write multiple IOData instances to a file.

This routine uses the extension or prefix of the filename to determine the file format. For each file format, a specialized function is called that does the real work.

Parameters:
  • iodatas (Iterator[IOData]) – An iterator over IOData instances.

  • filename (str) – The file to write the data to.

  • fmt (Optional[str]) – The name of the file format module to use.

  • **kwargs – Keyword arguments are passed on to the format-specific dump_many function.

dump_one(iodata, filename, fmt=None, **kwargs)[source]

Write data to a file.

This routine uses the extension or prefix of the filename to determine the file format. For each file format, a specialized function is called that does the real work.

Parameters:
  • iodata (IOData) – The object containing the data to be written.

  • filename (str) – The file to write the data to.

  • fmt (Optional[str]) – The name of the file format module to use. When not given, it is guessed from the filename.

  • **kwargs – Keyword arguments are passed on to the format-specific dump_one function.

load_many(filename, fmt=None, **kwargs)[source]

Load multiple IOData instances from a file.

This function uses the extension or prefix of the filename to determine the file format. When the file format is detected, a specialized load function is called for the heavy lifting.

Parameters:
  • filename (str) – The file to load data from.

  • fmt (Optional[str]) – The name of the file format module to use. When not given, it is guessed from the filename.

  • **kwargs – Keyword arguments are passed on to the format-specific load_many function.

Yields:

out – An instance of IOData with data for one frame loaded for the file.

Return type:

Iterator[IOData]

load_one(filename, fmt=None, **kwargs)[source]

Load data from a file.

This function uses the extension or prefix of the filename to determine the file format. When the file format is detected, a specialized load function is called for the heavy lifting.

Parameters:
  • filename (str) – The file to load data from.

  • fmt (Optional[str]) – The name of the file format module to use. When not given, it is guessed from the filename.

  • **kwargs – Keyword arguments are passed on to the format-specific load_one function.

Returns:

The instance of IOData with data loaded from the input files.

Return type:

out

write_input(iodata, filename, fmt, template=None, **kwargs)[source]

Write input file using an instance of IOData for the specified software format.

Parameters:
  • iodata (IOData) – An IOData instance containing the information needed to write input.

  • filename (str) – The input file name.

  • fmt (str) – The name of the software for which input file is generated.

  • template (Optional[str]) – The template input string.

  • **kwargs – Keyword arguments are passed on to the input-specific write_input function.

iodata.attrutils module

Utilities for building attr classes.

convert_array_to(dtype)[source]

Return a function to convert arrays to the given type.

validate_shape(*shape_requirements)[source]

Return a validator for the shape of an array or the length of an iterable.

Parameters:

shape_requirements (tuple) – Specifications for the required shape. Every item of the tuple describes the required size of the corresponding axis of an array. Also the number of items should match the dimensionality of the array. When the validator is used for general iterables, this tuple should contain just one element. Possible values for each item are explained in the “Notes” section below.

Returns:

A validator function for the attr library.

Return type:

validator

Notes

Every element of shape_requirements defines the expected size of an array along the corresponding axis. An item in this tuple at position (or index) i can be one of the following:

  1. An integer, which is taken as the expected size along axis i.

  2. None. In this case, the size of the array along axis i is not checked.

  3. A string, which should be the name of another integer attribute with the expected size along axis i. The other attribute is always an attribute of the same object as the attribute being checked.

  4. A 2-tuple containing a name and an integer. In this case, the name refers to another attribute which is an array or an iterable. When the integer is 0, just the length of the other attribute is used. When the integer is non-zero, the other attribute must be an array and the integer selects an axis. The size of the other array along the selected axis is then used as the expected size of the array being checked along axis i.

iodata.basis module

Utility functions for working with basis sets.

Notes

Basis set conventions and terminology are documented in Basis set conventions.

class MolecularBasis(shells, conventions, primitive_normalization)[source]

Bases: object

A complete molecular orbital or density basis set.

shells

A list of objects of type Shell which can support generalized contractions.

conventions

A dictionary specifying the ordered basis functions for a given angular momentum and kind. The key is a tuple of angular momentum integer and kind character (‘c’ for Cartesian and ‘p’ for pure/spherical) and the value is a list of basis function strings. For example,

{
    ### Conventions for Cartesian functions
    # E.g., alphabetically ordered Cartesian functions.
    (0, 'c'): ['1'],
    (1, 'c'): ['x', 'y', 'z'],
    (2, 'c'): ['xx', 'xy', 'xz', 'yy', 'yz', 'zz'],
    ### Conventions for pure functions.
    # The notation is referring to real solid spherical harmonics.
    # See https://en.wikipedia.org/wiki/Solid_harmonics#Real_form
    # 'c{m}' = solid harmonic containing cos(m phi)
    # 's{m}' = solid harmonic containing sin(m phi)
    # where m is the magnetic quantum number and phi is the
    # azimuthal angle.
    # For example, wikipedia-ordered real spherical harmonics,
    # see https://en.wikipedia.org/wiki/Spherical_harmonics#Real_form
    (2, 'p'): ['s2', 's1', 'c0', 'c1', 'c2'],
    # Different quantum-chemistry codes may use incompatible
    # orderings and sign conventions. E.g. Molden files written
    # by ORCA use the following convention for pure f functions:
    (3, 'p'): ['c0', 'c1', 's1', 'c2', 's2', '-c3', '-s3'],
    # Note that the minus sign in the last two basis functions
    # denotes that the signs of these harmonics have been changed.
}

The basis function strings in the conventions dictionary are documented in Basis set conventions.

primitive_normalization

The normalization convention of primitives, which can be ‘L2’ (orbitals) or ‘L1’ (densities) normalized.

__init__(shells, conventions, primitive_normalization)

Method generated by attrs for class MolecularBasis.

conventions: Dict[str, str]
get_segmented()[source]

Unroll generalized contractions.

property nbasis: int

Number of basis functions.

primitive_normalization: str
shells: List[Shell]
class Shell(icenter, angmoms, kinds, exponents, coeffs)[source]

Bases: object

A shell in a molecular basis representing (generalized) contractions with the same exponents.

icenter

An integer index specifying the row in the atcoords array of IOData object.

angmoms

An integer array of angular momentum quantum numbers, non-negative, with shape (ncon,).

kinds

List of strings describing the kind of contractions: ‘c’ for Cartesian and ‘p’ for pure. Pure functions are only allowed for angmom>1. The length equals the number of contractions: len(angmoms)=ncon.

exponents

The array containing the exponents of the primitives, with shape (nprim,).

coeffs

The array containing the coefficients of the normalized primitives in each contraction; shape = (nprim, ncon). These coefficients assume that the primitives are L2 (orbitals) or L1 (densities) normalized, but contractions are not necessarily normalized. (This depends on the code which generated the contractions.)

__init__(icenter, angmoms, kinds, exponents, coeffs)

Method generated by attrs for class Shell.

angmoms: List[int]
coeffs: ndarray
exponents: ndarray
icenter: int
kinds: List[str]
property nbasis: int

Number of basis functions (e.g. 3 for a P shell and 4 for an SP shell).

property ncon: int

Number of contractions. This is usually 1; e.g., it would be 2 for an SP shell.

property nprim: int

Number of primitives, also known as the contraction length.

angmom_its(angmom)[source]

Convert an angular momentum from integer to string representation.

Parameters:

angmom (Union[int, List[int]]) – The integer representation of the angular momentum.

Returns:

The string representation of the angular momentum. If a list of integer angmom is given, a list of str is returned.

Return type:

char

angmom_sti(char)[source]

Convert an angular momentum from string to integer format.

Parameters:

char (Union[str, List[str]]) – Character representation of angular momentum, (s, p, d, …)

Returns:

An integer representation of the angular momentum. If a list of str char is given, a list of integers in returned.

Return type:

angmom

convert_convention_shell(conv1, conv2, reverse=False)[source]

Return a permutation vector and sign changes to convert from 1 to 2.

The transformation from convention 1 to convention 2 can be done applying the results of this function as follows:

vector2 = vector1[permutation]*signs

When using the option reverse=True, one can use the results to convert in the opposite sense:

vector1 = vector2[permutation]*signs
Parameters:
  • conv1 (List[str]) – Two lists, with the same strings (in different order), where each string may be prefixed with a ‘-‘.

  • conv2 (List[str]) – Two lists, with the same strings (in different order), where each string may be prefixed with a ‘-‘.

  • reverse – When, true the conversion from 2 to 1 is returned.

Return type:

Tuple[ndarray, ndarray]

Returns:

  • permutation – An integer array that permutes basis function from 1 to 2.

  • signs – Sign changes when going from 1 to 2, must be applied after permutation

convert_conventions(molbasis, new_conventions, reverse=False)[source]

Return a permutation vector and sign changes to convert from 1 to 2.

The transformation from molbasis.convention to the new convention can be done applying the results of this function as follows:

vector2 = vector1[permutation]*signs

When using the option reverse=True, one can use the results to convert in the opposite sense:

vector1 = vector2[permutation]*signs
Parameters:
  • molbasis (MolecularBasis) – The description of a molecular basis set.

  • new_conventions (Dict[str, List[str]]) – The new conventions for ordering and signs, to which data for the orbital basis needs to be converted.

  • reverse – When, true the conversion from 2 to 1 is returned.

Return type:

Tuple[ndarray, ndarray]

Returns:

  • permutation – An integer array that permutes basis function from 1 to 2.

  • signs – Sign changes when going from 1 to 2, must be applied after permutation

get_default_conventions()[source]

Produce conventions dictionaries compatible with HORTON2 and CCA.

Do not change this! Both conventions are also used by several file formats from other QC codes.

Common Component Architecture (CCA) conventions are defined in appendix B of the following article:

Kenny, J. P.; Janssen, C. L.; Valeev, E. F.; Windus, T. L. Components for Integral Evaluation in Quantum Chemistry: Components for Integral Evaluation in Quantum Chemistry. J. Comput. Chem. 2008, 29 (4), 562–577. https://doi.org/10.1002/jcc.20815.

The ordering of the spherical harmonics within one shell is rather vague in appendix B and a more precise description is given on the LibInt Wiki:

https://github.com/evaleev/libint/wiki/using-modern-CPlusPlus-API

Return type:

Tuple[Dict, Dict]

Returns:

  • horton2_conventions – A conventions dictionary for HORTON2, of which parts are used by various file formats.

  • cca_conventions – A conventions dictionary compatible with the Common Component Architecture (CCA).

iter_cart_alphabet(n)[source]

Loop over powers of Cartesian basis functions in alphabetical order.

See https://theochem.github.io/horton/2.1.1/tech_ref_gaussian_basis.html for details.

Parameters:

n (int) – The angular momentum, i.e. sum of Cartesian powers in this case.

Return type:

ndarray

iodata.docstrings module

Docstring decorators for file format implementations.

document_dump_many(fmt, required, optional=None, kwdocs={}, notes=None)[source]

Decorate a dump_many function to generate a docstring.

Parameters:
  • fmt (str) – The name of the file format.

  • required (List[str]) – A list of mandatory IOData attributes needed to write the file.

  • optional (Optional[List[str]]) – A list of optional IOData attributes which can be include when writing the file.

  • kwdocs (Dict[str, str]) – A dictionary with documentation for keyword arguments. Each key is a keyword argument name and the corresponding value is text explaining the argument.

  • notes (Optional[str]) – Additional information to be added to the docstring.

Returns:

A decorator function.

Return type:

decorator

document_dump_one(fmt, required, optional=None, kwdocs={}, notes=None)[source]

Decorate a dump_one function to generate a docstring.

Parameters:
  • fmt (str) – The name of the file format.

  • required (List[str]) – A list of mandatory IOData attributes needed to write the file.

  • optional (Optional[List[str]]) – A list of optional IOData attributes which can be include when writing the file.

  • kwdocs (Dict[str, str]) – A dictionary with documentation for keyword arguments. Each key is a keyword argument name and the corresponding value is text explaining the argument.

  • notes (Optional[str]) – Additional information to be added to the docstring.

Returns:

A decorator function.

Return type:

decorator

document_load_many(fmt, guaranteed, ifpresent=None, kwdocs={}, notes=None)[source]

Decorate a load_many function to generate a docstring.

Parameters:
  • fmt (str) – The name of the file format.

  • guaranteed (List[str]) – A list of IOData attributes this format can certainly read.

  • ifpresent (Optional[List[str]]) – A list of IOData attributes this format reads of present in the file.

  • kwdocs (Dict[str, str]) – A dictionary with documentation for keyword arguments. Each key is a keyword argument name and the corresponding value is text explaining the argument.

  • notes (Optional[str]) – Additional information to be added to the docstring.

Returns:

A decorator function.

Return type:

decorator

document_load_one(fmt, guaranteed, ifpresent=None, kwdocs={}, notes=None)[source]

Decorate a load_one function to generate a docstring.

Parameters:
  • fmt (str) – The name of the file format.

  • guaranteed (List[str]) – A list of IOData attributes this format can certainly read.

  • ifpresent (Optional[List[str]]) – A list of IOData attributes this format reads of present in the file.

  • kwdocs (Dict[str, str]) – A dictionary with documentation for keyword arguments. Each key is a keyword argument name and the corresponding value is text explaining the argument.

  • notes (Optional[str]) – Additional information to be added to the docstring.

Returns:

A decorator function.

Return type:

decorator

document_write_input(fmt, required, optional=None, kwdocs={}, notes=None)[source]

Decorate a write_input function to generate a docstring.

Parameters:
  • fmt (str) – The name of the file format.

  • required (List[str]) – A list of mandatory IOData attributes needed to write the file.

  • optional (Optional[List[str]]) – A list of optional IOData attributes which can be include when writing the file.

  • kwdocs (Dict[str, str]) – A dictionary with documentation for keyword arguments. Each key is a keyword argument name and the corresponding value is text explaining the argument.

  • notes (Optional[str]) – Additional information to be added to the docstring.

Returns:

A decorator function.

Return type:

decorator

iodata.iodata module

Module for handling input/output from different file formats.

class IOData(atcharges={}, atcoords=None, atcorenums=None, atffparams={}, atfrozen=None, atgradient=None, athessian=None, atmasses=None, atnums=None, basisdef=None, bonds=None, cellvecs=None, charge=None, core_energy=None, cube=None, energy=None, extcharges=None, extra={}, g_rot=None, lot=None, mo=None, moments={}, nelec=None, obasis=None, obasis_name=None, one_ints={}, one_rdms={}, run_type=None, spinpol=None, title=None, two_ints={}, two_rdms={})[source]

Bases: object

A container class for data loaded from (or to be written to) a file.

In principle, the constructor accepts any keyword argument, which is stored as an attribute. All attributes are optional. Attributes can be set are removed after the IOData instance is constructed. The following attributes are supported by at least one of the io formats:

atcharges

A dictionary where keys are names of charge definitions and values are arrays with atomic charges (size N).

atcoords

A (N, 3) float array with Cartesian coordinates of the atoms.

atcorenums

A (N,) float array with pseudo-potential core charges. The matrix elements corresponding to ghost atoms are zero.

atffparams

A dictionary with arrays of atomic force field parameters (typically non-bonded). Keys include ‘charges’, ‘vdw_radii’, ‘sigmas’, ‘epsilons’, ‘alphas’ (atomic polarizabilities), ‘c6s’, ‘c8s’, ‘c10s’, ‘buck_as’, ‘buck_bs’, ‘lj_as’, ‘core_charges’, ‘valence_charges’, ‘valence_widths’, etc. Not all of them have to be present, depending on the use case.

atfrozen

A (N,) bool array with frozen atoms. (All atoms are free if this attribute is not set.)

atgradient

A (N, 3) float array with the first derivatives of the energy w.r.t. Cartesian atomic displacements.

athessian

A (3*N, 3*N) array containing the energy Hessian w.r.t Cartesian atomic displacements.

atmasses

A (N,) float array with atomic masses

atnums

A (N,) int vector with the atomic numbers.

basisdef

A basis set definition, i.e. a dictionary whose keys are symbols (of chemical elements), atomic numbers (similar to previous, str to make distinction with following) or an atom index (integer referring to a specific atom in a molecule). The format of the values is to be decided when implementing a load function for basis set definitions.

bonds

An (nbond, 3) array with the list of covalent bonds. Each row represents one bond and consists of three integers: first atom index (starting from zero), second atom index & an optional bond type. Numerical values of bond types are defined in iodata.periodic.

cellvecs

A (NP, 3) array containing the (real-space) cell vectors describing periodic boundary conditions. A single vector corresponds to a 1D cell, e.g. for a wire. Two vectors describe a 2D cell, e.g. for a membrane. Three vectors describe a 3D cell, e.g. a crystalline solid.

charge

The net charge of the system. When possible, this is derived from atcorenums and nelec.

core_energy

The Hartree-Fock energy due to the core orbitals

cube

An instance of Cube, describing the volumetric data from a cube (or similar) file.

energy

The total energy (electronic + nn)

extcharges

Array with values of external charges, with shape (nextcharge, 4). First three columns for Cartesian X, Y and Z coordinates, last column for the actual charge.

extra

A dictionary with additional data loaded from a file. Any data which cannot be assigned to the other attributes belongs here. It may be decided in future to move some of the results from this dictionary to IOData attributes, with a more final name.

g_rot

The rotational symmetry number of the molecule.

lot

The level of theory used to compute the orbitals (and other properties).

mo

An instance of MolecularOrbitals.

moments

A dictionary with electrostatic multipole moments. Keys are (angmom, kind) tuples where angmom is an integer for the angular momentum and kind is ‘c’ for Cartesian or ‘p’ for pure functions (only for angmom >= 2). The corresponding values are 1D numpy arrays. The order of the components of the multipole moments follows the HORTON2_CONVENTIONS from iodata/basis.py

nelec

The number of electrons.

obasis

An OrderedDict containing parameters to instantiate a GOBasis class.

obasis_name

A name or DOI describing the basis set used for the orbitals in the mo attribute (if applicable). Should be consistent with www.basissetexchange.org.

one_ints

Dictionary where keys are names and values are numpy arrays with one-body operators, typically integrals of a one-body operator with a pair of (Gaussian) basis functions. Names can start with olp (overlap), kin (kinetic energy), na (nuclear attraction), core (core hamiltonian), etc., or one (general one-electron integral). When relevant, these names must have a suffix _ao or _mo to clarify in which basis the integrals are computed. _ao is used to denote integrals in a non-orthogonal (atomic orbital) basis. _mo is used to denote an orthogonal (molecular orbital) basis. For the overlap integrals, this suffix can be omitted because it is only useful to compute them in the atomic-orbital basis.

one_rdms

Dictionary where keys are names and values are one-particle density matrices. Names can be scf, post_scf, scf_spin, post_scf_spin. When relevant, these names must have a suffix _ao or _mo to clarify in which basis the RDMs are computed. _ao is used to denote a non-orthogonal (atomic orbital) basis. _mo is used to denote an orthogonal (molecular orbital) basis. For the SCF RDMs, this suffix can be omitted because it is only useful to compute them in the atomic-orbital basis.

run_type

The type of calculation that lead to the results stored in IOData, which must be one of the following: ‘energy’, ‘energy_force’, ‘opt’, ‘scan’, ‘freq’ or None.

spinpol

The spin polarization. By default, its value is derived from the molecular orbitals (mo attribute), as abs(nalpha - nbeta). In this case, spinpol cannot be set. When no molecular orbitals are present, this attribute can be set.

title

A suitable name for the data.

two_ints

Dictionary where keys are names and values are numpy arrays with two-body operators, typically integrals of two-body operator with four of (Gaussian) basis functions. Names can start with er (electron repulsion) or two (general pairswise interaction). When relevant, these names must have a suffix _ao or _mo to clarify in which basis the integrals are computed. See one_ints for more details. Array indexes are in physicist’s notation.

two_rdms

Dictionary where keys are names and values are two-particle density matrices. Names can be post_scf or post_scf_spin. When relevant, these names must have a suffix _ao or _mo to clarify in which basis the RDMs are computed. See one_rdms for more details. Array indexes are in physicist’s notation.

__init__(atcharges={}, atcoords=None, atcorenums=None, atffparams={}, atfrozen=None, atgradient=None, athessian=None, atmasses=None, atnums=None, basisdef=None, bonds=None, cellvecs=None, charge=None, core_energy=None, cube=None, energy=None, extcharges=None, extra={}, g_rot=None, lot=None, mo=None, moments={}, nelec=None, obasis=None, obasis_name=None, one_ints={}, one_rdms={}, run_type=None, spinpol=None, title=None, two_ints={}, two_rdms={})

Method generated by attrs for class IOData.

atcharges: dict
atcoords: ndarray
property atcorenums: ndarray

Return effective core charges.

atffparams: dict
atfrozen: ndarray
atgradient: ndarray
athessian: ndarray
atmasses: ndarray
atnums: ndarray
basisdef: str
bonds: ndarray
cellvecs: ndarray
property charge: float

Return the net charge of the system.

core_energy: float
cube: Cube
energy: float
extcharges: ndarray
extra: dict
g_rot: float
lot: str
mo: MolecularOrbitals
moments: dict
property natom: int

Return the number of atoms.

property nelec: float

Return the number of electrons.

obasis: MolecularBasis
obasis_name: str
one_ints: dict
one_rdms: dict
run_type: str
property spinpol: float

Return the spin polarization.

Warning: for restricted wavefunctions, it is assumed that an occupation number in ]0, 2[ implies spin polarizaiton, which may not always be a valid assumption.

title: str
two_ints: dict
two_rdms: dict
iodata.orbitals module

Data structure for molecular orbitals.

class MolecularOrbitals(kind, norba, norbb, occs=None, coeffs=None, energies=None, irreps=None)[source]

Bases: object

Class of Orthonormal Molecular Orbitals.

kind

Type of molecular orbitals, which can be ‘restricted’, ‘unrestricted’, or ‘generalized’.

norba

Number of (occupied and virtual) alpha molecular orbitals. Set to None in case oftype==’generalized’.

norbb

Number of (occupied and virtual) beta molecular orbitals. Set to None in case of type==’generalized’. This is expected to be equal to norba for the restricted kind.

occs

Molecular orbital occupation numbers. The length equals the number of columns of coeffs.

coeffs

Molecular orbital coefficients. In case of restricted: shape = (nbasis, norba) = (nbasis, norbb). In case of unrestricted: shape = (nbasis, norba + norbb). In case of generalized: shape = (2 * nbasis, norb), where norb is the total number of orbitals.

energies

Molecular orbital energies. The length equals the number of columns of coeffs.

irreps

Irreducible representation. The length equals the number of columns of coeffs.

Warning
Type:

the interpretation of the occupation numbers may only be suitable

for single-reference orbitals (not fractionally occupied natural orbitals.)
When an occupation number is in ]0, 1], it is assumed that an alpha orbital
is (fractionally) occupied. When an occupation number is in ]1, 2], it is
assumed that the alpha orbital is fully occupied and the beta orbital is
(fractionally) occupied.
__init__(kind, norba, norbb, occs=None, coeffs=None, energies=None, irreps=None)

Method generated by attrs for class MolecularOrbitals.

coeffs: ndarray
property coeffsa

Return alpha orbital coefficients.

property coeffsb

Return beta orbital coefficients.

energies: ndarray
property energiesa

Return alpha orbital energies.

property energiesb

Return beta orbital energies.

irreps: ndarray
property irrepsa

Return alpha irreps.

property irrepsb

Return beta irreps.

kind: str
property nbasis

Return the number of spatial basis functions.

property nelec: float

Return the total number of electrons.

property norb

Return the number of spatially distinct orbitals.

Notes

In case of restricted wavefunctions, this may be less than just the sum of norba and norbb, because alpha and beta orbitals share the same spatical dependence.

norba: int
norbb: int
occs: ndarray
property occsa

Return alpha occupation numbers.

property occsb

Return beta occupation numbers.

property spinpol: float

Return the spin polarization of the Slater determinant.

validate_norbab(mo, attribute, value)[source]

Validate the norba or norbb value assigned to a MolecularOrbitals object.

Parameters:
  • mo – The MolecularOrbitals instance.

  • attribute – Attribute instancce being changed.

  • value – The new value.

iodata.overlap module

Module for computing overlap of atomic orbital basis functions.

class GaussianOverlap(n_max)[source]

Bases: object

Gaussian Overlap Class.

__init__(n_max)[source]

Initialize class.

Parameters:

n_max (int) – Maximum angular momentum.

compute_overlap_gaussian_1d(x1, x2, n1, n2, two_at)[source]

Compute overlap integral of two Gaussian functions in one-dimensions.

compute_overlap(obasis0, atcoords0, obasis1=None, atcoords1=None)[source]

Compute overlap matrix for the given molecular basis set(s).

\[\braket{\psi_{i}}{\psi_{j}}\]

When only one basis set is given, the overlap matrix of that basis (with itself) is computed. If a second basis set (with its atomic coordinates) is provided, the overlap between the two basis sets is computed.

This function takes into account the requested order of the basis functions in obasis0.conventions (and obasis1.conventions). Note that only L2 normalized primitives are supported at the moment.

Parameters:
  • obasis0 (MolecularBasis) – The orbital basis set.

  • atcoords0 (ndarray) – The atomic Cartesian coordinates (including those of ghost atoms).

  • obasis1 (Optional[MolecularBasis]) – An optional second orbital basis set.

  • atcoords1 (Optional[ndarray]) – An optional second array with atomic Cartesian coordinates (including those of ghost atoms).

Returns:

The matrix with overlap integrals, shape=(obasis0.nbasis, obasis1.nbasis).

Return type:

overlap

gob_cart_normalization(alpha, n)[source]

Compute normalization of exponent.

Parameters:
  • alpha (ndarray) – Gaussian basis exponents

  • n (ndarray) – Cartesian subshell angular momenta

Returns:

The normalization constant for the gaussian cartesian basis.

Return type:

np.ndarray

iodata.overlap_cartpure module

Transformation matrices from Cartesian to pure basis functions.

Both Cartesian and pure functions are assumed to be normalized. These matrices were generated with:

python tools/harmonics.py L2 python 7
iodata.periodic module

Periodic table module.

iodata.utils module

Utility functions module.

class Cube(origin, axes, data)[source]

Bases: object

The volumetric data from a cube (or similar) file.

origin

A 3D vector with the origin of the axes frame.

axes

A (3, 3) array where each row represents the spacing between two neighboring grid points along the first, second and third axis, respectively.

data

A (K, L, M) array of data on a uniform grid

__init__(origin, axes, data)

Method generated by attrs for class Cube.

axes: ndarray
data: ndarray
origin: ndarray
property shape

Shape of the rectangular grid.

exception FileFormatError[source]

Bases: OSError

Raised when incorrect content is encountered when loading files.

__init__(*args, **kwargs)
args
characters_written
errno

POSIX exception code

filename

exception filename

filename2

second exception filename

strerror

exception strerror

with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

exception FileFormatWarning[source]

Bases: Warning

Raised when incorrect content is encountered and fixed when loading files.

__init__(*args, **kwargs)
args
with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

class LineIterator(filename)[source]

Bases: object

Iterator class for looping over lines and keeping track of the line number.

__init__(filename)[source]

Initialize a LineIterator.

Parameters:

filename (str) – The file that will be read.

back(line)[source]

Go one line back and decrease the lineno attribute by one.

error(msg)[source]

Raise an error while reading a file.

Parameters:

msg (str) – Message to raise alongside filename and line number.

warn(msg)[source]

Raise a warning while reading a file.

Parameters:

msg (str) – Message to raise alongside filename and line number.

check_dm(dm, overlap, eps=0.0001, occ_max=1.0)[source]

Check if the density matrix has eigenvalues in the proper range.

Parameters:
  • dm (ndarray) – The density matrix shape=(nbasis, nbasis), dtype=float

  • overlap (ndarray) – The overlap matrix shape=(nbasis, nbasis), dtype=float

  • eps (float) – The threshold on the eigenvalue inequalities.

  • occ_max (float) – The maximum occupation.

Raises:

ValueError – When the density matrix has wrong eigenvalues.

derive_naturals(dm, overlap)[source]

Derive natural orbitals from a given density matrix.

Parameters:
  • dm (ndarray) – The density matrix. shape=(nbasis, nbasis)

  • overlap (ndarray) – The overlap matrix shape=(nbasis, nbasis)

Return type:

Tuple[ndarray, ndarray]

Returns:

  • coeffs – Orbital coefficients shape=(nbasis, nfn)

  • occs – Orbital occupations shape=(nfn, )

set_four_index_element(four_index_object, i, j, k, l, value)[source]

Assign values to a four index object, account for 8-fold index symmetry.

This function assumes physicists’ notation.

Parameters:
  • four_index_object (ndarray) – The four-index object. It will be written to. shape=(nbasis, nbasis, nbasis, nbasis), dtype=float

  • i (int) – The indices to assign to.

  • j (int) – The indices to assign to.

  • k (int) – The indices to assign to.

  • l (int) – The indices to assign to.

  • value (float) – The value of the matrix element to store.

volume(cellvecs)[source]

Calculate the (generalized) cell volume.

Parameters:

cellvecs (ndarray) – A numpy matrix of shape (x,3) where x is in {1,2,3}. Each row is one cellvector.

Returns:

In case of 3D, the cell volume. In case of 2D, the cell area. In case of 1D, the cell length.

Return type:

volume

Module contents

Input and Output Module.

Indices and tables