Skip to content

Structure & Identifiers

A Structure is the core qcio object for representing a molecule or molecular super structure in 3D space. Structure objects can be created directly from symbol and geometry information (geometry must be in Bohr), from SMILES strings, from xyz files, or opened from Structure objects previously saved to disk.

qcio.Structure

Structure(**data: Any)

A Structure object with atoms and their corresponding cartesian coordinates, charge, multiplicity, and identifiers such as name, smiles, etc.

Attributes:

Name Type Description
symbols list[str]

The atomic symbols of the structure.

geometry SerializableNDArray

The geometry of the structure in Cartesian coordinates. Units are Bohr (AU).

identifiers Identifiers

Identifiers for the structure such as name, smiles, etc.

charge int

The molecular charge.

multiplicity int

The molecular multiplicity.

connectivity list[tuple[int, int, float]]

Explicit description of the bonds between atoms. Each tuple contains the indices of the atoms in the bond and the order of the bond. E.g., [(0, 1, 1.0), (1, 2, 2.0)] indicates a single bond between atoms 0 and 1 and a double bond between atoms 1 and 2.

extras Dict[str, Any]

Additional information to bundle with the object. Use for schema development and scratch space.

ids Identifiers

@property Shortcut to access identifiers.

geometry_angstrom ndarray

@property The geometry of the structure in Angstrom.

atomic_numbers list[int]

@property The atomic numbers of the atoms in the structure.

formula str

@property The molecular formula of the structure using the Hill System.

Example
from qcio import Structure

structure = Structure(
    symbols=["H", "O", "H"],
    geometry=[[0.0, 0.0, 0.0], [0.0, 0.0, 1.0], [0.0, 0.0, 2.0]],
    charge=0,  # optional; defaults to 0
    multiplicity=1,  # optional; defaults to 1
    identifiers={"smiles": "O"},  # optional
)
Source code in src/qcio/models/structure.py
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
def __init__(self, **data: Any):
    """Create a new Structure object.

    Example:
        ```python
        from qcio import Structure

        structure = Structure(
            symbols=["H", "O", "H"],
            geometry=[[0.0, 0.0, 0.0], [0.0, 0.0, 1.0], [0.0, 0.0, 2.0]],
            charge=0,  # optional; defaults to 0
            multiplicity=1,  # optional; defaults to 1
            identifiers={"smiles": "O"},  # optional
        )

        ```
    """
    # Backwards compatibility for 'ids' attribute.
    if identifiers := data.pop("ids", None):
        warnings.warn(
            "Passing 'ids' is deprecated and will be removed in a future "
            "release. Please use 'identifiers' instead. Once instantiated, "
            "you can use structure.ids to access the identifiers as a shortcut.",
            category=FutureWarning,
            stacklevel=2,
        )
        data["identifiers"] = identifiers
    super().__init__(**data)

from_xyz classmethod

from_xyz(
    xyz_str: str,
    *,
    charge: int | None = None,
    multiplicity: int | None = None,
) -> Self

Create a Structure from an XYZ file or string.

Parameters:

Name Type Description Default
xyz_str str

The XYZ string.

required
charge int | None

The molecular charge of the structure. If not provided, will read from the XYZ string if set or default to 0.

None
multiplicity int | None

The molecular multiplicity of the structure. If not provided, will read from the XYZ string if set or default to 1.

None
Note

Will read qcio data such as charge and multiplicity from the comments line with a qcio_key=value format (if it is present). Also will read in qcio__identifiers_* keys and additional non-qcio comments.

Example
struct = Structure.from_xyz(xyz_str)
Source code in src/qcio/models/structure.py
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
@classmethod
def from_xyz(
    cls,
    xyz_str: str,
    *,
    charge: int | None = None,
    multiplicity: int | None = None,
) -> Self:
    """Create a Structure from an XYZ file or string.

    Args:
        xyz_str: The XYZ string.
        charge: The molecular charge of the structure. If not provided, will read
            from the XYZ string if set or default to 0.
        multiplicity: The molecular multiplicity of the structure. If not provided,
            will read from the XYZ string if set or default to 1.

    Note:
        Will read qcio data such as `charge` and `multiplicity` from the comments
        line with a `qcio_key=value` format (if it is present). Also will read in
        qcio__identifiers_* keys and additional non-qcio comments.

    Example:
        ```python
        struct = Structure.from_xyz(xyz_str)
        ```
    """

    lines = xyz_str.split("\n")

    num_atoms = int(lines[0])

    # Collect comments
    structure_kwargs: dict[str, Any] = {}
    identifier_kwargs: dict[str, Any] = {}
    other_comments: list[str] = []

    for item in lines[1].strip().split():
        if item.startswith("qcio__identifiers_"):
            key = item.split("=")[0].replace("qcio__identifiers_", "")
            value = item.split("=")[1]
            identifier_kwargs[key] = value
        elif item.startswith("qcio_"):
            key = item.split("=")[0].replace("qcio_", "")
            value = item.split("=")[1]
            structure_kwargs[key] = value
        else:
            other_comments.append(item)

    if charge is not None and "charge" in structure_kwargs:
        raise ValueError("Charge cannot be set in the file and as an argument.")
    if multiplicity is not None and "multiplicity" in structure_kwargs:
        raise ValueError(
            "Multiplicity cannot be set in the file and as an argument."
        )

    # Set charge and multiplicity if provided
    if charge is not None:
        structure_kwargs["charge"] = charge
    if multiplicity is not None:
        structure_kwargs["multiplicity"] = multiplicity

    symbols = []
    geometry = []
    for line in lines[2 : 2 + num_atoms]:
        split_line = line.split()
        symbols.append(split_line[0])
        geometry.append([float(val) / BOHR_TO_ANGSTROM for val in split_line[1:]])

    return cls(
        symbols=symbols,
        geometry=geometry,
        **structure_kwargs,
        identifiers=Identifiers(**identifier_kwargs),
        extras={cls._xyz_comment_key: other_comments},
    )

to_xyz

to_xyz(precision: int = 17) -> str

Return an xyz string representation of the structure.

Parameters:

Name Type Description Default
precision int

The number of decimal places to include in the xyz file. Default 17 which captures all precision of float64.

17

Notes: Will add qcio data such as charge and multiplicity to the comments line with a qcio_key=value format.

Source code in src/qcio/models/structure.py
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
def to_xyz(self, precision: int = 17) -> str:
    """Return an xyz string representation of the structure.

    Args:
        precision: The number of decimal places to include in the xyz file. Default
            17 which captures all precision of float64.
    Notes:
        Will add qcio data such as charge and multiplicity to the comments line with
        a `qcio_key=value` format.
    """

    qcio_data = {  # These get added to comments line (line 2) in xyz file
        "qcio_charge": self.charge,
        "qcio_multiplicity": self.multiplicity,
    }

    # Add identifiers to qcio_data
    for key, value in self.identifiers.__dict__.items():
        if key != "extras" and value:
            qcio_data[f"qcio__identifiers_{key}"] = value

    assert isinstance(self.geometry, np.ndarray)  # For mypy
    geometry_angstrom = self.geometry * BOHR_TO_ANGSTROM

    xyz_lines = []
    xyz_lines.append(f"{len(self.symbols)}")
    # Add qcio data to comments line
    comments = f"{' '.join([f'{k}={v}' for k, v in qcio_data.items()])}"
    # Add any other comments
    if xyz_comments := self.extras.get(self._xyz_comment_key, []):
        comments += " " + " ".join(xyz_comments)
    xyz_lines.append(comments)

    # Create a format string using the precision parameter
    format_str = f"{{:2s}} {{: >18.{precision}f}} {{: >18.{precision}f}} {{: >18.{precision}f}}"  # noqa: E501

    for symbol, (x, y, z) in zip(self.symbols, geometry_angstrom):
        xyz_lines.append(format_str.format(symbol, x, y, z))
    xyz_lines.append("")  # Append newline to end of file
    return "\n".join(xyz_lines)

distance

distance(i: int, j: int, units: LengthUnit = BOHR) -> float

Calculate the distance between two atoms.

Parameters:

Name Type Description Default
i int

The index of the first atom.

required
j int

The index of the second atom.

required
units LengthUnit

The units to return the distance in. Defaults to "bohr". May be "bohr" or "angstrom".

BOHR

Returns:

Type Description
float

The distance between the atoms in units (Bohr or Angstrom).

Example
struct.distance(0, 1)
1.34
Source code in src/qcio/models/structure.py
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
def distance(self, i: int, j: int, units: LengthUnit = LengthUnit.BOHR) -> float:
    """Calculate the distance between two atoms.

    Args:
        i: The index of the first atom.
        j: The index of the second atom.
        units: The units to return the distance in. Defaults to "bohr".
            May be "bohr" or "angstrom".

    Returns:
        The distance between the atoms in units (Bohr or Angstrom).

    Example:
        ```python
        struct.distance(0, 1)
        1.34
        ```
    """
    distance = np.linalg.norm(self.geometry[i] - self.geometry[j])
    if units == LengthUnit.ANGSTROM:
        return float(distance * BOHR_TO_ANGSTROM)
    return float(distance)

add_smiles

add_smiles(
    *, program: str = "rdkit", hydrogens: bool = False
) -> None

!! DEPRECATED !!

This helper has been removed to qcinf (see qcinf.structure_to_smiles). It will be removed from qcio in a future release.

Source code in src/qcio/models/structure.py
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
def add_smiles(
    self: "Structure",
    *,
    program: str = "rdkit",
    hydrogens: bool = False,
) -> None:
    """
    !! DEPRECATED !!

    This helper has been removed to **qcinf** (see `qcinf.structure_to_smiles`).
    It will be removed from qcio in a future release.
    """
    warnings.warn(
        "`Structure.add_smiles()` has moved to `qcinf` and is no longer "
        "implemented here.\n\n"
        "Install qcinf and replace your call with:\n\n"
        "    from qcinf import structure_to_smiles\n"
        "    smiles = structure_to_smiles(struct, backend='rdkit|openbabel')\n"
        "    struct.add_identifiers(smiles=smiles)\n\n",
        DeprecationWarning,  # use FutureWarning if you want it visible by default
        stacklevel=2,
    )
    raise NotImplementedError(
        "Structure.add_smiles() is removed. "
        "Use qcinf.structure_to_smiles and struct.add_identifiers instead."
    )

qcio.Identifiers

Structure identifiers.

Attributes:

Name Type Description
name str | None

A human-readable, common name for the structure.

name_IUPAC str | None

The IUPAC name of the structure.

smiles str | None

The SMILES representation of the structure.

canonical_smiles str | None

The canonical SMILES representation of the structure.

canonical_smiles_program str | None

The program used to generate the canonical SMILES.

canonical_explicit_hydrogen_smiles str | None

The canonical explicit hydrogen SMILES representation of the structure.

canonical_isomeric_smiles str | None

The canonical isomeric SMILES representation of the structure.

canonical_isomeric_explicit_hydrogen_smiles str | None

The canonical isomeric explicit hydrogen SMILES representation of the structure.

canonical_isomeric_explicit_hydrogen_mapped_smiles str | None

The canonical isomeric explicit hydrogen mapped SMILES representation of the structure.

inchi str | None

The InChI representation of the structure.

inchikey str | None

The InChIKey representation of the structure.

pubchem_cid str | None

The PubChem Compound ID of the structure.

pubchem_sid str | None

The PubChem Substance ID of the structure.

pubchem_conformerid str | None

The PubChem Conformer ID of the structure.

extras Dict[str, Any]

Additional information to bundle with the object. Use for schema development and scratch space.