Skip to content

Structure & Identifiers

A Structure is the core qcio object for representing a molecule or molecular super structure in 3D space. Structure objects can be created directly from symbol and geometry information (geometry must be in Bohr), from SMILES strings, from xyz files, or opened from Structure objects previously saved to disk.

qcio.Structure

Structure(**data: Any)

A Structure object with atoms and their corresponding cartesian coordinates, charge, multiplicity, and identifiers such as name, smiles, etc.

Attributes:

Name Type Description
symbols List[str]

The atomic symbols of the structure.

geometry SerializableNDArray

The geometry of the structure in Cartesian coordinates. Units are Bohr (AU).

identifiers Identifiers

Identifiers for the structure such as name, smiles, etc.

charge int

The molecular charge.

multiplicity int

The molecular multiplicity.

connectivity List[Tuple[int, int, float]]

Explicit description of the bonds between atoms. Each tuple contains the indices of the atoms in the bond and the order of the bond. E.g., [(0, 1, 1.0), (1, 2, 2.0)] indicates a single bond between atoms 0 and 1 and a double bond between atoms 1 and 2.

extras Dict[str, Any]

Additional information to bundle with the object. Use for schema development and scratch space.

ids Identifiers

@property Shortcut to access identifiers.

geometry_angstrom ndarray

@property The geometry of the structure in Angstrom.

atomic_numbers List[int]

@property The atomic numbers of the atoms in the structure.

formula str

@property The molecular formula of the structure using the Hill System.

Example
from qcio import Structure

structure = Structure(
    symbols=["H", "O", "H"],
    geometry=[[0.0, 0.0, 0.0], [0.0, 0.0, 1.0], [0.0, 0.0, 2.0]],
    charge=0,  # optional; defaults to 0
    multiplicity=1,  # optional; defaults to 1
    identifiers={"smiles": "CCO"},  # optional
)
Source code in qcio/models/structure.py
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
def __init__(self, **data: Any):
    """Create a new Structure object.

    Example:
        ```python
        from qcio import Structure

        structure = Structure(
            symbols=["H", "O", "H"],
            geometry=[[0.0, 0.0, 0.0], [0.0, 0.0, 1.0], [0.0, 0.0, 2.0]],
            charge=0,  # optional; defaults to 0
            multiplicity=1,  # optional; defaults to 1
            identifiers={"smiles": "CCO"},  # optional
        )

        ```
    """
    # Backwards compatibility for 'ids' attribute.
    if identifiers := data.pop("ids", None):
        warnings.warn(
            "Passing 'ids' is deprecated and will be removed in a future "
            "release. Please use 'identifiers' instead. Once instantiated, "
            "you can use structure.ids to access the identifiers as a shortcut.",
            category=FutureWarning,
            stacklevel=2,
        )
        data["identifiers"] = identifiers
    super().__init__(**data)

from_smiles classmethod

from_smiles(
    smiles: str,
    *,
    program: str = "rdkit",
    force_field: str = "MMFF94s",
    multiplicity: int = 1
) -> Self

Create a new Structure object from a SMILES string.

Parameters:

Name Type Description Default
smiles str

The SMILES string.

required
program str

The program to use for the conversion. Defaults to "rdkit".

'rdkit'
force_field str

The force field to use. E.g., UFF, MMFF94, MMFF94s, etc.

'MMFF94s'
multiplicity int

The multiplicity of the structure.

1

Returns:

Type Description
Self

A Structure object with identifiers for SMILES and canonical SMILES.

Example
struct = Structure.from_smiles("CN1C=NC2=C1C(=O)N(C(=O)N2C)C")

print(struct.ids.smiles)
# Output: 'CN1C=NC2=C1C(=O)N(C(=O)N2C)C'

print(struct.ids.canonical_smiles)
# Output: 'CN1C=NC2=C1C(=O)N(C(=O)N2C)C'
Source code in qcio/models/structure.py
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
@classmethod
def from_smiles(
    cls,
    smiles: str,
    *,
    program: str = "rdkit",
    force_field: str = "MMFF94s",
    multiplicity: int = 1,
) -> Self:
    """Create a new Structure object from a SMILES string.

    Args:
        smiles: The SMILES string.
        program: The program to use for the conversion. Defaults to "rdkit".
        force_field: The force field to use. E.g., UFF, MMFF94, MMFF94s, etc.
        multiplicity: The multiplicity of the structure.

    Returns:
        A Structure object with identifiers for SMILES and canonical SMILES.

    Example:
        ```python
        struct = Structure.from_smiles("CN1C=NC2=C1C(=O)N(C(=O)N2C)C")

        print(struct.ids.smiles)
        # Output: 'CN1C=NC2=C1C(=O)N(C(=O)N2C)C'

        print(struct.ids.canonical_smiles)
        # Output: 'CN1C=NC2=C1C(=O)N(C(=O)N2C)C'
        ```
    """
    dict_repr = smiles_to_structure(smiles, program, force_field)
    dict_repr["multiplicity"] = multiplicity
    return cls(**dict_repr)

from_xyz classmethod

from_xyz(
    xyz_str: str,
    *,
    charge: Optional[int] = None,
    multiplicity: Optional[int] = None
) -> Self

Create a Structure from an XYZ file or string.

Parameters:

Name Type Description Default
xyz_str str

The XYZ string.

required
charge Optional[int]

The molecular charge of the structure. If not provided, will read from the XYZ string if set or default to 0.

None
multiplicity Optional[int]

The molecular multiplicity of the structure. If not provided, will read from the XYZ string if set or default to 1.

None
Note

Will read qcio data such as charge and multiplicity from the comments line with a qcio_key=value format (if it is present). Also will read in qcio__identifiers_* keys and additional non-qcio comments.

Example
struct = Structure.from_xyz(xyz_str)
Source code in qcio/models/structure.py
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
@classmethod
def from_xyz(
    cls,
    xyz_str: str,
    *,
    charge: Optional[int] = None,
    multiplicity: Optional[int] = None,
) -> Self:
    """Create a Structure from an XYZ file or string.

    Args:
        xyz_str: The XYZ string.
        charge: The molecular charge of the structure. If not provided, will read
            from the XYZ string if set or default to 0.
        multiplicity: The molecular multiplicity of the structure. If not provided,
            will read from the XYZ string if set or default to 1.

    Note:
        Will read qcio data such as `charge` and `multiplicity` from the comments
        line with a `qcio_key=value` format (if it is present). Also will read in
        qcio__identifiers_* keys and additional non-qcio comments.

    Example:
        ```python
        struct = Structure.from_xyz(xyz_str)
        ```
    """

    lines = xyz_str.split("\n")

    num_atoms = int(lines[0])

    # Collect comments
    structure_kwargs: Dict[str, Any] = {}
    identifier_kwargs: Dict[str, Any] = {}
    other_comments: List[str] = []

    for item in lines[1].strip().split():
        if item.startswith("qcio__identifiers_"):
            key = item.split("=")[0].replace("qcio__identifiers_", "")
            value = item.split("=")[1]
            identifier_kwargs[key] = value
        elif item.startswith("qcio_"):
            key = item.split("=")[0].replace("qcio_", "")
            value = item.split("=")[1]
            structure_kwargs[key] = value
        else:
            other_comments.append(item)

    if charge is not None and "charge" in structure_kwargs:
        raise ValueError("Charge cannot be set in the file and as an argument.")
    if multiplicity is not None and "multiplicity" in structure_kwargs:
        raise ValueError(
            "Multiplicity cannot be set in the file and as an argument."
        )

    # Set charge and multiplicity if provided
    if charge is not None:
        structure_kwargs["charge"] = charge
    if multiplicity is not None:
        structure_kwargs["multiplicity"] = multiplicity

    symbols = []
    geometry = []
    for line in lines[2 : 2 + num_atoms]:
        split_line = line.split()
        symbols.append(split_line[0])
        geometry.append([float(val) / BOHR_TO_ANGSTROM for val in split_line[1:]])

    return cls(
        symbols=symbols,
        geometry=geometry,
        **structure_kwargs,
        identifiers=Identifiers(**identifier_kwargs),
        extras={cls._xyz_comment_key: other_comments},
    )

to_smiles

to_smiles(
    program: str = "rdkit", hydrogens: bool = False
) -> str

Generate the canonical SMILES representation of the structure.

Parameters:

Name Type Description Default
program str

The program to use for the conversion. Defaults to "rdkit".

'rdkit'
hydrogens bool

Whether to include hydrogens in the SMILES string. Defaults to False.

False

Returns:

Type Description
str

The canonical SMILES representation of the structure.

Example
struct.to_smiles()
'CN1C=NC2=C1C(=O)N(C(=O)N2C)C'
Source code in qcio/models/structure.py
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
def to_smiles(self, program: str = "rdkit", hydrogens: bool = False) -> str:
    """Generate the canonical SMILES representation of the structure.

    Args:
        program: The program to use for the conversion. Defaults to "rdkit".
        hydrogens: Whether to include hydrogens in the SMILES string. Defaults to
            False.

    Returns:
        The canonical SMILES representation of the structure.

    Example:
        ```python
        struct.to_smiles()
        'CN1C=NC2=C1C(=O)N(C(=O)N2C)C'
        ```
    """
    return structure_to_smiles(self, program=program, hydrogens=hydrogens)

to_xyz

to_xyz(precision: int = 17) -> str

Return an xyz string representation of the structure.

Parameters:

Name Type Description Default
precision int

The number of decimal places to include in the xyz file. Default 17 which captures all precision of float64.

17

Notes: Will add qcio data such as charge and multiplicity to the comments line with a qcio_key=value format.

Source code in qcio/models/structure.py
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
def to_xyz(self, precision: int = 17) -> str:
    """Return an xyz string representation of the structure.

    Args:
        precision: The number of decimal places to include in the xyz file. Default
            17 which captures all precision of float64.
    Notes:
        Will add qcio data such as charge and multiplicity to the comments line with
        a `qcio_key=value` format.
    """

    qcio_data = {  # These get added to comments line (line 2) in xyz file
        "qcio_charge": self.charge,
        "qcio_multiplicity": self.multiplicity,
    }

    # Add identifiers to qcio_data
    for key, value in self.identifiers.__dict__.items():
        if key != "extras" and value:
            qcio_data[f"qcio__identifiers_{key}"] = value

    assert isinstance(self.geometry, np.ndarray)  # For mypy
    geometry_angstrom = self.geometry * BOHR_TO_ANGSTROM

    xyz_lines = []
    xyz_lines.append(f"{len(self.symbols)}")
    # Add qcio data to comments line
    comments = f"{' '.join([f'{k}={v}' for k, v in qcio_data.items()])}"
    # Add any other comments
    if xyz_comments := self.extras.get(self._xyz_comment_key, []):
        comments += " " + " ".join(xyz_comments)
    xyz_lines.append(comments)

    # Create a format string using the precision parameter
    format_str = f"{{:2s}} {{: >18.{precision}f}} {{: >18.{precision}f}} {{: >18.{precision}f}}"  # noqa: E501

    for symbol, (x, y, z) in zip(self.symbols, geometry_angstrom):
        xyz_lines.append(format_str.format(symbol, x, y, z))
    xyz_lines.append("")  # Append newline to end of file
    return "\n".join(xyz_lines)

distance

distance(
    i: int,
    j: int,
    units: DistanceUnits = DistanceUnits.bohr,
) -> float

Calculate the distance between two atoms.

Parameters:

Name Type Description Default
i int

The index of the first atom.

required
j int

The index of the second atom.

required
units DistanceUnits

The units to return the distance in. Defaults to "bohr". May be "bohr" or "angstrom".

bohr

Returns:

Type Description
float

The distance between the atoms in units (Bohr or Angstrom).

Example
struct.distance(0, 1)
1.34
Source code in qcio/models/structure.py
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
def distance(
    self, i: int, j: int, units: DistanceUnits = DistanceUnits.bohr
) -> float:
    """Calculate the distance between two atoms.

    Args:
        i: The index of the first atom.
        j: The index of the second atom.
        units: The units to return the distance in. Defaults to "bohr".
            May be "bohr" or "angstrom".

    Returns:
        The distance between the atoms in units (Bohr or Angstrom).

    Example:
        ```python
        struct.distance(0, 1)
        1.34
        ```
    """
    distance = np.linalg.norm(self.geometry[i] - self.geometry[j])
    if units == DistanceUnits.angstrom:
        return float(distance * BOHR_TO_ANGSTROM)
    return float(distance)

add_smiles

add_smiles(
    *, program: str = "rdkit", hydrogens: bool = False
) -> None

Add SMILES data to the identifiers. The SMILES will be generated from the structure using the specified program.

Parameters:

Name Type Description Default
program str

The program to use to generate the SMILES. Defaults to "rdkit".

'rdkit'
hydrogens bool

Whether to include hydrogens in the SMILES string. Defaults to False.

False
Example
struct.add_smiles()
struct.ids.smiles
'CCO'
struct.ids.canonical_smiles
'CCO'
Source code in qcio/models/structure.py
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
def add_smiles(
    self,
    *,
    program: str = "rdkit",
    hydrogens: bool = False,
) -> None:
    """Add SMILES data to the identifiers. The SMILES will be generated from the
        structure using the specified program.

    Args:
        program: The program to use to generate the SMILES. Defaults to "rdkit".
        hydrogens: Whether to include hydrogens in the SMILES string. Defaults to
            False.

    Example:
        ```python
        struct.add_smiles()
        struct.ids.smiles
        'CCO'
        struct.ids.canonical_smiles
        'CCO'
        ```
    """
    smiles = self.to_smiles(program=program, hydrogens=hydrogens)
    identifiers = {"smiles": smiles}

    if hydrogens:
        identifiers["canonical_explicit_hydrogen_smiles"] = smiles
    else:
        identifiers["canonical_smiles"] = smiles

    identifiers["canonical_smiles_program"] = program
    self.add_identifiers(identifiers)
    # Ensure pydantic knows the field has been set
    self.__pydantic_fields_set__.add("identifiers")

qcio.Identifiers

Structure identifiers.

Attributes:

Name Type Description
name Optional[str]

A human-readable, common name for the structure.

name_IUPAC Optional[str]

The IUPAC name of the structure.

smiles Optional[str]

The SMILES representation of the structure.

canonical_smiles Optional[str]

The canonical SMILES representation of the structure.

canonical_smiles_program Optional[str]

The program used to generate the canonical SMILES.

canonical_explicit_hydrogen_smiles Optional[str]

The canonical explicit hydrogen SMILES representation of the structure.

canonical_isomeric_smiles Optional[str]

The canonical isomeric SMILES representation of the structure.

canonical_isomeric_explicit_hydrogen_smiles Optional[str]

The canonical isomeric explicit hydrogen SMILES representation of the structure.

canonical_isomeric_explicit_hydrogen_mapped_smiles Optional[str]

The canonical isomeric explicit hydrogen mapped SMILES representation of the structure.

inchi Optional[str]

The InChI representation of the structure.

inchikey Optional[str]

The InChIKey representation of the structure.

pubchem_cid Optional[str]

The PubChem Compound ID of the structure.

pubchem_sid Optional[str]

The PubChem Substance ID of the structure.

pubchem_conformerid Optional[str]

The PubChem Conformer ID of the structure.

extras Dict[str, Any]

Additional information to bundle with the object. Use for schema development and scratch space.

qcio.DistanceUnits

Distance units for the Structure.distance method.

Attributes:

Name Type Description
bohr str

The distance in Bohr.

angstrom str

The distance in Angstrom.