Molecular Systems
Under construction. The chiLife.MolSys
module is not yet officially supported.
The MolSys object
The chiLife Molecular System, or MolSys
and related objects, are how chilife represents molecular systems including
proteins, nucleic acids, small molecules, course grained systems and more. MolSys
objects are generally created
from PDB files, MDAnalysis objects or existing chilife molecular systems. The design of the chiLife molecular system
objects are heavily influenced by MDAnalysis and shares many
attribute and method names to facilitate a compatible interface.
- class MolecularSystemBase
Base class for molecular systems containing attributes universal to
- select_atoms(selstr)
Select atoms from
MolSys
orAtomSelection
based on selection a selection string, similar to the select_atoms method from MDAnalysis- Parameters:
selstr (str) – Selection string
- Returns:
atoms – Selected atoms
- Return type:
- property fname
File name or functional name of the molecular system
- property positions
3-Dimensional cartesian coordinates of the atoms in the molecular system
- property coords
3-dimensional cartesian coordinates of the atoms in the molecular system
- property n_atoms
“Number of atoms in the molecular system
- property n_residues
Number of residues in the molecular system
- property n_chains
Number of chains in the molecular system
- property residues
ResidueSelection
for all residues of the molecular system
- property segments
SegmentSelection
for all segments (chains) of the molecular system
- property logic_keywords
Dictionary of logic keywords (strings) pertaining to
atom_selection()
method. Keywords are mapped to the associated logic function
- property molsys_keywords
Dictionary of molecular system keywords (strings) pertaining to
atom_selection()
method. Keywords are mapped to the associated molecular system attributes
- property universe
Wrapper attribute to allow molecular systems to behave like MDAnalysis AtomGroup objects
- copy()
Creates a deep copy of the underlying MolSys and returns an
AtomSelection
from the new copy matching the current selection if it exists.
- class MolSys(atomids: ndarray, names: ndarray, altlocs: ndarray, resnames: ndarray, resnums: ndarray, chains: ndarray, trajectory: ndarray, occupancies: ndarray, bs: ndarray, segs: ndarray, atypes: ndarray, charges: ndarray, bonds: ArrayLike | None = None, name: str = 'Noname_MolSys')
An object containing all attributes of a molecular system.
- Parameters:
atomids (np.ndarray) – Array of atom indices where each index is unique to each atom in the molecular system.
names (np.ndarray) – Array of atom names, e.g. ‘N’, ‘CA’ ‘C’.
altlocs (np.ndarray) – Array of alternative location identifiers for each atom.
resnames (np.ndarray) – Array of residue names for each atom.
resnums (np.ndarray) – Array of residue numbers for each atom.
chains (np.ndarray) – Array of chain identifiers for each atom.
trajectory (np.ndarray) – Cartesian coordinates of each atom for each state of an ensemble or for each timestep of a trajectory.
occupancies (np.ndarray) – Occupancy or q-factory for each atom.
bs (np.ndarray) – thermal, or b-factors for each atom.
segs (np.ndarray) – Segment/chain IDs for each atom.
atypes (np.ndarray) – Atom/element types.
charges (np.ndarray) – Formal or partial charges assigned to each atom.
bonds (ArrayLike) – Array of atom ID pairs corresponding to all atom bonds in the system.
name (str) – Name of molecular system
- classmethod from_pdb(file_name, sort_atoms=False)
Reads a pdb file and returns a MolSys object
- Parameters:
file_name (str, Path) – Path to the PDB file to read.
sort_atoms (bool) – Presort atoms for consisted internal coordinate definitions
- Returns:
cls – Molecular system object of the PDB structure.
- Return type:
- classmethod from_arrays(anames: ArrayLike, atypes: ArrayLike, resnames: ArrayLike, resindices: ArrayLike, resnums: ArrayLike, segindices: ArrayLike, segids: ArrayLike | None = None, trajectory: ArrayLike | None = None, **kwargs) MolSys
Create a MolSys object from a minimal set of arrays.
- Parameters:
anames (ArrayLike) – Array of atom names.
atypes (ArrayLike) – Array of atom types/elements.
resnames (ArrayLike) – Array of residue names for each atom.
resindices (ArrayLike) – Array of residue indices for each atom.
resnums (ArrayLike) – Array of residue numbers for each atom.
segindices (ArrayLike) – Array of segment indices for each atom.
segids (ArrayLike) – Array of segment IDs for each atom.
trajectory (ArrayLike) – Array of cartesian coordinates for each atom and each state of an ensemble or frame of a trajectory.
- Returns:
cls – Molecular system object of the PDB structure.
- Return type:
- classmethod from_atomsel(atomsel, frames=None)
Create a new MolSys object from an existing
AtomSelection
or MDAnalysis.AtomGroup object.- Parameters:
atomsel (
AtomSelection
, MDAnalysis.AtomGroup) – TheAtomSelection
or MDAnalysis.AtomGroup from which to create the new MolSys object.frames (int, Slice, ArrayLike) – Frame index, slice or array of frame indices corresponding to the frames of the atom selection you wish to extract into a new
MolSys
object.
- Returns:
cls – Molecular system object of the PDB structure.
- Return type:
- classmethod from_rdkit(mol)
Create a MolSys from an rdkit Mol object with embedded conformers.
- Parameters:
mol
- classmethod from_openMM(simulation)
Create a MolSys from an openMM Simulation object.
- Parameters:
simulation (openmm.simulation.Simulation) – The simulation object to create the MolSys object from.
- Returns:
sys – chiLife molecular system as a clone from
- Return type:
chilife.MolSys
- copy()
Create a deep copy of the MolSys.
- load_new(coordinates)
Load a new set of 3-dimensional coordinates into the MolSys.
- Parameters:
coordinates (ArrayLike) – Set of new coordinates to load into the MolSys
- class Trajectory(coordinates: ndarray, molsys: MolSys, timestep: float = 1)
Object containing and managing 3-dimensional coordinates of
MolSys
objects, particularly when there are multiple states or time steps.- Parameters:
- load_new(coordinates)
Load a new set (trajectory or ensemble) of 3-dimensional coordinates into the Trajectory.
- Parameters:
coordinates (ArrayLike) – Set of new coordinates to load into the MolSys
- property frame
The frame number the trajectory is currently on
- class ResidueSelection(molsys, mask)
An object containing a group of residues and all their atoms from a
MolSys
object. Note that if one atom of a residue is present in the AtomSelection used to create the group, all atoms of that residue will be present in the ResidueSelection object.Note
Unlike a regular
AtomSelection
object, theresnames
,resnums
,segids
andchains
attributes of this object are tracked with respect to residue not atom and changes made to these attributes on this object will not be present in the parent object and vice versa.
- class SegmentSelection(molsys, mask)
An object containing a group of segments and all their atoms from a
MolSys
object. Note that if one atom of a segment is present in the AtomSelection used to create the group, all atoms of that segment will be present in the SegmentSelection object.Note
Unlike a regular
AtomSelection
object, thesegids
andchains
attributes of this object are tracked with respect to segements not atoms and changes made to these attributes on this object will not be present in the parent object and vice versa.
- class Atom(molsys, mask)
An object containing information for a single atom.
- class Residue(molsys, mask)
An object representing a single residue.
- Parameters:
- phi_selection()
Get an
AtomSelection
of the atoms defining the Phi backbone dihedral angle of the residue.- Returns:
sel – An
AtomSelection
object containing the atoms that make up the Phi backbone dihedral angle of the selected residue.- Return type:
- psi_selection()
Get an
AtomSelection
of the atoms defining the Psi backbone dihedral angle of the residue.- Returns:
sel – An
AtomSelection
object containing the atoms that make up the Psi backbone dihedral angle of the selected residue.- Return type:
The MolSysIC object
While the MolSys
object contains cartesian coordinate and protein information, the MolSysIC
object contains information on internal coordinates.
- class MolSysIC(z_matrix: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], z_matrix_idxs: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], protein: MolecularSystemBase | Universe | AtomGroup, **kwargs)
A class for protein internal coordinates.
- Parameters:
z_matrix (np.ndarray) – Array of z-matricies for all frames of the ensemble/trajectory
z_matrix_idxs (np.ndarray) – Indices of the attoms that define the bond lengths, angles, and dihedrals of the z-matrix
protein
kwargs
- zmats
Dictionary of Z-matrices of the molecule. Each entry corresponds to a contiguous segment of bonded atoms.
- Type:
dict[np.ndarray]
- z_matrix_idxs
Dictionary of Z-matrix indices.
- Type:
dict[np.ndarray]
- atom_dict
Nested dictionaries containing indices of the atoms that define the coordinates (bond, angle dihedral) objects. Nesting pattern: [coord_type][chain][resi, stem][name] where coord_type is one of ‘bond’, ‘angle’ or ‘dihedral’, chain is the chainid, resi is the residue number, stem is a tuple of the names of the 3 atoms preceding the atom coordinate being defined.
- Type:
dict
- ICS
Nested dictionaries containing AtomIC objects. Nesting pattern: [chain][resi][stem][name] where chain is the chainid, resi is the residue number, stem is a tuple of the names of the 3 atoms preceding the atom coordinate being defined.
- Type:
dict
- atoms
List of AtomIC objects making up the protein in order of their indices.
- Type:
np.ndarray[AtomIC]
- atom_types
Array of atom types (elements) corresponding to the atom indices.
- Type:
np.ndarray
- atom_names
Array of atom names corresponding to the atom indices.
- Type:
np.ndarray
- resis
Ordered array of residue numbers.
- Type:
np.ndarray
- resnames
Dictionary mapping residue numbers to their residue names.
- Type:
dict[int]:str
- chains
Array of chainids
- Type:
np.ndarray
- bonded_pairs
Array of index pairs corresponding to atoms that are bonded.
- Type:
np.ndarray
- perturbed
Indicates if any internal coordinates have changed since the last time cartesian coordinates were calculated.
- Type:
bool
- dihedral_defs
List of major side chain and backbone dihedral definitions as defined by user defined and supported residues.
- Type:
list
- classmethod from_atoms(atoms: Universe | AtomGroup | MolecularSystemBase, preferred_dihedrals: List | None = None, bonds: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes] | None = None, **kwargs: Dict)
Generate a MolSysIC object from a
AtomSelection
or MDAnalysis.AtomGroup object.- Parameters:
atoms (MDAnalysis.Universe, MDAnalysis.AtomGroup, MolecularSystemBase) – MDA Universe, AtomGroup or chiLife Molecular System to create the internal coordinates from.
preferred_dihedrals (List[List[str]]) – List of preferred dihedral definitions to use when defining the internal coordinate system.
bonds (ArrayLike) – Array of tuples defining the atom pairs that are bonded.
kwargs (dict) –
Additional keyword arguments.
- ignore_waterbool
Ignore atoms that belong to water molecules.
- use_chain_operatorsbool
Allow for the use of a translation and rotation vectors to orient cains that are not covalently linked.
- Returns:
cls – A chiLife internal coordinates molecular system.
- Return type:
MDAnalysis.Universe
- copy()
Create a deep copy of an MolSysIC instance
- property chain_operator_idxs
An array of indices defining the starting and ending indices of the different chains.
- property chain_operators
A set of coordinate transformations that can orient multiple chains that are not covalently linked. e.g. structures with missing residues or protein complexes. The
chain_operator
property is a dictionary mapping chainIDs to a sub-dictionary containing a translation vector,ori
and rotation matrixmx
that will transform the protein coordinates from tie internal coordinate frame to some global frame.- Type:
dict
- property z_matrix
The z-matrix (internal) coordinates of the molecular system for the current frame.
- property coords
The cartesian coordinates of the molecular system for the current frame
- property nonbonded
Array of atom index pairs of atoms that are not bonded
- Type:
np.ndarray
- to_cartesian()
Method to convert z-matrix (internal) coordinate to cartesian coordinate
- set_dihedral(dihedrals: float | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], resi: int, atom_list: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], chain: int | str | None = None)
Set one or more dihedral angles of a single residue in internal coordinates for the atoms defined in atom list.
- Parameters:
dihedrals (float, ArrayLike) – Angle or array of angles to set the dihedral(s) to.
resi (int) – Residue number of the site being altered
atom_list (ArrayLike) – Names or array of names of atoms involved in the dihedral(s)
chain (int, str, optional) – Chain containing the atoms that are defined by the dihedrals that are being set. Only necessary when there is more than one chain in the proteinIC object.
- Returns:
self – ProteinIC object with new dihedral angle(s)
- Return type:
- batch_set_dihedrals(idxs: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], dihedrals: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], resi: int, atom_list: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], chain: int | str | None = None)
Sets dihedral angles of a single residue in internal coordinates for the atoms defined in atom list and in trajectory indices in idxs.
- Parameters:
idxs (ArrayLike) – an array in indices corresponding to the progenitor structure in the MolSysIC trajectory.
dihedrals (ArrayLike) – an array of angles for each idx in
idxs
and for each dihedral inatom_list
. Should have the shape(len(idxs), len(atom_list))
resi (int) – Residue number of the site being altered
atom_list (ArrayLike) – Names or array of names of atoms involved in the dihedrals
chain (int, str, optional) – Chain containing the atoms that are defined by the dihedrals that are being set. Only necessary when there is more than one chain in the proteinIC object.
- Returns:
z_matrix – the z_matrix for every frame in
idxs
with the new dihedral angel values defined indihedrals
- Return type:
ArrayLike
- get_dihedral(resi: int, atom_list: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], chain: int | str | None = None)
Get the dihedral angle(s) of one or more atom sets at the specified residue. Dihedral angles are returned in radians
- Parameters:
resi (int) – Residue number of the site being altered
atom_list (ArrayLike) – Names or array of names of atoms involved in the dihedral(s)
chain (int, str) – Chain identifier. required if there is more than one chain in the protein. Default value = None
- Returns:
angles – Array of dihedral angles corresponding to the atom sets in atom list.
- Return type:
numpy.ndarray
- get_z_matrix_idxs(resi: int, atom_list: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], chain: int | str | None = None)
Get the z-matrix indices of the dihedral angle(s) defined by
atom_list
and the specified residue and chain. Dihedral angles are returned in radians.- Parameters:
resi (int) – Residue number of the site being altered
atom_list (ArrayLike) – Names or array of names of atoms involved in the dihedral(s)
chain (int, str) – Chain identifier. required if there is more than one chain in the protein. Default value = None
- Returns:
dihedral_idxs – Array of dihedral angles corresponding to the atom sets in atom list.
- Return type:
numpy.ndarray
- has_clashes(distance: float = 1.5) bool
Checks for an internal clash between non-bonded atoms.
- Parameters:
distance (float) – Minimum distance allowed between non-bonded atoms in angstroms. (Default value = 1.5).
- Returns:
has_clashes – True if there are any atoms within
distance
angstroms from each other that would constitute a clash otherwise False- Return type:
bool
- phi_idxs(resnums: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes] | None = None, chain: int | str | None = None)
Method to return the Z-matrix indices of Phi backbone dihedrals
- Parameters:
resnums (int, ArrayLike[int]) – Residues for which to return the Phi value index of the z-matrix
chain (int, str) – Chain corresponding to the resnums of interest
- Returns:
idxs – Array of indices corresponding to the phi dihedral angles of the selected residues on the chain
- Return type:
ndarray[int]
- psi_idxs(resnums: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes] | None = None, chain: int | str | None = None)
Method to return the Z-matrix indices of Psi backbone dihedrals
- Parameters:
resnums (int, ArrayLike[int]) – Residues for which to return the Psi value index of the z-matrix
chain (int, str) – Chain corresponding to the resnums of interest
- Returns:
idxs – Array of indices corresponding to the Psi dihedral angles of the selected residues on the chain
- Return type:
ndarray[int]
- omega_idxs(resnums: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes] | None = None, chain: int | str | None = None)
Method to return the Z-matrix indices of Psi backbone dihedrals
- Parameters:
resnums (int, ArrayLike[int]) – Residues for which to return the Psi value index of the z-matrix
chain (int, str) – Chain corresponding to the resnums of interest
- Returns:
idxs – Array of indices corresponding to the Psi dihedral angles of the selected residues on the chain
- Return type:
ndarray[int]
- chi_idxs(resnums: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes] | None = None, chain: int | str | None = None)
Create a list of index arrays corresponding the Z-matrix indices of flexible side chain (chi) dihedrals. Note that only chi dihedrals defined in
chilife.dihedral_defs
will be considered.- Parameters:
resnums (int, ArrayLike[int]) – Residues for which to return the Psi value index of the z-matrix
chain (int, str) – Chain corresponding to the resnums of interest
- Returns:
chi_idxs – list of arrays containing the indices of the z-matrix dihedrals corresponding to the chi dihedrals. Each array in the list refers to a specific dihedral:
[χ1, χ2, ... χn]
. because different residues will have different numbers of chi dihedrals the arrays will not be the same size, but they can be concatenated using numpy:chi_idxs = np.concatenate(IC.chi_idxs(...)))
.- Return type:
List[np.ndarray]
- load_new(z_matrix, **kwargs)
Replace the current z-matrix ensemble/trajectory with new one.
- Parameters:
z_matrix (np.ndarray) – New z-matrix to load into the MolSysIC
kwargs (dict) –
Additional keyword arguments.
- opdict
New chain operators for the new z-matrix.
- apply_chain_operators(idx=None, from_list=False)
Apply chain operators to the specified frames (
idx
) of the MolSysIC trajectory. If noidx
is provided then all chain operators will be applied to all frames. :param from_list: A boolean argument to specify whether each frame previously had its own chain operators. :type from_list: bool :param idx: Frames or array of frames that should have the chain operators applied. :type idx: int, Array
- use_frames(idxs)
Remove all frames except those specified by idxs :param idxs: Index or array if indices to keep :type idxs: int, ArrayLike
- set_cartesian_coords(coords, mask)
Alter the coordinates of the current frame by assigning new cartwsian coordinates to the atoms set to True and
mask
.- Parameters:
coords (np.ndarray) – Cartesian coordinates to apply to the atoms defined by
mask
of the current frame.mask (np.ndarray) – Array of bool values defining which atoms of the MolSysIC are being set.
- shift_resnum(delta: int)
Alter the residue number of all residues in the MolSysIC by adding
delta
.- Parameters:
delta (int) – Amount to shift residue numbers by. Can be positive or negative.
- reconfigure_cap(cap, atom_idxs, bonds)
Helper function to reconfigure the “cap” region of bifunctional labels so that the internal coordinates are defined correctly.
- Parameters:
cap (ArrayLike) – Indices of the cap atoms in the molecule with atoms defined in atom_idxs.
atom_idxs (ArrayLike) – Indices of all atoms to be considered for reconfiguration containing the cap.
bonds (ArrayLike) – Array of tuples defining the bonds between atoms in atom_idxs.
- Returns:
atom_idxs – Array of atom indices rearranged so that the cap is terminal to the rest of the atoms.
- Return type:
ArrayLike
- zmatrix_idxs_to_local(zmatrix_idxs)
Convert Z-matrix indices so that they reference the indices of the z-matrix rather than the parental atom selection that was used to construct the z-matrix and may have an index offset or missing atoms.
- Parameters:
zmatrix_idxs (ArrayLike) – z-matrix indices
- Returns:
zmatrix_idxs – Array of indices modified for internal reference.
- Return type:
ArrayLike
- get_chainbreak_idxs(z_matrix_idxs)
Get indices of atoms that chain breaks :param z_matrix_idxs: index map of z_matrix :type z_matrix_idxs: ArrayLike
- Returns:
indices of atoms that start new chains
- Return type:
chainbreak_idxs
- ic_mx(*p: _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes]) Tuple[_SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes]]
Calculates a rotation matrix and translation to transform a set of atoms to global coordinate frame from a local coordinated frame defined by
p
. This function is a wrapper to quickly and easily transform cartesian coordinates, made from internal coordinates, to their appropriate location in the global frame.- Parameters:
p (ArrayLike) – Coordinates of the three atoms defining the local coordinate frame of the IC object
- Returns:
rotation_matrix (np.ndarray) – Rotation matrix to rotate from the internal coordinate frame to the global frame
origin (np.ndarray) – New origin position in 3-dimensional space
- get_z_matrix(coords, z_matrix_idxs, chain_operator_idxs=None)
Given a set of cartesian coordinates and indices defining a z-matrix, calcualte the z-matrix.
- Parameters:
coords (np.ndarray) – Array of cartesian coordinates.
z_matrix_idxs (np.ndarray) – Array of indices defining bond-angle-torsions of the z-matrix.
chain_operator_idxs (np.ndarray) – Array of indices defining different chains.
- Returns:
z_matrix – Z-matrix (internal) coordinates.
- Return type:
np.ndarray