API Reference

This page contains the full API reference generated from docstrings. If you are new to PepKit, start with Getting Started and the module guides.

Module map

Chemical Modeling

Parsing/standardization, conversion, properties, descriptors

Query

Fetch, constraint-based filtering

Chem Module

Conversion (pepkit.chem.conversion.conversion)

Tools for parsing peptide representations (FASTA/SMILES), standardizing sequences, and filtering non-canonical FASTA records.

Descriptor (pepkit.chem.desc.descriptor)

Calculation of molecular descriptors and physicochemical properties.

Standardize (pepkit.chem.standardize)

Utilities for standardizing peptide sequences and molecular representations.

Query Module

Request (pepkit.query.request)

pepkit.query.request.retrieve_pdb(pdb_id, outdir='.', format='pdb')[source]

Download a .pdb file from RCSB by PDB ID.

Parameters:
Return type:

Path

Filter (pepkit.query.filter)

pepkit.query.filter.validate_complex_pdb(pdb_id, length_cutoff=50, canonical_check=False, hetatm_check=False)
Parameters:
  • pdb_id (str)

  • length_cutoff (int)

  • canonical_check (bool)

  • hetatm_check (bool)

pepkit.query.filter.validate_complex_pdbs(pdb_ids, length_cutoff=50, canonical_check=False, hetatm_check=False, n_jobs=8)
Parameters:
  • pdb_ids (list)

  • length_cutoff (int)

  • canonical_check (bool)

  • hetatm_check (bool)

  • n_jobs (int)

Constraint-based query (pepkit.query.query)

pepkit.query.query.query(quality, exp_method, release_date, length_cutoff, canonical_check, hetatm_check, csv_path, fasta_path, receptor_only, n_jobs)[source]

Query, validate, and extract peptide–protein complexes from RCSB.

This function performs an end-to-end workflow:

  1. Query RCSB for candidate peptide–protein complexes using metadata constraints (resolution, experimental method, release date).

  2. Validate each PDB entry using structural and sequence-based criteria (peptide detection, length cutoff, canonical residues, HETATM presence).

  3. Write a metadata table (CSV) describing valid complexes.

  4. Extract corresponding sequences into a FASTA file for downstream modeling (e.g., AF-Multimer, docking, ML pipelines).

The function is side-effect driven: results are written to disk (CSV + FASTA) and not returned explicitly.

Parameters:
  • quality (float) – Maximum allowed experimental resolution (in Å) used to query RCSB. Lower values correspond to higher-quality structures. Example: 3.0.

  • exp_method (str) – Experimental method used to solve the structure. Must match RCSB metadata exactly. Example: "X-RAY DIFFRACTION".

  • release_date (dict or str) – Release date constraint for RCSB query. Can be either: - a dict with {"from": YYYY-MM-DD, "to": YYYY-MM-DD}, or - a single date string (interpreted as lower bound).

  • length_cutoff (int) – Maximum allowed sequence length used for peptide/protein filtering. Typically peptides are expected to be short (e.g. ≤ 50 residues).

  • canonical_check (bool) – If True, discard complexes containing non-canonical amino acids (e.g., X) in any retained chain.

  • hetatm_check (bool) – If True, discard PDB entries containing HETATM records (e.g., ligands, cofactors, modified residues).

  • csv_path (str or pathlib.Path) – Output path for the CSV metadata table describing valid peptide–protein complexes.

  • fasta_path (str or pathlib.Path) – Output path for the FASTA file containing extracted sequences. The exact content depends on receptor_only.

  • receptor_only (bool) – If True, only receptor (protein) chains are written to FASTA. If False, both peptide and protein chains are included.

  • n_jobs (int) – Number of parallel workers used for PDB validation. Passed to joblib.Parallel.

Raises:
  • RuntimeError – If RCSB query fails or no valid complexes are found.

  • IOError – If output files cannot be written.

Side effects:
  • Writes csv_path (CSV metadata)

  • Writes fasta_path (FASTA sequences)

Example:
>>> query(
...     quality=3.0,
...     exp_method="X-RAY DIFFRACTION",
...     release_date={"from": "2018-01-01", "to": "2018-01-08"},
...     length_cutoff=50,
...     canonical_check=True,
...     hetatm_check=True,
...     csv_path="demo.csv",
...     fasta_path="demo.fasta",
...     receptor_only=True,
...     n_jobs=4,
... )

Modelling Module

Analysis (pepkit.modelling.af.post.analysis)

class pepkit.modelling.af.post.analysis.AnalysisInputs(json_path: 'Optional[Path]', pdb_path: 'Optional[Path]')[source]

Bases: object

Parameters:
  • json_path (Path | None)

  • pdb_path (Path | None)

json_path: Path | None
pdb_path: Path | None
class pepkit.modelling.af.post.analysis.EntryMeta(length: 'Optional[int]', processing_time: 'Optional[float]')[source]

Bases: object

Parameters:
  • length (int | None)

  • processing_time (float | None)

length: int | None
processing_time: float | None
class pepkit.modelling.af.post.analysis.BatchStats(ok: 'int' = 0, empty: 'int' = 0, error: 'int' = 0, dockq_ok: 'int' = 0, dockq_fail: 'int' = 0)[source]

Bases: object

Parameters:
ok: int = 0
empty: int = 0
error: int = 0
dockq_ok: int = 0
dockq_fail: int = 0
class pepkit.modelling.af.post.analysis.ProgressLogger(total, step_pct)[source]

Bases: object

Log at K% increments (10%, 20%, …).

Parameters:
tick(i)[source]
Parameters:

i (int)

Return type:

None

class pepkit.modelling.af.post.analysis.Analysis(json_path=None, pdb_path=None, peptide_chain_position='last', distance_cutoff=8.0, round_digits=2, *, pdockq2_d0=10.0, pdockq2_sym_pae=True)[source]

Bases: BaseFeature

High-level feature aggregation for AF(-Multimer) outputs.

DockQ integration (via dockq.py):
  • Provide –mapping_csv with pdb_id,mapping to enable DockQ.

  • DockQ is computed for EACH entry and EACH rank.

  • Written inside each rank dict:

    rankXXX[“total_dockq”] rankXXX[“avg_dockq”]

Parameters:
  • json_path (Optional[str])

  • pdb_path (Optional[str])

  • peptide_chain_position (str)

  • distance_cutoff (float)

  • round_digits (int)

  • pdockq2_d0 (float)

  • pdockq2_sym_pae (bool)

single_analysis()[source]
Return type:

Dict[str, Any]

all_analysis(dir_path)[source]
Parameters:

dir_path (str | Path)

Return type:

Dict[str, Any]

batch_analysis(batch_dir, *, delete_zips=True, mapping_by_pdbid=None, native_pdb_dir=None, progress_step_pct=10)[source]

progress_step_pct=10 => log at 10%,20%,…,100%

Parameters:
Return type:

Dict[str, Any]

static args()[source]
Return type:

ArgumentParser

pepkit.modelling.af.post.analysis.main()[source]
Return type:

None