API Reference

This page contains the full API reference generated from docstrings. If you are new to PepKit, start with Getting Started and the module guides.

Module map 

Chemical Modeling	Parsing/standardization, conversion, properties, descriptors
Query	Fetch, constraint-based filtering

Chem Module 

Conversion (`pepkit.chem.conversion.conversion`)

Tools for parsing peptide representations (FASTA/SMILES), standardizing sequences, and filtering non-canonical FASTA records.

Descriptor (`pepkit.chem.desc.descriptor`)

Calculation of molecular descriptors and physicochemical properties.

Standardize (`pepkit.chem.standardize`)

Utilities for standardizing peptide sequences and molecular representations.

Query Module 

Request (`pepkit.query.request`)

pepkit.query.request.retrieve_pdb(pdb_id, outdir='.', format='pdb')[source]

Download a .pdb file from RCSB by PDB ID.

Parameters:

pdb_id (str)
outdir (str | Path)
format (str)

Return type:

Path

Filter (`pepkit.query.filter`)

pepkit.query.filter.validate_complex_pdb(pdb_id, length_cutoff=50, canonical_check=False, hetatm_check=False)

Parameters:

pdb_id (str)
length_cutoff (int)
canonical_check (bool)
hetatm_check (bool)

pepkit.query.filter.validate_complex_pdbs(pdb_ids, length_cutoff=50, canonical_check=False, hetatm_check=False, n_jobs=8)

Parameters:

pdb_ids (list)
length_cutoff (int)
canonical_check (bool)
hetatm_check (bool)
n_jobs (int)

Constraint-based query (`pepkit.query.query`)

pepkit.query.query.query(quality, exp_method, release_date, length_cutoff, canonical_check, hetatm_check, csv_path, fasta_path, receptor_only, n_jobs)[source]

Query, validate, and extract peptide–protein complexes from RCSB.

This function performs an end-to-end workflow:

Query RCSB for candidate peptide–protein complexes using metadata constraints (resolution, experimental method, release date).
Validate each PDB entry using structural and sequence-based criteria (peptide detection, length cutoff, canonical residues, HETATM presence).
Write a metadata table (CSV) describing valid complexes.
Extract corresponding sequences into a FASTA file for downstream modeling (e.g., AF-Multimer, docking, ML pipelines).

The function is side-effect driven: results are written to disk (CSV + FASTA) and not returned explicitly.

Parameters:

quality (float) – Maximum allowed experimental resolution (in Å) used to query RCSB. Lower values correspond to higher-quality structures. Example: 3.0.
exp_method (str) – Experimental method used to solve the structure. Must match RCSB metadata exactly. Example: "X-RAY DIFFRACTION".
release_date (dict or str) – Release date constraint for RCSB query. Can be either: - a dict with {"from": YYYY-MM-DD, "to": YYYY-MM-DD}, or - a single date string (interpreted as lower bound).
length_cutoff (int) – Maximum allowed sequence length used for peptide/protein filtering. Typically peptides are expected to be short (e.g. ≤ 50 residues).
canonical_check (bool) – If True, discard complexes containing non-canonical amino acids (e.g., X) in any retained chain.
hetatm_check (bool) – If True, discard PDB entries containing HETATM records (e.g., ligands, cofactors, modified residues).
csv_path (str or pathlib.Path) – Output path for the CSV metadata table describing valid peptide–protein complexes.
fasta_path (str or pathlib.Path) – Output path for the FASTA file containing extracted sequences. The exact content depends on receptor_only.
receptor_only (bool) – If True, only receptor (protein) chains are written to FASTA. If False, both peptide and protein chains are included.
n_jobs (int) – Number of parallel workers used for PDB validation. Passed to joblib.Parallel.

Raises:

RuntimeError – If RCSB query fails or no valid complexes are found.
IOError – If output files cannot be written.

Side effects:

Writes csv_path (CSV metadata)
Writes fasta_path (FASTA sequences)

Example:

>>> query(
...     quality=3.0,
...     exp_method="X-RAY DIFFRACTION",
...     release_date={"from": "2018-01-01", "to": "2018-01-08"},
...     length_cutoff=50,
...     canonical_check=True,
...     hetatm_check=True,
...     csv_path="demo.csv",
...     fasta_path="demo.fasta",
...     receptor_only=True,
...     n_jobs=4,
... )

Modelling Module 

Analysis (`pepkit.modelling.af.post.analysis`)

class pepkit.modelling.af.post.analysis.AnalysisInputs(json_path: 'Optional[Path]', pdb_path: 'Optional[Path]')[source]

Bases: object

Parameters:

json_path (Path | None)
pdb_path (Path | None)

json_path: Path | None

pdb_path: Path | None

class pepkit.modelling.af.post.analysis.EntryMeta(length: 'Optional[int]', processing_time: 'Optional[float]')[source]

Bases: object

Parameters:

length (int | None)
processing_time (float | None)

length: int | None

processing_time: float | None

class pepkit.modelling.af.post.analysis.BatchStats(ok: 'int' = 0, empty: 'int' = 0, error: 'int' = 0, dockq_ok: 'int' = 0, dockq_fail: 'int' = 0)[source]

Bases: object

Parameters:

ok (int)
empty (int)
error (int)
dockq_ok (int)
dockq_fail (int)

ok: int = 0

empty: int = 0

error: int = 0

dockq_ok: int = 0

dockq_fail: int = 0

class pepkit.modelling.af.post.analysis.ProgressLogger(total, step_pct)[source]

Bases: object

Log at K% increments (10%, 20%, …).

Parameters:

total (int)
step_pct (int)

tick(i)[source]

Parameters:: i (int)
Return type:: None

class pepkit.modelling.af.post.analysis.Analysis(json_path=None, pdb_path=None, peptide_chain_position='last', distance_cutoff=8.0, round_digits=2, *, pdockq2_d0=10.0, pdockq2_sym_pae=True)[source]

Bases: BaseFeature

High-level feature aggregation for AF(-Multimer) outputs.

DockQ integration (via dockq.py):

Provide –mapping_csv with pdb_id,mapping to enable DockQ.
DockQ is computed for EACH entry and EACH rank.
Written inside each rank dict:
rankXXX[“total_dockq”] rankXXX[“avg_dockq”]

Parameters:

json_path (Optional[str])
pdb_path (Optional[str])
peptide_chain_position (str)
distance_cutoff (float)
round_digits (int)
pdockq2_d0 (float)
pdockq2_sym_pae (bool)

single_analysis()[source]

Return type:: Dict[str, Any]

all_analysis(dir_path)[source]

Parameters:: dir_path (str | Path)
Return type:: Dict[str, Any]

batch_analysis(batch_dir, *, delete_zips=True, mapping_by_pdbid=None, native_pdb_dir=None, progress_step_pct=10)[source]

progress_step_pct=10 => log at 10%,20%,…,100%

Parameters:

batch_dir (str | Path)
delete_zips (bool)
mapping_by_pdbid (Dict[str, Dict[str, str]] | None)
native_pdb_dir (Path | None)
progress_step_pct (int)

Return type:

Dict[str, Any]

static args()[source]

Return type:: ArgumentParser

pepkit.modelling.af.post.analysis.main()[source]

Return type:: None

API Reference

Analysis (pepkit.modelling.af.post.analysis)

Analysis (`pepkit.modelling.af.post.analysis`)