API Reference
This page contains the full API reference generated from docstrings. If you are new to PepKit, start with Getting Started and the module guides.
Module map
Parsing/standardization, conversion, properties, descriptors |
|
Fetch, constraint-based filtering |
Chem Module
Conversion (pepkit.chem.conversion.conversion)
Tools for parsing peptide representations (FASTA/SMILES), standardizing sequences, and filtering non-canonical FASTA records.
Descriptor (pepkit.chem.desc.descriptor)
Calculation of molecular descriptors and physicochemical properties.
Standardize (pepkit.chem.standardize)
Utilities for standardizing peptide sequences and molecular representations.
Query Module
Request (pepkit.query.request)
Filter (pepkit.query.filter)
- pepkit.query.filter.validate_complex_pdb(pdb_id, length_cutoff=50, canonical_check=False, hetatm_check=False)
Constraint-based query (pepkit.query.query)
- pepkit.query.query.query(quality, exp_method, release_date, length_cutoff, canonical_check, hetatm_check, csv_path, fasta_path, receptor_only, n_jobs)[source]
Query, validate, and extract peptide–protein complexes from RCSB.
This function performs an end-to-end workflow:
Query RCSB for candidate peptide–protein complexes using metadata constraints (resolution, experimental method, release date).
Validate each PDB entry using structural and sequence-based criteria (peptide detection, length cutoff, canonical residues, HETATM presence).
Write a metadata table (CSV) describing valid complexes.
Extract corresponding sequences into a FASTA file for downstream modeling (e.g., AF-Multimer, docking, ML pipelines).
The function is side-effect driven: results are written to disk (CSV + FASTA) and not returned explicitly.
- Parameters:
quality (float) – Maximum allowed experimental resolution (in Å) used to query RCSB. Lower values correspond to higher-quality structures. Example:
3.0.exp_method (str) – Experimental method used to solve the structure. Must match RCSB metadata exactly. Example:
"X-RAY DIFFRACTION".release_date (dict or str) – Release date constraint for RCSB query. Can be either: - a dict with
{"from": YYYY-MM-DD, "to": YYYY-MM-DD}, or - a single date string (interpreted as lower bound).length_cutoff (int) – Maximum allowed sequence length used for peptide/protein filtering. Typically peptides are expected to be short (e.g. ≤ 50 residues).
canonical_check (bool) – If
True, discard complexes containing non-canonical amino acids (e.g.,X) in any retained chain.hetatm_check (bool) – If
True, discard PDB entries containing HETATM records (e.g., ligands, cofactors, modified residues).csv_path (str or pathlib.Path) – Output path for the CSV metadata table describing valid peptide–protein complexes.
fasta_path (str or pathlib.Path) – Output path for the FASTA file containing extracted sequences. The exact content depends on
receptor_only.receptor_only (bool) – If
True, only receptor (protein) chains are written to FASTA. IfFalse, both peptide and protein chains are included.n_jobs (int) – Number of parallel workers used for PDB validation. Passed to
joblib.Parallel.
- Raises:
RuntimeError – If RCSB query fails or no valid complexes are found.
IOError – If output files cannot be written.
- Side effects:
Writes
csv_path(CSV metadata)Writes
fasta_path(FASTA sequences)
- Example:
>>> query( ... quality=3.0, ... exp_method="X-RAY DIFFRACTION", ... release_date={"from": "2018-01-01", "to": "2018-01-08"}, ... length_cutoff=50, ... canonical_check=True, ... hetatm_check=True, ... csv_path="demo.csv", ... fasta_path="demo.fasta", ... receptor_only=True, ... n_jobs=4, ... )
Modelling Module
Analysis (pepkit.modelling.af.post.analysis)
- class pepkit.modelling.af.post.analysis.AnalysisInputs(json_path: 'Optional[Path]', pdb_path: 'Optional[Path]')[source]
Bases:
object
- class pepkit.modelling.af.post.analysis.EntryMeta(length: 'Optional[int]', processing_time: 'Optional[float]')[source]
Bases:
object
- class pepkit.modelling.af.post.analysis.BatchStats(ok: 'int' = 0, empty: 'int' = 0, error: 'int' = 0, dockq_ok: 'int' = 0, dockq_fail: 'int' = 0)[source]
Bases:
object
- class pepkit.modelling.af.post.analysis.ProgressLogger(total, step_pct)[source]
Bases:
objectLog at K% increments (10%, 20%, …).
- class pepkit.modelling.af.post.analysis.Analysis(json_path=None, pdb_path=None, peptide_chain_position='last', distance_cutoff=8.0, round_digits=2, *, pdockq2_d0=10.0, pdockq2_sym_pae=True)[source]
Bases:
BaseFeatureHigh-level feature aggregation for AF(-Multimer) outputs.
- DockQ integration (via dockq.py):
Provide –mapping_csv with pdb_id,mapping to enable DockQ.
DockQ is computed for EACH entry and EACH rank.
- Written inside each rank dict:
rankXXX[“total_dockq”] rankXXX[“avg_dockq”]
- Parameters:
- batch_analysis(batch_dir, *, delete_zips=True, mapping_by_pdbid=None, native_pdb_dir=None, progress_step_pct=10)[source]
progress_step_pct=10 => log at 10%,20%,…,100%