Usage
CAAT provides three entry points depending on your workflow needs.
Option A: Full End-to-End Pipeline
Run structure prediction and attention analysis in one command. This is the recommended starting point for most users.
poetry run python scripts/run_e2e_pipeline.py \
--query-seq-path <path/to/sequence.fasta> \
--query-name <protein_name> \
[OPTIONS]
Prediction Settings
| Argument | Default | Description |
|---|---|---|
--query-seq-path |
- | Path to MSA or FASTA file |
--query-name |
- | Required. Identifier for your query protein (e.g., XCL1) |
--target-name |
None |
Identifier for target/reference protein (for comparative analysis) |
--target-seq-path |
None |
Path to target sequence file (for comparative analysis) |
--alignment-path |
None |
Path to MSA alignment file (for comparative analysis) |
--model-type |
alphafold2 |
AlphaFold model variant to use |
--num-models |
5 |
Number of models to generate |
--result-dir |
results |
Output directory for PDB structures |
--save-attention-npy |
False |
Export individual uncompressed attention heads |
--attention-output-dir |
attention_outputs |
Directory for raw attention files |
--save-attention-compressed |
False |
Save attention in compressed H5 format |
--save-intermediate-structures |
None |
Directory for intermediate structure outputs |
Analysis Settings
| Argument | Default | Description |
|---|---|---|
--vis-output-dir |
visualizations |
Output directory for plots |
--query-highlight-indices |
None |
Comma-separated residue positions to highlight (1-indexed, e.g., 1,5,10) |
--target-highlight-indices |
None |
Residue positions to highlight in target |
--query-highlight-color |
#AE0639 |
Hex color for query highlights |
--target-highlight-color |
#1f77b4 |
Hex color for target highlights |
Option B: Generate Attention Heads Only
Extract attention weights without visualization. Useful for custom downstream analysis.
poetry run python scripts/run_attention_heads.py \
--query-seq-path <path/to/sequence.fasta> \
--query-name <protein_name> \
[OPTIONS]
Arguments
| Argument | Default | Description |
|---|---|---|
--query-seq-path |
- | Path to input MSA (.a3m) or FASTA file |
--query-name |
- | Required. Protein identifier |
--model-type |
alphafold2 |
Model variant (e.g., alphafold2_multimer_v3) |
--attention-output-dir |
attention_outputs |
Where to save .npy attention files |
--result-dir |
results |
Directory for final PDB structures |
--num-models |
5 |
Number of models to run |
--save-attention-compressed |
False |
Export compressed H5 format |
--save-intermediate-structures |
None |
Save intermediate evoformer structures |
Option C: Using ColabFold Directly
CAAT extends ColabFold with custom attention output capabilities. Use the standard colabfold_batch command with additional flags:
poetry run colabfold_batch \
<input> <results> \
--attention-output-dir <path> \
[STANDARD_COLABFOLD_OPTIONS]
CAAT-Specific Flags
| Argument | Description |
|---|---|
--attention-output-dir |
Directory to save attention head matrices (.npy files) |
--save-intermediate-structures |
Directory to save intermediate evoformer structures |
For full ColabFold options, see the ColabFold documentation or run:
poetry run colabfold_batch --help
Option D: Analysis Only
Visualize and compare pre-computed attention heads. Use this when you already have .npy attention files.
poetry run python scripts/run_analysis_pipeline.py \
--query-attn-dir <path/to/attention_files> \
--query-name <protein_name> \
--query-seq-path <path/to/sequence.fasta> \
[OPTIONS]
Arguments
| Argument | Default | Description |
|---|---|---|
--query-attn-dir |
- | Required. Directory containing .npy attention files for query |
--query-name |
- | Required. Identifier for the query protein |
--query-seq-path |
- | Required. Path to query sequence (.a3m or .fasta) |
--target-attn-dir |
None |
Attention directory for target protein (for comparative analysis) |
--target-name |
None |
Target protein identifier (for comparative analysis) |
--target-seq-path |
None |
Target sequence file (for comparative analysis) |
--alignment-path |
None |
Alignment file mapping query to target (for comparative analysis) |
--output-dir |
attention_visualizations |
Output directory for plots |
--query-highlight-indices |
None |
Residues to highlight in query (1-indexed) |
--target-highlight-indices |
None |
Residues to highlight in target (1-indexed) |
--query-highlight-color |
#AE0639 |
Hex color for query highlights |
--target-highlight-color |
#1f77b4 |
Hex color for target highlights |