PROTHON — docs-first AI-assisted Python development

01 Modern Python Scaffolding

One command. Full toolchain. Same quality bar for human and AI code. 9 quality tools on every commit. copier update pulls upstream improvements without losing local changes.

poe check — single gate for hooks, CI, and AI

9 quality tools: ruff, ty, pytest, hypothesis, mutmut, bandit, vulture, complexipy

uv, pyproject.toml-only, src/ layout, py.typed marker

copier update for upstream template improvements

single quality gate

pre-commit CI AI

↓

poe check

↓

ruff ty pytest hypothesis mutmut bandit vulture complexipy

prothon new my-project

my-project/
├── pyproject.toml          # uv, poethepoet, all tool config
├── .python-version
├── .gitignore
├── .pre-commit-config.yaml  # 9 quality tools
├── AGENTS.md               # canonical AI instructions
├── CLAUDE.md → symlink to AGENTS.md
├── docs/
│   ├── SPEC.md              # requirements (highest authority)
│   ├── DESIGN.md            # architecture (traces to SPEC)
│   └── PATTERNS.md          # conventions (can't contradict)
├── src/my_project/
│   ├── __init__.py
│   └── py.typed              # PEP 561 marker
└── tests/

02 Hierarchical Documentation

Three documents with strict authority. Higher overrides lower. Each has a dedicated conversational skill that presents one decision at a time and hard-rejects content belonging at a different level.

Skills hard-reject content from other levels

SPEC change triggers DESIGN, then PATTERNS review

Conflicts resolve at doc level before code

After design or patterns, harmonizer cross-references all three levels

authority chain

SPEC.md highest

DESIGN.md traces up

PATTERNS.md no conflict

change cascades top-down

03 Design Workflow

Seven commands, run in sequence. Each launches an interactive session scoped to a single concern. Each produces a versioned artifact in the repo.

new — scaffold a fresh project from the template

init — add prothon to an existing project

spec — "What are you building, who is it for, and why?"

design — researches tech, presents trade-offs

patterns — code style, testing, conventions

execute — fresh subagents, verifies promises

compliance — evidence tables, code vs docs

new / init

bootstrap

▸

spec

what & why

▸

design

architecture

▸

patterns

code style

▸

execute

build it

▸

compliance

verify it

04 Drift Detection & Reconciliation

After design and patterns commands, subagent systems fire automatically to maintain doc consistency and generate reference material.

Harmonizer catches contradictions, scope creep, unchosen tech

Amends lower doc. SPEC never touched.

Tech Researcher generates reference skills from Context7, web, training data

Compliance runs at checkpoints: PASS/FAIL/PARTIAL with file:line evidence

auto-fire gates

harmonizer

doc ↔ doc

after design/patterns

tech researcher

generates skills

after design

SPEC never amended

05 Skills Collection

After DESIGN is written, tech researcher generates reference skills for your exact stack. Queries Context7 live docs, falls back to web search, then training knowledge. Current material, not generic training data.

tech-* — library usage, idioms, gotchas, version-specific APIs

style-* — naming conventions, import organization, type annotations

optim-* — performance patterns, GPU batching, subprocess management

domain-* — field-specific concepts: geospatial, ML, finance, etc.

Auto-loaded during execution — no manual context switching

Example: ML + geospatial project

.agents/skills/

.agents/skills/
├── tech-pytorch.md
├── tech-fastapi.md
├── tech-polars.md
├── style-python.md
├── optim-gpu.md
└── domain-geospatial.md

research pipeline

DESIGN.md

↓

Context7 web

↓

tech-* style-* optim-* domain-*

auto-loaded during execute

06 Execution Promises

Before execution starts, the planner writes change_promise.toml — a contract that declares exactly what each task will produce. This turns open-ended code generation into a bounded, verifiable process.

Files to create, modify, remove — declared upfront

Line predictions force thinking through scope

Checked against git with ±30% or ±30 lines tolerance

3 attempts per task, fresh context each

1 Plan

Read all docs + skills

Scan codebase gaps

Write promise file

2 Execute

Fresh subagent per task

Implement → check → commit

Verify promise (3 retries)

3 Verify

Compliance check

Full docs vs code

Cleanup promise file

Why line predictions? Requiring the AI to predict line counts forces thoughtful scoping. If it predicts 50 lines but writes 300, either the plan was sloppy or execution went sideways.

Example

docs/change_promise.toml

[metadata]
base_commit = "a3f2c1b"

[[tasks]]
title = "Add auth handler"
goal = "Implement JWT auth"
success_criteria = "Tests pass"
files_to_create = ["src/auth/handler.py"]
files_to_modify = ["src/__init__.py"]
files_to_remove = []
expected_lines_added = 85
expected_lines_removed = 0
context_files = ["src/config.py"]
doc_sections = ["docs/DESIGN.md#auth"]
reference_skills = ["tech-pyjwt"]
dependencies = []
completed = false
attempts = 0

[[tasks]]
title = "Add auth tests"
goal = "Test auth flows"
success_criteria = "100% coverage"
files_to_create = ["tests/test_auth.py"]
files_to_modify = []
files_to_remove = []
expected_lines_added = 120
expected_lines_removed = 0
context_files = ["src/auth/handler.py"]
doc_sections = []
reference_skills = []
dependencies = [0]
completed = false
attempts = 0