Built for the Daytona HackSprint · DevSecOps

The security agent that
proves the bug before it patches it.

An autonomous DevSecOps loop that scans your repo, reproduces every finding with a failing regression test, writes the fix, re‑proves it green inside an isolated Daytona sandbox — then opens the Pull Request. No weaponized exploits. Ever.

See the live PR → Watch the loop run

python run.py · single Daytona sandbox

$ python run.py
[1/5] Creating Daytona sandbox ... 6a178f27 (snapshot: fixit-scanners)
[2/5] Cloning angseesiang/bugbounty-demo (private)
[SCAN] 13 findings: 2 SAST, 11 SCA
        SAST  app.py:48  ERROR  sqlalchemy-execute-raw-query
[SELECT] SQL injection at app.py:48 → reproducing & fixing
        $ pytest (pre-patch)   RED: 2 failed, 1 passed
        $ apply patch          attempt 1: APPLIED:git
        $ pytest (post-patch)  GREEN: 3 passed, 0 failed
RESULT:   PASS — reproduced and fixed in one clean sandbox  ✓
timing: clone 3.4s · scan 14.4s · verify 3.1s · total 20.9s

findings surfaced
2 SAST · 11 SCA

RED→GREEN

proven, then fixed
2 failed → 3 passed

20.9s

full loop, cold→done
one throwaway sandbox

~90ms

sandbox start
pre‑baked snapshot

How it works

Deterministic where it counts.
Reasoned where it matters.

Established scanners find the bugs. Claude Code writes the proof and the patch. The sandbox runs everything untrusted — the host only reasons.

Scan

Semgrep (SAST) and pip‑audit (SCA) run inside the sandbox and emit JSON findings — no freeform “find bugs,” no guessing.

Reproduce RED

The agent writes a failing regression test that demonstrates the insecure behavior. If it doesn’t fail first, it isn’t proven.

Patch

A minimal, reviewable diff — e.g. a parameterized query replacing string‑formatted SQL. Applied cleanly with git apply.

Verify GREEN

Re‑run the exact same test. Red → green is the money shot: the bug is gone, and nothing else broke.

Isolate

Every run gets a fresh Daytona sandbox, created from a snapshot and deleted in a finally. Nothing untrusted touches the host.

Open PR

A fine‑grained PAT scoped to one repo opens the Pull Request with the fix, the proof, and the run artifacts attached.

The core loop

scan → prove → patch → prove again → PR

scan

Semgrep + pip‑audit

→

reproduce

failing test (RED)

→

patch

minimal diff

→

verify

test passes (GREEN)

→

opened on GitHub

One command

The whole demo is a single command.

From the project root, one invocation drives scan → select → reproduce → patch → verify → report in a single sandbox, then tears it down.

$ source .venv/bin/activate

$ python run.py

Claude Code on Max is the agent — no Anthropic API key in the loop.
The sandbox runs all untrusted code; the host only reasons.
Demo runs against our own seeded repo — never a stranger’s.

Open the resulting PR →

demo_transcript.txt

======== Fix-It Security Agent — one-command loop ========
[1/5] Creating Daytona sandbox ...
      sandbox id: 6a178f27 (snapshot: fixit-scanners)
[2/5] Cloning angseesiang/bugbounty-demo (private)
      cloned: README.md, app.py, requirements.txt

----- SCAN -----
[4/5] Semgrep (SAST) ...  2 findings
[4/5] pip-audit (SCA) ...  14 vuln records / 6 deps
[SCAN] 13 findings: 2 SAST, 11 SCA
      SAST app.py:48 ERROR sqlalchemy-execute-raw-query
[SELECT] SQL injection at app.py:48 → fixing

----- REPRODUCE → PATCH → VERIFY -----
      $ pytest (pre-patch)   RED: 2 failed, 1 passed
      $ apply patch          APPLIED:git
      $ pytest (post-patch)  GREEN: 3 passed, 0 failed

========= FULL LOOP (single sandbox) =========
 target:   angseesiang/bugbounty-demo
 SCAN:     13 findings (2 SAST, 11 SCA)
 SELECT:   SQL injection  app.py:48  (CWE-89)
 VERIFY:   RED 2 failed → APPLIED:git → GREEN 3 passed ✓
 RESULT:   PASS — reproduced and fixed in one clean sandbox
 timing:   clone 3.4s · scan 14.4s · verify 3.1s · total 20.9s
 PR:       github.com/.../bugbounty-demo/pull/1

==> Deleting sandbox ...  sandbox deleted.

The stack

Boring, deterministic tools.
A careful agent on top.

Agent + orchestrator

Claude Code on Max, driving Python. No API key.

Sandbox

Daytona Python SDK. Pre‑baked snapshot, ~90ms start.

SAST

Semgrep, --config auto, JSON output.

SCA

pip‑audit against pinned dependencies.

Proof

pytest regression tests — RED then GREEN.

VCS

GitHub REST / gh, fine‑grained single‑repo PAT.

Design principles

No weaponized exploits. Proof means a failing regression test — safer and more convincing than a live payload.

Sandbox runs tools; the host reasons. No credentials injected into the sandbox, ever.

Own repos only. Auto‑PRs onto projects we don’t own are out of scope.

Deterministic first. Established scanners over “find bugs” freeform.

Find it. Prove it. Fix it.
Ship the PR.

The complete loop — scan, reproduce, patch, verify, open PR — running in one isolated sandbox, in about twenty seconds.

View the live PR → Browse the code

The security agent thatproves the bug before it patches it.

Deterministic where it counts.Reasoned where it matters.