Cleanly: Production-grade backend skills for AI agents

The Problem

AI-generated backend code has predictable failure modes. Cleanly detects and prevents all of them.

⚠

Happy-Path-Only

No error handling on external calls. No timeouts. No retry logic. No validation. No thought about what happens when things fail.

🛡

Security Negligence

Secrets in code. String concatenation for SQL. No auth checks. No rate limiting. Rolling custom crypto instead of using libraries.

⚙

Over-Abstraction

AbstractServiceManagerProviderFactory. Generic CRUD wrappers. Repository pattern wrapping an ORM that already IS a repository.

📈

Data Access Sins

N+1 queries inside loops. SELECT * everywhere. No pagination. Missing indexes. ORM .save() in a loop instead of bulk operations.

👁

Observability Theater

console.log as the logging strategy. No structured logging. No correlation IDs. No metrics. Logging passwords and PII.

🚧

Structural Smells

God endpoints doing 15 things. Business logic in route handlers. Validation, logic, persistence, and HTTP mixed in one function.

The Skills

14 skills organized by concern. Each one tackles a specific dimension of backend quality.

Write

/backend-design

Write production-grade code with error handling, security, performance, and observability baked in.

Example

"Build a REST API for user management"

Agent adds input validation, auth middleware, proper status codes, structured logging, and error boundaries to every endpoint.

Harden

/secure-backend

OWASP Top 10, dependency audit, secrets scanning, auth hardening, injection prevention.

Example

"Review auth for vulnerabilities"

Finds SQL injection vectors in search, missing rate limits on login, and a hardcoded API key in config.

/harden-backend

Input validation, auth checks, rate limiting, error boundaries, defensive patterns.

Example

"Harden the payments endpoint"

Adds Zod schema validation, 30s timeout on Stripe calls, idempotency keys for retries, rate limit of 5 req/min.

Optimize

/optimize-backend

Detect N+1 queries, improve caching, tune connection pools, reduce latency.

Example

"Why is the dashboard slow?"

Finds N+1 in user list (47 queries → 2), adds Redis cache for org data, suggests composite index on (org_id, created_at).

/scale-backend

Identify scalability bottlenecks and assess horizontal scaling readiness.

Example

"Can this handle 10x traffic?"

Flags in-memory sessions blocking horizontal scaling, single-writer DB bottleneck, and missing queue for email sends.

Observe

/observe-backend

Structured logging, metrics, distributed tracing, health checks, alerting rules.

Example

"Add observability to the API"

Replaces console.log with structured JSON, adds request correlation IDs, Prometheus histograms for latency, and /health endpoint.

/audit-backend

Read-only audit across security, performance, reliability, and observability.

Example

"Audit the codebase"

Produces severity-rated report: 2 critical (SQL injection, no auth), 5 high (N+1, no timeouts), 8 medium. No code changes.

Quality

/test-backend

Identify test gaps, design strategies, generate scaffolding for unit and integration tests.

Example

"What's not tested?"

Maps coverage gaps, finds 0 tests for payment webhooks, generates integration test stubs with fixtures and assertions.

/polish-backend

Consistent naming, remove dead code, align patterns, improve readability.

Example

"Clean up the services directory"

Standardizes error format across 12 services, removes 3 dead exports, renames getUserData/fetchUser/loadUser to getUser.

/extract-backend

Extract shared services, middleware, utilities from duplicated code.

Example

"Reduce duplication in routes"

Extracts shared auth middleware used in 8 files, consolidates 4 duplicate validation helpers into one module.

Operate

/migrate-backend

Review migrations for zero-downtime safety, reversibility, data preservation.

Example

"Review this migration"

Flags non-reversible column drop, suggests adding new column first + backfill + rename, checks index creation won't lock table.

Understand

/explain-backend

Mermaid diagrams, annotated code, plain-language walkthroughs of architecture and flows.

Example

"Explain the order flow"

Generates sequence diagram from HTTP request through auth → validation → service → DB, annotates each middleware step.

/document-backend

Generate API docs, OpenAPI specs, README files, inline documentation.

Example

"Generate API docs"

Produces OpenAPI 3.1 spec with request/response examples, error codes, and auth requirements for all 23 endpoints.

/teach-backend

One-time setup to gather your stack, conventions, and infrastructure context.

Example

"Set up backend context"

Scans project, detects Express + Prisma + PostgreSQL, asks about deploy targets, writes Backend Context to CLAUDE.md.

Core Principles

Fail Explicitly, Recover Gracefully

Errors are data, not exceptions to hide. Return meaningful responses. Use circuit breakers. Serve stale cache when upstream fails.

Validate at the Boundary, Trust Internally

Validate ALL external input at the entry point. Internal function calls between trusted modules should not re-validate.

Design for Observability from Day One

Structured logging. Correlation IDs. Business metrics. Distributed tracing. If you can't observe it, you can't debug it.

Concurrency Is Not Optional

Race conditions happen in production. Use optimistic locking. Implement idempotency keys. Use atomic operations.

Keep It Boring

Prefer battle-tested patterns over clever abstractions. A straightforward if/else beats a monad chain. Optimize for the reader.

Optimize for the Read Path

Most systems are read-heavy. Cache aggressively. Denormalize strategically. But always measure before optimizing.

Install

One command. Every provider.

install

All tools

$ npx skills add S4M3R/cleanly

Auto-detects your AI harness and installs to the right location

Claude Code

$ /install-skills S4M3R/cleanly

Install directly from Claude Code

Works with

Cursor

Claude Code

Gemini CLI

Codex CLI

Copilot

FAQ

What languages/frameworks does this work with?

Cleanly is language and framework agnostic. The skills teach patterns and principles that apply to Node.js, Python, Go, Rust, Java, and any backend stack. The agent adapts the guidance to whatever you're building with.

Will this slow down my AI agent?

No. Skills use progressive disclosure — only ~100 tokens load at startup for metadata. The full skill content loads only when the agent decides it's relevant to your task. Reference docs load on demand.

Do I need all 14 skills?

Install the full set. Skills are activated automatically based on what you're doing — if you're writing an endpoint, backend-design activates. If you're reviewing security, secure-backend kicks in. You don't manage them manually.

How is this different from just prompting "write production code"?

Prompts are vague and inconsistent. Skills provide structured, comprehensive checklists that the agent follows every time. It's the difference between "be careful" and a 50-point safety inspection.