How treating an AI assistant as a senior pair programmer turned a months-old performance backlog into two fully optimized APIs — with interactive architecture diagrams and every prompt that shaped the work.
A production REST API was responding in 10–12 seconds. The endpoint powered a high-traffic discovery flow on mobile and web — every second of latency directly impacted conversion. The “fix it” backlog had been growing for months because nobody had time to do a deep performance audit.
I decided to try something different: use an AI assistant as a pair programmer for the entire engagement — analysis, planning, implementation, and debugging. This is what happened.
Instead of treating AI as a “complete this snippet” tool, I treated it as a senior engineer who walked into the codebase for the first time. The workflow had six stages, each driven by a specific kind of prompt.
By asking “how would you build it?” instead of “fix this,” I forced the AI into architectural thinking. It traced the full request lifecycle, mapped every database query, identified every external dependency, and produced a structured report.
“In-place” prevented the AI from suggesting a rewrite. “Under 500ms” gave it a measurable target. “Plan” forced it to break the work into shippable phases instead of producing one giant patch.
| Phase | Change | Expected Impact |
|---|---|---|
| 0 | Per-stage timing instrumentation | Measure before cutting |
| 1 | Database indexes on hot query paths | 3–5s reduction |
| 2 | Eliminate duplicate queries and lazy loads | 1–2s reduction |
| 3 | Push filters into SQL + database-level pagination | 1–2s reduction |
| 4 | Response cache on the deterministic endpoint | ~10s → ~50ms on cache hit |
| 5 | Configuration memoization + loop optimizations | 100–300ms reduction |
Interactive C4-style diagrams showing the system from users down to individual queries. Click a tab and watch the data flow.
Discovery endpoint with 13 filter params — city, capacity, add-ons, pagination, distance sort. After optimization: 8 queries typical, 2–3 on holiday dates.
Authenticated detail endpoint. Original had 30–40 queries across 5 database sessions. Optimized version: ~5 queries in a single session.
The original code re-ran an identical filtered SELECT just to change the result ordering. Replaced with an in-memory sort. Saved one database round-trip per request.
Capacity ranges, category type, add-on filters — all happening in application code after fetching every row. Pushing them into the WHERE clause cut the result set by an order of magnitude.
The endpoint executed five separate batch queries and then checked if the date was a public holiday. On holidays, all that work was thrown away. A pre-check skipped the batch queries entirely when applicable.
Configuration data was being expanded on every single request. A 5-minute in-process cache eliminated that work.
A nested loop with cubic complexity was rewritten using set membership checks. The complexity dropped to roughly linear in the input size.
The same data structure was being serialized to logs three times per item — and the endpoint processed up to 500 items per request. That meant 1,500+ log writes per request, each performing serialization on a large object. Cut to one log per item.
A separate database lookup was being made for data that was already present in a previous query's result set. Eliminated the round-trip.
When AI code breaks production — and how each failure taught the AI something it couldn't learn from reading code alone.
This was the most instructive part of the entire engagement. The AI produced correct, well-reasoned code that passed all syntax checks — and it still broke the application three times.
The AI applied all optimizations at once. The container failed to boot. Standard library imports it had added were incompatible with the production runtime's cooperative scheduling model.
The AI couldn't diagnose it — the real error was hidden above the pasted logs. Then came the critical prompt:
This single sentence told the AI: (a) the issue is definitely in its code, not the environment, and (b) it needs to isolate which change caused the crash.
The AI removed the suspect import but kept other related ones. Before testing, upstream changes were merged in — so the new attempt was never tested in isolation.
The AI reverted everything to the known-good baseline. Then reapplied only four small, surgical logic fixes — zero new imports, zero new module-level code. Build succeeded on the first try.
This established the core constraint: in this runtime environment, any new module-level code is suspect until proven safe in deployment.
With the constraint now understood, the AI rewrote all the performance optimizations to use zero new import lines — using stdlib modules already imported, skipping instrumentation that would have required new dependencies.
This reframed the AI from “be cautious” to “be fast, within the constraints you've now learned.”
After the performance optimizations deployed, the API returned empty results for certain query parameter combinations. The original application-side filter treated a value of 0 as “no filter applied” because of language truthiness rules. The new database-side filter treated 0 as “filter for the literal value zero,” which matched nothing.
Providing the exact failing input let the AI trace both code paths and find the semantic difference in one shot. One-line fix.
After the first API, the same methodology was applied to a second endpoint — simpler parameters but far worse internals.
The AI found 30–40 queries per request for just 10 items: three N+1 loops, five separate database sessions, and a function called N times with identical parameters.
Per-item queries replaced with batch queries. One batch function already existed in the codebase — it just wasn't wired up. Still 5 seconds.
Three helper functions each opened their own database connection. Under a pool of 5, concurrent requests queued for connections. Inlined everything into one session. Down to ~2 seconds.
Static lookup instead of a query. Pre-filtered config. Released connection before response building. Down to ~1 second.
| Metric | Before | After |
|---|---|---|
| DB sessions per request | 5 | 1 |
| Total queries (10 items) | 30–40 | ~5 |
| N+1 query loops | 3 | 0 |
| Implicit lazy loads | 2–4 per parent | 0 |
| Prompt | What It Taught the AI |
|---|---|
| "After reverting your changes, the build went through" | The problem is in the code, not the environment. Isolate. |
| "Still getting the same error, can you revert all your changes" | Stop guessing. Go back to known-good. Re-approach incrementally. |
| "[exact failing input]" | Concrete reproduction lets the AI diagnose in one shot instead of guessing. |
| "As this API is taking 10 to 12 seconds, I'm more interested in fixing performance" | Reframe from “be careful” to “be fast, within the constraints you've now learned.” |
| "Now re-analyze with the pulled code" | After upstream changes, prior analysis is no longer valid. Start fresh. |
| "Can you cross-check if the response structure is the same and not changed?" | Every optimization is suspect until the contract is verified field by field. |
| What Went Wrong | What to Do Instead |
|---|---|
| Applied all optimizations at once | Ship one change at a time, verify each |
| Added new dependencies without testing in deployment | Test the minimal change in the actual deployment environment first |
| Assumed edge values would behave the same in both filter implementations | Check how the original code interprets every edge value |
| Got distracted by downstream symptom logs instead of finding the root error | Always grab the full error log before guessing |
The most valuable output wasn't the code. It was the systematic analysis that mapped twenty bottlenecks across multiple files in a single pass.
The AI had to learn things it couldn't learn from reading code alone: which stdlib primitives are unsafe in the production runtime, that web framework default values don't protect against empty strings, that certain numeric edge values are treated as “no filter” by convention.
The big-bang approach failed. The surgical approach (small fixes with zero new imports) worked on first deploy. When the AI has earned a constraint the hard way, every subsequent change should respect it.
Every optimization was audited against the original response shape. Performance work that breaks the API isn't a win — it's a regression with extra steps.
Same methodology, second API: under an hour instead of a full day. The real value of AI-assisted engineering compounds as patterns get reused.
All the application-level optimizations together saved a few seconds. The remaining latency budget lives in database indexes and response caching. Sometimes the smartest engineering decision is recognizing where the real bottleneck lives.
Three things, mostly: