ENGINEERING CASE STUDY

From 10 Seconds to Sub-1-Second

How treating an AI assistant as a senior pair programmer turned a months-old performance backlog into two fully optimized APIs — with interactive architecture diagrams and every prompt that shaped the work.

10-12s
Before
 
<1s
After
2 APIs
Optimized
40→5
Queries (2nd API)

The Problem

A production REST API was responding in 10–12 seconds. The endpoint powered a high-traffic discovery flow on mobile and web — every second of latency directly impacted conversion. The “fix it” backlog had been growing for months because nobody had time to do a deep performance audit.

I decided to try something different: use an AI assistant as a pair programmer for the entire engagement — analysis, planning, implementation, and debugging. This is what happened.

The Approach: AI as a Pair Programmer

Instead of treating AI as a “complete this snippet” tool, I treated it as a senior engineer who walked into the codebase for the first time. The workflow had six stages, each driven by a specific kind of prompt.

1 Deep Analysis

The Prompt That Worked

"Analyze this API. It's taking 8 to 10 seconds as a response time. If you wanted to build the same thing, how would you build it?"

Why This Framing Mattered

By asking “how would you build it?” instead of “fix this,” I forced the AI into architectural thinking. It traced the full request lifecycle, mapped every database query, identified every external dependency, and produced a structured report.

What Came Back

The most valuable output wasn't code. It was the systematic analysis — the kind of audit that would have taken a senior engineer several hours of manual profiling. The AI did it in minutes because it could read every file in the call chain without context-switching fatigue.

2 Structured Planning

The Prompt That Worked

"In-place optimization plan — keep the current stack and files, but restructure queries, add caching and indexes to get the response time under ~500ms."

Why This Framing Mattered

“In-place” prevented the AI from suggesting a rewrite. “Under 500ms” gave it a measurable target. “Plan” forced it to break the work into shippable phases instead of producing one giant patch.

PhaseChangeExpected Impact
0Per-stage timing instrumentationMeasure before cutting
1Database indexes on hot query paths3–5s reduction
2Eliminate duplicate queries and lazy loads1–2s reduction
3Push filters into SQL + database-level pagination1–2s reduction
4Response cache on the deterministic endpoint~10s → ~50ms on cache hit
5Configuration memoization + loop optimizations100–300ms reduction
Each phase was independently shippable and independently measurable. No big-bang rewrite. Each phase could be deployed, validated, and rolled back in isolation.

3 The Architecture

Interactive C4-style diagrams showing the system from users down to individual queries. Click a tab and watch the data flow.

Person
External
Container
Component
Database
Optimized
Legacy
System Context
Container View
API 1 — Flow
API 2 — Legacy vs Optimized
[PERSON] Mobile User iOS / Android Discovers and books items [PERSON] Web User Web Browser Browses via web app [PERSON] Platform Admin Back-office user Manages inventory [SYSTEM] Discovery Platform Web framework + ORM + relational DB Inventory discovery & reservation [EXTERNAL] Auth Service JWT validation [EXTERNAL] Cache Service Pre-computed data [EXTERNAL] CRM Sync Downstream sync [EXTERNAL] Observability Traces & metrics [EXTERNAL] Log Aggregator Centralized logging
[PERSON] Client (Mobile / Web) GET requests DISCOVERY PLATFORM — APP CONTAINER [CONTAINER] WSGI Server Async workers Process pool [CONTAINER] Web Application Framework + ORM Routes, controllers, models [CONTAINER] Telemetry Auto-instrumentation Traces & metrics [CONTAINER] Async Worker Task queue consumer Not in hot request path [CONTAINER] Relational DB ORM layer Connection pool: 5 Core business data [CONTAINER] Key-Value Store In-memory cache Session, rate limit [IN-PROCESS] Config Cache 5-min TTL dict Per-worker memory Added in optimization [EXTERNAL] Auth Service [EXTERNAL] Cache Service [EXTERNAL] Trace Collector [EXTERNAL] Log Aggregator No DB hit (cached)

Discovery endpoint with 13 filter params — city, capacity, add-ons, pagination, distance sort. After optimization: 8 queries typical, 2–3 on holiday dates.

Request
[PERSON]
Client
Mobile / Web
13 query params supported
HTTPS
[MIDDLEWARE]
Auth Check
JWT decode + verify
No DB, no external HTTP
Controller
[COMPONENT]
Discovery Handler
Parses params, builds response
calls
[HELPER]
Config Loader
In-process 5-min cache
[HELPER]
Availability Batcher
Batched slot availability
[HELPER]
Add-ons Batcher
3 queries for all add-ons
Database Queries
[QUERY 1]
Config lookup
Cached — 0 DB hits on cache hit
[QUERY 2]
Main items + location JOIN
WHERE filters pushed to SQL
[QUERY 3]
Location names (batched)
[QUERY 4]
Categories (batched)
[QUERY 5]
Bookings JOIN (batched)
[QUERY 6-8]
Add-ons (3 queries, batched)
External
[EXTERNAL]
Cache Service
HTTP
Only when cache=true
Optional path

Authenticated detail endpoint. Original had 30–40 queries across 5 database sessions. Optimized version: ~5 queries in a single session.

LEGACY — 30-40 QUERIES, 5 SESSIONS
Sessions Opened
SESSION 1
Permission Check
Helper opens own DB session
SESSION 2
Main Session
Restriction, items, statuses, user
SESSION 3
Daypass Status
Called N times — same params!
SESSION 4
Occupancy Count
Raw SQL, per-item (N+1)
SESSION 5
Config Load
On cache miss
N+1 Queries
LOOP 1
Bookings per item
10 items = 10 queries
LOOP 2
Status check per item
Same result every iteration — wasted!
LOOP 3
Occupancy per item
Only for certain types
Bottlenecks
BOTTLENECK
Pool Starvation
Pool = 5 connections. 5 sessions → concurrent requests queue.
BOTTLENECK
Lazy Loads
Nested relationship access triggers cascade SELECTs
BOTTLENECK
Redundant Queries
Same user queried 2x (perm + account)
OPTIMIZED — ~5 QUERIES, 1 SESSION
Single Session
INLINED
User + Permission
One query, checked in Python
INLINED
Items + Location JOIN
Eager-loaded via add_columns
INLINED
Daypass Status (once)
Outside loop — same for all
INLINED
Batched Bookings + Occupancy
IN clause + SQL GROUP BY
Batched Queries
QUERY 1
Users (user + perm)
QUERY 2
Items + Location JOIN
QUERY 3
Bookings batched
QUERY 4
Status check (1x)
QUERY 5
Occupancy GROUP BY
Wins
WIN
Pool Safe
1 connection per request. No contention.
WIN
Static Lookups
Never-changing data replaced by in-code dict
WIN
Early Release
Response built after session closes

4 Implementation

"Continue with your recommendations."

What Got Built

Killed a Duplicate Database Query -1 QUERY

The original code re-ran an identical filtered SELECT just to change the result ordering. Replaced with an in-memory sort. Saved one database round-trip per request.

Pushed Application Filters into SQL 90% FEWER ROWS

Capacity ranges, category type, add-on filters — all happening in application code after fetching every row. Pushing them into the WHERE clause cut the result set by an order of magnitude.

Holiday Short-Circuit -5 TO -8 QUERIES

The endpoint executed five separate batch queries and then checked if the date was a public holiday. On holidays, all that work was thrown away. A pre-check skipped the batch queries entirely when applicable.

In-Process Configuration Cache -1 QUERY/REQ

Configuration data was being expanded on every single request. A 5-minute in-process cache eliminated that work.

Optimized Inner-Loop Computation FASTER

A nested loop with cubic complexity was rewritten using set membership checks. The complexity dropped to roughly linear in the input size.

Removed Redundant Logging -1500 I/O OPS

The same data structure was being serialized to logs three times per item — and the endpoint processed up to 500 items per request. That meant 1,500+ log writes per request, each performing serialization on a large object. Cut to one log per item.

Derived Data Instead of Re-Querying -1 QUERY

A separate database lookup was being made for data that was already present in a previous query's result set. Eliminated the round-trip.

5 The Revert Cycle

When AI code breaks production — and how each failure taught the AI something it couldn't learn from reading code alone.

This was the most instructive part of the entire engagement. The AI produced correct, well-reasoned code that passed all syntax checks — and it still broke the application three times.

Attempt 1: Full Deploy → Container Boot Crash

The AI applied all optimizations at once. The container failed to boot. Standard library imports it had added were incompatible with the production runtime's cooperative scheduling model.

"I'm getting following error and docker is not coming up"

The AI couldn't diagnose it — the real error was hidden above the pasted logs. Then came the critical prompt:

"After reverting your changes, the build went through."

This single sentence told the AI: (a) the issue is definitely in its code, not the environment, and (b) it needs to isolate which change caused the crash.

Attempt 2: Fewer New Imports → Still Crashed

The AI removed the suspect import but kept other related ones. Before testing, upstream changes were merged in — so the new attempt was never tested in isolation.

Attempt 3: Logic Fixes Only → Success

"Still getting the same error, can you revert all your changes?"

The AI reverted everything to the known-good baseline. Then reapplied only four small, surgical logic fixes — zero new imports, zero new module-level code. Build succeeded on the first try.

This established the core constraint: in this runtime environment, any new module-level code is suspect until proven safe in deployment.

Attempt 4: Performance Optimizations, Reapplied Smartly

With the constraint now understood, the AI rewrote all the performance optimizations to use zero new import lines — using stdlib modules already imported, skipping instrumentation that would have required new dependencies.

"As this API is taking 10 to 12 seconds, I'm more interested in fixing performance and reducing response time below 1 second."

This reframed the AI from “be cautious” to “be fast, within the constraints you've now learned.”

Attempt 5: Empty Response Bug — Edge Values

The Edge-Value Trap

After the performance optimizations deployed, the API returned empty results for certain query parameter combinations. The original application-side filter treated a value of 0 as “no filter applied” because of language truthiness rules. The new database-side filter treated 0 as “filter for the literal value zero,” which matched nothing.

"I'm getting this response... [pasted the empty JSON response and the exact query parameters]"

Providing the exact failing input let the AI trace both code paths and find the semantic difference in one shot. One-line fix.

The AI didn't get it right on the first try — or the second, or the third. But each failure narrowed the constraints: no unsafe imports, no new module-level code, edge values that appear identical can mean opposite things in different filter implementations. By attempt 5, the AI had internalized every constraint and produced code that passed on first deploy.

6 Second API — Proving the Playbook Scales

After the first API, the same methodology was applied to a second endpoint — simpler parameters but far worse internals.

The AI found 30–40 queries per request for just 10 items: three N+1 loops, five separate database sessions, and a function called N times with identical parameters.

Three Rounds

Round 1: Batch the N+1 Queries 30 → 7 QUERIES

Per-item queries replaced with batch queries. One batch function already existed in the codebase — it just wasn't wired up. Still 5 seconds.

Round 2: Single DB Session 5 SESSIONS → 1

Three helper functions each opened their own database connection. Under a pool of 5, concurrent requests queued for connections. Inlined everything into one session. Down to ~2 seconds.

Round 3: Eliminate Round-Trips 7 → ~5

Static lookup instead of a query. Pre-filtered config. Released connection before response building. Down to ~1 second.

Results

MetricBeforeAfter
DB sessions per request51
Total queries (10 items)30–40~5
N+1 query loops30
Implicit lazy loads2–4 per parent0
The playbook — analyze, batch, inline, validate — transferred directly. What took a full day of discovery on the first API took under an hour on the second because the patterns were already learned.

Before vs After — At a Glance

🔴 API 2 (Legacy)

Response time~10-12s
DB sessions / request5
Total DB queries (10 items)30-40
N+1 query loops3
Lazy loads per parent2-4
Connection pool contentionHigh
Python sort overheadYes
Config DB hitEvery request

🟢 API 2 (Optimized)

Response time~1s
DB sessions / request1
Total DB queries (10 items)~5
N+1 query loops0
Lazy loads per parent0
Connection pool contentionNone
Python sort overheadNo (SQL ORDER BY)
Config DB hitCached 5 min

The Prompts That Taught the AI the Most

PromptWhat It Taught the AI
"After reverting your changes, the build went through" The problem is in the code, not the environment. Isolate.
"Still getting the same error, can you revert all your changes" Stop guessing. Go back to known-good. Re-approach incrementally.
"[exact failing input]" Concrete reproduction lets the AI diagnose in one shot instead of guessing.
"As this API is taking 10 to 12 seconds, I'm more interested in fixing performance" Reframe from “be careful” to “be fast, within the constraints you've now learned.”
"Now re-analyze with the pulled code" After upstream changes, prior analysis is no longer valid. Start fresh.
"Can you cross-check if the response structure is the same and not changed?" Every optimization is suspect until the contract is verified field by field.

Anti-Patterns to Avoid

What Went WrongWhat to Do Instead
Applied all optimizations at onceShip one change at a time, verify each
Added new dependencies without testing in deploymentTest the minimal change in the actual deployment environment first
Assumed edge values would behave the same in both filter implementationsCheck how the original code interprets every edge value
Got distracted by downstream symptom logs instead of finding the root errorAlways grab the full error log before guessing

Key Lessons

1

AI as an analyst, not just a coder

The most valuable output wasn't the code. It was the systematic analysis that mapped twenty bottlenecks across multiple files in a single pass.

2

Environment awareness matters

The AI had to learn things it couldn't learn from reading code alone: which stdlib primitives are unsafe in the production runtime, that web framework default values don't protect against empty strings, that certain numeric edge values are treated as “no filter” by convention.

3

Iterate in small, testable increments

The big-bang approach failed. The surgical approach (small fixes with zero new imports) worked on first deploy. When the AI has earned a constraint the hard way, every subsequent change should respect it.

4

Always verify the response contract

Every optimization was audited against the original response shape. Performance work that breaks the API isn't a win — it's a regression with extra steps.

5

The playbook transfers

Same methodology, second API: under an hour instead of a full day. The real value of AI-assisted engineering compounds as patterns get reused.

6

The biggest wins are often outside the code

All the application-level optimizations together saved a few seconds. The remaining latency budget lives in database indexes and response caching. Sometimes the smartest engineering decision is recognizing where the real bottleneck lives.

What This Demonstrates About AI-Assisted Engineering

Three things, mostly:

  1. AI shortens the analysis phase dramatically. What used to be a multi-day profiling exercise became a multi-hour conversation.
  2. AI doesn't replace judgment — it amplifies it. Every revert was a moment of human judgment. The AI proposed, the human disposed.
  3. The prompt is the spec. Bad prompts produce bad code. Concrete, contextual, constrained prompts produce reasoning that holds up under production scrutiny.