Edge-first MCP servers: why Pulse runs on Cloudflare Workers
Most MCP server implementations sit on AWS Lambda or Vercel Functions. Pulse runs on Cloudflare Workers. The reason is latency, and the constraints that follow.
I am building an AI marketing coordinator called Pulse. The product is a set of MCP tools that AI agents call to pull data from marketing tools: Stripe, Klaviyo, Google Ads, Meta Ads, SimilarWeb, and so on. Eighteen tools total, each one a discrete capability.
When Claude (or another agent) drafts a weekly performance report, it might call seven or eight of those tools in sequence. Each call needs to feel snappy. Not for a human staring at a dashboard. For an agent waiting to decide what to call next.
That is a different latency requirement than most SaaS, and it shaped every architectural choice.
The latency math
A typical AWS Lambda or Vercel Function has a cold start of 200 to 1000 milliseconds. Once warm, it responds in whatever the business logic takes (50 to 150ms typically).
For a human dashboard, cold start is acceptable. The user clicks, waits half a second, sees the page. Fine.
For an agent chaining tool calls, cold start compounds. Eight calls, each starting up a fresh container, can add 2 to 4 seconds of pure cold-start latency to a sequence the user thinks is "one operation." Worse, agents have less patience than humans. A 2-second pause between tool calls breaks the flow and starts to feel slow.
My target became: keep every single tool call under 100ms at p95, globally.
That ruled out Lambda. Probably ruled out Vercel Functions. What is left?
Cold-start latency by runtime · figures approximate, varies by region and config
Why Cloudflare Workers fit
Workers run on V8 Isolates instead of containers. The cold start is roughly 5 milliseconds versus 200 to 1000 milliseconds for Lambda. The runtime is JavaScript, or anything that compiles to it. We use TypeScript.
The deployment model is also distinct. A Worker deploys to 300-plus data centers globally as part of a single deploy command. There is no region selection. Every request hits the nearest data center.
For an MCP server, this matters because:
- Tool calls are stateless reads of remote APIs. The Worker just orchestrates and shapes the response.
- The bulk of the work is HTTP fetches. Workers' fetch API is fast.
- Cold start is irrelevant because every call IS effectively a cold start in the Workers model. There is no container to warm up.
In practice, Pulse tool calls return well under the 100ms p95 target globally. The variance is mostly upstream API latency (Stripe is fast, Meta Ads is not), not the Worker itself.
The tradeoffs
Workers are not free. The constraints I hit, in rough order of pain:
| Constraint | What I had to change |
|---|---|
| 50ms CPU time per request (default) | Bumped to Unbound tier for 2 heavier tools |
| 128MB memory per request | Fine for orchestration, would be tight for image work |
| No persistent DB connections | Switched to Turso (libSQL over HTTP) as primary store |
| No long-running connections | No WebSockets; SSE for streaming tools |
| Auth middleware libraries don't run | Hand-rolled bearer-token auth, no Auth.js |
The DB switch was the biggest one. I started on Neon Postgres and got a working prototype quickly. The connection pool model did not play well with the stateless Worker request lifecycle. Every request opened a new connection, hit pool limits at moderate load, and the latency floor was higher than I wanted.
Turso uses HTTP for queries, which fits the Worker model. Query latency from a Worker to a Turso replica in the same region is sub-10ms. Reads dominate the workload, so this is the right tradeoff.
When edge is the wrong call
Workers are not a default. They are a fit for a specific shape of product:
- API-shaped, not screen-shaped
- Read-heavy, light write
- Stateless or session-managed externally
- Global users where latency matters
- AI agents as a meaningful audience
If your product is screen-shaped (most SaaS) and AI agents are an afterthought, stay on Vercel. The constraints of the Workers model are not worth the latency win for a human-paced UX.
If your product does heavy compute (image generation, video processing), Workers will not work. The CPU and memory limits will bite immediately. Modal, RunPod, or a long-lived container service is the right choice.
If your product has complex multi-table Postgres transactions, the HTTP-DB model does not fit gracefully. Stay on a traditional Postgres-backed serverless setup.
What I would do differently
One thing: I would start in Wrangler from day one. The first three weeks I built in a Node environment locally and ported to Workers later. The runtime differences caused "works on my machine" bugs that took longer to track down than they should have. Workers' local emulator is good enough now to build directly in it.
The other thing: invest in structured logging earlier. Workers logging is decent but not on par with Vercel's experience. wrangler tail was the bridge for the first month. Pushing structured logs to Logflare or BetterStack was the first thing I added once the product had a heartbeat.
The general pattern
Edge-first is the right default for AI-native products where agents will be a significant fraction of the consumers. Cold start compounds for chained tool calls in a way it never does for human clicks.
If you are building an MCP server, evaluate Workers before defaulting to Lambda or Vercel. The constraints force discipline. The latency win pays for the discipline.