Architecting scalable web platforms.
Your web platform doesn't need to scale — until it does, suddenly, and you have four weeks to fix it. This is the practical guide we wish every founder read before their first architecture decision: what actually matters, what's a waste of effort, and where most platforms really break.
When scalability actually matters (and when it doesn't)
Most web platforms don't need "scale" — they need to work. A marketing site doing 5,000 visits a month isn't a scalability problem; it's a correctness problem. The common mistake we see in inherited codebases is founders who built for Reddit's traffic and ended up with a Kubernetes cluster running an app with seven active users and a $4,800 monthly AWS bill.
Scalability becomes a real concern when one of these is true:
- You're processing more than ~100 concurrent write operations per second.
- Your single-server database is spending more than 40% of its time on CPU or I/O wait.
- Your 95th-percentile response time exceeds 1 second under normal load.
- Your business model requires a viral growth curve (marketplaces, UGC platforms, consumer apps).
- Traffic patterns are spiky — think Black Friday, product launches, press hits.
If none of these are true, the right answer is usually "make it correct, make it simple, measure it, and come back when the numbers say you need to scale." Writing that down matters because premature scaling is one of the most expensive mistakes in software, right alongside premature abstraction.
The numbers that actually drive architecture decisions
We make architectural choices based on four operational numbers: requests per second (RPS), 95th-percentile latency, database connection usage, and egress cost. Every "should we use microservices?" conversation gets clearer when you know where you actually are on those axes.
Here's a rough mental model we use with clients when they ask whether they need to change their stack:
- < 10 RPS sustained: A single Node/Python/Go server on a $20/month VPS (or a serverless function stack) handles this comfortably. Postgres on the same box or a managed $15/month instance is fine. Focus on correctness, not scale.
- 10–200 RPS sustained: Now you want a managed database (Supabase, Neon, managed Postgres on DigitalOcean/AWS), a CDN in front (Vercel, Cloudflare, Fastly), and cached responses for your hot paths. One server still works.
- 200–2,000 RPS sustained: You need horizontal scaling (multiple app instances behind a load balancer), connection pooling (pgBouncer or built-in cloud pooler), and probably a read replica on the database. Redis for sessions and rate limiting becomes worth the operational cost.
- > 2,000 RPS sustained: You're in "real scale" territory — queue-based writes, sharded databases, edge compute, multi-region active-active, maybe microservices. You also need a team, not a studio, to operate it.
Most small business and B2B web platforms live in the first or second tier for years. Many die in the third tier because they tried to jump straight to the fourth.
Monolith vs microservices: what we actually recommend
The microservices-vs-monolith debate has been one of the loudest and least helpful in software for a decade. Here's what we've seen across dozens of engagements:
We recommend a modular monolith for ~90% of web platforms we build. One deployable unit, one database, clear internal module boundaries, strict dependency rules between modules. Shopify ran a modular monolith (Ruby on Rails) until they had > 1,000 engineers. Stack Overflow still runs nine web servers serving ~6B monthly page views. Basecamp. GitHub (until recently). The companies who made microservices work — Netflix, Uber, Amazon — had tens of thousands of engineers and very specific organizational pressures microservices solve.
A modular monolith gives you the scaling headroom to grow to tens of millions of requests per day without the operational complexity of twenty services that all need independent deploys, monitoring, schemas, and SLAs.
Microservices start making sense when you have:
- Independent scaling needs for specific workloads (one heavy image-processing service, one lightweight CRUD API).
- Multiple teams that need to deploy independently without coordinating.
- Regulatory boundaries (e.g., a payments service with stricter compliance than the rest of the app).
- Different language/runtime requirements (a Python ML service alongside a Node web service).
If none of those are true and you're under 10 engineers, use a modular monolith and revisit in two years.
Database: the place most sites actually break
In our experience auditing slow platforms, ~70% of real performance problems are database-layer problems. Application code is rarely the bottleneck; the database is. These are the patterns we see most often — and the fixes:
Missing indexes
Every WHERE, ORDER BY, and JOIN condition on a table > 10,000 rows needs a supporting index. We routinely see production Postgres instances grinding at 100% CPU because a users.email lookup is a full sequential scan. Adding CREATE INDEX ON users(email) takes 90 seconds and cuts query time from 800ms to under 1ms. This is usually the first thing to check.
N+1 queries
The classic ORM trap: you fetch 50 blog posts, then iterate and call post.author on each one, issuing 50 more queries. We've seen this pattern turn a 50ms response into an 8-second response. Fix it with eager loading (.include(:author), .select related fields, or manual JOINs). Most ORMs have a tool to log queries per request — turn it on in staging and any request issuing > 10 queries is a red flag.
Unbounded result sets
A paginated endpoint that loads all results into memory before slicing, or an admin dashboard that fetches "all orders" without a limit. This works in dev with 100 rows and fails in production with 100,000. Always paginate server-side, always set a hard max limit (e.g., LIMIT 500 enforced at the query builder level).
Connection pool exhaustion
Postgres has a fixed connection limit (commonly 100 on managed tiers). Serverless functions that open a fresh connection per invocation will blow through this at maybe 30-50 concurrent requests. The fix is a connection pooler — Supabase provides one, Neon provides one, you can run pgBouncer yourself. This is invisible at low load and catastrophic at medium load.
Caching: the cheapest scaling lever
Before you rearchitect anything, cache. A 5-minute cache on a hot endpoint often turns a scaling problem into a non-issue. We layer caching at four levels:
- Browser cache (Cache-Control headers): Static assets get
immutable, max-age=31536000. HTML gets short TTLs ormust-revalidate. - CDN edge cache (Vercel, Cloudflare): Cache full HTML responses on the edge for anonymous users. A blog post served from edge is ~10ms globally vs ~400ms from origin.
- Application cache (Redis, Memcached): Cache expensive DB queries and computed values. Target: anything that takes > 50ms to compute and doesn't change on every request.
- Database query cache (Postgres shared_buffers, Rails' query cache within a request): Usually automatic, but verify your
shared_buffersis tuned for your RAM (25% is the common default; for write-heavy workloads go lower, for read-heavy go higher).
Cache invalidation is the hard part. The simplest strategy that works for most platforms: time-based expiry (TTL), plus explicit purge on writes to the affected keys. Don't invent complex invalidation schemes until you have to.
Observability: you can't scale what you can't see
Every platform we ship includes three observability pillars from day one:
- Structured logs sent to a central aggregator (Vercel's log drain, Better Stack, Axiom, or open-source ELK). Log requests with correlation IDs, response times, user IDs, and outcomes.
- Metrics on the four golden signals — latency, traffic, errors, saturation. Dashboards for each. Alerts when 95th-percentile latency exceeds your threshold for 5 minutes.
- Distributed tracing once you have more than one service. Even for a monolith, request-level tracing across database queries and external API calls is invaluable for debugging slow requests.
We set this up on project day one, not as an afterthought. You can't fix a scalability problem you can't measure — and you usually find the problem three days before you hit the wall, not three hours after.
What "production-ready" actually includes
When a client asks "is the platform ready?" we check against this list. Most handed-over projects we inherit are missing half of it.
- Automated backups of the database, tested restore procedure, documented retention policy.
- Secret management (env vars through a secret store, not checked into the repo).
- CI/CD pipeline that runs tests on every PR and blocks merges on failures.
- Staging environment that mirrors production and is backed by a separate database.
- Error tracking (Sentry, Rollbar, or equivalent) with alerts to a channel a human watches.
- Uptime monitoring on the public URL with alert escalation.
- Documented runbook for the top 5 incident classes (DB down, API rate-limited, deploy failed, queue backed up, credentials rotated).
- Security headers (CSP, HSTS, X-Frame-Options, Permissions-Policy), HTTPS-only, rate limiting on public endpoints.
- A plan for what happens when the lead engineer gets hit by a bus. Usually: a one-page architecture doc, access to all third-party accounts in a shared password manager, and at least one other person who has deployed the app successfully.
How we approach platform work at PIXIPACE
Our typical engagement for a custom platform is 4–8 weeks of fixed-scope work. Week 1 is discovery — we write a one-page architecture document, sketch the data model, identify the three biggest technical risks, and agree on the non-goals (what we're not shipping). Weeks 2–7 are build, with demos every Friday and continuous deploys to staging. Week 8 is hardening: load testing, observability setup, documentation, handover.
We use the stack that fits the problem: React or Next.js on the frontend, TypeScript end-to-end, Node or Python for the API layer, Postgres or Firestore for data depending on access patterns. Hosting on Vercel or Firebase by default — we self-host only when the cost or control requirements make it necessary.
We're opinionated about what we don't use. No Kubernetes unless you're already running it. No microservices unless the organizational pressure demands them. No custom auth when Clerk, Auth0, or Supabase Auth will do. No handrolled job queues when BullMQ or Cloud Tasks exists. The engineering discipline is knowing what to not build.
Every platform we ship comes with a one-page architecture doc, a video walkthrough of the code, and credentials handover on day one of launch. The person who takes over after us should be able to pick up where we left off without a month of archaeology.
§ Thinking about building one of these?
Tell us what you're working on — we'll reply within 24 hours.
30-minute intro call, written proposal within 72 hours. No sales theatre.