Multi-Region PostgreSQL on GCP: Our Plan for Global Scale
This is the final post in our series on self-hosting Supabase on GCP. Throughout this series, we've covered:
- Why we self-host
- Infrastructure as Code with Pulumi
- Cloud SQL Enterprise Plus
- Security with VPN & Cloud Armor
- Auto-scaling Supabase Studio
Now we're looking ahead. Our client's application is growing internationally, and users in the US and Australia are experiencing higher latency than those in Europe (where our primary database lives).
This post outlines our plan for adding read replicas across regions. We haven't implemented this yet, but we're sharing our research and architecture decisions.
The problem: Latency across continents
Our primary database is in europe-west2 (London). Here's the typical latency for a database query:
| User Location | Round-trip Latency |
|---|---|
| London | ~5ms |
| New York | ~80ms |
| Sydney | ~280ms |
For a typical page load with 5-10 database queries, that's 400ms+ of database latency alone for Australian users. Not great.
The solution: Read replicas
Cloud SQL supports cross-region read replicas. The architecture:
┌─────────────────────────────┐
│ Primary DB │
│ (europe-west2) │
│ Writes + Reads │
└──────────────┬──────────────┘
│
┌──────────────┼──────────────┐
│ │ │
▼ ▼ ▼
┌────────────┐ ┌────────────┐ ┌────────────┐
│ Replica │ │ Replica │ │ Replica │
│ (us-east1)│ │ (us-west1)│ │(aus-southeast1)│
│ Reads │ │ Reads │ │ Reads │
└────────────┘ └────────────┘ └────────────┘
Key points:
- Writes go to primary — All mutations hit the primary in London
- Reads can go to replicas — Queries can use the nearest replica
- Automatic replication — GCP handles sync (typically under 1 second lag)
Our planned regions
Based on our user distribution:
| Region | Location | Purpose |
|---|---|---|
| europe-west2 | London | Primary (writes + reads) |
| us-east1 | South Carolina | Replica (US East Coast) |
| australia-southeast1 | Sydney | Replica (APAC) |
We might add us-west1 (Oregon) later if West Coast latency becomes an issue.
Infrastructure changes
Read replica resource
In Pulumi, read replicas are separate DatabaseInstance resources that reference the primary:
// Pseudocode - not yet implemented
const usEastReplica = new gcp.sql.DatabaseInstance(`${resourceName}-replica-us-east`, {
name: `${resourceName}-replica-us-east`,
masterInstanceName: primaryInstance.name,
region: "us-east1",
databaseVersion: "POSTGRES_18",
replicaConfiguration: {
failoverTarget: false, // Not for automatic failover
},
settings: {
tier: "db-perf-optimized-N-4", // Can be smaller than primary
edition: "ENTERPRISE_PLUS",
availabilityType: "ZONAL", // Replicas don't need regional HA
ipConfiguration: {
ipv4Enabled: false,
privateNetwork: usEastVpc.id,
},
dataCacheConfig: {
dataCacheEnabled: true, // Data Cache helps replicas too
},
},
});Cross-region networking
Each replica needs private connectivity. Options:
- VPC Peering — Connect VPCs across regions
- Shared VPC — Single VPC spanning regions
- Private Service Connect — GCP-managed private connectivity
We're leaning toward Shared VPC for simplicity. One VPC with subnets in each region keeps the networking straightforward.
Replica sizing
Replicas can be smaller than the primary:
- Primary:
db-perf-optimized-N-8(8 vCPU, 64 GB) - Replicas:
db-perf-optimized-N-4(4 vCPU, 32 GB)
Read replicas handle less load (no writes), and we can always scale up if needed.
Application layer: Routing reads to replicas
This is where it gets interesting. The application needs to:
- Route writes to the primary
- Route reads to the nearest replica
- Handle replication lag gracefully
Option 1: Prisma's read replica support
Prisma (our ORM of choice) has built-in read replica support:
// schema.prisma
datasource db {
provider = "postgresql"
url = env("DATABASE_URL") // Primary
readReplicas = [
{ url = env("DATABASE_URL_US_EAST") },
{ url = env("DATABASE_URL_AUSTRALIA") }
]
}// Usage in application code
const users = await prisma.user.findMany(); // Automatically routes to replica
const newUser = await prisma.user.create({
// Always goes to primary
data: { name: "Alice" },
});Prisma handles:
- Automatic routing based on read/write
- Connection pooling per replica
- Fallback if a replica is unavailable
Option 2: Manual routing with connection selection
For more control, we could manually select connections:
const primaryDb = new PrismaClient({ datasources: { db: { url: PRIMARY_URL } } });
const replicaDb = new PrismaClient({ datasources: { db: { url: REPLICA_URL } } });
// Write
await primaryDb.user.create({ data: { name: "Alice" } });
// Read
const users = await replicaDb.user.findMany();More verbose but gives explicit control.
Option 3: Smart routing based on user location
For APIs, we could route based on the incoming request's region:
function getDatabaseClient(request: Request) {
const region = getRequestRegion(request); // From CF-IPCountry header, etc.
switch (region) {
case "US":
return usEastPrisma;
case "AU":
return australiaPrisma;
default:
return primaryPrisma;
}
}This ensures users hit the closest replica.
Handling replication lag
Read replicas have some delay (typically under 1 second, but can be more under load). This matters for "read-after-write" scenarios:
// User creates a post
await prisma.post.create({ data: { title: "Hello" } });
// Immediately fetch their posts
const posts = await prisma.post.findMany({ where: { authorId: userId } });
// ⚠️ New post might not be visible yet if reading from replicaSolutions
1. Read-your-writes consistency
After a write, temporarily route that user's reads to the primary:
async function createPost(userId: string, data: PostData) {
await primaryDb.post.create({ data });
// Set a short TTL flag
await cache.set(`read-primary:${userId}`, true, { ttl: 5 }); // 5 seconds
}
async function getPosts(userId: string) {
const shouldReadPrimary = await cache.get(`read-primary:${userId}`);
const db = shouldReadPrimary ? primaryDb : replicaDb;
return db.post.findMany({ where: { authorId: userId } });
}2. Explicit primary reads for sensitive operations
Some queries should always hit primary:
// Dashboard showing user's just-created data
const myPosts = await primaryDb.post.findMany({
where: { authorId: currentUser.id },
});
// Public listing can use replica (slightly stale is fine)
const publicPosts = await replicaDb.post.findMany({
where: { published: true },
take: 20,
});3. Accept eventual consistency
For many use cases, a 1-second delay is fine:
- Product listings
- Blog posts
- Search results
- Analytics dashboards
Document which operations tolerate staleness and which don't.
Monitoring replication lag
Cloud SQL exposes replication lag metrics. We'd set up alerts for:
- Lag exceeds 5 seconds — Warning, investigate
- Lag exceeds 30 seconds — Critical, potential replication issues
- Replica unavailable — Failover reads to primary
// Pseudocode for health check
async function checkReplicaHealth(replicaUrl: string) {
const lag = await getReplicationLag(replicaUrl);
if (lag > 30) {
logger.error(`Replica lag critical: ${lag}s`);
metrics.increment("replica.lag.critical");
return false;
}
return true;
}Cost considerations
Adding replicas increases costs:
| Component | Monthly Cost (est.) |
|---|---|
| Primary (N-8) | $800-1000 |
| US East Replica (N-4) | $400-500 |
| Australia Replica (N-4) | $400-500 |
| Cross-region egress | $50-100 |
Total: roughly 2x the database cost. But if latency is causing user churn, it's worth it.
Optimizing costs
- Right-size replicas — Start with N-2, scale up as needed
- Consider cascading replicas — US West replicates from US East, not primary
- Pause unused replicas — If Australia traffic is low at night, stop the replica
Deployment considerations
DNS and service discovery
Each replica needs to be discoverable:
// Environment variables per region
DATABASE_URL_PRIMARY=postgresql://primary.internal:5432/db
DATABASE_URL_US_EAST=postgresql://replica-us-east.internal:5432/db
DATABASE_URL_AUSTRALIA=postgresql://replica-aus.internal:5432/dbConnection pooling
Each replica should have its own connection pool. With Prisma, this happens automatically. With other tools, configure PgBouncer per replica.
Failover strategy
If a replica fails:
- Route reads to primary temporarily
- Alert on-call
- Investigate and restore replica
Replicas are for performance, not availability. The primary handles all traffic if replicas are down.
What we're still figuring out
Edge deployments
With replicas in multiple regions, should we also deploy the application in those regions? Options:
- Single-region app — App in Europe, queries cross-region to replicas (still adds network hop)
- Multi-region app — App instances near each replica (lower latency, more complexity)
- Edge functions — Cloudflare Workers/Vercel Edge connecting to nearest replica
We're researching the tradeoffs.
Prisma Data Proxy
Prisma offers a Data Proxy that handles connection pooling and could simplify multi-region routing. We're evaluating if it's worth the additional dependency.
Write latency for remote users
Even with read replicas, writes still go to London. An Australian user creating content faces ~280ms latency. Options:
- Accept it — Writes are less frequent than reads
- Async writes — Queue locally, process async
- Multi-primary — Complex, usually not worth it for our scale
Timeline
We're planning to implement this in phases:
- Research complete (now) — Architecture decisions made
- Proof of concept — Single replica in us-east1
- Production rollout — Add replicas, update application
- Monitoring — Observe latency improvements, tune as needed
We'll update this post with implementation details once we've built it.
Key takeaways
If you're considering multi-region PostgreSQL:
- Measure first — Is latency actually your bottleneck?
- Start with one replica — Add complexity incrementally
- Plan for consistency — Read-after-write needs special handling
- Use built-in tooling — Prisma, PgBouncer, etc. handle the hard parts
- Monitor replication lag — It's the key metric for replica health
Multi-region databases aren't simple, but for global applications, they're often necessary. The key is understanding the tradeoffs and implementing incrementally.
Building a global application and need help with multi-region architecture? Get in touch — we love solving these kinds of infrastructure challenges.
