Cloud SQL PostgreSQL 18 Enterprise Plus: A Complete Setup Guide
When we decided to self-host Supabase on GCP, the database choice was straightforward: Cloud SQL with PostgreSQL 18 on the Enterprise Plus tier.
This post covers everything we learned setting it up — the configuration decisions, the gotchas, and the features that make Enterprise Plus worth considering.
Why Enterprise Plus?
GCP offers two Cloud SQL editions:
| Feature | Enterprise | Enterprise Plus |
|---|---|---|
| Availability SLA | 99.95% | 99.99% |
| Max instance size | 96 vCPUs / 624 GB RAM | 128 vCPUs / 864 GB RAM |
| Data Cache | ❌ | ✅ |
| Point-in-time recovery | ✅ | ✅ |
| Read replicas | ✅ | ✅ |
| Private networking | ✅ | ✅ |
The killer feature for us was Data Cache.
What is Data Cache?
Data Cache is an in-memory caching layer that sits in front of your database storage. It automatically caches frequently accessed data, reducing disk I/O and improving query latency.
Think of it as a built-in Redis layer, but:
- You don't manage it
- It's automatically populated based on access patterns
- It's included in the Enterprise Plus pricing
- It works transparently with your existing queries
For read-heavy workloads (which most web applications are), this is a significant performance boost without any code changes.
Our database configuration
Here's how we configure Cloud SQL in Pulumi:
this.instance = new gcp.sql.DatabaseInstance(
`${resourceName}-db`,
{
name: `${resourceName}-db`,
databaseVersion: "POSTGRES_18",
region: region,
deletionProtection: deletionProtection,
settings: {
tier: tier,
edition: "ENTERPRISE_PLUS",
availabilityType: availabilityType,
// Storage
diskSize: diskSize,
diskType: "PD_SSD",
diskAutoresize: true,
diskAutoresizeLimit: 1000,
// Data Cache
dataCacheConfig: dataCacheEnabled ? { dataCacheEnabled: true } : undefined,
// Private IP
ipConfiguration: {
ipv4Enabled: false,
privateNetwork: args.network.id,
enablePrivatePathForGoogleCloudServices: true,
},
// Backups
backupConfiguration: {
enabled: true,
startTime: "03:00",
pointInTimeRecoveryEnabled: true,
transactionLogRetentionDays: 7,
backupRetentionSettings: {
retainedBackups: backupRetainedDays,
retentionUnit: "COUNT",
},
},
// Maintenance
maintenanceWindow: {
day: 7, // Sunday
hour: 4, // 4 AM UTC
updateTrack: "stable",
},
// Database flags
databaseFlags: [
{ name: "max_connections", value: maxConnections.toString() },
{ name: "cloudsql.iam_authentication", value: "on" },
],
// Query monitoring
insightsConfig: {
queryInsightsEnabled: true,
queryPlansPerMinute: 5,
queryStringLength: 1024,
recordApplicationTags: true,
recordClientAddress: true,
},
},
},
{
parent: this,
dependsOn: [args.privateVpcConnection],
},
);Let's break down each section.
Instance tiers and sizing
Cloud SQL uses a tiered naming convention for machine types. For Enterprise Plus, the performance-optimized tiers are what you want:
db-perf-optimized-N-{vCPUs}
Common configurations:
| Tier | vCPUs | Memory | Use Case |
|---|---|---|---|
| db-perf-optimized-N-2 | 2 | 16 GB | Small dev environments |
| db-perf-optimized-N-4 | 4 | 32 GB | Dev/staging, light production |
| db-perf-optimized-N-8 | 8 | 64 GB | Production workloads |
| db-perf-optimized-N-16 | 16 | 128 GB | High-traffic applications |
| db-perf-optimized-N-32 | 32 | 256 GB | Enterprise workloads |
We use db-perf-optimized-N-4 for development (32 GB RAM is plenty for testing) and db-perf-optimized-N-8 for production.
The memory matters
With Enterprise Plus and Data Cache enabled, more memory means more cache. If your working dataset fits in memory, you'll see dramatically lower query latencies.
Storage configuration
diskSize: diskSize,
diskType: "PD_SSD",
diskAutoresize: true,
diskAutoresizeLimit: 1000,SSD storage
Always use PD_SSD. The IOPS and latency improvements over standard persistent disk are significant for database workloads. The cost difference is minor compared to the performance gain.
Auto-resize
Enable diskAutoresize to prevent running out of space. The database will automatically expand storage when it reaches 90% capacity.
We set diskAutoresizeLimit: 1000 (1 TB) as a safety cap. This prevents runaway costs if something goes wrong with data growth.
Warning: Disk size can only increase, never decrease. Start with a reasonable minimum and let it grow.
Availability and redundancy
availabilityType: "REGIONAL", // or "ZONAL"ZONAL (Single-zone)
- Instance runs in one zone
- Cheaper
- Downtime during zone failures
- Good for: Development, non-critical workloads
REGIONAL (Multi-zone)
- Primary instance + standby in different zones
- Automatic failover (usually under 60 seconds)
- Required for 99.99% SLA
- Good for: Production workloads
We use ZONAL for dev (save costs) and REGIONAL for prod (maximize availability).
Private networking
This is crucial for security. We never expose the database to the public internet.
ipConfiguration: {
ipv4Enabled: false,
privateNetwork: args.network.id,
enablePrivatePathForGoogleCloudServices: true,
},How it works
- VPC Peering: Cloud SQL creates a peering connection to your VPC
- Private IP: The database gets an internal IP address in your network
- No public exposure:
ipv4Enabled: falseensures no public IP is assigned
The setup requirement
Before creating a Cloud SQL instance with private IP, you need to establish the private services connection:
// Reserve an IP range for Cloud SQL
this.privateIpRange = new gcp.compute.GlobalAddress(`${resourceName}-private-ip-range`, {
purpose: "VPC_PEERING",
addressType: "INTERNAL",
prefixLength: 16,
network: this.vpc.id,
});
// Create the peering connection
this.privateVpcConnection = new gcp.servicenetworking.Connection(`${resourceName}-private-vpc-connection`, {
network: this.vpc.id,
service: "servicenetworking.googleapis.com",
reservedPeeringRanges: [this.privateIpRange.name],
});The key detail: Cloud SQL depends on this connection. In Pulumi, we express this explicitly:
{
dependsOn: [args.privateVpcConnection],
}Without this dependency, Pulumi might try to create the database before the VPC peering is ready, and you'll get cryptic networking errors.
Private path for Google services
enablePrivatePathForGoogleCloudServices: true,This allows other Google services (like Cloud Run, GKE, etc.) to access the database through private networking without going through the public internet.
Backup configuration
Backups are non-negotiable for production databases.
backupConfiguration: {
enabled: true,
startTime: "03:00", // 3 AM UTC
pointInTimeRecoveryEnabled: true,
transactionLogRetentionDays: 7,
backupRetentionSettings: {
retainedBackups: 35,
retentionUnit: "COUNT",
},
},Daily automated backups
startTime: "03:00" schedules backups at 3 AM UTC. Choose a time when your database has low activity.
Point-in-time recovery (PITR)
When enabled, Cloud SQL continuously writes transaction logs to storage. You can restore to any point within the retention window — not just the daily backup.
transactionLogRetentionDays: 7 means you can restore to any moment in the last week.
Backup retention
We keep 35 daily backups (about 5 weeks). This gives us enough history to recover from issues that aren't noticed immediately.
Tip: Test your restore process regularly. Backups are worthless if you can't restore from them.
Maintenance windows
maintenanceWindow: {
day: 7, // Sunday
hour: 4, // 4 AM UTC
updateTrack: "stable",
},GCP applies security patches and updates during maintenance windows. By specifying day and hour, you control when this happens.
We choose Sunday at 4 AM UTC — lowest traffic for most business applications.
Update tracks
- stable: Updates after they've been tested in production by other users
- canary: Get updates earlier (useful for testing)
Stick with stable for production.
Database flags
databaseFlags: [
{ name: "max_connections", value: maxConnections.toString() },
{ name: "cloudsql.iam_authentication", value: "on" },
],max_connections
The default is often too low for production. We set this to 200, but adjust based on your needs:
- Connection poolers like PgBouncer can reduce the connections you need
- Each connection uses memory, so don't set this arbitrarily high
- Monitor
pg_stat_activityto see actual usage
IAM authentication
cloudsql.iam_authentication enables authentication using GCP IAM users and service accounts. This is useful for:
- Service accounts connecting from Cloud Run, GKE, etc.
- Developers connecting via Cloud SQL Auth Proxy
- Reducing password management overhead
Query Insights
insightsConfig: {
queryInsightsEnabled: true,
queryPlansPerMinute: 5,
queryStringLength: 1024,
recordApplicationTags: true,
recordClientAddress: true,
},Query Insights is free and provides invaluable debugging information:
- Top queries by CPU, latency, and IO
- Query plans for slow queries
- Historical trends
We enable all options. The overhead is minimal, and the visibility is worth it.
Creating the database and user
The Cloud SQL instance is just the server. You also need a database and user:
this.database = new gcp.sql.Database(
`${resourceName}-database`,
{
name: args.databaseName,
instance: this.instance.name,
charset: "UTF8",
collation: "en_US.UTF8",
},
{ parent: this },
);
this.user = new gcp.sql.User(
`${resourceName}-db-user`,
{
name: "postgres",
instance: this.instance.name,
password: args.password,
},
{ parent: this },
);Character set and collation
UTF8 with en_US.UTF8 collation is the safe default for most applications. If you have specific locale requirements, adjust accordingly.
The postgres user
We use the default postgres superuser. For enhanced security, you could:
- Create a separate application user with limited privileges
- Disable the postgres user after setup
- Use IAM authentication for application access
For now, password authentication with secrets management (via Infisical) works well for our use case.
Environment presets
Rather than duplicating configuration, we use presets:
export const DatabasePresets = {
dev: {
tier: "db-perf-optimized-N-4",
edition: "ENTERPRISE_PLUS",
availabilityType: "ZONAL" as const,
diskSize: 250,
dataCacheEnabled: true,
deletionProtection: false,
backupRetainedDays: 35,
},
prod: {
tier: "db-perf-optimized-N-8",
edition: "ENTERPRISE_PLUS",
availabilityType: "REGIONAL" as const,
diskSize: 250,
dataCacheEnabled: true,
deletionProtection: true,
backupRetainedDays: 35,
},
};Key differences:
| Setting | Dev | Prod | Why |
|---|---|---|---|
| Tier | N-4 (4 vCPU) | N-8 (8 vCPU) | Prod handles more load |
| Availability | ZONAL | REGIONAL | Prod needs HA |
| Deletion protection | Off | On | Prevent accidental deletion in prod |
Connecting to the database
From within GCP
Applications running in the same VPC can connect directly using the private IP:
const connectionString = pulumi.interpolate`postgresql://postgres:${password}@${database.privateIp}:5432/${dbName}`;From outside GCP (development)
For local development, use the Cloud SQL Auth Proxy:
# Install the proxy
gcloud components install cloud-sql-proxy
# Connect (this creates a local socket)
cloud-sql-proxy --port 5432 PROJECT:REGION:INSTANCE_NAMEThen connect as if the database were local:
psql -h localhost -U postgres -d mydbVia VPN
Our setup includes a VPN, so developers can:
- Connect to the VPN
- Access the database directly via private IP
This is convenient for tools that don't support Cloud SQL Auth Proxy.
Common gotchas
1. VPC peering takes time
The private VPC connection can take 5-10 minutes to establish. If your pulumi up seems stuck on the database creation, this is likely why.
2. Connection limits with serverless
Cloud Run and similar services can create many connections quickly. Use a connection pooler (PgBouncer, Prisma's built-in pooling) or Cloud SQL's built-in connection pooling.
3. Disk can only grow
Once you increase disk size, you can't decrease it. Start smaller and let auto-resize handle growth.
4. Maintenance can cause brief downtime
Even with REGIONAL availability, maintenance updates can cause brief (seconds) connection interruptions. Design your application to handle database reconnection gracefully.
5. Backups aren't free
Backup storage costs money. 35 daily backups of a large database adds up. Monitor your storage costs.
Monitoring and alerts
Beyond Query Insights, set up alerts for:
- CPU utilization > 80% sustained
- Memory utilization > 90%
- Disk utilization > 80%
- Connection count approaching max_connections
- Replication lag (if using read replicas)
GCP Cloud Monitoring makes this straightforward.
Next steps
In upcoming posts, we'll cover:
- Read replicas: Setting up cross-region replicas for global deployments
- Performance tuning: PostgreSQL configuration for specific workloads
- Monitoring deep dive: Setting up comprehensive observability
Need help with your database architecture? Get in touch — we've deployed and optimized databases for applications at every scale.
