Auto-Scaling Supabase Studio on GCP with Managed Instance Groups

In previous posts, we covered the database, networking, and security layers of our self-hosted Supabase setup. Now let's talk about the application layer: how we deploy and scale Supabase Studio.

Our goals:

Auto-scaling — Handle traffic spikes without manual intervention
Auto-healing — Replace unhealthy instances automatically
Zero-downtime updates — Deploy new versions without interruption
Cost efficiency — Scale down when traffic is low

What is Supabase Studio?

Supabase Studio is the web-based admin interface for managing your Supabase/PostgreSQL database. It includes:

SQL Editor — Run queries directly
Table Editor — Visual database management
API Documentation — Auto-generated from your schema
Auth Management — User administration

It's open source and runs as a Docker container, making it perfect for self-hosting.

The architecture

┌─────────────────────────────────────────────────────┐
│                 Global Load Balancer                │
│                 (SSL Termination)                   │
└───────────────────────┬─────────────────────────────┘
                        │
┌───────────────────────▼─────────────────────────────┐
│              Regional Instance Group                 │
│    ┌──────────────┐    ┌──────────────┐            │
│    │   Zone A     │    │   Zone B     │            │
│    │  ┌────────┐  │    │  ┌────────┐  │            │
│    │  │   VM   │  │    │  │   VM   │  │            │
│    │  │ Studio │  │    │  │ Studio │  │            │
│    │  └────────┘  │    │  └────────┘  │            │
│    └──────────────┘    └──────────────┘            │
└─────────────────────────────────────────────────────┘
                        │
                        ▼
┌─────────────────────────────────────────────────────┐
│              Cloud SQL (Private IP)                  │
└─────────────────────────────────────────────────────┘

Instance Template

The foundation of auto-scaling is the instance template. It defines what each VM looks like:

this.instanceTemplate = new gcp.compute.InstanceTemplate(
  `${resourceName}-template`,
  {
    namePrefix: `${resourceName}-template-`,
    machineType: machineType,
    region: region,
    tags: ["http-server", "allow-ssh"],
 
    disks: [
      {
        sourceImage: "projects/cos-cloud/global/images/family/cos-stable",
        autoDelete: true,
        boot: true,
        diskSizeGb: 50,
        diskType: "hyperdisk-balanced",
      },
    ],
 
    networkInterfaces: [
      {
        network: args.networking.vpc.id,
        subnetwork: args.networking.subnet.id,
        // No external IP - instances are private
      },
    ],
 
    serviceAccount: {
      email: this.serviceAccount.email,
      scopes: ["cloud-platform"],
    },
 
    metadataStartupScript: startupScript,
 
    metadata: {
      "google-logging-enabled": "true",
      "google-monitoring-enabled": "true",
    },
 
    shieldedInstanceConfig: {
      enableSecureBoot: false, // COS needs this off for Docker
      enableVtpm: true,
      enableIntegrityMonitoring: true,
    },
  },
  {
    parent: this,
    dependsOn: [args.database.instance, args.database.database, args.database.user],
  },
);

Key decisions

Container-Optimized OS

We use cos-stable (Container-Optimized OS). It's:

Minimal and secure
Optimized for running Docker
Auto-updates the OS
Maintained by Google

No external IP

Instances don't have public IPs. They access the internet through Cloud NAT (for pulling Docker images) and are accessed through the load balancer.

Hyperdisk Balanced

For the boot disk, we use hyperdisk-balanced. It offers better IOPS than standard persistent disk, which helps with container startup times.

Dependencies

The template depends on the database being ready. No point starting Studio if it can't connect to PostgreSQL.

The startup script

This is where the magic happens. When a VM boots, it runs this script to configure and start Supabase Studio:

#!/bin/bash
set -euo pipefail
 
exec > >(tee /var/log/supabase-startup.log) 2>&1
echo "Starting Supabase Studio setup at $(date)"
 
# Wait for Docker
for i in {1..30}; do
    if docker info > /dev/null 2>&1; then
        break
    fi
    echo "Waiting for Docker... ($i/30)"
    sleep 2
done
 
# Environment variables (replaced at deploy time)
DB_HOST="__DB_HOST__"
DB_NAME="__DB_NAME__"
DB_PASSWORD="__DB_PASSWORD__"
AUTH_USER="__AUTH_USER__"
AUTH_PASSWORD="__AUTH_PASSWORD__"
ANON_KEY="__ANON_KEY__"
SERVICE_ROLE_KEY="__SERVICE_ROLE_KEY__"
JWT_SECRET="__JWT_SECRET__"
 
# Create network for containers
docker network create supabase-network 2>/dev/null || true
 
# Start postgres-meta (API for database introspection)
docker run -d \
    --name postgres-meta \
    --restart always \
    --network supabase-network \
    -e "PG_META_HOST=0.0.0.0" \
    -e "PG_META_PORT=8080" \
    -e "PG_META_DB_HOST=${DB_HOST}" \
    -e "PG_META_DB_PORT=5432" \
    -e "PG_META_DB_NAME=${DB_NAME}" \
    -e "PG_META_DB_USER=postgres" \
    -e "PG_META_DB_PASSWORD=${DB_PASSWORD}" \
    supabase/postgres-meta:latest
 
# Wait for postgres-meta
for i in {1..30}; do
    if curl -s http://localhost:8080/health > /dev/null; then
        break
    fi
    echo "Waiting for postgres-meta... ($i/30)"
    sleep 2
done
 
# Start Supabase Studio
docker run -d \
    --name studio \
    --restart always \
    --network supabase-network \
    -p 3000:3000 \
    -e "STUDIO_PG_META_URL=http://postgres-meta:8080" \
    -e "SUPABASE_URL=http://localhost:8000" \
    -e "SUPABASE_PUBLIC_URL=http://localhost:8000" \
    -e "SUPABASE_ANON_KEY=${ANON_KEY}" \
    -e "SUPABASE_SERVICE_KEY=${SERVICE_ROLE_KEY}" \
    -e "AUTH_JWT_SECRET=${JWT_SECRET}" \
    -e "DEFAULT_ORGANIZATION_NAME=Acme Corp" \
    -e "DEFAULT_PROJECT_NAME=Production" \
    -e "NEXT_PUBLIC_SITE_URL=http://localhost:3000" \
    -e "NEXT_ANALYTICS_BACKEND_PROVIDER=postgres" \
    supabase/studio:latest
 
echo "Supabase Studio setup complete at $(date)"

Secret injection

Notice the __DB_HOST__, __DB_PASSWORD__, etc. placeholders. These are replaced at deployment time:

const startupScript = pulumi.all([args.secrets, args.database.privateIp]).apply(([s, dbHost]) => {
  return startupScriptTemplate
    .replace(/__DB_HOST__/g, dbHost)
    .replace(/__DB_NAME__/g, s.infra.postgresDb)
    .replace(/__DB_PASSWORD__/g, s.infra.postgresPassword)
    .replace(/__AUTH_USER__/g, s.studio.authUser)
    .replace(/__AUTH_PASSWORD__/g, s.studio.authPassword)
    .replace(/__ANON_KEY__/g, s.studio.anonKey)
    .replace(/__SERVICE_ROLE_KEY__/g, s.studio.serviceRoleKey)
    .replace(/__JWT_SECRET__/g, s.studio.jwtSecret);
});

This keeps secrets out of the instance template (which would be visible in GCP console) while still baking them into the startup script.

Two containers

We run two containers:

postgres-meta — Provides the API that Studio uses for database introspection
studio — The actual Supabase Studio web UI

They communicate over a Docker network.

Health checks

Health checks are critical for auto-healing. If an instance fails the health check, it gets replaced.

this.healthCheck = new gcp.compute.HealthCheck(
  `${resourceName}-health-check`,
  {
    name: `${resourceName}-health-check`,
    checkIntervalSec: 10,
    timeoutSec: 5,
    healthyThreshold: 2,
    unhealthyThreshold: 3,
    httpHealthCheck: {
      port: 3000,
      requestPath: "/api/platform/profile",
    },
  },
  { parent: this },
);

Configuration explained

Setting	Value	Meaning
checkIntervalSec	10	Check every 10 seconds
timeoutSec	5	Request must respond in 5 seconds
healthyThreshold	2	2 consecutive successes = healthy
unhealthyThreshold	3	3 consecutive failures = unhealthy

The health endpoint

We hit /api/platform/profile on port 3000. This endpoint:

Returns 200 if Studio is running and connected to the database
Returns an error if something's wrong

Choosing the right health endpoint matters. A simple / might return 200 even if the database connection is broken. We want to verify the full stack is working.

Managed Instance Group

The instance group manages the VMs:

this.instanceGroupManager = new gcp.compute.RegionInstanceGroupManager(
  `${resourceName}-mig`,
  {
    name: `${resourceName}-mig`,
    region: region,
    baseInstanceName: resourceName,
    targetSize: minInstances,
    distributionPolicyZones: [`${region}-a`, `${region}-b`],
 
    versions: [{ instanceTemplate: this.instanceTemplate.selfLinkUnique }],
 
    namedPorts: [{ name: "http", port: 3000 }],
 
    autoHealingPolicies: {
      healthCheck: this.healthCheck.id,
      initialDelaySec: 300,
    },
 
    updatePolicy: {
      type: "PROACTIVE",
      minimalAction: "REPLACE",
      maxSurgeFixed: 2,
      maxUnavailableFixed: 0,
      replacementMethod: "SUBSTITUTE",
    },
  },
  { parent: this },
);

Regional distribution

We spread instances across two zones (region-a and region-b). If one zone has issues, the other keeps serving traffic.

Auto-healing

The autoHealingPolicies configuration:

Uses our health check to monitor instances
initialDelaySec: 300 — Wait 5 minutes before checking (startup takes time)
Unhealthy instances are automatically terminated and replaced

Update policy

The updatePolicy controls how new versions roll out:

Setting	Value	Effect
type	PROACTIVE	Apply updates immediately
minimalAction	REPLACE	Create new instances (don't just restart)
maxSurgeFixed	2	Create up to 2 extra instances during update
maxUnavailableFixed	0	Never have fewer than target instances
replacementMethod	SUBSTITUTE	Delete old, create new (vs. recreate in-place)

This gives us zero-downtime deployments:

New instances are created with the new template
Once healthy, traffic shifts to them
Old instances are terminated

Autoscaler

The autoscaler adjusts instance count based on load:

this.autoscaler = new gcp.compute.RegionAutoscaler(
  `${resourceName}-autoscaler`,
  {
    name: `${resourceName}-autoscaler`,
    region: region,
    target: this.instanceGroupManager.id,
    autoscalingPolicy: {
      minReplicas: minInstances,
      maxReplicas: maxInstances,
      cooldownPeriod: 60,
      cpuUtilization: { target: 0.7 },
    },
  },
  { parent: this },
);

Scaling policy

minReplicas — Never go below this (1 for dev, 2 for prod)
maxReplicas — Never exceed this (keeps costs bounded)
cooldownPeriod — Wait 60 seconds between scaling decisions
cpuUtilization.target — Scale up when average CPU exceeds 70%

Environment differences

Setting	Dev	Prod
minReplicas	1	2
maxReplicas	1	2

For dev, we fix at 1 instance (cost savings). For prod, we ensure at least 2 for high availability.

Backend service

The backend service connects the load balancer to the instance group:

this.backendService = new gcp.compute.BackendService(
  `${resourceName}-backend`,
  {
    name: `${resourceName}-backend`,
    protocol: "HTTP",
    portName: "http",
    timeoutSec: 30,
    healthChecks: this.healthCheck.id,
    securityPolicy: securityPolicy?.selfLink,
    backends: [
      {
        group: this.instanceGroupManager.instanceGroup,
        balancingMode: "UTILIZATION",
        capacityScaler: 1.0,
      },
    ],
    logConfig: { enable: true, sampleRate: 1.0 },
  },
  { parent: this },
);

Connection draining

When an instance is being removed, GCP waits for existing requests to complete. The timeoutSec: 30 gives in-flight requests 30 seconds to finish.

Logging

logConfig: { enable: true, sampleRate: 1.0 } logs every request. In production, you might reduce sampleRate to save on logging costs.

Load balancer

We covered the load balancer in detail in the infrastructure post. Key points:

Global IP — Single static IP for DNS
SSL termination — Google-managed certificate
HTTP to HTTPS redirect — All traffic forced to HTTPS

this.loadBalancer = new LoadBalancer(
  `${resourceName}-lb`,
  {
    name: resourceName,
    domain: args.domain,
    backendService: this.backendService,
  },
  { parent: this },
);

Putting it all together

Here's the complete service component:

const supabaseStudio = new SupabaseStudioService("supabase-studio", {
  name: "supabase-studio",
  projectId: config.projectId,
  region: config.region,
  domain: `db.${config.environment}.example.com`,
  networking: networking,
  database: database,
  secrets: secrets,
  machineType: "c4d-standard-2",
  minInstances: config.environment === "prod" ? 2 : 1,
  maxInstances: config.environment === "prod" ? 2 : 1,
  vpnPublicIp: vpn?.publicIp,
});

Machine type

We use c4d-standard-2:

2 vCPUs
8 GB RAM
Compute-optimized (C4D series)

This is plenty for Supabase Studio. The containers are lightweight.

Deployment workflow

When we run pulumi up:

Template changes? If the startup script or machine config changed, a new template is created
Rolling update — New instances are created with the new template
Health check passes — Traffic shifts to new instances
Old instances terminated — Previous version instances are deleted

The whole process takes 5-10 minutes and requires no manual intervention.

Monitoring

Built-in metrics

With google-monitoring-enabled: true, we get:

CPU utilization
Memory usage
Disk I/O
Network traffic

Custom health checks

For deeper monitoring, we could add:

Database connection latency
Container startup time
API response times

Logging

Container logs go to Cloud Logging via google-logging-enabled: true. We can:

Search and filter logs
Set up alerts on error patterns
Export to BigQuery for analysis

Common issues and fixes

1. Startup too slow

If instances take too long to start, they might fail health checks during boot. Solutions:

Increase initialDelaySec in auto-healing policy
Optimize Docker image pulls (use regional mirrors)
Pre-pull images in a base image

2. Memory pressure

If instances run out of memory:

Increase machine type
Add swap (not ideal but works)
Optimize container memory limits

3. Database connection pool exhaustion

Each Studio instance opens connections to the database. With many instances:

Monitor max_connections in PostgreSQL
Consider connection pooling (PgBouncer)
Limit concurrent Studio instances

4. Slow health check response

If the health endpoint is slow:

Check database connection latency
Increase health check timeout
Optimize the health endpoint

Cost optimization

Right-size instances

Start small. Monitor CPU and memory. Scale up only if needed.

Use committed use discounts

For predictable workloads, committed use discounts save 20-50%.

Preemptible/Spot instances

For dev environments, preemptible instances cost 60-80% less. They can be terminated with 30 seconds notice, but for non-production, that's usually fine.

What's next

In the final post of this series, we'll preview our plans for multi-region read replicas — bringing the database closer to users around the world.

Need help deploying auto-scaling applications on GCP? Get in touch — we've built infrastructure for everything from MVPs to enterprise workloads.

Auto-Scaling Supabase Studio on GCP with Managed Instance Groups

Auto-Scaling Supabase Studio on GCP with Managed Instance Groups

What is Supabase Studio?

The architecture

Instance Template

Key decisions

Container-Optimized OS

No external IP

Hyperdisk Balanced

Dependencies

The startup script

Secret injection

Two containers

Health checks

Configuration explained

The health endpoint

Managed Instance Group

Regional distribution

Auto-healing

Update policy

Autoscaler

Scaling policy

Environment differences

Backend service

Connection draining

Logging

Load balancer

Putting it all together

Machine type

Deployment workflow

Monitoring

Built-in metrics

Custom health checks

Logging

Common issues and fixes

1. Startup too slow

2. Memory pressure

3. Database connection pool exhaustion

4. Slow health check response

Cost optimization

Right-size instances

Use committed use discounts

Preemptible/Spot instances

What's next

Related Posts

Multi-Region PostgreSQL on GCP: Our Plan for Global Scale

Cloud SQL PostgreSQL 18 Enterprise Plus: A Complete Setup Guide

Building Reusable GCP Infrastructure with Pulumi & TypeScript

Want to discuss this topic?