Documentation Index
Fetch the complete documentation index at: https://langchain-5e9cc07a-preview-featse-1779998369-ad736a3.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
This page documents what the GCP Terraform modules provision and how the modules wire the resulting deployment together.
LangSmith on GCP deploys in up to five passes. Each pass adds a capability layer on top of the previous. All layers share the same GKE cluster and langsmith namespace.
| Pass | Layer | What it adds |
|---|
| 1 | GCP infrastructure | VPC, GKE, Cloud SQL, Memorystore, GCS, K8s bootstrap, cert-manager, KEDA, Envoy Gateway |
| 2 | LangSmith base | frontend, backend, platform-backend, queue, ace-backend, clickhouse, playground |
| 3 | LangSmith Deployment | host-backend, listener, operator + per-deployment pods |
| 4 | Agent Builder | agent-builder-tool-server, agent-builder-trigger-server + deep-agent LGP |
| 5 | Insights + Polly | Clio analytics (ClickHouse-backed), Polly eval agent |
Module descriptions
| Module | Path | Purpose |
|---|
networking | modules/networking/ | VPC, subnet with secondary ranges, Cloud Router, Cloud NAT, private service connection for Cloud SQL and Memorystore |
k8s-cluster | modules/k8s-cluster/ | GKE Standard or Autopilot cluster, node pool with autoscaling, Workload Identity enabled |
postgres | modules/postgres/ | Cloud SQL PostgreSQL instance, HA standby replica, private IP, deletion protection |
redis | modules/redis/ | Memorystore Redis STANDARD_HA tier, private IP within VPC |
storage | modules/storage/ | GCS bucket with lifecycle rules for ttl_s/ (14 days) and ttl_l/ (400 days) prefixes |
k8s-bootstrap | modules/k8s-bootstrap/ | langsmith namespace, Kubernetes Secrets for Postgres and Redis URLs, cert-manager and KEDA Helm releases |
ingress | modules/ingress/ | Envoy Gateway Helm release, GatewayClass, HTTPRoute, optional HTTPS Gateway listener |
iam | modules/iam/ | GCP service accounts and Workload Identity bindings for GCS access (wired by default) |
dns | modules/dns/ | Cloud DNS managed zone and managed cert (optional, enable with enable_dns_module) |
secrets | modules/secrets/ | Secret Manager secret bundle (optional, enable with enable_secret_manager_module) |
Deployment tiers
Light deploy (all in-cluster)
VPC
└── subnet (10.0.0.0/20 — GKE nodes only)
No Cloud SQL or Memorystore — chart pods handle both
GKE Cluster
├── langsmith namespace
│ ├── frontend, backend, platform-backend, queue, ace-backend, playground
│ ├── clickhouse (in-cluster)
│ ├── postgres (in-cluster)
│ └── redis (in-cluster)
├── cert-manager
├── keda
└── envoy-gateway-system
GCS Bucket (trace payloads, always external)
Set in terraform.tfvars:
postgres_source = "in-cluster"
redis_source = "in-cluster"
clickhouse_source = "in-cluster"
Production (external managed services)
VPC
├── subnet (10.0.0.0/20 — GKE nodes, pods, services)
│ └── Secondary ranges: pods 10.4.0.0/14, services 10.8.0.0/20
└── Private service connection (VPC peering to Google managed network)
├── Cloud SQL PostgreSQL (private IP, regional standby)
└── Memorystore Redis (private IP, STANDARD_HA tier)
GKE Cluster
├── langsmith namespace
│ ├── frontend, backend, platform-backend, queue, ace-backend, playground
│ └── clickhouse (in-cluster — use LangChain Managed for production scale)
├── cert-manager
├── keda
└── envoy-gateway-system
GCS Bucket (Workload Identity, no static keys)
Application core services
| Service | Purpose | Port | HPA | Workload Identity | Depends on |
|---|
langsmith-frontend | React UI | 3000 | 1 to 10 | No | backend, platform-backend |
langsmith-backend | Main API (traces, runs, projects, API keys, feedback) | 1984 | 3 to 10 | Yes (GCS) | Postgres, Redis, ClickHouse, GCS |
langsmith-platform-backend | Org and user management, auth, billing, settings | 1986 | 1 to 10 | Yes (GCS) | Postgres, Redis, GCS |
langsmith-playground | LLM prompt playground UI | 3001 | 1 to 10 | No | backend |
langsmith-queue | Trace ingestion worker (Redis to ClickHouse + GCS) | — | 3 to 10 + KEDA | Yes | Redis, ClickHouse, GCS |
langsmith-ingest-queue | Dedicated high-throughput ingestion worker | — | 3 to 10 + KEDA | Yes | Redis, GCS |
langsmith-ace-backend | Async compute (dataset runs, evaluations, background jobs) | — | 1 to 5 | No | Postgres, Redis |
langsmith-clickhouse | Columnar store (trace spans, run metadata, eval results) | — | StatefulSet, single replica | No | 500Gi premium-rwo PVC |
In-cluster ClickHouse is dev/POC only (single pod, no replication, no backups). For production, use LangChain Managed ClickHouse or a self-managed external cluster.
One-time jobs
| Job | Purpose |
|---|
langsmith-backend-migrations | PostgreSQL schema migrations |
langsmith-backend-ch-migrations | ClickHouse schema migrations |
langsmith-backend-auth-bootstrap | Creates the initial org and admin account |
LangSmith Deployment add-on
| Service | Purpose | Workload Identity |
|---|
langsmith-host-backend | LangGraph control plane API. Manages deployment lifecycle, serves deployment metadata. | Yes (GCS) |
langsmith-listener | Watches host-backend for state changes, creates and updates LangGraphPlatform CRDs. | Yes (GCS) |
langsmith-operator | Kubernetes operator. Reconciles LangGraphPlatform CRDs, creates and deletes Deployments and Services. | RBAC for Deployments and Services |
Each LangGraph deployment created in the UI produces a Kubernetes Deployment in the langsmith namespace, with pods running as the langsmith-ksa ServiceAccount. That ServiceAccount must carry the iam.gke.io/gcp-service-account annotation, which deploy.sh applies idempotently.
GCP managed services
When postgres_source = "external" and redis_source = "external" (the recommended production setting), Terraform provisions:
Cloud SQL PostgreSQL
- Default size
db-custom-2-8192 (2 vCPU, 8 GB), private IP, port 5432.
- REGIONAL availability with automatic failover.
- Holds orgs, users, projects, API keys, settings.
- Terraform writes the connection URL directly to the
langsmith-postgres Kubernetes Secret.
Memorystore Redis
- Default 5 GB, STANDARD_HA tier, private IP, port 6379.
- Trace ingestion queue, pub/sub, short-lived cache.
- No auth token required. Access is controlled by VPC private IP only.
- Terraform writes the connection URL directly to the
langsmith-redis Kubernetes Secret.
Cloud Storage bucket
- Trace payloads: large inputs and outputs, attachments.
- Accessed via the S3-compatible API (
apiURL: https://storage.googleapis.com, engine: S3).
- HMAC keys are required for the S3-compatible API even with Workload Identity. Create one under Cloud Storage → Settings → Interoperability and pass them to Helm via
config.blobStorage.accessKey and config.blobStorage.accessKeySecret.
- Lifecycle rules:
ttl_s/ prefix (14 days default), ttl_l/ prefix (400 days default).
- Always required.
Secret Manager (optional module)
- Stores Postgres password and generated secrets (LangSmith secret key, JWT secret) when
enable_secret_manager_module = true.
- Core secrets (
langsmith-postgres, langsmith-redis) are always stored in Kubernetes Secrets by k8s-bootstrap regardless of this module. Secret Manager provides an additional durable store for secrets that must survive cluster recreation.
Cluster infrastructure
| Service | Namespace | Installed by | Required for |
|---|
| Envoy Gateway | envoy-gateway-system | ingress module (install_ingress = true, default) | All ingress |
| KEDA | keda | k8s-bootstrap module when enable_langsmith_deployment = true | LangSmith Deployment add-on and later |
| cert-manager | cert-manager | k8s-bootstrap module when tls_certificate_source = "letsencrypt" or install_cert_manager = true | Let’s Encrypt TLS |
| External Secrets Operator | external-secrets | k8s-bootstrap module | Custom secret workflows (optional) |
The Gateway resource is managed by Terraform; the HTTPRoute is managed by Helm. Do not delete the Gateway resource manually. GCP releases the external IP when the Gateway is deleted, and a new IP is issued on recreate.
Workload Identity
GKE pods access GCS through Workload Identity. The Kubernetes ServiceAccount is bound to a GCP service account via an IAM binding; pods receive temporary credentials with no static keys in Secrets or environment variables.
GKE pod
└── Kubernetes ServiceAccount (annotated with iam.gke.io/gcp-service-account)
└── IAM binding: roles/iam.workloadIdentityUser
└── GCP Service Account
└── roles/storage.objectAdmin on the GCS bucket
| Component | Annotation | Permissions |
|---|
langsmith-backend | iam.gke.io/gcp-service-account: <gsa> | GCS storage.objectAdmin on the LangSmith bucket |
langsmith-platform-backend | Same | GCS storage.objectAdmin |
langsmith-queue | Same | GCS storage.objectAdmin |
langsmith-ingest-queue | Same | GCS storage.objectAdmin |
langsmith-host-backend | Same | GCS storage.objectAdmin |
langsmith-listener | Same | GCS storage.objectAdmin |
langsmith-ksa (operator pods) | Same | GCS storage.objectAdmin |
The GSA is defined by the iam module and output as workload_identity_annotation. init-values.sh writes these annotations into values-overrides.yaml automatically.
GCS access via the S3-compatible API requires HMAC keys in addition to Workload Identity. Create the HMAC key under Cloud Storage → Settings → Interoperability and pass it to Helm.
Network topology
| Range | CIDR | Used by |
|---|
| Subnet | 10.0.0.0/20 | GKE nodes |
| Pods | 10.4.0.0/14 | GKE pod IPs (secondary range) |
| Services | 10.8.0.0/20 | GKE ClusterIP services (secondary range) |
| Private service connection | /16 allocated by Google | Cloud SQL, Memorystore private IPs |
Cloud SQL and Memorystore are accessed exclusively via private IP. The networking module establishes a private service connection (VPC peering to Google’s managed network) whenever postgres_source = "external" or redis_source = "external".
Traffic flow
Internet (HTTPS :443)
↓
Envoy Gateway (envoy-gateway-system, external LoadBalancer IP)
TLS terminated — cert-manager + Let's Encrypt or existing certificate
│
├── / → frontend:80
├── /api/* → backend:1984
└── /api/v1/deployments/* → host-backend:1985 (LangSmith Deployment add-on)
Internal traffic (private IPs, never leaving VPC):
backend → Cloud SQL:5432 via private IP
backend → Memorystore:6379 via private IP
backend → GCS via Workload Identity + HMAC keys
host-backend → K8s API reads deployment pod status
listener → K8s API reconciles Deployment CRDs
operator → K8s API creates and manages deployment pods
Component to storage mapping
| Component | PostgreSQL | Redis | ClickHouse | GCS |
|---|
backend | Org config, run metadata | Ingestion queue | — | Trace objects |
platform-backend | — | — | — | Blob routing |
queue | — | Pops jobs | — | Writes trace blobs |
clickhouse | — | — | Trace search index | — |
host-backend | Deployment lifecycle state | — | — | — |
Secret Manager integration
Without Secret Manager:
terraform.tfvars → terraform apply → kubernetes_secret (postgres, redis)
With Secret Manager:
terraform.tfvars → terraform apply → Secret Manager secrets
→ ESO (External Secrets Operator)
→ kubernetes_secret (langsmith namespace)
google_project_service (APIs enabled)
└── module.networking
├── module.gke_cluster
│ └── null_resource.wait_for_cluster
│ ├── module.cloudsql (count = postgres_source == "external")
│ ├── module.redis (count = redis_source == "external")
│ ├── module.storage
│ ├── module.iam (count = enable_gcp_iam_module)
│ ├── module.secrets (count = enable_secret_manager_module)
│ ├── module.dns (count = enable_dns_module)
│ ├── module.k8s_bootstrap
│ └── module.ingress (count = install_ingress)
└── (private_service_connection when external services)
LangSmith itself is not deployed by Terraform; the chart is installed in the application stage via helm upgrade --install.
Verification commands
# Cluster connectivity
gcloud container clusters get-credentials <cluster-name> --region <region> --project <project-id>
kubectl cluster-info
kubectl get nodes -o wide
# All LangSmith pods
kubectl get pods -n langsmith
# Envoy Gateway
kubectl get pods -n envoy-gateway-system
kubectl get svc -n envoy-gateway-system
# cert-manager
kubectl get pods -n cert-manager
kubectl get certificate -n langsmith
# KEDA (LangSmith Deployment add-on)
kubectl get pods -n keda
# Cloud SQL connectivity test
kubectl run psql-test --rm -it --image=postgres:15 -n langsmith -- \
psql "postgresql://langsmith:<password>@<cloud-sql-private-ip>:5432/langsmith" -c "SELECT version();"
# Memorystore connectivity test
kubectl run redis-test --rm -it --image=redis:7 -n langsmith -- \
redis-cli -h <redis-private-ip> ping
# GCS connectivity test
kubectl run gcs-test --rm -it --image=google/cloud-sdk -n langsmith -- \
gsutil ls gs://<bucket-name>