Deploy LangSmith on GCP with Terraform

Provision the GCP cloud foundation and install LangSmith with the public Terraform modules at github.com/langchain-ai/terraform/tree/main/modules/gcp. Plan for 35 to 45 minutes end to end on a clean project. The deployment runs in two stages: infrastructure (Terraform provisions VPC, GKE, Cloud SQL, Memorystore, GCS, Workload Identity) and application (Helm installs the LangSmith chart against the cluster). Add-ons are enabled with a flag and a redeploy.

Prerequisites

Required tools

Tool	Version	Purpose
Google Cloud SDK (`gcloud`)	450	Authenticate, query GCP resources, manage GKE credentials
Terraform	1.5	Run the infrastructure modules
`kubectl`	1.28	Inspect the GKE cluster
Helm	3.12	Install and manage the LangSmith chart

Install on macOS:

brew install --cask google-cloud-sdk
brew install kubectl helm
brew tap hashicorp/tap && brew install hashicorp/tap/terraform

gcloud version
terraform version
kubectl version --client
helm version

Required GCP APIs

Terraform enables these automatically on first apply, but cloudresourcemanager.googleapis.com must be enabled first so Terraform can enable the rest. Enable everything manually for fast first runs:

gcloud services enable \
  container.googleapis.com \
  sqladmin.googleapis.com \
  redis.googleapis.com \
  storage.googleapis.com \
  iam.googleapis.com \
  secretmanager.googleapis.com \
  certificatemanager.googleapis.com \
  servicenetworking.googleapis.com \
  cloudresourcemanager.googleapis.com \
  --project <your-project-id>

Required IAM roles

The principal running Terraform needs the following roles on the target project. Trim to least-privilege after the initial deployment is stable.

Role	Purpose
`roles/container.admin`	Create and manage GKE clusters
`roles/compute.networkAdmin`	Create VPC, subnets, firewall rules
`roles/iam.serviceAccountAdmin`	Create service accounts for Workload Identity
`roles/cloudsql.admin`	Create and manage Cloud SQL instances
`roles/redis.admin`	Create and manage Memorystore Redis
`roles/storage.admin`	Create GCS buckets and lifecycle policies
`roles/resourcemanager.projectIamAdmin`	Grant IAM bindings during provisioning
`roles/servicenetworking.networksAdmin`	Create private service connections (required for Cloud SQL and Redis)

Authenticate

gcloud auth login
gcloud config set project <your-project-id>
gcloud auth application-default login

You also need a LangSmith license key (contact sales) and a domain or subdomain that resolves to GCP.

Rapid path

For the fastest path from zero to a running LangSmith instance, run these commands in order:

# 1. Clone the public modules
git clone https://github.com/langchain-ai/terraform.git
cd terraform/modules/gcp

# 2. Generate terraform.tfvars interactively (Enter accepts current values)
make quickstart

# 3. Load secrets into Secret Manager
#    Must be sourced, not executed
source infra/scripts/setup-env.sh

# 4. Validate environment
make preflight

# 5. Provision infrastructure (~25 to 35 min)
make init
make plan
make apply

# 6. Configure kubectl
make kubeconfig
kubectl get nodes

# 7. Deploy LangSmith via Helm (~8 to 12 min)
make init-values
make deploy

# 8. Get the Gateway IP for DNS
kubectl get gateway -n langsmith \
  -o jsonpath='{.items[0].status.addresses[0].value}'

The sections below cover each phase in detail.

Provision infrastructure

Provisioning the GCP cloud foundation takes 25 to 35 minutes on a clean project. Do not interrupt the apply.

What gets provisioned

Resource	Purpose
VPC + subnet + Cloud NAT	Private network for the cluster and managed services
Private service connection	VPC peering for Cloud SQL and Memorystore private IPs
GKE cluster (Standard or Autopilot)	Kubernetes compute, Workload Identity enabled
Cloud SQL PostgreSQL	LangSmith operational data, HA standby, private IP
Memorystore Redis	Queue and cache, STANDARD_HA tier, private IP
GCS bucket	Trace payload blob storage, lifecycle rules
Workload Identity service account	Per-pod GCP access without static keys
cert-manager, KEDA, Envoy Gateway	Bootstrap workloads installed alongside infrastructure

Clone and configure

git clone https://github.com/langchain-ai/terraform.git
cd terraform/modules/gcp

All subsequent commands run from modules/gcp/. Run make help for the full target list. Generate terraform.tfvars with the interactive wizard:

make quickstart

The wizard prompts for project ID, naming prefix, region, GKE sizing, TLS source, external vs in-cluster services, and the optional add-on flags. It writes infra/terraform.tfvars. Re-running pre-selects existing values; press Enter at each prompt to keep the current config. Prefer to edit manually:

cp infra/terraform.tfvars.example infra/terraform.tfvars
vi infra/terraform.tfvars

The minimum required variables:

project_id            = "<your-gcp-project-id>"
name_prefix           = "ls"
environment           = "prod"
langsmith_license_key = "<your-license-key>"
langsmith_domain      = "langsmith.example.com"

region = "us-west2"
zone   = "us-west2-a"

postgres_source   = "external"
postgres_password = "<strong-password>"   # or: export TF_VAR_postgres_password=...

redis_source = "external"

clickhouse_source = "in-cluster"

tls_certificate_source = "letsencrypt"
letsencrypt_email      = "ops@example.com"

enable_langsmith_deployment = true

See the GCP variables reference for every input variable.

Configure a remote state backend before applying. Copy infra/backend.tf.example to infra/backend.tf and point it at a GCS bucket you control. Local state is fragile and easily lost during directory restructuring.

Load secrets into Secret Manager

source infra/scripts/setup-env.sh

The script reads terraform.tfvars, derives the secret prefix, and for each secret either reuses an exported value, reads the existing Secret Manager secret, auto-generates one (for salts and Fernet keys), or prompts you. The license key and admin password are the two values you supply interactively. The script must be sourced because make cannot export environment variables back to the parent shell. Verify the secrets are present:

make secrets

Preflight checks

make preflight

make preflight validates that the active gcloud credentials can perform each required action, that the required GCP APIs are enabled, and that the target region has the SKUs the modules request. Catching gaps here is faster than discovering them mid-terraform apply.

Apply

make init
make plan
make apply

make plan shows the proposed diff. Review the output before applying. make apply provisions in dependency order: VPC and networking, then GKE (about 10 to 15 minutes), private service connection, Cloud SQL (about 10 minutes with HA), Memorystore, GCS, and the bootstrap workloads. Equivalent direct Terraform flow:

cd modules/gcp/infra

terraform init
terraform plan -var-file=terraform.tfvars
terraform apply -var-file=terraform.tfvars

Configure kubectl

make kubeconfig
kubectl get nodes

All nodes should report Ready.

Verify bootstrap components

kubectl get pods -n cert-manager
kubectl get pods -n keda
kubectl get secrets -n langsmith

cert-manager, KEDA, and the LangSmith namespace secrets should all be in place.

Deploy LangSmith

Two paths are supported. Pick one.

Script-driven Helm deploy (recommended)

Two commands install the LangSmith chart with sensible defaults wired from Terraform outputs:

cd modules/gcp

make init-values
make deploy

init-values.sh prompts for the admin email, then reads sizing_profile and the enable_* flags from terraform.tfvars and copies matching values files from helm/values/examples/ into helm/values/. It also generates values-overrides.yaml with your hostname, Workload Identity annotations, and GCS bucket name. make deploy runs helm/scripts/deploy.sh, which refreshes the kubeconfig, runs preflight checks, applies the layered values files, and runs helm upgrade --install. Expect 8 to 12 minutes for the chart to install and pods to become ready.

Manual Helm install

Best for teams running helm directly without the scripts. Generate the required secrets first:

export API_KEY_SALT=$(openssl rand -base64 32)
export JWT_SECRET=$(openssl rand -base64 32)
export AGENT_BUILDER_ENCRYPTION_KEY=$(python3 -c \
  "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())")
export INSIGHTS_ENCRYPTION_KEY=$(python3 -c \
  "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())")
export ADMIN_EMAIL="admin@example.com"
export ADMIN_PASSWORD="<strong-password>"

# GCS HMAC credentials (create in GCP Console: Storage > Settings > Interoperability)
export GCS_ACCESS_KEY="<your-hmac-access-key>"
export GCS_ACCESS_SECRET="<your-hmac-secret>"

helm repo add langchain https://langchain-ai.github.io/helm
helm repo update

helm upgrade --install langsmith langchain/langsmith \
  --namespace langsmith \
  --create-namespace \
  -f ../helm/values/values.yaml \
  -f ../helm/values/values-overrides.yaml \
  --set config.langsmithLicenseKey="<your-license-key>" \
  --set config.apiKeySalt="$API_KEY_SALT" \
  --set config.basicAuth.jwtSecret="$JWT_SECRET" \
  --set config.hostname="<your-langsmith-domain>" \
  --set config.basicAuth.initialOrgAdminEmail="$ADMIN_EMAIL" \
  --set config.basicAuth.initialOrgAdminPassword="$ADMIN_PASSWORD" \
  --set config.agentBuilder.encryptionKey="$AGENT_BUILDER_ENCRYPTION_KEY" \
  --set config.insights.encryptionKey="$INSIGHTS_ENCRYPTION_KEY" \
  --set config.blobStorage.bucketName="$(terraform output -raw storage_bucket_name)" \
  --set config.blobStorage.accessKey="$GCS_ACCESS_KEY" \
  --set config.blobStorage.accessKeySecret="$GCS_ACCESS_SECRET" \
  --set gateway.enabled=true \
  --set ingress.enabled=false \
  --wait --timeout 15m

Verify and configure DNS

kubectl get pods -n langsmith

EXTERNAL_IP=$(kubectl get svc -n envoy-gateway-system \
  -l gateway.envoyproxy.io/owning-gateway-name=langsmith-gateway \
  -o jsonpath='{.items[0].status.loadBalancer.ingress[0].ip}')

echo "Create A record: $EXTERNAL_IP -> <your-langsmith-domain>"

kubectl get certificate -n langsmith

cert-manager cannot issue the Let’s Encrypt certificate until the DNS A record resolves to the Gateway IP. Create the record at your DNS provider, wait for propagation, then re-check the certificate status.

Sizing profiles

Set sizing_profile in terraform.tfvars, then re-run make init-values && make deploy.

sizing_profile = "production"   # default | minimum | dev | production | production-large

Profile	When to use
`default`	Chart defaults, no overlay applied
`minimum`	Absolute floor, fits `e2-standard-4`. Cost parking or CI smoke tests
`dev`	Single replica, minimal resources
`production`	Multi-replica with HPA. Recommended for real workloads
`production-large`	High memory, high CPU. 50+ users or 1000+ traces/sec

Minimum profile with LangSmith Deployment? Run make patch-lgp after deploy to right-size LangGraph Platform CRs. The operator overwrites Deployment patches, so the CRs must be targeted directly.

Expected pods

langsmith-ace-backend-xxx          1/1  Running    0
langsmith-backend-xxx              1/1  Running    0
langsmith-backend-auth-bootstrap   0/1  Completed  0
langsmith-backend-migrations       0/1  Completed  0
langsmith-clickhouse-0             1/1  Running    0
langsmith-frontend-xxx             1/1  Running    0
langsmith-ingest-queue-xxx         1/1  Running    0
langsmith-platform-backend-xxx     1/1  Running    0
langsmith-playground-xxx           1/1  Running    0
langsmith-queue-xxx                1/1  Running    0

Enable add-ons

Each add-on is gated by a flag in infra/terraform.tfvars. Set the flag, re-apply Terraform, then re-run make init-values && make deploy.

LangSmith Deployment

Adds host-backend, listener, and operator. Required before enabling Agent Builder or Insights. KEDA is installed automatically when enable_langsmith_deployment = true.

# infra/terraform.tfvars
enable_deployments = true

cd modules/gcp

make apply        # push the enable_deployments flag
make init-values  # pick up the change
make deploy       # roll out host-backend + listener + operator

Verify:

kubectl get pods -n langsmith | grep -E "host-backend|listener|operator"
kubectl get lgp -n langsmith
kubectl get crd | grep langchain
kubectl get pods -n keda

config.deployment.url must include https://. Without the protocol, operator-spawned agents stay stuck in DEPLOYING indefinitely.

Agent Builder

Prerequisite: LangSmith Deployment healthy. Adds agent-builder-tool-server, agent-builder-trigger-server, and an agentBootstrap Job that registers the Polly agent URL.

# infra/terraform.tfvars
enable_agent_builder = true

make init-values
make deploy

Verify:

kubectl get pods -n langsmith | grep -E "tool-server|trigger-server|bootstrap"

Roll the frontend after agentBootstrap completes so it picks up the langsmith-polly-config ConfigMap:

kubectl rollout restart deployment langsmith-frontend -n langsmith

Skipping the frontend restart makes Polly show “Unable to connect to LangGraph server”.

Insights and Polly

Prerequisite: Agent Builder healthy. Insights enables ClickHouse-backed trace analytics. Polly is the AI eval and monitoring agent. Enable both together.

# infra/terraform.tfvars
enable_insights = true
enable_polly    = true

make init-values
make deploy

Verify:

kubectl get pods -n langsmith | grep -E "clio|polly"
kubectl get pods -n langsmith -w

insights_encryption_key and polly_encryption_key must never change after first enable. Rotating either permanently breaks existing encrypted data.

Expected pods by add-on

LangSmith Deployment adds: langsmith-host-backend, langsmith-listener, langsmith-operator. Agent Builder adds: langsmith-agent-builder-tool-server, langsmith-agent-builder-trigger-server, langsmith-agent-builder-bootstrap (Completed), agent-builder-<hash> (operator-spawned). Insights and Polly add: clio-<hash> (Insights analytics), smith-polly-<hash> (Polly agent), lg-<hash>-0 (LangGraph StatefulSet).

Key watchouts

config.deployment.url must include https://. Without it, operator-spawned agents stay stuck in DEPLOYING.
config.deployment.enabled: true is required for LangSmith Deployment. Setting only the URL without enabled: true causes the chart to silently skip listener and operator.
Encryption keys must never change after first enable. Rotating insights_encryption_key or polly_encryption_key permanently breaks existing encrypted data.
Roll the frontend after first Polly enable. agentBootstrap creates the langsmith-polly-config ConfigMap after registering. Frontend pods started before bootstrap completes do not pick it up automatically.
Envoy Gateway IP changes on teardown. GCP releases the external IP when the Gateway is deleted. After a re-apply, a new IP is issued, so update your DNS A record.
langsmith-ksa annotation is not permanent. The operator creates langsmith-ksa at runtime; it does not survive namespace deletion. deploy.sh re-annotates it idempotently. Re-run make deploy if operator pods lose GCS access after a cluster rebuild.

Next steps

Reference the GCP variables and the quick reference.
Review the GCP architecture for module structure, traffic flow, and Workload Identity.
When something breaks, check the GCP troubleshooting guide.
Enable agent deployment in the UI with LangSmith Deployment.

Connect these docs to Claude, VSCode, and more via MCP for real-time answers.

Edit this page on GitHub or file an issue.

Documentation Index

​Prerequisites

​Required tools

​Required GCP APIs

​Required IAM roles

​Authenticate

​Rapid path

​Provision infrastructure

​What gets provisioned

​Clone and configure

​Load secrets into Secret Manager

​Preflight checks

​Apply

​Configure kubectl

​Verify bootstrap components

​Deploy LangSmith

​Script-driven Helm deploy (recommended)

​Manual Helm install

​Verify and configure DNS

​Sizing profiles

​Expected pods

​Enable add-ons

​LangSmith Deployment

​Agent Builder

​Insights and Polly

​Expected pods by add-on

​Key watchouts

​Next steps

Prerequisites

Required tools

Required GCP APIs

Required IAM roles

Authenticate

Rapid path

Provision infrastructure

What gets provisioned

Clone and configure

Load secrets into Secret Manager

Preflight checks

Apply

Configure kubectl

Verify bootstrap components

Deploy LangSmith

Script-driven Helm deploy (recommended)

Manual Helm install

Verify and configure DNS

Sizing profiles

Expected pods

Enable add-ons

LangSmith Deployment

Agent Builder

Insights and Polly

Expected pods by add-on

Key watchouts

Next steps