Skip to main content

Documentation Index

Fetch the complete documentation index at: https://langchain-5e9cc07a-preview-featse-1779998369-ad736a3.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Provision the GCP cloud foundation and install LangSmith with the public Terraform modules at github.com/langchain-ai/terraform/tree/main/modules/gcp. Plan for 35 to 45 minutes end to end on a clean project. The deployment runs in two stages: infrastructure (Terraform provisions VPC, GKE, Cloud SQL, Memorystore, GCS, Workload Identity) and application (Helm installs the LangSmith chart against the cluster). Add-ons are enabled with a flag and a redeploy.

Prerequisites

Required tools

ToolVersionPurpose
Google Cloud SDK (gcloud)450Authenticate, query GCP resources, manage GKE credentials
Terraform1.5Run the infrastructure modules
kubectl1.28Inspect the GKE cluster
Helm3.12Install and manage the LangSmith chart
Install on macOS:
brew install --cask google-cloud-sdk
brew install kubectl helm
brew tap hashicorp/tap && brew install hashicorp/tap/terraform

gcloud version
terraform version
kubectl version --client
helm version

Required GCP APIs

Terraform enables these automatically on first apply, but cloudresourcemanager.googleapis.com must be enabled first so Terraform can enable the rest. Enable everything manually for fast first runs:
gcloud services enable \
  container.googleapis.com \
  sqladmin.googleapis.com \
  redis.googleapis.com \
  storage.googleapis.com \
  iam.googleapis.com \
  secretmanager.googleapis.com \
  certificatemanager.googleapis.com \
  servicenetworking.googleapis.com \
  cloudresourcemanager.googleapis.com \
  --project <your-project-id>

Required IAM roles

The principal running Terraform needs the following roles on the target project. Trim to least-privilege after the initial deployment is stable.
RolePurpose
roles/container.adminCreate and manage GKE clusters
roles/compute.networkAdminCreate VPC, subnets, firewall rules
roles/iam.serviceAccountAdminCreate service accounts for Workload Identity
roles/cloudsql.adminCreate and manage Cloud SQL instances
roles/redis.adminCreate and manage Memorystore Redis
roles/storage.adminCreate GCS buckets and lifecycle policies
roles/resourcemanager.projectIamAdminGrant IAM bindings during provisioning
roles/servicenetworking.networksAdminCreate private service connections (required for Cloud SQL and Redis)

Authenticate

gcloud auth login
gcloud config set project <your-project-id>
gcloud auth application-default login
You also need a LangSmith license key (contact sales) and a domain or subdomain that resolves to GCP.

Rapid path

For the fastest path from zero to a running LangSmith instance, run these commands in order:
# 1. Clone the public modules
git clone https://github.com/langchain-ai/terraform.git
cd terraform/modules/gcp

# 2. Generate terraform.tfvars interactively (Enter accepts current values)
make quickstart

# 3. Load secrets into Secret Manager
#    Must be sourced, not executed
source infra/scripts/setup-env.sh

# 4. Validate environment
make preflight

# 5. Provision infrastructure (~25 to 35 min)
make init
make plan
make apply

# 6. Configure kubectl
make kubeconfig
kubectl get nodes

# 7. Deploy LangSmith via Helm (~8 to 12 min)
make init-values
make deploy

# 8. Get the Gateway IP for DNS
kubectl get gateway -n langsmith \
  -o jsonpath='{.items[0].status.addresses[0].value}'
The sections below cover each phase in detail.

Provision infrastructure

Provisioning the GCP cloud foundation takes 25 to 35 minutes on a clean project. Do not interrupt the apply.

What gets provisioned

ResourcePurpose
VPC + subnet + Cloud NATPrivate network for the cluster and managed services
Private service connectionVPC peering for Cloud SQL and Memorystore private IPs
GKE cluster (Standard or Autopilot)Kubernetes compute, Workload Identity enabled
Cloud SQL PostgreSQLLangSmith operational data, HA standby, private IP
Memorystore RedisQueue and cache, STANDARD_HA tier, private IP
GCS bucketTrace payload blob storage, lifecycle rules
Workload Identity service accountPer-pod GCP access without static keys
cert-manager, KEDA, Envoy GatewayBootstrap workloads installed alongside infrastructure

Clone and configure

git clone https://github.com/langchain-ai/terraform.git
cd terraform/modules/gcp
All subsequent commands run from modules/gcp/. Run make help for the full target list. Generate terraform.tfvars with the interactive wizard:
make quickstart
The wizard prompts for project ID, naming prefix, region, GKE sizing, TLS source, external vs in-cluster services, and the optional add-on flags. It writes infra/terraform.tfvars. Re-running pre-selects existing values; press Enter at each prompt to keep the current config. Prefer to edit manually:
cp infra/terraform.tfvars.example infra/terraform.tfvars
vi infra/terraform.tfvars
The minimum required variables:
project_id            = "<your-gcp-project-id>"
name_prefix           = "ls"
environment           = "prod"
langsmith_license_key = "<your-license-key>"
langsmith_domain      = "langsmith.example.com"

region = "us-west2"
zone   = "us-west2-a"

postgres_source   = "external"
postgres_password = "<strong-password>"   # or: export TF_VAR_postgres_password=...

redis_source = "external"

clickhouse_source = "in-cluster"

tls_certificate_source = "letsencrypt"
letsencrypt_email      = "ops@example.com"

enable_langsmith_deployment = true
See the GCP variables reference for every input variable.
Configure a remote state backend before applying. Copy infra/backend.tf.example to infra/backend.tf and point it at a GCS bucket you control. Local state is fragile and easily lost during directory restructuring.

Load secrets into Secret Manager

source infra/scripts/setup-env.sh
The script reads terraform.tfvars, derives the secret prefix, and for each secret either reuses an exported value, reads the existing Secret Manager secret, auto-generates one (for salts and Fernet keys), or prompts you. The license key and admin password are the two values you supply interactively. The script must be sourced because make cannot export environment variables back to the parent shell. Verify the secrets are present:
make secrets

Preflight checks

make preflight
make preflight validates that the active gcloud credentials can perform each required action, that the required GCP APIs are enabled, and that the target region has the SKUs the modules request. Catching gaps here is faster than discovering them mid-terraform apply.

Apply

make init
make plan
make apply
make plan shows the proposed diff. Review the output before applying. make apply provisions in dependency order: VPC and networking, then GKE (about 10 to 15 minutes), private service connection, Cloud SQL (about 10 minutes with HA), Memorystore, GCS, and the bootstrap workloads. Equivalent direct Terraform flow:
cd modules/gcp/infra

terraform init
terraform plan -var-file=terraform.tfvars
terraform apply -var-file=terraform.tfvars

Configure kubectl

make kubeconfig
kubectl get nodes
All nodes should report Ready.

Verify bootstrap components

kubectl get pods -n cert-manager
kubectl get pods -n keda
kubectl get secrets -n langsmith
cert-manager, KEDA, and the LangSmith namespace secrets should all be in place.

Deploy LangSmith

Two paths are supported. Pick one. Two commands install the LangSmith chart with sensible defaults wired from Terraform outputs:
cd modules/gcp

make init-values
make deploy
init-values.sh prompts for the admin email, then reads sizing_profile and the enable_* flags from terraform.tfvars and copies matching values files from helm/values/examples/ into helm/values/. It also generates values-overrides.yaml with your hostname, Workload Identity annotations, and GCS bucket name. make deploy runs helm/scripts/deploy.sh, which refreshes the kubeconfig, runs preflight checks, applies the layered values files, and runs helm upgrade --install. Expect 8 to 12 minutes for the chart to install and pods to become ready.

Manual Helm install

Best for teams running helm directly without the scripts. Generate the required secrets first:
export API_KEY_SALT=$(openssl rand -base64 32)
export JWT_SECRET=$(openssl rand -base64 32)
export AGENT_BUILDER_ENCRYPTION_KEY=$(python3 -c \
  "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())")
export INSIGHTS_ENCRYPTION_KEY=$(python3 -c \
  "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())")
export ADMIN_EMAIL="admin@example.com"
export ADMIN_PASSWORD="<strong-password>"

# GCS HMAC credentials (create in GCP Console: Storage > Settings > Interoperability)
export GCS_ACCESS_KEY="<your-hmac-access-key>"
export GCS_ACCESS_SECRET="<your-hmac-secret>"
helm repo add langchain https://langchain-ai.github.io/helm
helm repo update

helm upgrade --install langsmith langchain/langsmith \
  --namespace langsmith \
  --create-namespace \
  -f ../helm/values/values.yaml \
  -f ../helm/values/values-overrides.yaml \
  --set config.langsmithLicenseKey="<your-license-key>" \
  --set config.apiKeySalt="$API_KEY_SALT" \
  --set config.basicAuth.jwtSecret="$JWT_SECRET" \
  --set config.hostname="<your-langsmith-domain>" \
  --set config.basicAuth.initialOrgAdminEmail="$ADMIN_EMAIL" \
  --set config.basicAuth.initialOrgAdminPassword="$ADMIN_PASSWORD" \
  --set config.agentBuilder.encryptionKey="$AGENT_BUILDER_ENCRYPTION_KEY" \
  --set config.insights.encryptionKey="$INSIGHTS_ENCRYPTION_KEY" \
  --set config.blobStorage.bucketName="$(terraform output -raw storage_bucket_name)" \
  --set config.blobStorage.accessKey="$GCS_ACCESS_KEY" \
  --set config.blobStorage.accessKeySecret="$GCS_ACCESS_SECRET" \
  --set gateway.enabled=true \
  --set ingress.enabled=false \
  --wait --timeout 15m

Verify and configure DNS

kubectl get pods -n langsmith

EXTERNAL_IP=$(kubectl get svc -n envoy-gateway-system \
  -l gateway.envoyproxy.io/owning-gateway-name=langsmith-gateway \
  -o jsonpath='{.items[0].status.loadBalancer.ingress[0].ip}')

echo "Create A record: $EXTERNAL_IP -> <your-langsmith-domain>"

kubectl get certificate -n langsmith
cert-manager cannot issue the Let’s Encrypt certificate until the DNS A record resolves to the Gateway IP. Create the record at your DNS provider, wait for propagation, then re-check the certificate status.

Sizing profiles

Set sizing_profile in terraform.tfvars, then re-run make init-values && make deploy.
sizing_profile = "production"   # default | minimum | dev | production | production-large
ProfileWhen to use
defaultChart defaults, no overlay applied
minimumAbsolute floor, fits e2-standard-4. Cost parking or CI smoke tests
devSingle replica, minimal resources
productionMulti-replica with HPA. Recommended for real workloads
production-largeHigh memory, high CPU. 50+ users or 1000+ traces/sec
Minimum profile with LangSmith Deployment? Run make patch-lgp after deploy to right-size LangGraph Platform CRs. The operator overwrites Deployment patches, so the CRs must be targeted directly.

Expected pods

langsmith-ace-backend-xxx          1/1  Running    0
langsmith-backend-xxx              1/1  Running    0
langsmith-backend-auth-bootstrap   0/1  Completed  0
langsmith-backend-migrations       0/1  Completed  0
langsmith-clickhouse-0             1/1  Running    0
langsmith-frontend-xxx             1/1  Running    0
langsmith-ingest-queue-xxx         1/1  Running    0
langsmith-platform-backend-xxx     1/1  Running    0
langsmith-playground-xxx           1/1  Running    0
langsmith-queue-xxx                1/1  Running    0

Enable add-ons

Each add-on is gated by a flag in infra/terraform.tfvars. Set the flag, re-apply Terraform, then re-run make init-values && make deploy.

LangSmith Deployment

Adds host-backend, listener, and operator. Required before enabling Agent Builder or Insights. KEDA is installed automatically when enable_langsmith_deployment = true.
# infra/terraform.tfvars
enable_deployments = true
cd modules/gcp

make apply        # push the enable_deployments flag
make init-values  # pick up the change
make deploy       # roll out host-backend + listener + operator
Verify:
kubectl get pods -n langsmith | grep -E "host-backend|listener|operator"
kubectl get lgp -n langsmith
kubectl get crd | grep langchain
kubectl get pods -n keda
config.deployment.url must include https://. Without the protocol, operator-spawned agents stay stuck in DEPLOYING indefinitely.

Agent Builder

Prerequisite: LangSmith Deployment healthy. Adds agent-builder-tool-server, agent-builder-trigger-server, and an agentBootstrap Job that registers the Polly agent URL.
# infra/terraform.tfvars
enable_agent_builder = true
make init-values
make deploy
Verify:
kubectl get pods -n langsmith | grep -E "tool-server|trigger-server|bootstrap"
Roll the frontend after agentBootstrap completes so it picks up the langsmith-polly-config ConfigMap:
kubectl rollout restart deployment langsmith-frontend -n langsmith
Skipping the frontend restart makes Polly show “Unable to connect to LangGraph server”.

Insights and Polly

Prerequisite: Agent Builder healthy. Insights enables ClickHouse-backed trace analytics. Polly is the AI eval and monitoring agent. Enable both together.
# infra/terraform.tfvars
enable_insights = true
enable_polly    = true
make init-values
make deploy
Verify:
kubectl get pods -n langsmith | grep -E "clio|polly"
kubectl get pods -n langsmith -w
insights_encryption_key and polly_encryption_key must never change after first enable. Rotating either permanently breaks existing encrypted data.

Expected pods by add-on

LangSmith Deployment adds: langsmith-host-backend, langsmith-listener, langsmith-operator. Agent Builder adds: langsmith-agent-builder-tool-server, langsmith-agent-builder-trigger-server, langsmith-agent-builder-bootstrap (Completed), agent-builder-<hash> (operator-spawned). Insights and Polly add: clio-<hash> (Insights analytics), smith-polly-<hash> (Polly agent), lg-<hash>-0 (LangGraph StatefulSet).

Key watchouts

  • config.deployment.url must include https://. Without it, operator-spawned agents stay stuck in DEPLOYING.
  • config.deployment.enabled: true is required for LangSmith Deployment. Setting only the URL without enabled: true causes the chart to silently skip listener and operator.
  • Encryption keys must never change after first enable. Rotating insights_encryption_key or polly_encryption_key permanently breaks existing encrypted data.
  • Roll the frontend after first Polly enable. agentBootstrap creates the langsmith-polly-config ConfigMap after registering. Frontend pods started before bootstrap completes do not pick it up automatically.
  • Envoy Gateway IP changes on teardown. GCP releases the external IP when the Gateway is deleted. After a re-apply, a new IP is issued, so update your DNS A record.
  • langsmith-ksa annotation is not permanent. The operator creates langsmith-ksa at runtime; it does not survive namespace deletion. deploy.sh re-annotates it idempotently. Re-run make deploy if operator pods lose GCS access after a cluster rebuild.

Next steps