Multi-Tenant Routing Strategies

A single API gateway serving multiple customer environments must do two opposing things at once: share infrastructure efficiently and enforce hard boundaries between tenants. Get either side wrong and you face either runaway costs or a data-plane isolation failure that becomes a security incident. This page covers the full routing stack for multi-tenant deployments — from tenant identity extraction to per-tenant upstream pools, rate limiting and throttling, controlled rollouts, and observability — as a deep-dive within the Advanced Routing & API Versioning discipline.

Architectural Baseline

Before writing a single routing rule, establish three mental-model prerequisites.

Tenant resolution happens before routing evaluation. The gateway must know which tenant owns a request before it can select an upstream, apply a quota, or scope a policy. Resolution that occurs inside the upstream (post-routing) is already too late — quota overruns and cross-tenant data leaks both become possible.

Identity and tenancy are distinct concepts. A user (sub claim in a JWT) belongs to a tenant (org_id or tenant_id claim). The routing layer cares about the tenant, not the individual user. The authentication proxying and token validation layer handles user identity; multi-tenant routing consumes the tenant dimension that authentication exposes.

Isolation granularity is a spectrum. At one end, all tenants share a single upstream deployment and isolation is entirely logical (quota counters, policy scopes). At the other end, each tenant gets a dedicated upstream pod and isolation is physical. Most production systems land somewhere between those extremes: tiered isolation where enterprise tenants get dedicated pools and standard tenants share a pool with enforced quotas.

Tenant Identity Extraction

Three extraction mechanisms are in common use. Each has different trust properties and operational trade-offs.

Subdomain routing

tenant-alpha.api.example.com maps to tenant alpha. The SNI hostname arrives before TLS termination, making it usable for early routing decisions at the listener level — before the middleware chain has processed anything.

Trust level: High — DNS is controlled by the platform, not the caller.
Drawback: Wildcard TLS certificates and DNS propagation add operational overhead. Difficult to use with API clients that cannot resolve dynamic subdomains.

`X-Tenant-ID` header

Callers include an explicit header. The gateway reads X-Tenant-ID and uses it for routing.

Trust level: Low on its own — any caller can forge an arbitrary header value.
Required pairing: Always validate against a signed credential (JWT claim or mTLS certificate). The header value from the token must match the header sent by the caller, or the request must be rejected.

JWT claim extraction

The gateway validates the JWT signature (see implementing JWT validation in Kong plugins for a concrete walkthrough), then reads the tenant_id (or org_id) claim from the verified payload. Downstream services receive the tenant ID as an injected upstream header, never the raw JWT.

Trust level: High — the claim is cryptographically bound to the identity provider’s signature.
Recommended default for token-authenticated tenants. See routing by API key vs JWT claims for a side-by-side comparison of credential models.

Primary Deep-Dive: Kong 3.x Tenant-Aware Routing

Kong 3.x supports tenant-aware dispatch natively through route headers predicates combined with plugin scoping. The declarative config below wires a JWT plugin (validation), a request-transformer (tenant header injection), and a rate-limiting plugin (per-tenant quota) onto a shared service.

# Kong 3.x declarative config (deck format)
_format_version: "3.0"

services:
  - name: platform-api
    url: http://upstream.internal:8080
    connect_timeout: 3000
    write_timeout: 10000
    read_timeout: 10000

routes:
  - name: tenant-alpha-route
    service: platform-api
    hosts: ["api.example.com"]
    headers:
      X-Tenant-ID: ["tenant-alpha"]
    paths: ["/v1"]
    plugins:
      - name: jwt
        config:
          key_claim_name: kid
          claims_to_verify: ["exp", "nbf"]
          secret_is_base64: false

      - name: request-transformer
        config:
          add:
            headers:
              - "X-Resolved-Tenant-ID: tenant-alpha"
              - "X-Tenant-Tier: enterprise"
          remove:
            headers:
              - "Authorization"  # strip before forwarding

      - name: rate-limiting
        config:
          minute: 2000
          hour: 60000
          policy: redis
          limit_by: header
          header_name: X-Resolved-Tenant-ID
          redis_host: redis.internal
          redis_port: 6379

  - name: tenant-beta-route
    service: platform-api
    hosts: ["api.example.com"]
    headers:
      X-Tenant-ID: ["tenant-beta"]
    paths: ["/v1"]
    plugins:
      - name: jwt
        config:
          key_claim_name: kid
          claims_to_verify: ["exp", "nbf"]
          secret_is_base64: false

      - name: request-transformer
        config:
          add:
            headers:
              - "X-Resolved-Tenant-ID: tenant-beta"
              - "X-Tenant-Tier: standard"
          remove:
            headers:
              - "Authorization"

      - name: rate-limiting
        config:
          minute: 200
          hour: 5000
          policy: redis
          limit_by: header
          header_name: X-Resolved-Tenant-ID
          redis_host: redis.internal
          redis_port: 6379

Key design decisions in this config:

Route matching uses headers predicates so only verified header values reach the route — unmatched requests fall through to a catch-all 404 route.
Authorization is stripped before reaching the upstream. Upstreams receive X-Resolved-Tenant-ID (set by the gateway, not the caller) and X-Tenant-Tier for downstream business logic.
Rate-limiting uses policy: redis so quota counters survive gateway restarts and are shared across gateway replicas. The limit_by: header + header_name: X-Resolved-Tenant-ID pair ensures the counter key is the gateway-controlled header, not anything the caller can manipulate.

Secondary Deep-Dive: Envoy 1.32+ Virtual Host Routing

Envoy’s route_configuration evaluates routes in declaration order. Tenant discrimination happens inside virtual_hosts using headers match conditions. Each tenant’s traffic is directed to a named cluster — Envoy’s abstraction for an upstream endpoint group — which can carry its own load-balancing policy, circuit breaker thresholds, and connection pool limits.

# Envoy 1.32+ static bootstrap (excerpt)
static_resources:
  listeners:
    - name: ingress_listener
      address:
        socket_address: { address: 0.0.0.0, port_value: 8080 }
      filter_chains:
        - filters:
            - name: envoy.filters.network.http_connection_manager
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
                stat_prefix: ingress_http
                route_config:
                  name: tenant_routing
                  virtual_hosts:
                    - name: platform_api
                      domains: ["api.example.com", "*.api.example.com"]
                      routes:
                        # Enterprise tenant — dedicated upstream cluster
                        - match:
                            prefix: "/v1"
                            headers:
                              - name: "x-tenant-id"
                                string_match: { exact: "tenant-alpha" }
                          route:
                            cluster: upstream_tenant_alpha
                            timeout: 10s
                            retry_policy:
                              retry_on: "5xx,reset,connect-failure"
                              num_retries: 3
                              per_try_timeout: 3s

                        # Standard tenant — shared upstream cluster
                        - match:
                            prefix: "/v1"
                            headers:
                              - name: "x-tenant-id"
                                string_match: { exact: "tenant-beta" }
                          route:
                            cluster: upstream_shared
                            timeout: 5s
                            retry_policy:
                              retry_on: "5xx,reset"
                              num_retries: 2
                              per_try_timeout: 2s

  clusters:
    - name: upstream_tenant_alpha
      connect_timeout: 2s
      type: STRICT_DNS
      load_assignment:
        cluster_name: upstream_tenant_alpha
        endpoints:
          - lb_endpoints:
              - endpoint:
                  address:
                    socket_address: { address: alpha-upstream.internal, port_value: 8080 }
      circuit_breakers:
        thresholds:
          - priority: DEFAULT
            max_connections: 100
            max_pending_requests: 50
            max_requests: 200

    - name: upstream_shared
      connect_timeout: 2s
      type: STRICT_DNS
      load_assignment:
        cluster_name: upstream_shared
        endpoints:
          - lb_endpoints:
              - endpoint:
                  address:
                    socket_address: { address: shared-upstream.internal, port_value: 8080 }
      circuit_breakers:
        thresholds:
          - priority: DEFAULT
            max_connections: 500
            max_pending_requests: 200
            max_requests: 1000

The circuit_breakers block on each upstream enforces connection pool limits per tenant tier. If the enterprise upstream is degraded, the circuit breaker trips and returns a fast failure — it does not spill traffic into the shared upstream and contaminate other tenants’ error rates.

For deployments managing hundreds of tenants, hardcoding every tenant into a static bootstrap is impractical. Use Envoy’s xDS control plane to push dynamic RouteConfiguration and ClusterLoadAssignment updates without restarting the proxy — tenant onboarding becomes an API call to the control plane rather than a config reload.

Controlled Rollouts for Tenant Routing Changes

Routing changes carry blast radius proportional to the tenant population they affect. Use weighted traffic splitting to validate new upstream versions or routing logic changes before full promotion.

Weighted routing in Envoy (two upstream versions):

# Envoy 1.32+ weighted cluster split for canary rollout
- match:
    prefix: "/v1"
    headers:
      - name: "x-tenant-id"
        string_match: { exact: "tenant-alpha" }
  route:
    weighted_clusters:
      clusters:
        - name: upstream_tenant_alpha_v2   # canary — new version
          weight: 10
        - name: upstream_tenant_alpha_v1   # stable — current version
          weight: 90
      total_weight: 100
    timeout: 10s

Staged rollout protocol:

Deploy the new upstream (upstream_tenant_alpha_v2) alongside the existing one.
Start at 5–10% weight on the new upstream.
Monitor 5xx error rate, p99 latency, and business-level metrics (e.g., payload validation errors) for at least one full traffic cycle (typically 24 hours for B2B APIs).
Increment weight by 10–20 percentage points per cycle, pausing if error rate exceeds a threshold.
Flip to 100% on the new upstream and decommission the old one only after two clean cycles at 90%+.

For geographic tenants requiring data-residency enforcement, combine weighted routing with DNS-based traffic steering: the geo-DNS layer resolves the nearest regional gateway, and the gateway’s routing rules ensure traffic never leaves the designated region regardless of where the originating client is located.

Comparative Implementation Table

Gateway	Tenant Extraction Mechanism	Config Approach	Key Trade-off
Kong 3.x	`headers` predicate on route; JWT plugin extracts claim	Declarative YAML (`deck`); plugins scoped per route	Fine-grained plugin scoping per route; scales with `deck sync` in CI/CD
Envoy 1.32+	`headers` match in `route_config`; Lua/WASM filter for JWT	Static bootstrap or xDS dynamic config	Per-cluster circuit breakers; requires xDS control plane for large tenant counts
Tyk 5.x	`ApiDefinition` with `proxy.listen_path` + JWT middleware	API definition JSON per tenant; Tyk Dashboard API	Dashboard-driven; per-API rate limits; less YAML-native than Kong
NGINX Plus	`map` directive on `$http_x_tenant_id`; `auth_jwt` module	`nginx.conf` with upstream blocks per tenant	High throughput; tenant config requires reload or NGINX Plus API; no built-in quota

Operational Gotchas

Forged tenant headers

A caller who discovers the X-Tenant-ID header name can set any value they like if the gateway does not validate it against a signed credential. Always cross-check the header value against the JWT tenant_id claim after token validation — reject requests where they disagree.

# Kong: use request-validator to reject mismatched tenant header
- name: request-validator
  config:
    parameter_schema:
      - name: X-Tenant-ID
        in: header
        required: true
        schema: '{"type":"string","enum":["tenant-alpha"]}'

Metrics cardinality explosion

Tagging every Prometheus metric with tenant_id as a high-cardinality label kills scrape performance above a few hundred tenants. Aggregate low-volume tenants into a tenant_id="__other__" bucket using a recording rule:

# Recording rule: normalize low-traffic tenants
record: gateway:requests:by_tenant_normalized
expr: |
  sum by (tenant_id) (
    label_replace(
      gateway_http_requests_total,
      "tenant_id", "__other__", "tenant_id", "tenant-(gamma|delta|epsilon|.*)"
    )
  )

Noisy-neighbour quota bypass via connection reuse

If the gateway reuses upstream HTTP/2 connections across tenants sharing the same upstream_shared cluster, a burst from one tenant can exhaust the connection pool before the rate limiter fires (because the rate limiter operates at the request layer, not the socket layer). Mitigate by setting max_requests_per_connection: 1000 in Envoy’s http_protocol_options and tuning max_pending_requests in circuit_breakers to the per-tenant expected concurrency.

JWT clock skew causing 401 bursts

Token expiry validation is strict. A 5-second clock drift between the identity provider and the gateway causes valid tokens to be rejected at the boundary. Synchronise all nodes with NTP and configure a leeway of 5–10 seconds in JWT plugins — enough to absorb typical skew without materially widening the expiry window.

Missing tenant context in async workflows

Webhooks and event-driven callbacks often arrive without the original JWT. Embed the tenant_id in the webhook payload or URL path during event emission so the gateway can resolve context without re-validating a stale token. Alternatively, use a short-lived webhook-specific token that carries only the tenant_id claim.

Production Configuration Checklist

Tenant ID is extracted from a cryptographically verified source (JWT claim or mTLS certificate CN), not from an unvalidated caller-supplied header.
Gateway rejects requests where the caller-supplied X-Tenant-ID header does not match the JWT tenant_id claim.
Rate-limiting counters key on the gateway-resolved X-Resolved-Tenant-ID header, not on IP address or route.
Redis rate-limiting backend has sentinel or cluster replication; a single-node Redis failure must not disable quota enforcement (fail-open vs fail-closed decision is documented).
Per-tenant upstream clusters have individual circuit-breaker thresholds — enterprise cluster thresholds are not shared with the standard cluster.
Weighted canary rollout procedure is documented and tested for at least one routing change before production.
tenant_id appears in all distributed traces as an OTel baggage entry (baggage.tenant_id=<value>) and is propagated across gRPC/HTTP service boundaries.
Prometheus cardinality is bounded: low-volume tenants are aggregated into a recording rule bucket; cardinality is validated after each new tenant onboarding.
Authorization header is stripped before reaching upstream services; resolved tenant context is injected as X-Resolved-Tenant-ID.
JWT leeway is configured (5–10 s) and all gateway nodes are NTP-synchronised.
Data-residency tenants are mapped to region-locked upstream clusters; no cross-region routing is possible for those tenant IDs.
Webhook callbacks embed tenant_id in the payload or path, not in a re-used JWT.

FAQ

What is the safest way to extract tenant identity at the gateway?

Verify a signed JWT at the gateway edge and read the tenant claim from the validated payload. Never trust an unverified header like X-Tenant-ID that a caller could forge. Use mTLS client certificates as a second factor for machine-to-machine traffic.

How do I prevent one tenant’s traffic burst from affecting others?

Scope rate-limiting counters to the resolved tenant identifier, not to the global route. Use Redis sorted sets with per-tenant keys so each tenant exhausts only its own quota. Set upstream connection pool limits per cluster in Envoy to enforce back-pressure at the socket level, before the rate limiter even fires.

When should I give a tenant a dedicated upstream pool versus a shared pool?

Use dedicated upstream pools when a tenant has regulatory data-residency requirements, contractual SLA guarantees, or traffic volumes large enough to warrant predictable resource allocation. Shared pools with per-tenant quotas and circuit breakers work well for standard and free-tier tenants.

Path & Header-Based Routing — foundational request-matching mechanics that underpin tenant header and subdomain dispatch
Tenant Isolation: Kong Workspaces vs Envoy Namespaces — comparing the two config-isolation models for per-tenant blast-radius containment
Routing by API Key vs JWT Claims — credential model trade-offs that directly affect how tenant identity reaches the gateway
Configuring CORS Policies for Multi-Tenant APIs — CORS origin allowlists that must be scoped per tenant in shared-domain deployments
Security Boundaries & Zero-Trust — mTLS, zero-trust segmentation, and the authentication layer that multi-tenant routing depends on
Kong vs Tyk vs Envoy for Microservices — gateway capability matrix to inform which tenant routing model fits your stack

Up: Advanced Routing & API Versioning