Production Deploy Checklist for TypeScript SDK

The following are recommended steps to take before deploying your Temporal application to production.

Production Temporal Cluster

Either use Temporal Cloud (join the waitlist) or deploy a self-hosted Temporal Cluster:

Linting and types

If you started your project with @temporalio/create, you already have our recommended TypeScript and ESLint configurations.

If you incrementally added Temporal to an existing app, we do recommend setting up linting and types as they will help catch bugs well before you ship them to production, and improve your development feedback loop. Take a look at our recommended .eslintrc file and tweak to taste.

Configure Connections and Namespaces

Temporal Clients and Workers connect with Temporal Clusters through gRPC.

While you were developing locally, all these connections were set to their default gRPC ports on localhost.

In production, you will need to configure address, Namespace, and encryption settings:

TypeScript
JavaScript

export function getEnv(): Env {
  return {
    // NOT web.foo.bar.tmprl.cloud
    address: 'foo.bar.tmprl.cloud',
    namespace: 'foo.bar',
    // in project root
    clientCertPath: 'foobar.pem',
    clientKeyPath: 'foobar.key',
    // just to ensure task queue is same on client and worker, totally optional
    taskQueue: process.env.TEMPORAL_TASK_QUEUE || 'hello-world-mtls',
    // not usually needed:
    // serverNameOverride: process.env.TEMPORAL_SERVER_NAME_OVERRIDE,
    // serverRootCACertificatePath: process.env.TEMPORAL_SERVER_ROOT_CA_CERT_PATH,
  };
}

export function getEnv() {
  return {
    // NOT web.foo.bar.tmprl.cloud
    address: 'foo.bar.tmprl.cloud',
    namespace: 'foo.bar',
    // in project root
    clientCertPath: 'foobar.pem',
    clientKeyPath: 'foobar.key',
    // just to ensure task queue is same on client and worker, totally optional
    taskQueue: process.env.TEMPORAL_TASK_QUEUE || 'hello-world-mtls',
    // not usually needed:
    // serverNameOverride: process.env.TEMPORAL_SERVER_NAME_OVERRIDE,
    // serverRootCACertificatePath: process.env.TEMPORAL_SERVER_ROOT_CA_CERT_PATH,
  };
}

For more information, see Connecting to Temporal Cloud (with mTLS).

Pre-build code

This information has been moved to Register Types section in the application developer guide.

Logging

Send logs and errors to a logging service, so that when things go wrong, you can see what happened.

For more information about sending logs, see Logging.

Metrics and tracing

Options

Workers can emit metrics and traces. There are a few telemetry options that can be provided to Runtime.install. The common options are:

metrics: { otel: { url } }: The URL of a gRPC OpenTelemetry collector.
metrics: { prometheus: { bindAddress } }: Address on the Worker host that will have metrics for Prometheus to scrape.

To set up tracing of Workflows and Activities, use our opentelemetry-interceptors package.

Monitoring

Here is the full list of SDK metrics. Some of them are used in the Worker Tuning Guide to determine how to change your deployment configuration. The guide also assumes you track the host-level metrics that are important for measuring your application's load (for many applications, this is just CPU, but some applications may run into other bottlenecks—like with Activities that use a lot of memory, or open a lot of sockets). How you track host-level metrics depends on where you deploy your Workers.

Performance tuning

If you are experiencing system performance issues, make sure that you have checked that the bottleneck is not with your Temporal Cluster before turning to the performance of your Workers.

We endeavor to give you good defaults, so you don't have to worry about them, but there are a few key settings you may want to explore if you are pushing system limits:

Worker Options, for example:
- maxCachedWorkflows to limit Workflow cache size and trade memory for CPU (biggest lever for Worker performance)
- maxConcurrentActivityTaskExecutions and other options for tuning concurrency
- stickyQueueScheduleToStartTimeout to determine how quickly Temporal stops trying to send work to Workers that are no longer present, via Sticky Queues
- See Worker Tuning Guide
Activity Timeouts and Retries as you gain an understanding of Temporal and the services you rely on, you will likely want to adjust the timeouts and Retry Policy to reflect your desired behavior.
- Note that there are separate Timeouts and Retry Policy at the Workflow level, but we do not encourage their usage unless you know what you are doing.
to be completed as we get more user feedback

Running in Docker

Workers based on TypeScript SDK can be deployed and run as Docker containers.

At this moment, we recommend usage of NodeJS 16 (note that there are known issues with NodeJS 18). Both amd64 and arm64 platforms are supported. A glibc-based image is required; musl-based images are not supported (see below).

The easiest way to deploy a TypeScript SDK Worker on Docker is to start with the node:16-bullseye image. For example:

FROM node:16-bullseye

COPY . /app
WORKDIR /app

RUN npm install --only=production \
    && npm run build

CMD ["build/worker.js"]

For smaller images and/or more secure deployments, it is also possible to use -slim Docker image variants (like node:16-bullseye-slim) or distroless/nodejs Docker images (like gcr.io/distroless/nodejs:16) with the below caveats.

Using `node:slim` images

node:slim images do not contain some of the common packages found in regular images. This results in significantly smaller images.

However, TypeScript SDK requires the presence of root TLS certificates (the ca-certificates package), which are not included in slim images. ca-certificates package is required even when connecting to a local Temporal Server or when using a server connection config that doesn't explicitly use TLS.

For this reason, the ca-certificates package must be installed during the construction of the Docker image. For example:

FROM node:16-bulleyes-slim

RUN apt-get update \
    && apt-get install -y ca-certificates \
    && rm -rf /var/lib/apt/lists/*

# ... same as with regular image

Failure to install this dependency results in a [TransportError: transport error] runtime error, because the certificates cannot be verified.

Using `distroless/nodejs` images

distroless/nodejs images include only the files that are strictly required to execute node. This results in even smaller images (approximately half the size of node:slim images). It also significantly reduces the surface of potential security issues that could be exploited by a hacker in the resulting Docker images.

It is generally possible and safe to execute TypeScript SDK Workers using distroless/nodejs images (unless your code itself requires dependencies that are not included in distroless/nodejs).

Note however that some tools required for the build process (notably the npm command) are not included in the distroless/nodejs image. This might result in various error messages during the Docker build.

The recommanded solution is to use a multi-step Dockerfile. For example:

# -- BUILD STEP --

FROM node:16-bulleyes AS builder

COPY . /app
WORKDIR /app

RUN npm install --only=production \
    && npm run build

# -- RESULTING IMAGE --

FROM gcr.io/distroless/nodejs:16

COPY --from=builder /app /app
WORKDIR /app

CMD ["build/worker.js"]

Properly configure Node's memory in Docker

By default, node configures its maximum old-gen memory to 25% of the physical memory of the machine on which it is executing, with a maximum of 4 GB. This is very likely inappropriate when running node in a Docker environment and can result in either under usage of available memory (node only uses a fraction of the memory allocated to the container) or overusage (node tries to use more memory than what is allocated to the container, which will eventually lead to the process being killed by the operating system).

It is therefore recommended that you always explicitly set the --max-old-space-size node argument to approximately 80% of the maximum size (in megabytes) that you want to allocate the node process. You might need some experimentation and adjustment to find the most appropriate value based on your specific application.

In practice, it is generally easier to provide this argument through the NODE_OPTIONS environment variable.

Do not use Alpine

Alpine replaces glibc with musl, which is incompatible with the Rust core of the TypeScript SDK. If you receive errors like below, it's probably because you are using Alpine.

Error: Error loading shared library ld-linux-x86-64.so.2: No such file or directory (needed by /opt/app/node_modules/@temporalio/core-bridge/index.node)

Or like this:

Error: Error relocating /opt/app/node_modules/@temporalio/core-bridge/index.node: __register_atfork: symbol not found

Production Temporal Cluster​

Linting and types​

Configure Connections and Namespaces​

Pre-build code​

Logging​

Metrics and tracing​

Options​

Monitoring​

Performance tuning​

Running in Docker​

Using node:slim images​

Using distroless/nodejs images​

Properly configure Node's memory in Docker​

Do not use Alpine​