Writing

Going with the Flow: Temporal

5 minutesNovember 19, 2024TemporalArchitectureEngineering
going-with-the-flow-temporal

When building software, a key challenge is how to map a complex business process, consisting of several steps, into a set of actions that can be handled by software. We need to manage state, retries, and error handling across various functions and handlers and update state along the way. Each action might be straightforward, but understanding how it fits into the bigger picture can be difficult, especially when it exists alongside other files. For us, Temporal.io has solved this problem by allowing us to build long-running workflows that encapsulate entire business processes. In this article, we’ll explore how it works and how it might benefit your project.

What is Temporal?

Temporal.io is an open-source solution that enables you to run durable workflows. This means you can write code that appears synchronous but in fact operates asynchronously under the hood. Temporal provides a platform where code can run, pause, and resume while waiting for long-running tasks to complete or asynchronous events to occur, in a very synchronous fashion. Once mastered, Temporal becomes a very powerful tool for orchestrating complex processes.

We’ve also found that it addresses several common issues such as boilerplate code for retries; losing the state of a running process, and dealing with callbacks for long-running operations.

Getting Started

Temporal can be self-hosted or used as a SaaS. Before diving in, let’s define some key concepts:

  • Workflow: Defines the overall flow of the application.
  • Activity: A function that executes a single, well-defined action (e.g. calling a service, transcoding a media file, sending an email).
  • Signal: An asynchronous write request that causes changes in the running Workflow, without awaiting a response (or error).
  • Query: A read request to get the current state of the Workflow.

Workflows are written like regular functions, leveraging activities you have defined to execute asynchronous actions. Temporal automatically handles retries in case of errors, and you can even configure the retry strategy and errors it should ignore. You can also write conditions and wait for external signals to resume execution (e.g. a customer clicking a link or a payment being received).

While the workflow is running (or even after completion under certain conditions), you can query its state. Temporal workflows are durable and can run indefinitely (with some limitations on time), making it a great tool for long-term processes, like deleting inactive customers after two years.

Additional features

You can also schedule the execution of a workflow for later, run it regularly using its cronjob feature, and embed existing workflows by invoking one within another.

For example, you might have a workflow to update a customer's email across all systems. Your customer support team should be able to change this at will because they already have a peer review system. However, you want customers to be limited in their number of attempts and to first verify their email. So the API for your support team could invoke your workflow directly, while the customer-facing API invokes a workflow that implements all of the above. Upon completion it automatically invokes the update email workflow.

Maximize image

Edit image

Delete image

How we've used Temporal

We've used Temporal to orchestrate complex workflows spanning multiple microservices. Temporal held the workflow (managed, run, stored, executed) that allowed us to start the business process, follow its progress and report status and errors. We applied it to various use cases, such as customer lifecycles and financial activities.

One of our regular patterns involved communicating changes across multiple unrelated microservices. The first thing we did was standardise the contract that all those microservices expected to be made aware of the change. Afterwards, leveraging the power of activities, we put the list of endpoints that needed to be called into a configuration file that was dynamically loaded every time we needed to communicate a change. Any endpoint added would then be automatically retried in case of failure and if unsuccessful after some point would be reported back to the workflow to alert us. This pattern allows new teams to subscribe to events without having to know how the workflow works or worry about their uptime.

In one use case (this one is my favourite), we combined the power of Temporal.io with a state machine inside the workflow (people with prior knowledge of Temporal.io will know that this is an inception).

In this case our temporal workflow was started with a configuration that allowed us to create a state machine. Then on each action that was signalled, the workflow moved the state machine forward, and then workflow users could know what was the next action to be taken by making a query to the workflow.

By using standardised contracts for our state machine, signals and queries, we built a powerful engine that allowed us to reconfigure our business process on the fly without needing to redeploy code, simply updating configuration files.

Learnings and Gotchas

  • Customer-facing Flows: Temporal is not ideal for synchronous or customer-facing flows due to the latency introduced by signal/query dance (but there is hope with the upcoming update feature which we’re very excited try!)
  • Retries: Automatic activity retries can cause duplication in downstream services. Ensure services are idempotent or use idempotency keys to avoid processing requests twice.
  • Workflow Updates: Take special care. Updating long-running workflows requires attention, as changes affect both existing and new workflows. There’s a good amount of documentation around this but not something that can be done on a whim initially.
  • API Abstraction: It can be useful to have an internal api hiding temporal logic from your other services. This keeps the temporal specific code in one location, avoids having to set up the SDK in all of your services. This means that other teams don’t necessarily need to know or understand Temporal.io themselves.
  • Minimising Payloads: Minimise the payload returned by activities because everything is stored within the workflow. We’ve had issues where we were returning whole files from one activity to pass to another one and ended up blowing up the workflow’s context and then our Temporal.io server (since we created a lot of those workflows). In that case we solved it by passing around an S3 link instead.

Conclusion

We’ve found that Temporal offers significant value by simplifying the management of complex business processes, handling error management, and managing retries seamlessly. We've explored various use cases where Temporal excels, and highlighted scenarios where it may not be the best fit (async process vs sync process). We've also discussed potential challenges to be mindful of when starting with Temporal to help you avoid common pitfalls.

We hope the insights and tips shared here will help you get the most out of this tool. Whilst Temporal is not a silver bullet, it's a very powerful addition to your toolkit. On our side, we’re eagerly awaiting the upcoming "Updates" feature to address some current limitations and very excited to try it out.

If you'd like more insights or assistance in implementing it, feel free to reach out!

Need help with your project?

Let's talk