Asynchronous step functions

Waiting without the hassle thanks to Task Tokens

Fabian FeldbergFabian Feldberg
5 days ago | 4 min read

Alarm clock on pink and green background

I've written about Step Functions and why they are great previously. Being able to quickly create a very clear and easily observable workflow is simply a very nice thing.
As you might have gathered from my previous post, or personal experience, handling asynchronous tasks used to be something of a pain where Step Functions are concerned. Luckily for us, AWS added some new functionality to help with just those issues. Having recently refactored some of our Step Functions to use them, I want to talk about Task Tokens.

The old days

If you wrote a Step Function before June 2019, the longest you could painlessly wait for a task to complete was 15 minutes (the maximum runtime of a lambda function). Anything that might take longer involved adding an extra step to constantly poll whether your previous task had been completed; adding both complexity and expense through additional lambda runtime, Step Function transitions and development time.

Enter Task Tokens

All that can be a thing of the past. Whenever you have a task in your Step Function that requires you to wait for something to complete (e.g. waiting for a user to click a confirmation link), you can append .waitForTaskToken to the resource field in your task. This will generate a unique token for this task that can be used to make an API call to register success or failure for the task.
You can then pass that token into your task and use it as needed. Here is a quick example of what a step using task tokens might look like.

"RequestApproval": {
      "Type": "Task",
      "Resource": "arn:aws:states:::lambda:invoke.waitForTaskToken",
      "Parameters": {
        "FunctionName": "approvals-production-approvalRequest-send",
        "Payload": {
          "body": {
            "taskToken.$": "$$.Task.Token",
            "approvalRequest.$": "$.body"
          }
        }
      },
      "HeartbeatSeconds": 300000,
      "Next": "SaveFeedback"
    },

Making it work

Knowing Task Tokens exist is great, but let's look at how to actually use them. Whenever you define your resource, simply append .waitForTaskToken to declare that you'll be using it. The token itself with be placed in the executions context object, where you can access it at $$.Task.Token.
It is important to note that the task itself will not have any impact on the workflow in this case. You will need to ensure that either SendTaskSuccess or SendTaskFailure are called with the Task Token. The most common ways of doing this would be invoking a dedicated Lambda or configuring API Gateway endpoints - which is what we ended up doing. The process for using API Gateway is straightforward and nicely documented by AWS themselves

If we take the example of our RequestApproval step above, we could have that lambda generate an email for our user, with an approve or deny buttons using the taskToken. The Step Function will wait until the user visits either link and continues execution of the task, or time out at a configured time or the maximum execution time (1 year).
A note here, in a production environment you will want to use a message queue instead of directly invoking the lambda, or at the least implement a DLQ, as a failed execution will not necessarily fail the Step Function when using waitForTaskToken.

const sendApprovalNotification = async event => {
  const requestBody = JSON.parse(event.body)
  const { taskToken, approvalRequest } = requestBody

  const approveUrl = `${process.env.STEP_FUNCTION_TRIGGER_GATEWAY_URL}/approve/${taskToken}`
  const denyUrl = `${process.env.STEP_FUNCTION_TRIGGER_GATEWAY_URL}/deny/${taskToken}`

  return await sendEmail(approveUrl, denyUrl, approvalRequest)
}

As you can see, the taskToken is simply passed in as another variable and is simple enough to use.

You will probably want to configure a timeout for your task, as tasks using Task Tokens will otherwise run indefinitely. You can do so by passing desired timeout in seconds with the HeartbeatSeconds key - not the standard TimeoutSeconds.

Making the switch

Looking at these benefits, it seems obvious that you should use Task Tokens for any new Step Functions that include asynchronous tasks. I would suggest that you should also refactor any existing Step Functions to use task tokens if you have the time.
The new pattern lets you reduce the amount of Steps, which leads to both easier to read workflows as well as savings on Step Transitions. On top of that, the old pattern of having lambdas polling against activities results in using a lot of lambda runtime which isn't necessary anymore.

Share me!