I’ve seen on a few occasions that there is some confusion on what the differences are between the short lived and long lived processes and when it’s appropriate to use one process type vs the other. In this post I’ll talk about how the two work and some of the differences between them, and also talk a bit about when I choose to use one or the other. In the end, you, as the developer will know your application and be in the best position to make the decision on which is right for you. Hopefully, this post will arm you with enough knowledge to be able to make the best informed decision you can.
So what is the difference between the two? A generic description you’ve probably already heard (or inferred from the names) is that short lived processes are for doing things that will complete quickly (or short running processes) and long lived processes are for doing things that will take a long time to complete (or long running processes). However, there’s a lot more to it than just that.
Note: This post assumes the user has a license for Process Management. Without this, use of long lived processes are restriction by the EULA.
The first thing to talk about is how these two process types are usually invoked.
Short Lived: By far the most common way to invoke short lived processes is synchronously. So in this case the caller invokes the process and waits until a response or output is returned from the process. However, it is also possible to asynchronously invoke a short lived process. You may have seen this termed as fire and forget in developer tooling like Workbench. In this case the caller invokes the process and then moves on without waiting for a response.
Long Lived: Long lived processes have only 1 method of invocation and that is asynchronously. This is one differentiating factor, if you need a synchronous invocation of a process then you must use a short lived process.
The discussion of how things look at thread level actually depends on the invocation mode being used (synchronous vs asynchronous) and not strictly on short vs long lived processes. However, since the vast majority of short lived processes are synchronously invoked and all long lived processes are asynchronously invoked you can almost draw a 1-1 relationship between thread behavior and process type.
In synchronously invoked processes what you will see is that the execution of the process will occur completely within the same thread as the invocation occurred. This leads to a couple of interesting notes:
- There is more than one thread pool involved. Since the thread where process execution occurs depends on the thread in which invocation occurred there are multiple possible pools from which the thread can come from. For example, if you invoke the process via a SOAP request, you’ll be using a thread for process execution from the application servers web thread pool. However, if you were to invoke the process using Workbench, the process execution would occur in a thread from within the application servers workmanager thread pool.
- For the entire lifecycle of the invocation the thread being used will never be released back into the pool. What does this mean? It means that for everything that is done prior to invocation, to the invocation, to the process execution, to everything that may occur after the invocation returns is all happening in the same thread, and that thread is not available to be used for anything else. If you follow the general guideline that short lived processes should only be used for processing that doesn’t take much time, this isn’t likely to cause you any problems. However, there is no technical limitation stopping you from using a short lived process for something that actually takes a long time to process. For example, you might have a short lived process that takes 4 hours to finish. In that case the thread you are using is unavailable for other work for that whole time. As you might imagine, if you throw enough load at the system doing this, it would not be very difficult to exhaust the entire thread pool being used, and effectively “bring down” a portion of the application server.
For asynchronous invocations the process execution does not occur within the thread where the invocation occurs. The process execution occurs inside another separate thread (one from the workmanager thread pool in ES2). On top of that any asynchronous steps that are hit during execution of the process will result in the thread being released back into the pool for use. This happens because the invocation of the asynchronous step’s operation is done asynchronously as well.
As you can see, while synchronous invocations have the advantage of not incurring any overhead of having to switch threads, asynchronous invocations provide a better capacity for sharing system resources among the entire load on the system and there are times when this would be preferable to just choosing the more efficient route.
Note: A common misconception is that each step in a long lived process runs in an asynchronous manner. This has not been the case since version 8 where synchronous branches are used as the default for long lived processes. This means that the execution of the process will occur in a synchronous fashion, within the same thread, until an asynchronous step is hit. After the asynchronous step is done the process execution will again continue in a synchronous fashion until the next asynchronous step is hit.
Transactional Behavior and Database Use
The next thing to look at is the transactional differences between short lived and long lived processes and also their use of the database.
Short lived process execution occurs entirely within one transaction. This means that if anything goes wrong during execution of the process the entire transaction and all the steps that occurred previously will be rolled back. Because of this, “retrying” a short lived process literally means invoking it again from scratch.
Note: There are further options for the transactional behavior of short lived processes that can be configured. For example, whether to allow the process execution to occur within a parent transaction, or whether to force the creation of a new transaction to encompass the process execution. I won’t go into these details as it could be a topic for a post all on it’s own, but will instead focus on the general behavior.
There is no default use of the database by short lived processes. So unless there is a step in the process that explicitly writes to the database then nothing is written into it.
Long lived processes will have 1 transaction created and committed/rolled back per step in the process. Unlike with short lived processes, when something goes wrong in a long lived process it is only the current step that gets rolled back and not the entire process. When this occurs the step is then marked as stalled in the workflow engine, allowing you to go back and retry the step at a later date should you so desire.
This is made possible, because long lived processes will track the values of its process variables at each step in the process within the database. This generally occurs within a database table named tb_pt_process_name, where process_name is the actual name of the process. This means that there is additional overhead associated with long lived processes coming from it’s need to maintain it’s state inside the database.
Because the of the transaction behavior of these two process types there may be times when it is more desirable to use a long lived process even though it may not be strictly required. Imagine the situation where a process loops around a dataset that is passed in as input and performs some operations on each item in the set. Let’s say for each item the operations performed require 30 seconds of processing time, and the number of items in the dataset can be between 1 and 10000 items (numbers may be exagerated to illustrate the point :) ). In that case the running time of this process will be between 30 seconds and 83.33 hours. Normally the default transaction timeout value for an application server is around 5 mintues. You can change the timeout value for the short lived process specifically, but in this case setting it to 84 hours wouldn’t be something I’d recommend. Here it would be better to use a long lived process where you don’t have to worry about transaction timeout values.
So which should you choose?
Should you choose to design your process as a short lived process or long lived process? In the end, it’s up to you, but when I’m faced with this decision here’s what I do:
I always design a process as short lived, unless I have to make it long lived.
Here’s some examples of times when I’d decide I have to make it long lived:
- The process contains an asynchronous step (Wait and User services are the two that come to mind out of the box).
- When the processes execution time is significant and you want to avoid using up threads in the thread pool used by the invoker.
- When the process’ execution time is completely variable and unpredictable and you want to avoid setting an extremely high transaction timeout value.
- When you want to have the granularity to to stall and retry on a step by step basis in the process.
- When you want a data trail of the process variables to be kept in the database for tracking or reporting (this would bring up another question of whether to use the data we store for reporting or design your own reporting mechanism, which is outside the scope of this post. In general, my recommendation would not be to use our data for your reporting, it likely isn’t structured in a way you’ll find suitable).
Hopefully the information presented here will be helpful and provide you with the necessary background to make your decisions on process design and whether to use a short lived or long lived process.