RPA & API Management: The Great Convergence

future

The RPA market is reaching an interesting inflection point. The lines that traditionally separated APIs from bots are beginning to blur. What is driving this trend, and what does it mean to organizations looking to tackle digital transformation initiatives? 

 

For a long time, there was a clear separation between RPA and API management tools. Traditionally, RPA vendors allowed companies to automate low volume back office tasks that span multiple applications. API management vendors helped manage the complexities of multiple vendor APIs, but could only add value when applications already had some kind of API interface. Both classes of technologies have included some form of business process management capabilities, to mixed success. 

 

The reason why RPA solutions are not well integrated with API management tools is simple: RPA is slow, doesn’t scale cost-effectively and is often unreliable when deployed in production. With enough money and hardware, it is possible for organizations to integrate RPA and API management tools together to support higher volume transactional workflows, but often the ROI for these projects does not justify the spend. Companies need a better way of achieving the speed, scale and simplicity required of their RPA backed initiatives.

Why is RPA slow?

There are a couple of reasons for RPA being slow. Since RPA just drives applications in the same way as a user would, all applications have a hard lower bound on latency for a given automation. However, there are a few things within our control that allow us to reach the true lower bound latency of an application. For one, you can minimize the system overhead by running applications in headless mode whenever possible. Also, by using smart session management and autoscaling rules, you can have a pool of warm instances that eliminate the delays associated with cold start. Even though there is a lower bound to latency, knowing you can reliably achieve that lower bound at scale allows architects to build more reliable solutions.

Why does RPA not scale?

RPA workflows are often designed to span multiple applications. This is great when building an attended workflow or for low-volume batch processes, but it is not ideal for creating synthetic APIs on top of legacy applications. With the legacy RPA architecture, you have to deploy enough bots to support the highest latency application in the flow, making it expensive to scale.

Imagine if you could break out the component steps in a workflow based on the underlying applications you are automating. Instead of using a monolithic bot for the entire workflow, you use lightweight bots for each application type. This allows you to horizontally scale out the bots supporting high latency applications, while keeping a much smaller number of instances for the low latency applications in the workflow. Here is an example automation that should help illustrate the value of this approach. 

 

Let’s say you have four applications that make up a given process flow that you are looking to automate: 

 

  • Homegrown browser based billing application which lacks APIs (latency: 2s)
  • Homegrown .NET skew tracking application (latency: 10s)
  • Homegrown browser based CRM application which lacks APIs (latency: 2s)
  • Salesforce API (latency: 1s)

 

Maybe your company wants to allow a chatbot to respond with relevant product recommendations for a client based on their order history. This requires multiple systems, starting with a homegrown billing application. Data from the billing system is passed into a thick client .NET application that returns relevant product promotions. The product details are tracked in another homegrown system. Finally, a log of this interaction is entered into Salesforce through their API as part of an ongoing Salesforce migration.

For the pilot, you just want this service to be able to handle 16 requests per minute. The workflow has a total latency of 15 seconds on average. Let’s assume that the 15 second latency is only achievable if the underlying application is open, logged in and waiting on the correct page, so even though a browser is used twice, each application would need to be open to hit the latency target.

Let’s assign some numbers to the resources required by the applications used in the process flow:

 

App

Memory (MB)

CPU

Chrome

300

0.5

.NET

100

0.2

Rest Client

10

0.1

Windows OS

2000

1

Alpine Linux

100

0.2

To achieve the desired throughput using a traditional RPA approach would require four instances of Windows, and 4 instances of every component application.

Doing the math, this requires 9.2 CPU cores and 10840MB of memory.

 

Using AppBus’ architecture, the flow would look like the following:

There is no getting around the need for four instances of the highest latency app in this process flow. However, the lower latency applications can be broken out and run in lightweight containers. Doing the math for the resource requirements, this architecture would require  6.5 CPU cores and 9310MB of memory. This is a 42% reduction in CPU usage and a 17% reduction in memory usage. This is a conservative estimate as we are not accounting for the fact that AppBus runs browser automations in headless mode, or comparing the relative resource footprints of the bots themselves, or addressing the fact that the scale out behavior in the AppBus architecture is more granular. The bulk of the resource consumption here comes from the Windows OS. If all of the component applications were compatible with Linux, then the cost savings would be even greater.

A few related benefits fall out of this architecture as well. The lack of sufficient APIs on all three of the homegrown applications may be cause to consider rewriting them. However, with the benefit of AppBus’ architecture, it is clear the biggest win would come from investing in the .NET application. This could come from direct development on the .NET application or work put into thinning out the underlying Windows OS so that it only runs enough services to support running that specific .NET application.

 

The benefit of specialized bot runtimes seems pretty clear, so why isn’t this architecture more widely adopted by the major RPA vendors? The simple answer is that most of the advancements in container orchestration that makes the complexity of a more distributed architecture manageable is much newer than the core IP on which RPA vendors built their technology. Many RPA vendors cannot easily decouple from the Windows ecosystem. Ideally you should be able to treat your bots like you do serverless functions, with some high demand or first touch services being always on, and lower volume batch services scaling to zero. While there are some exciting advancements in the area of Windows containers, a cost-effective scale out strategy for most RPA just isn’t an option.

Why is RPA unreliable?

RPA is unreliable at a fundamental level. When you build services on top of a user interface with no explicit versioning or exhaustive documentation, there is always some amount of risk that technology can at best help mitigate, but never fully eliminate. How do you handle edge cases in the UI, for example what if a rarely used product code requires additional form inputs that you didn’t discover in testing? How do you handle subtle changes to the user interface for applications still under active development? Addressing the challenges of making RPA more reliable requires a multifactorial approach like the one discussed below.

Design

During the design phase of an automation your RPA tooling needs to make it easy for your automation engineers to follow best practices. You don’t want subtle changes to an application’s UI to break your automations. The tooling should allow developers to target UI elements in a very expressive way. The tooling should also make it easy to write tests, to test locally and to deploy into a dev or staging environment.  

Detect

When something does go wrong, your RPA tooling should provide ways to detect these breakages. This can be achieved through a mix of out of band testing and detailed error handling. The quicker breakages can be identified and isolated, the faster fixes can get into production, hopefully before users are impacted.

Delegate

As mentioned before, RPA will eventually break, and every organisation needs to have a contingency plan for when that happens. The only practical solution is to gracefully fall back to the user centric approach to performing the task. Your RPA tooling should be able to route work to humans and provide them with enough information that they can complete the workflow, or at least escalate the issue with more detail to your automation engineers.

Deploy

Once a fix is created, you need a quick and more reliable way to roll the fix out into production. In addition to the cost savings that result from AppBus’ architecture, it also makes it easier to identify and fix breakages. Much like with microservices, instead of having to redeploy an entire complex workflow to fix one small part, you just redeploy a much smaller logical component that is easier to test and get into production more reliably. 

Conclusion

RPA and API management will eventually converge once the RPA issues of speed, scale and reliability are addressed. Here at AppBus we think we have solved a number of these issues, and are working everyday to push the product forward. Legacy technology can sometimes feel like an anchor that stifles innovation. By allowing more legacy applications to participate in the API economy, we hope to unlock the innovative potential inside many mature companies

Recommended Posts