Financial Services Software App Challenges: Mean Time to Identify (MTTI) & Mean Time to Resolve (MTTR)

 ● 14th Jul 2021

3 min read

As DevOps teams release and automate with increasing frequency, performance and availability problems have soared, leading to more time troubleshooting and less time developing amazing apps. 

This means reducing Mean Time to Identify (MTTI) and Mean Time to Resolve (MTTR) is more important than ever, especially in the financial services industry where massive disruption has become the norm. 

Speed or Security? FinServ’s Technology “Tradeoff”

In today’s world of rapid digital transformation, FinServ companies need to be able to offer a totally seamless customer experience. When it comes to financial transactions, even the smallest lag or misstep can cause users to question the status and safety of their money, erode their trust, and irreparably damage your brand reputation and bottom line.

This scenario is putting financial institutions in a tough spot when it comes to software technology: they need to out-pace the competition by delivering simple, secure, feature-rich digital experiences; but they need to be able to do this without sacrificing reliability or security. 

The problem is, as FinServ organizations accelerate the frequency and velocity of their product releases to stay competitive, they are facing increasing reliability issues with mission-critical applications, leading to sometimes catastrophic, headline-making outages

This speed-stability challenge has been exacerbated by increasing scrutiny around financial industry regulation and compliance. A single, unforeseen error can expose the integrity of customer transactions and lead to massive fines and other regulatory consequences. 

Nobody wants downside, but in the financial world downside is all too common and tends to come with severe consequences, with sometimes hundreds of millions of dollars lost in a matter of minutes. 

The Microservices Blessing-Slash-Curse

Financial institutions have undoubtedly reaped the benefits of microservice-based applications. However, they’re also suffering the consequences of everything moving towards microservices and big distributed apps. 

Consider this: If you have 10 microservices, even a 99.9% code reliability means anywhere from 500 to 1,000 hours of downtime per year. 

Thanks to microservices, FinServ companies can much more easily deploy code and build apps, but they have many more places to look for problems, and what tends to happen is a cascading effect, where one problem leads to a second, leads to a third, until you have a symphony of issues, and good luck finding the instrument that started it all. 

Bottom Line: Don’t Be Afraid to Invest in Testing

The trick, of course, is to catch things before heading into production, meaning, before a customer can experience it and where the cheapest possible fix will still work. 

You always need to think about how far left you can shift the problem: was there a unit or migration test that could have caught it? Where should you have caught it? Every incident is a learning opportunity. You’ve already reaped the cost. Now—can you reap the benefits?

Defects found in production tend to whipsaw not just the developer who wrote that particular module but also senior developers. Downtime can derail an entire engineering team and derail an entire sprint. 

So, bottom line: don’t be afraid to spend time writing tests, because a little bit of testing can go a long way toward saving you, your team, and potentially your company, from damaging, maybe even catastrophic, downtime. 

Related Panel Discussion: 4 BIG Software App Challenges For FinServ

This past May, Bob Kemper, VP of Worldwide Engineering at OverOps, and Anders Wallgren, VP of Technology Strategy at CloudBees took part in a panel discussion on the 4 BIG Software App Challenges For FinServ. In part 1, the two discuss  the challenges of MTTI & MTTR:

Troubleshooting Apache Spark Applications with OverOps OverOps’ ability to detect precisely why something broke and to see variable state is invaluable in a distributed compute environment.
Troubleshooting Apache Spark Applications with OverOps

Next Article

The Fastest Way to Why.

Eliminate the detective work of searching logs for the Cause of critical issues. Resolve issues in minutes.
Learn More