OverOps CTO Tal Weiss shares some thoughts on application reliability in a time of economic uncertainty and increased reliance on software.
These are tumultuous times we live in. As of this post, almost a third of the U.S, and over a billion people around the world, are in virtual lockdown. Virtual not just because we are mostly limited to leaving our homes for performing only the essential activities required for everyday life, but also virtual in the sense that much of life has now gone online.
For those of who are in the business of making and operating software, things are becoming even more strange, and in some ways harder. The government mandated economic shutdown needed to combat this pandemic will undoubtedly have far-reaching negative implications for the future of our economy. On the other hand, it is already placing huge pressure on our hardware and software infrastructure – pressure that will not ease up anytime soon.
This tension creates a unique paradox in which technology companies need to provide the same level of service quality and innovation in order to compete in today’s marketplace, while simultaneously having to deal with declining revenues and share values (a primary way by which companies compensate and become attractive to their employees).
This is different from traditional industries where times of economic slowdown can decrease production to meet lower demand. This is especially true for companies specifically impacted by this storm in industries such as travel, hospitality and must adjust to it.
The new reality has also brought technologies such as video conferencing, mobile banking and online food delivery fully into the mainstream. What was once the social territory of techies, millennials and iGens has become the way by which senior citizens consult with their doctors and order their groceries to maintain social distance.
And this new reality is not likely to change soon. For most technology companies this means a slew of new features and capabilities they will need to add to their software to accommodate changing user behaviors and support new use cases. This means that just keeping the lights on will not be a highly feasible option. We can also factor into that the most likely winners in this equation will be “big-tech”, leaving most players within the industry to fend for themselves.
With hiring freezes looming and quiet layoffs already begun, technology organizations will need to find a way to deal with this crunch. The traditional way by which industries that survive economic downturns have done so has been by focusing on automation.
From a software standpoint, as the process of writing code remains today a very human driven process (no AI has yet to pass a coding “Turing test”), technology companies will need to find ways of automating the way by which they test, deliver and operate their software to remain not only competitive, but viable.
On the software delivery side, this will most likely mean an investment in open source technologies focused on the automation and delivery of software (e.g. Jenkins, Kubernetes, etc.). On the production operation side, things will get even more interesting, as companies will have to deal with increasing load on their infrastructure, and unforeseen edge cases that could risk the overall stability of their production environments with less hands involved. DevOps groups will need to find ways of becoming more efficient.
One very reasonable outcome of this sudden change in environment, is the push of capabilities such as Machine Learning and Artificial Intelligence that have up until now been considered mostly a novelty in the world of DevOps into the operational foreground.
It might very well be that things that we do today as part of our every days DevOps life such as searching through our logs or setting up health checks on Servlets in our APMs (i.e. “Business Transactions”) will become a thing of the past, in the same way by which we no longer copy bytecode artifacts using python scripts onto our production nodes (hopefully!). As such, this crisis will create an opportunity (or more likely a necessity) for Machine Learning and AIOps to become production ready.
Our belief at OverOps has always been that while developers are great at writing code and inherently limited in their ability to foresee where it will break down later, the task of detecting software issues and gathering the information on them in production should be automated, given the massive operational data volume and noise which high scale environments produce. The 30% of time and resources traditionally allocated to manual identification, routing and reproduction of issues during the software delivery lifecycle will most likely become a thing of the past.
This is something the industry has been increasingly moving towards, but given these recent changes it will now most likely become a hard necessity. While we normally welcome the shift towards new technologies and paradigms, we wish it would have come under much better circumstances. Stay healthy, and let us know what you think the impact of the epidemic will be on our software environment in the comment section below!