Let’s start a series that takes you along on my journey into the world of cloud-native observability. This is a journey I’ve started on since joining Chronosphere, a cloud-native observability platform, a little less than a month ago.
While I’ve been evolving the stories I’m telling for some time from developer audiences to architecture audiences, one thing that caught my eye has been the complexities of cloud-native environments. The more complex the solution architecture, the greater need for simple ways of sharing how successful organizations work at a cloud-native scale.
Along with the journey into cloud-native architectures, there has emerged a very distinct issue that is playing out across cloud-native environments. That issue I’ve outlined in a series about cloud data – and it’s about more than just your data storage from the early architecture days.
This look at cloud data uncovered a very interesting and somewhat hidden world of cloud-native observability, where the data generated while keeping tabs on your cloud-native architecture often can exceed your spend on running production.
This series kicks off with the basics from developer to cloud-native observability, the players involved, and outlines the technical versus business story being sold to you around the tooling in cloud-native observability.
Let’s dive right in, shall we?
The basic introduction starts from the point that developers are in a world without clouds and then have had to make the transition to a cloud-native development world. What does this mean for them and what are some of the challenges they are having to embrace?
Old Developer Ways
It’s important to understand coming from the developer world of old, writing code for services and applications pre-cloud native, that the idea of monitoring my code as it’s working its way towards production was often very limited.
This was usually some sort of continuous integration and continuous deployment (CI/CD) toolchain that would provide me with some insights as to performance, test failures, and deployment success. Chasing down failures did not often require dashboards, other than the CI/CD one alerting to any problems. That alert would put me back in my developer environment tooling to debug by trying to decipher logged errors, test failure results, and using a lot of breakpoints as I stepped through my code.
Most of this would be the purview of the operations department when the code hit production. They had their tooling, with log parsing, dashboards, and monitoring favorites such as Nagios.
Then came the world of cloud-native development.
Developing With Cloud-Native O11y
Slowly there was a shift where as a developer you were no longer working on your own machine or in your own data center-hosted environments. Everything is in a cloud, or cloud-like environment, which changes all business expectations.
Agile development shortens the road to production with automation, forcing us to move at the speed of your next code change. It also created a new landscape where operations shifted left closer to the developer, and we all became DevOps teams.
New features were no longer released several times a year, but several times daily or even hourly. This brought a need for better tooling to deal with the vast array of components being created in our cloud-native world. Applications make use of hundreds, if not thousands, of microservices and it becomes very difficult to maintain observability across these architectures.
There it is friends, the word we have landed upon in the cloud-native world to represent the monitoring of everything: the rise of cloud-native observability. Observability, or o11y for short, is so much more vast than anything that has happened in our developer world to date. Not only do you want to keep track of your applications’ and services’ availability, but you also want to pre-detect trends that might lead to degradation or downtime of your customer’s experience.
At the start, there was much talk about the three pillars of monitoring to try and tackle the challenges of cloud-native o11y: metrics, tracing, and logs. The problem is that businesses are more interested in focusing on three phases: a need to know the problem at hand as fast as possible, being able to quickly triage the issue and fix it (remediation), and finally, to come to understand fundamentally what happened to prevent future occurrences.
Next Up, Who’s on the Field
After a brief recap of the path developers and operations have taken from the old world to the new cloud-native world, this article touched on the difference between the technical approach (pillars) and the business approach (phases) to cloud-native o11y.
Keeping that in mind, coming up next in this series, a look at who are the players on this cloud native o11y field.