Modern Observability:  When “Small” Data Beats Big Data

Modern Observability:  When “Small” Data Beats Big Data

If you take a look at the history of big data, one thing becomes clear – our ability to collect data has always been greater than our capacity to make meaning of it.  We’re seeing this happen with observability, or the task of assessing and measuring a system’s current state of health based on the external data it generates. Today’s IT teams with observability practices in place are literally inundated with so much data that they can no longer harness and leverage it effectively. What was intended to be a blessing, becomes something of a curse: observability is ultimately meant to help these teams find and fix the root cause of system performance (speed, availability) issues faster, but ironically, the length of downtime associated with publicly reported outages is actually growing at a disturbing rate. 

With the rise of the cloud, hybrid infrastructures and microservices, apps and systems are only going to be generating even more data.  For many, big data is simply getting too big, and this is going to force some organizations to modify their observability approaches significantly – reverting from big data to small data.  What does this mean?

No More “Centralize and Analyze” – Observability architectures have traditionally been built using a ‘centralize then analyze’ approach, meaning data is centralized in a monitoring platform before users can query or analyze it. The thinking behind this approach is that data becomes contextually richer, the more you have and the more you can correlate, in one central location. Building your architecture in this manner may have worked well in a previous era when data volumes were comparatively smaller. But given the tsunami of data now being generated – the vast majority of which is never used – organizations can no longer afford to aggregate their data in these expensive, “hot” storage tiers. Rather, data needs to continue to be analyzed simultaneously and correlated, but in smaller volumes in different places, where it originates.

Analyze data at the source – To keep the storage costs associated with a central repository down, many organizations have resorted to indiscriminately discarding data sets. While it’s true that the vast majority of data is never used, the reality is anomalies and problems can crop up anytime, anywhere – so if you’re randomly omitting data, you’re leaving yourself with significant blind spots. By analyzing data in smaller chunks, at the source (versus a central repository), you can effectively survey and have an eye on all your data – giving tremendous peace of mind – while then relegating lower-priority data to a lower cost storage tier and saving significantly on expenses.

Ease the pressure on the pipes and downstream systems – Another challenge of the “centralize and analyze” approach is it can lead to clogged data pipelines and overstuffed central repositories, which slow down significantly and can take much longer to render returns on queries. So another benefit of analyzing data in smaller increments, at the source is that organizations become much more nimble in conducting real-time data analytics – helping identify growing hotspots and their root causes faster, which is critical to reducing MTTR. Some organizations find they don’t even need a central repository at all. But for those who wish to keep one, high-volume, noisy datasets can be converted into lightweight KPIs that are baselined over time, making it much easier to tell when something is abnormal or anomalous – a good sign that you want to index that data. In this way, organizations can “slim down” their central repositories and maintain some control over what’s getting routed there.

Make Your Data Accessible –  The reality is there are going to be times when access to all of this data is needed, and it should be accessible – whether in a streamlined central repository, or in cold storage. Developer team members should have fast, easy access to all their smaller-sized datasets regardless of the storage tier they’re in, not having to ask operations team members who often serve as the gatekeepers in the central repository-based, big data-”esque” model. 

Conclusion

When it comes to managing increasingly complex and extensive IT infrastructures, big data remains, justifiably, a major focus of research and interest. But small data is still with us and sometimes small data will beat big data, enabling more agility in reaching the right conclusions faster, more reliably and at lower cost. Modern observability initiatives are a prime example, as surging data volumes are driving many organizations close to an inflection point.

About the Author

Ozan Unlu is the CEO and Founder of Edge Delta, an edge observability platform. Previously he served as a Senior Solutions Architect at Sumo Logic; a Software Development Lead and Program Manager at Microsoft; and a Data Engineer at Boeing. Ozan holds a BS in nanotechnology from the University of Washington. 

Sign up for the free insideBIGDATA newsletter.

Join us on Twitter: https://twitter.com/InsideBigData1

Join us on LinkedIn: https://www.linkedin.com/company/insidebigdata/

Join us on Facebook: https://www.facebook.com/insideBIGDATANOW