The answer to the most complex bugs and issues in software often hides in the data. A transformation that breaks, an external source that changed its structure, or an unexpected event. Often logs, traces, and metrics show no sign of an issue, yet still, something is broken and users know it! As a developer, you often need to search through the data to understand what’s happening, yet using that data systematically is inconvenient and hard to do for many reasons.

We set ourselves the challenge of making application data to effectively help developers better troubleshoot their applications. We want to make data the fourth element of application observability. 

First, why did we land on this? We’ve been working in the observability world since 2016. We worked on developing network traffic analysis tools, where we built the systems and algorithms that derived traffic behavior patterns using metadata, NetFlow specifically. These were high data volume, real-time, complex data applications that continuously processed close to a million events per second. 

“We want to make data the fourth element of application observability.”

When something in that pipeline failed, we needed to search through the data to figure out what went wrong, but the process of getting the data, visualizing it, and understanding it was far from trivial. Let alone the fact that you couldn’t go back in time and figure out what happened in the past!

This brings us to what we’ve built, and how it works.

What is Neblic

Neblic provides application observability through data. Application data is monitored continuously, and it’s done at many points of the application as a way to have an understanding of how each component behaves and how that changes across time.  

Some of the core challenges of making data usable in a systematic way are: accessing the data, past and future, understanding the data including the business domain and its constraints, and knowing what to prioritize rather than getting overwhelmed by the sheer volume and complexity, and all the possible dimensions that the data itself can take. 

To make the data practical and useful for application troubleshooting we’ve built Neblic on analyzing the data based on three principles: Data Value Statistics, Data Structure and Business Logic, and Data Validation.

Value Statistics

Raw data can be hard to make sense of in the context of debugging an application. What we’ve built is a way to generate statistics about the data that can be tailored to what is relevant for each field within a specific application. 

For instance, Min-Max-Avg to understand numerical distributions across time, Cardinality and length to understand distinct events, and nulls and zero-values as a way to find if data is missing unexpectedly. 

Tracking stats rather than visualizing raw data makes understanding trends and patterns easier even for those lacking domain knowledge, it becomes simpler to set alerts based on specific thresholds and becomes something manageable at scale when applying behavior analysis methods that highlight anomalies first.

Data Structure Analysis

Generating field-level structural digests that let you understand the schema, field types, and field presence for every period at every point of your application. This lets you visualize the actual schema seen, know the field type for each field figure out if it has changed over time, and monitor field presence relative to each event.

Business Logic and Data Validation

Validating incoming data is an established control, we believe it needs to be taken a further step into business logic validation. For instance, validate that the timestamps you’re receiving are current as a way to detect misconfigured source devices. Or correlate two events from two different fields or samplers to make sure they’re working correctly. 

We sometimes refer to the combination of these three elements, Value Stats, Structure Analysis, and Logic/Data Validation as Telemetry, or data telemetry.

How does Neblic work

Neblic requires two main components, samplers and a collector. Samplers sample data from each component in your application, the collector aggregates that data, detects events, and creates the telemetry.  We have built both components on top of OpenTelemetry as a way to ensure vendor neutrality and support standardization across the observability stack. 

Operationally, our architecture allows for dynamic rule setting and dynamic control of the sampling rules and metrics generated. 

Also, all of this happens on-premise which has two implications: Privacy first design: the only thing that would leave your infrastructure is telemetry, not raw data, and you can use our open-source components to run this on your own and connect it to your preferred stack!

How can you try it?

You can check out our documentation here, we’ll have big updates in the coming weeks so sign up!

You can find our repo here as well:

Small plug

We’ve open-sourced all of these components and tutorials to support the community and share our vision for observability. We’re building a product and a company on top of these principles where we provide a neat way to host and visualize the data created, but also where we use ML/AI to build meaningful connections and understand complex behaviors to make troubleshooting even more proactive and intuitive. You can learn more about our product here: 

Bringing application data to your troubleshooting journey.

© Copyright 2024 — All rights reserved.