Making sense of time-series analysis

Even if you haven't heard of data described as a "time-series," you've probably seen examples out in the wild. As the name suggests, a time-series is a representation of an event over a period of time. That could mean representing many different changes: the highs and lows of your curling ice temperature over a year, the number of cars that drive across a bridge every day — or, more relevantly, your application usage data, such as error rates over time or the growing number of activations per day. With enough data collected — that is, over a long enough time period — you can start to forecast future trends using time-series analysis.

There are many different ways to project trends about your application's behavior. Some systems are configured to perform a handful of tasks, while others are much more flexible, at the cost of a longer initial setup and administrative time. Your choice of which tool to use depends on the goals you're trying to achieve.

Turning events into numbers

Collecting data for a time-series analysis can be as simple as incrementing a counter, but for a more sophisticated design, you'll want to use a time-series database, or TSDB. A good TSDB design combines the simplicity of a key-value store with the power of SQL-like commands to enable you to easily query, filter, and manipulate stored data. Perhaps more accurately, TSDBs can be considered a subset of regular NoSQL databases, and as such, there are many different storage options to choose from, ranging from a fully open and configurable design to a limited (but standardized) collection model.

InfluxDB is one extremely popular option. It's highly performant, replicable, and easy to set up. However, it doesn't offer any type of schema for your data, which may lead to a confusing organization if a proper architecture isn't set in place. To offset this, it offers an SQL-like querying system to better understand your data.

For a more structured approach, TimescaleDB supports more data types and provides a data scheme for that information. It's an open source project that's relatively young compared to other TSDBs available.

Graphite has been around for several years. While its feature set is somewhat limited compared to other TSDBs, it's still able to to analyze and compare large volumes of data and provides a way to graph that data for you. This may make it ideal for smaller companies that want to quickly spin up a way to fetch data.

If you’re interested in comparing more pros and cons, here's a list of TSDBs that compares their feature sets, so that you can choose the best one for your use-case.

Monitoring where you're headed

A TSDB is a crucial component for collecting and storing data. But when it comes to running an application, how quickly you can react to an incident or outage that affects your users is just as important as placing safeguards to mitigate unexpected disasters in the first place.

Monitoring your application's current behavior is one practical use-case for implementing a time-series system, though it is by no means the only one. Many monitoring systems collect time-series data as a means of generating upcoming alerts, as opposed to only notifying you based on the current state of the app. This is the core use-case for time-series analysis: having an intelligent system warn you of an issue before it occurs. For example, as disk space consumption on your platform is being used up, your team can be notified about the estimated time in the future when you will reach critical levels. SREs might use disk space consumption as a service level indicator, and configure alerts to fire if the rate of disk space consumption increases outside of established SLOs.

Some monitoring tools come bundled as an all-in-one package, supporting just a single TSDB option. While this can make them easier to set up, they can also be harder to integrate into an existing codebase. On the other hand, Sensu is designed with a "bring-your-own-stack" mentality. Through a variety of open-source plugins, Sensu can integrate with several different TSDBs, allowing us to focus on building the best monitoring tools, no matter how the data is stored. In fact, since different TSDBs are useful for different data types, a system that can represent anything is a better approach to monitoring.

Knowing the future is not enough

Of course, time-series analysis is only a prediction, and an over-reliance on just numbers may still leave you exposed to surprises.

Let's return to our example of a time-series graph that tracks the amount of disk space available to a server. Perhaps you want to notify your engineers in Slack when disk space reaches a threshold of consumption — say, 90% of available space. Over the course of a few weeks, your users may be using your platform in a predictable manner, and by all calculations it appears that you won't run out of space for several months. But all it takes is one bad actor to suddenly flood your system with data before all of your disk space is used up. How do you prevent something like that from happening?

In addition to gauging when you'll need to expand your disks, you can also measure the rate-of-change as your system is being used. In other words, if you can concretely predict that, on average, about 100 MB of data is being uploaded every hour, and that rate suddenly jumps to 1 GB an hour, you can send an alert based on that new rate-of-change to notify your team that some historically unprecedented behavior is occurring.

With Sensu you can go beyond alerting, automating remediation steps to take corrective action based on the state of the service. In this disk space example, Sensu could provision additional disk space automatically to ensure continued operation. (For more on auto remediation, check out this blog post from Community Maintainer Ben Abrams.)

That's why it's important to have a monitoring system that can do more than collect data and extrapolate it. Regardless of the TSDB used, some systems constrain the data model to only contain information that's measurable. This misses out on a lot of context around how, precisely, the data is changing. Just as important is to recognize that different TSDBs have different strengths, and it's in your application's best interest to send metrics to the database that can best represent it!

Doing more with the data

A proactive approach to anticipating bottlenecks for your application can help you and your team correct issues before they become problems. Time-series analysis is just one way of interpreting data. You could also rely on a random sampling, which takes various subsets of your data and identifies the probabilities of trends. Linear regression is a more methodical approach, which allows you to "blend" data sets to identify the effect one change might have on another. This could be as simple as an A/B test on your users that identifies how changes in site design might affect user behavior.

Evaluating the different ways you collect data can help you settle on strategies that are right for you and your team. For that reason alone, it's essential to have a collection and monitoring system that's flexible for any tactic you follow.

Ready to try Sensu? Download our sandbox, which comes pre-installed with InfluxDB and Grafana. 

Monitoring Integrations Graphite