Time series data is a set of values organized in the order in which they occur and arrive for processing. Unlike transactional data in SQL Server, which is not time-based and may be updated often, time series data is typically written once and rarely, if ever, updated.
Some examples of time series data include stock prices, telemetry from equipment sensors on a manufacturing floor, and performance metrics from SQL Server, such as CPU, memory, I/O, and network utilization.Time series data is often used for historical comparisons, anomaly detection and alerting, predictive analysis, and reporting, where time is a meaningful axis for viewing or analyzing data.Time series capabilities in SQL Server were introduced in Azure SQL Edge, Microsoft’s version of SQL Server for the Internet of Things (IoT) which combines capabilities such as data streaming and time series with built-in machine learning and graph features.
SQL Server 2022
The most Azure-enabled release yet, with continued performance, security, and availability innovation.
Learn more With SQL Server 2022 and Azure SQL, we’ve brought time series capabilities to the entire SQL Server family. Time series capabilities in SQL Server consist of enhancements to existing T-SQL functions for handling
NULL
values, plus two new functions that make working with temporal-based data easier than ever.Create contiguous ranges with GENERATE_SERIESWhen analyzing time series data, it’s common to create a contiguous set of
datetime values in evenly spaced intervals (for example, every second) within a specific range. One way to accomplish this is by creating a numbers table, also known as a tally table, which contains a set of consecutive numbers between a lower and upper bound. The numbers in the table can then be used in combination with the
DATEADD
function to create the range of
datetime values.Prior to SQL Server 2022, creating a numbers table usually involved some form of common table expressions,
CROSS JOIN
of system objects, looping, or some other creative T-SQL. These solutions are neither elegant nor efficient at scale, with additional complexity when the step between interval values is larger than 1.The
GENERATE_SERIES
relational operator in SQL Server 2022 makes creating a numbers table simple by returning a single-column table of numbers between a
start and
stop value, with an optional parameter defining the number of values to increment/decrement between steps in the series:
GENERATE_SERIES (start, stop [, step ])
|
This example creates a series of numbers between 1 and 100 in steps of 5:
SELECT valueFROM GENERATE_SERIES(1, 100, 5);
|
Taking this concept one step further, the next example shows how GENERATE_SERIES is used with DATEADD to create a set of values between 1:00 PM and 2:00 PM in 1-minute intervals:
SELECT DATEADD(minute, s.value, 'Dec 10, 2022 1:00 PM') AS [Interval]FROM GENERATE_SERIES(0, 59, 1) AS s;
|
If the step argument is omitted, a default value of 1 is used when computing interval values. GENEATE_SERIES also works with decimal values, with a requirement that the start, stop, and step arguments must all be the same data type. If
start is greater than
stop and the step is a negative value, then the resulting series will be a decrementing set of values. If
start is greater than
stop and the step is positive, an empty table will be returned.Finally,
GENERATE_SERIES
requires a compatibility level of 160 or higher.Group data in intervals with DATE_BUCKETTime series data is often grouped into fixed intervals, or buckets, for analytical purposes. For example, sensor measurements taken every minute may be averaged over 15-minute or 1-hour intervals. While
GENERATE_SERIES
and
DATEADD
are used to create the buckets, we need a way to determine which bucket/interval a measurement belongs to.The
DATE_BUCKET
function returns the
datetime value corresponding to the start of each
datetime bucket for an arbitrary bucket size, with an optional parameter to define the origin from which to calculate each bucket. If no origin is provided, the default value of Jan 1, 1900, will be used as the origin date:
DATE_BUCKET (datepart, number, date, origin)
|
The following example shows the buckets for Dec 10, 2022, for several date parts with a bucket size of 1 and an origin date of Jan 1, 2022:
DECLARE @date DATETIME = 'Dec 10, 2022 12:05 PM';DECLARE @origin DATETIME = 'Jan 1, 2022 12:00 AM'; SELECT 'Now' AS [BucketName], @date AS [DateBucketValue]UNION ALLSELECT 'Year', DATE_BUCKET (YEAR, 1, @date, @origin)UNION ALLSELECT 'Quarter', DATE_BUCKET (QUARTER, 1, @date, @origin)UNION ALLSELECT 'Month', DATE_BUCKET (MONTH, 1, @date, @origin)UNION ALLSELECT 'Week', DATE_BUCKET (WEEK, 1, @date, @origin)UNION ALLSELECT 'Day', DATE_BUCKET (DAY, 1, @date, @origin)UNION ALLSELECT 'Hour', DATE_BUCKET (HOUR, 1, @date, @origin)UNION ALLSELECT 'Minutes', DATE_BUCKET (MINUTE, 1, @date, @origin)UNION ALLSELECT 'Seconds', DATE_BUCKET (SECOND, 1, @date, @origin)
|
Notice how the date bucket value for the
Week date part is Dec 10, 2022, which is a Saturday. That’s because the provided origin date (Jan 1, 2022) is also a Saturday. (Note the default origin date of Jan 1, 1900, is a Monday). Therefore, when working with the
Week date part, if you want your
Week bucket to begin on a Sunday then be sure to use a known origin that falls on a Sunday.Where
DATE_BUCKET
becomes especially useful is for bucket sizes larger than 1, for example when grouping data in 5-minute or 15-minute buckets.
SELECT 'Now' AS [BucketName], GETDATE() AS [BucketDate]UNION ALLSELECT '5 Minute Buckets', DATE_BUCKET (MINUTE, 5, GETDATE())UNION ALLSELECT 'Quarter Hour', DATE_BUCKET (MINUTE, 15, GETDATE());
|
DATE_BUCKET
provides an easy way to determine which time-based interval a timestamped measurement belongs to using any arbitrary-sized interval.Gap analysis with FIRST_VALUE and LAST_VALUE
FIRST_VALUE
and
LAST_VALUE
are not new functions to SQL Server 2022; what
is new is how
NULL
values are handled. In previous versions of SQL Server,
NULL
values are preserved.When working with time series data, it’s possible to have gaps between measurements. Ideally, gaps are filled in with an imputed value. When using
FIRST_VALUE
and
LAST_VALUE
to compute the value corresponding to an interval, preserving
NULL
values isn’t ideal.In the following example, a series of sensor readings taken at 15-second intervals has some gaps:If analyzing the data in 1-minute intervals (using
DATE_BUCKET
), the default value returned by
FIRST_VALUE
will include the null values:
SELECT [timestamp] , DATE_BUCKET(MINUTE, 1, [timestamp]) AS [timestamp_bucket] , SensorReading , FIRST_VALUE (SensorReading) OVER ( PARTITION BY DATE_BUCKET(MINUTE, 1, [timestamp]) ORDER BY [timestamp] ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS [Default (RESPECT NULLS)]FROM MachineTelemetryORDER BY [timestamp];
|
FIRST_VALUE
and
LAST_VALUE
include new syntax (
IGNORE NULLS
or
RESPECT NULLS
) in SQL Server 2022 which allows you to decide how
NULL
values should be handled:
FIRST_VALUE ( [scalar_expression ] ) [ IGNORE NULLS | RESPECT NULLS ] OVER ( [ partition_by_clause ] order_by_clause [ rows_range_clause ] )
|
RESPECT NULLS
is the default behavior and will include null values in the result when computing the first or last value within a partition. Specifying
IGNORE NULLS
will cause
NULL
values to be excluded when computing the first or last value over a partition.
SELECT [timestamp] , DATE_BUCKET(MINUTE, 1, [timestamp]) AS [timestamp_bucket] , SensorReading , FIRST_VALUE (SensorReading) IGNORE NULLS OVER ( PARTITION BY DATE_BUCKET(MINUTE, 1, [timestamp]) ORDER BY [timestamp] ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS [Default (RESPECT NULLS)]FROM MachineTelemetryORDER BY [timestamp];
|
The new options for
IGNORE NULLS
and
RESPECT NULLS
allow you to decide how null values should be treated when analyzing your data.