Loko
February 7, 2026

Let's Learn About Time Series Databases

Posted on February 7, 2026  •  6 minutes  • 1237 words  • Other languages:  Korean, 日本語
Table of contents

When designing large-scale architectures or building distributed systems, you may consider adopting a Time Series Database (TSDB). Today, let’s explore what TSDB is, what characteristics it has, and what problems it helps solve.

1. What is Time Series Data?

TSDB is a database that stores and processes time series data. So what is time series data? Time series data is not simply data with timestamps attached. More precisely, it can be defined as “measurements or events that are tracked, monitored, downsampled, and aggregated over time”, and the measurements or events mentioned here can include:

Time series data handling differs from general databases in the following ways:

Let’s explore what strategies TSDB employs to handle these characteristics.

2. Write Optimization

In a typical database, writing data requires finding the location to write, then modifying or inserting. The process would look like this:

[Seek to block 4752][Read][Modify][Write][Seek to block 9201] → ...

On the other hand, TSDB doesn’t need to modify existing data and only needs to load data in chronological order, so it simply adds data in an Append-only manner. Operations become much simpler:

[Write to end][Write to end][Write to end] → ...

LSM (Log-Structured Merge) Tree

The LSM tree is the secret sauce of database systems like InfluxDB, Cassandra, and LevelDB that boast high-performance write throughput. The core ideas of the LSM tree are:

And the operation of LSM trees works as follows:

  1. Latest data is stored in the Memtable memory buffer

Data maintains sorted order by key. Write operations are very fast since they access RAM.

  1. Flush to SSTable (Sorted String Table) on disk

When the Memtable is full, it’s saved as an immutable, sorted SSTable file. Since the Memtable is already sorted, this flush operation can be completed sequentially, simply, and quickly. After flushing, the Memtable is cleared and reinitialized.

  1. Background compaction

In the background, smaller SSTables are merged into larger SSTables. Duplicate entries and tombstones (deletion markers) are also removed.

Due to this LSM tree operation, write operations don’t interfere with read operations. By having the Memtable always handle the latest data while cleaning up existing data in the background, high write throughput can be maintained continuously.

However, read performance degradation can occur when multiple SSTables need to be checked for queries. Also, if write amplification occurs, data may be rewritten multiple times during the compaction process.

Therefore, LSM trees are best used when write workloads are heavy and some read performance can be sacrificed.

3. Time-Based Indexing and Partitioning

Since time series data is already sorted and inserted based on timestamps, indexing is not needed for write operations. And if you partition by time range, you don’t need to worry about which partition to store data in during write operations. Data retention can also be easily handled by simply deleting old partitions.

Block-Level Metadata

In TSDB, data blocks are also sorted based on timestamps. Therefore, blocks that don’t fall within the time range can be skipped without checking, improving query performance. Each block records min/max timestamps for time, and optionally min/max values. This metadata acts as a filter layer that, together with time-based partitioning, helps maintain query performance even as data volume increases.

4. Downsampling and Compression

Due to its nature, time series data is well-suited for trimming less important content and representing it in compressed form. In fact, given TSDB’s characteristic of continuously ingesting millions of data points per second, downsampling is essentially mandatory. In TSDB, high-precision recent data is aggregated and downsampled into long-term data. A typical downsampling and retention strategy looks like this:

These rules can be defined by users, and leveraging them can dramatically improve storage efficiency and read performance.

5. Compression Algorithms

As mentioned earlier, the core of TSDB is processing numerous write operations in memory buffers and compressing in the background. So how is compression done? Let’s explore these concepts.

Delta Encoding

Raw values:     [45.2] [45.3] [45.1] [45.4]
Delta encoded:  [45.2] [+0.1] [-0.2] [+0.3]

Delta encoding transforms data to record only the change amount from the original value. Since change values are smaller numbers than the original values, the number of bits needed to store values is also much smaller.

Delta-of-Delta Encoding

Raw timestamps:     1000, 1010, 1020, 1030, 1040
Deltas:             10  , 10  , 10  , 10  , 10  , ...
Delta-of-deltas:    10  , 0   , 0   , 0   , 0   , ...

Delta-of-delta encoding goes further by recording the change amount of change amounts. This method can be used when timestamp intervals are completely regular, and it can reduce the number of bits even more.

XOR Compression

Value 1: 0 10000010 01101000101000111101011
Value 2: 0 10000010 01101000110000100000000
XOR:     0 00000000 00000000011000011101011
                    ^^^^^^^^^^ lots of leading zeros

XOR compression is a compression algorithm that takes advantage of the fact that consecutive float-based values often have similar values. As shown above, identical bit portions are all replaced with 0, significantly reducing the total number of bits. The Gorilla compression technique developed by Meta also falls into this category.

Run-Length Encoding

Run-length encoding stores only a single value with a count when measured values appear repeatedly, such as in monitoring metrics. It’s useful for recording login counts, request counts, and similar data.

6. TSDB Selection Guide

Finally, let’s summarize which TSDB to use in what situations.

InfluxDB

Prometheus

TimescaleDB

Kdb

Graphite

References

Contact me

email: nmin1124@gmail.com