Let's Learn About Time Series Databases
Posted on February 7, 2026 • 6 minutes • 1237 words • Other languages: Korean, 日本語
Table of contents
When designing large-scale architectures or building distributed systems, you may consider adopting a Time Series Database (TSDB). Today, let’s explore what TSDB is, what characteristics it has, and what problems it helps solve.
1. What is Time Series Data?
TSDB is a database that stores and processes time series data. So what is time series data? Time series data is not simply data with timestamps attached. More precisely, it can be defined as “measurements or events that are tracked, monitored, downsampled, and aggregated over time”, and the measurements or events mentioned here can include:
- Server metrics
- Application performance monitoring
- Network data
- IoT sensor data
- Click events
- Market trading data
Time series data handling differs from general databases in the following ways:
- Write heavy: 95% writes, 5% reads
- Immutable: Once recorded, data rarely changes, so data is added in an append-only manner
- Ordered by time: Sorted in chronological order and can be considered indexed by time
- High cardinality: Generally has high cardinality
- Time-based range queries: Querying large numbers of records by time range
- Continuous data stream: Continuously collecting data points
Let’s explore what strategies TSDB employs to handle these characteristics.
2. Write Optimization
In a typical database, writing data requires finding the location to write, then modifying or inserting. The process would look like this:
[Seek to block 4752] → [Read] → [Modify] → [Write] → [Seek to block 9201] → ...
On the other hand, TSDB doesn’t need to modify existing data and only needs to load data in chronological order, so it simply adds data in an Append-only manner. Operations become much simpler:
[Write to end] → [Write to end] → [Write to end] → ...
LSM (Log-Structured Merge) Tree
The LSM tree is the secret sauce of database systems like InfluxDB, Cassandra, and LevelDB that boast high-performance write throughput. The core ideas of the LSM tree are:
- Expensive random writes → Cheap sequential writes
- Improving read efficiency by periodically reorganizing data in the background
And the operation of LSM trees works as follows:
- Latest data is stored in the Memtable memory buffer
Data maintains sorted order by key. Write operations are very fast since they access RAM.
- Flush to SSTable (Sorted String Table) on disk
When the Memtable is full, it’s saved as an immutable, sorted SSTable file. Since the Memtable is already sorted, this flush operation can be completed sequentially, simply, and quickly. After flushing, the Memtable is cleared and reinitialized.
- Background compaction
In the background, smaller SSTables are merged into larger SSTables. Duplicate entries and tombstones (deletion markers) are also removed.
Due to this LSM tree operation, write operations don’t interfere with read operations. By having the Memtable always handle the latest data while cleaning up existing data in the background, high write throughput can be maintained continuously.
However, read performance degradation can occur when multiple SSTables need to be checked for queries. Also, if write amplification occurs, data may be rewritten multiple times during the compaction process.
Therefore, LSM trees are best used when write workloads are heavy and some read performance can be sacrificed.
3. Time-Based Indexing and Partitioning
Since time series data is already sorted and inserted based on timestamps, indexing is not needed for write operations. And if you partition by time range, you don’t need to worry about which partition to store data in during write operations. Data retention can also be easily handled by simply deleting old partitions.
Block-Level Metadata
In TSDB, data blocks are also sorted based on timestamps. Therefore, blocks that don’t fall within the time range can be skipped without checking, improving query performance. Each block records min/max timestamps for time, and optionally min/max values. This metadata acts as a filter layer that, together with time-based partitioning, helps maintain query performance even as data volume increases.
4. Downsampling and Compression
Due to its nature, time series data is well-suited for trimming less important content and representing it in compressed form. In fact, given TSDB’s characteristic of continuously ingesting millions of data points per second, downsampling is essentially mandatory. In TSDB, high-precision recent data is aggregated and downsampled into long-term data. A typical downsampling and retention strategy looks like this:
- Raw data: Keep for 1 day
- 1-minute averages: Keep for 7 days
- 5-minute averages: Keep for 30 days
- 1-hour averages: Keep for 1 year
- Daily summaries: Keep permanently
These rules can be defined by users, and leveraging them can dramatically improve storage efficiency and read performance.
5. Compression Algorithms
As mentioned earlier, the core of TSDB is processing numerous write operations in memory buffers and compressing in the background. So how is compression done? Let’s explore these concepts.
Delta Encoding
Raw values: [45.2] [45.3] [45.1] [45.4]
Delta encoded: [45.2] [+0.1] [-0.2] [+0.3]
Delta encoding transforms data to record only the change amount from the original value. Since change values are smaller numbers than the original values, the number of bits needed to store values is also much smaller.
Delta-of-Delta Encoding
Raw timestamps: 1000, 1010, 1020, 1030, 1040
Deltas: 10 , 10 , 10 , 10 , 10 , ...
Delta-of-deltas: 10 , 0 , 0 , 0 , 0 , ...
Delta-of-delta encoding goes further by recording the change amount of change amounts. This method can be used when timestamp intervals are completely regular, and it can reduce the number of bits even more.
XOR Compression
Value 1: 0 10000010 01101000101000111101011
Value 2: 0 10000010 01101000110000100000000
XOR: 0 00000000 00000000011000011101011
^^^^^^^^^^ lots of leading zeros
XOR compression is a compression algorithm that takes advantage of the fact that consecutive float-based values often have similar values. As shown above, identical bit portions are all replaced with 0, significantly reducing the total number of bits. The Gorilla compression technique developed by Meta also falls into this category.
Run-Length Encoding
Run-length encoding stores only a single value with a count when measured values appear repeatedly, such as in monitoring metrics. It’s useful for recording login counts, request counts, and similar data.
6. TSDB Selection Guide
Finally, let’s summarize which TSDB to use in what situations.
InfluxDB
- Query language: InfluxQL, Flux
- Pros: Optimized for event collection
- Cons: Complex distributed system scaling
- Deployment: Self-hosted / Managed cloud
- Use cases: When workload is metrics-focused real-time data collection, when data retention period is short
Prometheus
- Query language: PromQL
- Pros: Optimized for Kubernetes environments, Pull model so no agent installation required
- Cons: Not designed for long-term storage
- Deployment: Installed by default in Kubernetes
- Use cases: Kubernetes-centric architecture, when alerting system is needed
TimescaleDB
- Query language: SQL (especially optimized for PostgreSQL)
- Pros: Optimized for relational workloads, PostgreSQL ecosystem
- Cons: Requires PostgreSQL-related dependency installation
- Deployment: Self-hosted / Tiger Data cloud
- Use cases: Business Intelligence (BI), IoT, financial systems, and other cases requiring relational analysis
Kdb
- Query language: q language
- Pros: Given the price point, you can receive support from the support team
- Cons: Learning curve, poor documentation, expensive commercial license
- Deployment: Not optimized for cloud environments, must be installed on VMs
- Use cases: When extreme performance is required
Graphite
- Query language: Plain text, Pickle, AMQP
- Pros: Excellent performance on any hardware
- Cons: Poor documentation, community not very active
- Deployment: Container images
- Use cases: When a simple and lightweight open-source TSDB is needed
References
- Alex Xu - System Design Interview Volume 2
- influxdata - Time series database explained
- System Design Academy - Why a Time Series Database?
- hello interview - Time Series Databases
- Last 9 - Comparing Popular Time Series Databases
- OctaByte - InfluxDB vs TimescaleDB