Load Testing OLAP Features with ClickHouse Clusters

Tutorial: spin up ephemeral ClickHouse clusters, generate synthetic OLAP data, and replay production-like query patterns to validate analytics features.

Speed up OLAP validation: run disposable ClickHouse clusters and replay production-like queries

Pain point: you need to validate analytics features under realistic OLAP load, but production environments are expensive, tests are flaky, and CI cycles are slow. This tutorial shows how to create ephemeral ClickHouse clusters, generate synthetic OLAP workloads, and run repeatable load tests that mirror production query patterns.

Why this matters in 2026

ClickHouse adoption surged through 2024–2025 and into 2026, with significant commercial investment and wider managed offerings. Enterprises increasingly choose ClickHouse for latency-sensitive analytics and event-driven aggregation. That momentum means teams must validate analytics features at scale without spending weeks on long-running staging clusters.

Tip: ephemeral clusters let you run realistic benchmarks and tear everything down in minutes—minimizing cost and test-state drift.

What you'll build

A fast, disposable ClickHouse cluster (local Docker or a Kubernetes namespace)
Synthetic OLAP data generated from the ClickHouse numbers table and Python scripts
A workload replay that samples production queries from system.query_log and replays them with configurable concurrency
Load testing using built-in clickhouse-benchmark and an HTTP-based driver for mixed workloads

Prerequisites

kubectl and a Kubernetes cluster (kind, k3d, or any cloud k8s) OR Docker Compose (local rapid tests)
ClickHouse binary or Docker image (official clickhouse/clickhouse-server)
Python 3.9+ (for replay script) and pip packages: aiohttp, clickhouse-connect
CI runner with ephemeral resource capability (GitHub Actions, GitLab CI, or mytest.cloud ephemeral environments)

Step 1 — Quick local cluster: Docker Compose (ephemeral single-node)

For rapid iterations, use Docker Compose to stand up a disposable ClickHouse node. This is great for functional testing and low-concurrency benchmarks.

version: '3.7'
services:
  clickhouse-server:
    image: clickhouse/clickhouse-server:latest
    ports:
      - '9000:9000'   # native
      - '8123:8123'   # http
    volumes:
      - ./clickhouse_data:/var/lib/clickhouse
    ulimits:
      nofile:
        soft: 262144
        hard: 262144

Start it with: docker-compose up -d. For CI, start and tear down this compose file inside your job.

Step 2 — Ephemeral Kubernetes cluster (recommended for scale)

For production-like behavior, deploy a small ClickHouse cluster on Kubernetes using the ClickHouse Operator. Use a short-lived namespace and ephemeral volumes or hostPath for speed.

Minimal ClickHouseInstallation manifest

apiVersion: clickhouse.altinity.com/v1
kind: ClickHouseInstallation
metadata:
  name: test-chi
  namespace: clickhouse-ephemeral
spec:
  configuration:
    zookeeper:
      nodes:
        - host: clickhouse-keeper-0.clickhouse-keeper
          port: 9181
    templates:
      - name: default
        podTemplates:
          - name: pod-1
            nodeSelector: {}
  templates:
    podTemplate:
      spec:
        containers:
          - name: clickhouse
            resources:
              requests:
                cpu: "2"
                memory: "4Gi"
              limits:
                cpu: "4"
                memory: "8Gi"

Workflow pattern:

Create a namespace: kubectl create ns clickhouse-ephemeral
Install ClickHouse Operator (helm chart or kubectl apply)
Apply the CHI manifest, wait for pods to be Ready
Run tests, then kubectl delete ns clickhouse-ephemeral

Step 3 — Schema and storage choices for OLAP tests

Use MergeTree engines for OLAP workloads. For distributed, create a Distributed table that points to shard-level MergeTree tables. In ephemeral tests, replicate only to simulate replication behavior but avoid high storage overhead.

CREATE TABLE IF NOT EXISTS analytics.events_local
(
  event_date Date,
  user_id UInt64,
  event_type String,
  properties String,
  value Float64,
  ts DateTime
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(event_date)
ORDER BY (user_id, ts);

CREATE TABLE IF NOT EXISTS analytics.events AS distributed('test_cluster', 'analytics', 'events_local');

Design tips:

Partition by month/day depending on dataset size.
Order by high-cardinality keys used in WHERE/GROUP BY to benefit from primary-key pruning.
Use materialized views for pre-aggregations to test feature behavior that depends on near-real-time summaries.

Step 4 — Generate realistic synthetic OLAP data

Fast synthetic data can be generated inside ClickHouse using the numbers table. Combine it with functions to simulate event times, user distributions, and JSON-like properties.

INSERT INTO analytics.events_local
SELECT
  toDate(now() - number % 90 * 86400) AS event_date,
  number % 1000000 AS user_id,
  arrayElement(['view','click','purchase','signup'], (number % 4)+1) AS event_type,
  concat('{"item":', toString(number % 10000), '}') AS properties,
  randFloat64() * 100 AS value,
  now() - (number % 86400) AS ts
FROM numbers(5000000);

This inserts 5M rows quickly. For skewed user distributions (power law), use a Zipf-like function or pre-generate user_id with exponential distribution in Python and bulk insert via the native protocol.

Python generator (sample)

from clickhouse_connect import Client
import random, time

client = Client(host='localhost', username='default', password='')
rows = []
for i in range(100000):
    user = int(random.paretovariate(1.3)) % 1000000
    rows.append((time.strftime('%Y-%m-%d'), user, 'purchase', '{"item": %d}' % (i%1000), random.random()*100, int(time.time())))
client.insert('analytics.events_local', rows, column_names=['event_date','user_id','event_type','properties','value','ts'])

Step 5 — Capture and sample production query patterns

Rather than inventing queries, extract representative patterns from production logs. In ClickHouse, system.query_log tracks executed queries if enabled.

SELECT query, type, query_start_time, formatReadableDuration(query_duration_ms)
FROM system.query_log
WHERE event_date >= today() - 7
  AND type = 'QueryFinish'
ORDER BY query_start_time DESC
LIMIT 1000;

Sampling strategy:

Group similar queries by normalized fingerprint — strip literal constants and replace with placeholders.
Weighted sampling: more frequent queries get higher weight.
Include edge-case heavy queries (long-running GROUP BY / JOIN / subqueries).

Query normalization snippet (Python)

import sqlparse, re

def fingerprint(query):
    q = re.sub(r"'[^']*'", "''", query)
    q = re.sub(r"\b\d+\b", "", q)
    return sqlparse.format(q, keyword_case='upper', strip_comments=True)

Step 6 — Replay queries with concurrency

Two ways to drive load:

clickhouse-benchmark — good for highly concurrent, parameterized queries using the native binary.
HTTP driver (aiohttp) — best for mixed query types with accurate per-query timings and realistic HTTP-layer behavior.

clickhouse-benchmark example

echo "SELECT count() FROM analytics.events WHERE event_type='purchase' AND ts > now() - INTERVAL 1 DAY" | clickhouse-benchmark -c 50 -t 60 -i

Flags: -c concurrency, -t test time in seconds, -i read queries from stdin.

Async HTTP replay (sample Python)

import asyncio, aiohttp

QUERIES = ["SELECT count() FROM analytics.events WHERE event_type='purchase' AND ts > now() - INTERVAL 1 DAY", ...]

async def worker(session, q):
    async with session.post('http://localhost:8123/', data=q) as r:
        t0 = asyncio.get_event_loop().time()
        await r.text()
        return asyncio.get_event_loop().time() - t0

async def run(concurrency, iterations):
    async with aiohttp.ClientSession() as session:
        tasks = []
        for _ in range(iterations):
            for q in QUERIES:
                tasks.append(asyncio.create_task(worker(session, q)))
                if len(tasks) >= concurrency:
                    res = await asyncio.gather(*tasks)
                    print('batch latencies', res)
                    tasks = []

asyncio.run(run(40, 100))

Record per-query latency, success/failure, and result size. Save metrics to CSV or push to Prometheus for dashboarding.

Step 7 — Benchmark metrics and analysis

Track these metrics during runs:

Query latency percentiles (p50, p95, p99)
Throughput (queries/sec, rows/sec)
Resource usage (CPU, memory, disk I/O, network)
ClickHouse internals: background merges, parts counts, memory spikes (system.metrics, system.asynchronous_metrics)

Example query: get p95 latency from recorded run data or instrumented metrics.

Step 8 — Scale and configuration experiments

Use ephemeral clusters to test scaling strategies quickly. Two axes of scaling:

Vertical: increase CPU and memory of nodes. Useful for CPU-bound aggregations and wide sorts.
Horizontal: add shards and replicas, use the Distributed table to route queries. Effective for IO-bound workloads and multi-tenancy.

Recommended config knobs to iterate

max_threads — controls parallelism
max_memory_usage and max_memory_usage_for_user — prevent OOM; test behavior when queries hit limits
mark_cache_size, uncompressed_cache_size — tune for repeated scans
load_balancing (for Distributed engine) — test round-robin vs in_order

Use ephemeral runs to answer: Does adding a shard cut query p95 in half? Is join performance dominated by network or CPU? Run controlled A/B benchmarks and compare.

Step 9 — Test CI integration and automated teardown

Implement ephemeral provisioning in CI so each run starts from a clean cluster. Example GitHub Actions pattern:

Start cluster (kind/k3d or Docker Compose)
Run schema migration + synthetic data generation
Run replay and benchmarks
Collect artifacts and metrics
Tear down cluster

jobs:
  benchmark:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Start kind cluster
        run: kind create cluster --name clickhouse-test
      - name: Deploy ClickHouse
        run: kubectl apply -f chi.yaml
      - name: Load data
        run: python gen_data.py
      - name: Run replay
        run: python replay.py
      - name: Tear down
        if: always()
        run: kind delete cluster --name clickhouse-test

Advanced: Replay fidelity and edge cases

To get production-like fidelity:

Match client IP patterns and session pools to emulate how the application connects.
Reproduce query time-of-day patterns (ramp-up / batch windows).
Inject failures (node restart, network partition) to validate feature resiliency and retry behavior.
Include schema changes and data backfills in test scenarios to validate migrations.

Common pitfalls and how to avoid them

Not normalizing queries before replay — leads to unrealistic parameter distributions. Normalize and substitute realistic parameters.
Using tiny synthetic datasets — use scale factors (1x, 10x, 100x) to find inflection points.
Ignoring background merges — run tests long enough for merges to occur and observe their impact on latency.
Not recording resource metrics — correlate ClickHouse metrics with OS metrics for root cause analysis.

2026 trends and why ephemeral OLAP testing is now essential

By early 2026, the analytics landscape shifted: ClickHouse's rapid commercial growth and ecosystem investment made it a default choice for low-latency OLAP. Managed ClickHouse offerings reduced operational friction, and tooling around cloud-native deployments matured. That combination makes ephemeral testing a best practice—teams can emulate large-scale behavior cheaply and reproducibly, iterate on schema and query plans, and continuously validate analytics features as part of CI/CD.

Industry note: In late 2025 and early 2026, ClickHouse ecosystem growth accelerated, making production-like testing both more necessary and more accessible.

Actionable takeaways

Always start with a normalized query sample from production and build weighted workloads.
Use numbers and server-side SQL to generate large synthetic datasets quickly for MergeTree tables.
Prefer Kubernetes ephemeral namespaces for near-production behavior; use Docker Compose for rapid unit tests.
Automate provisioning and teardown in CI to avoid drift and control cost.
Measure p50 / p95 / p99 and resource metrics; iterate on both schema and cluster topology.

Checklist before you ship an analytics feature

Reproduce the expected query patterns in an ephemeral cluster.
Run 1x, 10x, and a stress 100x scale tests to see where performance breaks.
Test schema changes and backfills under load.
Validate cost by measuring CPU and storage used per test and estimate production spend.
Automate results collection and rollback triggers in CI if p95 exceeds SLA.

Final notes — reproducibility and trust

Ephemeral clusters let teams run the same test script against many configurations and commit reproducible artifacts. Treat load tests like first-class tests: run them in PRs for major analytics changes, record baselines, and compare new runs to baselines automatically. That discipline reduces late-stage surprises and helps control cloud cost.

Get started now

Spin up an ephemeral ClickHouse node using the Docker Compose example, load the synthetic data with the provided INSERT pattern, and run a small replay of your top 10 production queries. Record p95 latency and iterate on max_threads and mark_cache_size. Once you have stable results, test a 3-node ephemeral k8s cluster with a Distributed table to validate sharding behavior.

Call to action: Try this tutorial in your CI pipeline this week. If you want a reproducible template (k8s manifests, GitHub Actions workflow, and a ready-to-run replay script) tailored to your production query mix, request the template from mytest.cloud and get a pre-configured ephemeral environment that tears down automatically—so you can validate analytics features without long waits or surprise bills.