Cassandra Guide

Data Modeling Rules

Design tables around your queries, not relationships
One table per query pattern (denormalization is OK)
Partition key distributes data across nodes
Clustering key sorts data within a partition
Avoid large partitions (> 100MB or 100k rows)

CQL Examples

-- Create keyspace
CREATE KEYSPACE my_app
WITH replication = {'class': 'NetworkTopologyStrategy', 'datacenter1': 3};

-- Create table (query: get user's recent posts by date)
CREATE TABLE posts_by_user (
    user_id    UUID,
    created_at TIMESTAMP,
    post_id    UUID,
    title      TEXT,
    content    TEXT,
    tags       SET<TEXT>,
    metadata   MAP<TEXT, TEXT>,
    PRIMARY KEY ((user_id), created_at, post_id)  -- (partition, clustering...)
) WITH CLUSTERING ORDER BY (created_at DESC);

-- Insert
INSERT INTO posts_by_user (user_id, created_at, post_id, title)
VALUES (uuid(), toTimestamp(now()), uuid(), 'Hello World')
USING TTL 2592000;  -- 30 days TTL

-- Query (must include full partition key)
SELECT * FROM posts_by_user
WHERE user_id = ? AND created_at > '2024-01-01'
LIMIT 20;

Cassandra vs MongoDB vs DynamoDB

	Cassandra	MongoDB	DynamoDB
Best for	High-write time series	Flexible documents	AWS serverless
Query flexibility	Low (partition key required)	High	Medium
Write throughput	Excellent	Good	Excellent (managed)
ACID	Lightweight transactions	Multi-doc transactions	Single-item ACID

Cassandra Guide

Data Modeling Rules

CQL Examples

Cassandra vs MongoDB vs DynamoDB

Outils associés