Cassandra Guide

Data Modeling Rules

  1. Design tables around your queries, not relationships
  2. One table per query pattern (denormalization is OK)
  3. Partition key distributes data across nodes
  4. Clustering key sorts data within a partition
  5. Avoid large partitions (> 100MB or 100k rows)

CQL Examples

-- Create keyspace
CREATE KEYSPACE my_app
WITH replication = {'class': 'NetworkTopologyStrategy', 'datacenter1': 3};

-- Create table (query: get user's recent posts by date)
CREATE TABLE posts_by_user (
    user_id    UUID,
    created_at TIMESTAMP,
    post_id    UUID,
    title      TEXT,
    content    TEXT,
    tags       SET<TEXT>,
    metadata   MAP<TEXT, TEXT>,
    PRIMARY KEY ((user_id), created_at, post_id)  -- (partition, clustering...)
) WITH CLUSTERING ORDER BY (created_at DESC);

-- Insert
INSERT INTO posts_by_user (user_id, created_at, post_id, title)
VALUES (uuid(), toTimestamp(now()), uuid(), 'Hello World')
USING TTL 2592000;  -- 30 days TTL

-- Query (must include full partition key)
SELECT * FROM posts_by_user
WHERE user_id = ? AND created_at > '2024-01-01'
LIMIT 20;

Cassandra vs MongoDB vs DynamoDB

CassandraMongoDBDynamoDB
Best forHigh-write time seriesFlexible documentsAWS serverless
Query flexibilityLow (partition key required)HighMedium
Write throughputExcellentGoodExcellent (managed)
ACIDLightweight transactionsMulti-doc transactionsSingle-item ACID