Get Started. It's Free
or sign up with your email address
Rocket clouds
Cassandra by Mind Map: Cassandra

1. data model

1.1. relationships

1.1.1. a cluster is a container for keyspaces

1.1.2. a keyspace is a container for column families like database

1.1.3. a column family is a container for ordered rows like tables

1.1.4. each row contains ordered columns

1.2. clusters

1.2.1. def outermost structure: ring

1.2.2. a node holds a replica for different ranges of data

1.2.3. replication factor

1.3. keyspaces

1.3.1. def outermost container for data

1.3.2. consists of name keyspace-wide attributes replication factor replica placement strategy column families

1.4. column family

1.4.1. like a four dimensional hash Untitled

1.4.2. like but not a relational table cassandra is schema free although column families are defined, columns are not can freely add any column to any column family at any time column family has two attributes name comparator storage column families are each stored in separate files on disk in RDBMS, transparent to user how tables are stored on disk write data to column family you specify values for one or more columns

1.5. column

1.5.1. a triplet of a name, a value and a timestamp Untitled

1.5.2. not a column from the relational world size of rows wide rows skinny rows

1.5.3. super columns a special kind of column the value is a map of sub columns Untitled super column idea goes only one level deep

2. RDBMS vs Cassandra

2.1. how to switch from RDBMS to Cassandra

2.1.1. start with queries then model data

2.1.2. supply a timestamp with each query

2.2. Design differences between RDBMS and Cassandra

2.2.1. No query language

2.2.2. how secondary indexes are handled

2.2.3. Sorting is a design decision RDBMS - order by Cassandra: column family's CompareWith element

2.2.4. denormalization cassandra performs best when the data model is denormalized

2.2.5. No referential integrity in RDMBS, could specify foreign keys in a table to reference the primary key of a record in another table operations such as cascading deletes are not available

3. architecture

3.1. features

3.1.1. high availability

3.1.2. no single point of failure

3.1.3. inspired by amazon dynamoDB

3.1.4. developed in Facebook using Java

3.2. architecture

3.2.1. Untitled Ring in a cassandra cluster, data is assigned to nodes as if they form a ring of tokens partitioner a hashing algorithm to determine how data is distributed across the cluster by default murmur3 hashing algorithm is being used

3.2.2. peer-to-peer master-slave optimized for reading data but replication is one-way, from master to slave capacity depends on master even backup master might fail any given node is structurally identical to any other node

3.2.3. gossip and failure detection gossip protocol used for failure detection gossiper accurual failure detection failure detection should be flexible heartbeats

3.2.4. tunable consistency cap theorem Untitled when network latency is really good, you can get three all together def whether read always return the most recently written value in other systems, the consistency level is defined by the protocol

3.3. internal data storage

3.3.1. overview Untitled

3.3.2. commit log crash-recovery mechanism that supports Cassandra's durability goals a write will not count as successful until it is written to the commit log after written to the commit log, value is written to a memory-resident data structured called memtable when the number of objects stored in meltable reaches a threshold the contents of memtable are flushed to disk in a file called SSTable each SSTable has an associated bloom filter a new memtable is then created

3.3.3. memtable value will be added to meltable after commit log in memory store to speed up operations

3.3.4. SSTable content of memtable gets written to SSTable after memtable is full immutable cannot be changed changes are appended sequential write to disk

3.3.5. compaction merge of SStables new merged data is sorted as well reduce number of seeks

3.3.6. read/write operations inside cassandra client can contact any node to read the node becomes coordinator read/write within a data center read/write across data center Untitled

3.3.7. data replication

3.4. Anti-Entroy and read repair

3.4.1. anti-entropy replication synchronization mechanism used in Amazon's Dynamo merkle tree cassandra each column family has its merkle tree after each update, the anti-entry algorithm kicks in performs a checksum against database and peers if checksums differ

3.4.2. read repair to read a client connects to any node in the cluster based on the consistency level specified, a number of nodes are read the read operation blocks until client-specified consistency level is met if it is detected that some of the nodes responded with an out-of-date value performance improvement client does not block until all nodes are read if having lots of clients, important to read from a quorum of nodes to ensure at least one will have the most recent value

3.5. System Keyspace

3.5.1. it uses to store metadata about the cluster to aid in operations

3.5.2. stores metadata for the local node as well as hinted handoff information

4. use case

4.1. general data storage

4.1.1. use cassandra cluster as your primary data persistency layer

4.2. time-series data storage

4.2.1. data is sorted and written sequentially to disk

4.2.2. perfect for retrieving data and filter range

4.2.3. fast access due to small disk seeks

4.3. ttl data storage

4.3.1. some data can be discarded after some time

4.3.2. with cassandra TTL on data, this feature is easy to implement