-
Notifications
You must be signed in to change notification settings - Fork 26
Data Model
ElasticInbox uses 5 column families (tables):
- Accounts
- MessageMetadata
- MessageBlob
- IndexLabels
- Counters
Complete schema can be found here: https://github.com/elasticinbox/elasticinbox/blob/master/config/elasticinbox.cml
RFC5322 compatible email address is used as a unique account identifier in all CFs.
This CF contains account information such as labels and custom attributes.
Schema syntax:
CREATE COLUMN FAMILY Accounts WITH
key_validation_class = UTF8Type AND
caching = all AND
comment = 'Basic information about accounts';
Sample contents:
"Accounts" {
"[email protected]" {
"label:0" : "all",
"label:1" : "inbox",
"label:2" : "drafts",
...
"label:1234" : "MyLabel",
"lattr:1234:color" : "Green",
"lattr:1234:MyAttribute" : "My Text Value",
...
}
}
MessageMetadata
is a super column family. Each row contains all messages for the particular account identified by email address. This helps to store all messages for an account on the same Cassandra node and speedup read operation.
Each super column contains information about particular message, identified and ordered by message UUID. Message UUID generated based on the message time.
Schema syntax:
CREATE COLUMN FAMILY MessageMetadata WITH
column_type = Super AND
key_validation_class = UTF8Type AND
comparator = TimeUUIDType AND
subcomparator = BytesType AND
caching = keys_only AND
comment='Message metadata including headers, labels, markers, physical location, etc.';
Sample contents:
"MessageMetadata" {
"[email protected]" {
"550e8400-e29b-41d4-a716-446655440000" {
"from" : "[["EI Test","test@elasticinbox.com"]]", # JSON encoded data
"to" : "[["Me","[email protected]"],[...]]",
"subject" : "Hello world!",
"date" : "12 March 2011",
"location" : "blob://fs-local/container/[email protected]:753eef70-d5fb-14ce-abd4-040cced3bd7a",
"l:1" : true, # Label ID
"m:1" : true, # Marker ID
...
}
}
}
When enabled, MessageBlob
column family is used to store chunks of message blobs. Cassandra can be used as a blob storage and store messages in 128K chunks. Each key contains of message UUID and block (chunk) ID. In turn, columns represent sub-block ID and block data. For more see ...
Schema syntax:
CREATE COLUMN FAMILY MessageBlob WITH
key_validation_class = 'CompositeType(TimeUUIDType, Int32Type)' AND
comparator = Int32Type AND
caching = keys_only AND
comment='Chunked message blobs';
Sample contents:
"MessageBlob" {
"550e8400-e29b-41d4-a716-446655440000:0" {
"0" : "binary.message.content.for.block.0.subblock.0"
...
},
"550e8400-e29b-41d4-a716-446655440000:1" {
"0" : "binary.message.content.for.block.1.subblock.0"
...
},
...
}
IndexLabels
is reverse index for labels. Each row uniquely identified by composite key of email address and label id. Contents of each label index are message UUIDs which belong to this label and sorted as TimeUUID.
Schema syntax:
CREATE COLUMN FAMILY IndexLabels WITH
key_validation_class = UTF8Type AND
comparator = TimeUUIDType AND
caching = all AND
comment = 'Message ID indexes grouped by labels and ordered by time';
Sample contents:
"IndexLabels" {
"[email protected]:1" {
"550e8400-e29b-41d4-a716-446655440000" : null,
"892e8300-e29b-41d4-a716-446655440000" : null,
"a0232400-e29b-41d4-a716-446655440000" : null,
...
}
}
In addition to the normal label indexes, there's specific purge index type in IndexLabels
CF. Purge index keeps track of deleted messages.
Each time message deleted, ElasticInbox will remove it from all label indexes and add entry to purge index. Purge index's column name is timestamp of delete event (in form of TimeUUID) and column value is message UUID.
Sample contents:
"IndexLabels" {
"[email protected]:purge" {
"550e8400-e29b-41d4-a716-446655440000" : "892e8300-e29b-41d4-a716-446655440000",
"892e8300-e29b-41d4-a716-446655440000" : "892e8300-e29b-41d4-a716-446655440000",
"a0232400-e29b-41d4-a716-446655440000" : "892e8300-e29b-41d4-a716-446655440000",
...
}
}
NOTE: delete message operation does not remove message from MessageMetadata and Blob Store. This is done in order to 1) speedup delete operation, 2) provide restore mechanism in case of accidental deletes. Deleted messages should be periodically purged using API call.
Counters
is a column family which keeps track of mailbox stats (potentially may be used for IMAP serial ID generation).
Following stats are currently stored for each label:
- Size in Bytes (only available for
ALL_MAILS
label) - Total message count
- New message count
For total mailbox stats query ALL_MAILS
label (ID=0).
Schema syntax:
CREATE COLUMN FAMILY Counters WITH
comparator = 'CompositeType(UTF8Type,UTF8Type,UTF8Type)' AND
key_validation_class = UTF8Type AND
default_validation_class = CounterColumnType AND
replicate_on_write = true AND
caching = all AND
comment = 'All counters for an account';
Sample contents:
"Counters" {
"[email protected]" {
"l:0:b" : 18239090, # bytes, composite type, label ID, or other counter identified
"l:0:m" : 394, # messages
"l:0:u" : 12, # unread
...
}
}