Skip to content
This repository has been archived by the owner on Jun 7, 2023. It is now read-only.

Data Model

Rustam Aliyev edited this page Sep 19, 2013 · 8 revisions

ElasticInbox uses 5 column families (tables):

  • Accounts
  • MessageMetadata
  • MessageBlob
  • IndexLabels
  • Counters

Complete schema can be found here: https://github.com/elasticinbox/elasticinbox/blob/master/config/elasticinbox.cml

RFC5322 compatible email address is used as a unique account identifier in all CFs.

Accounts CF

This CF contains account information such as labels and custom attributes.

Schema syntax:

CREATE COLUMN FAMILY Accounts WITH 
	key_validation_class = UTF8Type AND
	caching = all AND
	comment = 'Basic information about accounts';

Sample contents:

"Accounts" {
    "[email protected]" {
        "label:0"  : "all",
        "label:1"  : "inbox",
        "label:2"  : "drafts",
        ...
        "label:1234" : "MyLabel",
        "lattr:1234:color" : "Green",
        "lattr:1234:MyAttribute" : "My Text Value",
        ...
    }
}

MessageMetadata SCF

MessageMetadata is a super column family. Each row contains all messages for the particular account identified by email address. This helps to store all messages for an account on the same Cassandra node and speedup read operation.

Each super column contains information about particular message, identified and ordered by message UUID. Message UUID generated based on the message time.

Schema syntax:

CREATE COLUMN FAMILY MessageMetadata WITH 
	column_type = Super AND
	key_validation_class = UTF8Type AND
	comparator = TimeUUIDType AND 
	subcomparator = BytesType AND
	caching = keys_only AND
	comment='Message metadata including headers, labels, markers, physical location, etc.';

Sample contents:

"MessageMetadata" {
    "[email protected]" {
        "550e8400-e29b-41d4-a716-446655440000" {
            "from"     : "[["EI Test","test@elasticinbox.com"]]", # JSON encoded data
            "to"       : "[["Me","[email protected]"],[...]]",
            "subject"  : "Hello world!",
            "date"     : "12 March 2011",
            "location" : "blob://fs-local/container/[email protected]:753eef70-d5fb-14ce-abd4-040cced3bd7a",
            "l:1"      : true,   # Label ID
            "m:1"      : true,   # Marker ID
            ...
        }
    }
}

MessageBlob CF

When enabled, MessageBlob column family is used to store chunks of message blobs. Cassandra can be used as a blob storage and store messages in 128K chunks. Each key contains of message UUID and block (chunk) ID. In turn, columns represent sub-block ID and block data. For more see ...

Schema syntax:

CREATE COLUMN FAMILY MessageBlob WITH
    key_validation_class = 'CompositeType(TimeUUIDType, Int32Type)' AND
    comparator = Int32Type AND
    caching = keys_only AND
    comment='Chunked message blobs';

Sample contents:

"MessageBlob" {
    "550e8400-e29b-41d4-a716-446655440000:0" {
        "0" : "binary.message.content.for.block.0.subblock.0"
        ...
    },
    "550e8400-e29b-41d4-a716-446655440000:1" {
        "0" : "binary.message.content.for.block.1.subblock.0"
        ...
    },
    ...
}

IndexLabels CF

IndexLabels is reverse index for labels. Each row uniquely identified by composite key of email address and label id. Contents of each label index are message UUIDs which belong to this label and sorted as TimeUUID.

Schema syntax:

CREATE COLUMN FAMILY IndexLabels WITH
	key_validation_class = UTF8Type AND
	comparator = TimeUUIDType AND 
	caching = all AND
	comment = 'Message ID indexes grouped by labels and ordered by time';

Sample contents:

"IndexLabels" {
    "[email protected]:1" {
        "550e8400-e29b-41d4-a716-446655440000" : null,
        "892e8300-e29b-41d4-a716-446655440000" : null,
        "a0232400-e29b-41d4-a716-446655440000" : null,
        ...
    }
}

Purge Indexes

In addition to the normal label indexes, there's specific purge index type in IndexLabels CF. Purge index keeps track of deleted messages.

Each time message deleted, ElasticInbox will remove it from all label indexes and add entry to purge index. Purge index's column name is timestamp of delete event (in form of TimeUUID) and column value is message UUID.

Sample contents:

"IndexLabels" {
    "[email protected]:purge" {
        "550e8400-e29b-41d4-a716-446655440000" : "892e8300-e29b-41d4-a716-446655440000",
        "892e8300-e29b-41d4-a716-446655440000" : "892e8300-e29b-41d4-a716-446655440000",
        "a0232400-e29b-41d4-a716-446655440000" : "892e8300-e29b-41d4-a716-446655440000",
        ...
    }
}

NOTE: delete message operation does not remove message from MessageMetadata and Blob Store. This is done in order to 1) speedup delete operation, 2) provide restore mechanism in case of accidental deletes. Deleted messages should be periodically purged using API call.

Counters CF

Counters is a column family which keeps track of mailbox stats (potentially may be used for IMAP serial ID generation).

Following stats are currently stored for each label:

  • Size in Bytes (only available for ALL_MAILS label)
  • Total message count
  • New message count

For total mailbox stats query ALL_MAILS label (ID=0).

Schema syntax:

CREATE COLUMN FAMILY Counters WITH
	comparator = 'CompositeType(UTF8Type,UTF8Type,UTF8Type)' AND
	key_validation_class = UTF8Type AND
	default_validation_class = CounterColumnType AND
	replicate_on_write = true AND
	caching = all AND
	comment = 'All counters for an account';

Sample contents:

"Counters" {
    "[email protected]" {
        "l:0:b" : 18239090,   # bytes, composite type, label ID, or other counter identified
        "l:0:m" : 394,        # messages
        "l:0:u" : 12,         # unread
        ...
    }
}
Clone this wiki locally