Skip to content

Indexes Redevelopment

anatoliychakkaev edited this page Dec 20, 2012 · 2 revisions

Schema

Example Content schema definition:

var Content = db.define('Content', {
        type: String,
        title: String,
        text: String,
        excerpt: String,
        previewImage: String,
        attachment: db.JSON,
        poll: db.JSON,
        location: db.JSON,
        comments: { type: [], fulltext: true },
        repliesTotal: Number,
        likes: [],
        dislikes: [],
        likesTotal: Number,
        views: Number,
        score: Number,
        groupId: { type: Number, index: true },
        replyToId: { type: Number, index: true },
        tags: [],
        tagString: String,
        authorId: { type: Number, index: true },
        privacy: { type: String, index: 'unsorted' },
        createdAt: { type: Date, defaultSort: 'desc' },
        updatedAt: Date,
        timestamp: Number,
        priority: Number,
        url: { type: String, index: 'unique' },
        fulltext: String,
        importData: db.JSON
    });
db.adapter.defineTagsIndex('Content', 'tags');

There are a few changes to the existing schema definition:

defaultSort

The defaultSort property defines that this property is used to define the default sort order and whether it is ascending or descending. If this is not defined for a model we should default to ‘id desc’ - which is in most cases the same as ‘createdAt desc’. The defaultSort property defines the value that is used for the sorted indexes as the score value.

index: "unique"

A unique index should only ever have one value. Therefore we can use a string to define the value rather than a set or sorted set.

index: "unsorted"

Some indexes do not need to be sorted. These can be indexes that are only used in conjunction with another index. E.g. privacy is only ever used alongside a groupId index or a tag index.

Tags

Tags can be used to define arbitrary indexes which are able to define their own sort property. An example set of tags for a content item might be as follows:

post.id = 1
post.title = 'hello, world';
post.groupId = 1;
post.privacy = 'public';
post.url = 'localhost:3000/hello-world';
post.createdAt = '01 jan 2001';
post.score = 5;
post.numberOfComments = 3;
post.tags = {
        { id: 1, index: 'group1-1', name: 'blog' },
        { id: 2, index: 'group1-2', name: 'popular', sort: 'score desc' },
        { id: 5, index: 'group1-5', name: 'most comments', sort: 'numberOfComments desc' }
};
post.save();

When the content item post is save to the database, the standard indexes are first processed. Then the tags are iterated through and their index value added. In this example, the Redis commands would be:

#standard indexes
ZADD z:Content:groupId:1 01-jan-2001 1
SADD i:Content:privacy:public 1
SET u:Content:url:localhost:3000/hello-world 1
#tag indexes
ZADD z:Content:tags:group1-1 01-jan-2001 1
ZADD z:Content:tags:group1-2 5 1
ZADD z:Content:tags:group1-5 3 1

Querying

Querying via JugglingDB is as per usual but the Redis commands that are issued to the database may be different depending on index used. Here are a few examples: Complexity is calculated assuming that each tag or field index contains 1,000,000 records.

Get latest 10 posts for groupId=1

Note that because the defaultSort is descending, we use ZREVRANGE rather than ZRANGE

db.model(‘Content’).all({ where: { groupId; 1 }}, function(err, posts) {  }});
ZREVRANGE z:Content:groupId:1 0 10

Complexity = log(1000000) + 10 = 6 + 10 = 16

Get the most popular 10 posts for groupId=1

This will return the top 10 posts in order of score

var tagId = group.getTagId(‘popular’);
db.model(‘Content’).all({ where: { tags: tagId }}, function(err, posts) {  }});
ZREVRANGE z:Content:tags:group1-2 0 10

Complexity = log(1000000) + 10 = 6 + 10 = 16

Get the posts with the most comments which are also public for groupId=1

This will return the top 50 public posts with the most comments

var tagId = group.getTagId(‘most comments’);
db.model(‘Content’).all({ where: { tags: tagId, privacy: ‘public’ }}, function(err, posts) {  }});
ZINTERSTORE temp123 z:Content:tags:group1-2 i:Content:privacy:public
EXPIRE temp123 7
ZREVRANGE temp123 0 50

Complexity = (1,000,000 * 2) + 333,333 * log(333,333) = 3,840,000

The last example is clearly inefficient and should be replaced with a tag called ‘most popular public’ which would reduce the complexity to a maximum of 16 ops.