Skip to content

Indexes

Flavian Alexandru edited this page Sep 17, 2016 · 17 revisions

phantom uses a specific set of traits to enforce more advanced Cassandra limitations and schema rules at compile time. Instead of waiting for Cassandra to tell you you've done bad things, phantom won't let you compile them, saving you a lot of time.

The error messages you would get at runtime are now available at compile time, with full domain awareness. That means phantom "knows" what the rules are in Cassandra, so it will automatically prevent you from doing a lot of "bad" things at compile time, one example being using a non index column in a where clause:

Modelling indexes and queries

This is the full list of available options you have with respect to Cassandra features, and this is a guide that shows you have to create every single one of those in phantom.

  • Partition keys
  • Compound keys
  • Composite keys
  • Secondary indexes
  • Indexed collections
  • SASI indexes
  • Materialised views
  • How phantom prevents errors at compile time
  • Tips and tricks

Partition keys.

How phantom pevents errors at compile time

import com.websudos.phantom.dsl._

case class Student(
  id: UUID,
  name: String
)

class Students extends CassandraTable[Students, Student] {
  object id extends UUIDColumn(this) with PartitionKey[UUID]
  object name extends StringColumn(this)

  def fromRow(row: Row): Student = Student(id(row), name(row))
}

object Students extends Students with Connector {

  /**
   * The below code will result in a compilation error phantom produces by design.
   * This behaviour is not only correct with respect to CQL but also intended by the implementation.
   *
   * The reason why it won't compile is because the "name" column is not an index in the "Students" table, which means using "name" in a "where" clause is
   * invalid CQL. Phantom prevents you from running most invalid queries by simply giving you a compile time error instead.
   */
  def getByName(name: String): Future[Option[Student]] = {
    // BOOM, this is a problem. "name" is not a primary key and therefore this query is invalid.
    select.where(_.name eqs name).one()
  }
}

The compilation error message for the above looks something like this, and what it's telling us is that the eqs operator for equality is not available on the name column, and that's because there is no index defined on the name column in the schema DSL.

 value eqs is not a member of object x$9.name

The way it works might seem overly mysterious to start with, but the logic is simple. There is no implicit conversion in scope to convert your non-indexed column to a QueryColumn. If you don't have an index, you can't query.

  // now we are using `id` in the where clause, which is a valid index so this will compile
  Students.update.where(_.id eqs someId).onlyIf(_.name is "test")

This is the default partitioning key of the table, telling Cassandra how to divide data into partitions and store them accordingly. You must define at least one partition key for a table. Phantom will gently remind you of this with a fatal error.

If you use a single partition key, the PartitionKey will always be the first PrimaryKey in the schema. Phantom distinguishes between the two types of keys using separate traits for PartitionKey and PrimaryKey, as well as a separate ClusteringKey when you want to define ordering.

Let's take for example the following CQL table, describing culinary recipes:

CREATE TABLE IF NOT EXISTS somekeyspace.recipes(
  url text,
  description text,
  ingredients list<text>,
  servings int,
  lastcheckedat timestamp,
  props map<text, text>,
  uid uuid,
  PRIMARY KEY (url)
);

To model it in phantom, we basically need a single partition key defined on the schema, and the entire CQL PRIMARY KEY of the table will be composed from a single column. So in this example we chose to index by the url field, just like in the CQL above.

class Recipes extends CassandraTable[ConcreteRecipes, Recipe] {

  // notice we explicitly mix in PartitionKey here.
  object url extends StringColumn(this) with PartitionKey[String]

  object description extends OptionalStringColumn(this)

  object ingredients extends ListColumn[String](this)

  object servings extends OptionalIntColumn(this)

  object lastcheckedat extends DateTimeColumn(this)

  object props extends MapColumn[String, String](this)

  object uid extends UUIDColumn(this)


  override def fromRow(r: Row): Recipe = {
    Recipe(
      url(r),
      description(r),
      ingredients(r),
      servings(r),
      lastcheckedat(r),
      props(r),
      uid(r)
    )
  }
}

Using more than one PartitionKey[T] in your schema definition will output a Composite Key in Cassandra.

The CQL correspondent of the above schema looks like this:

  // First the PartitionKeys
  (your_partition_key_1, your_partition_key2),

  // and then the primary keys or the clustering columns
  primary_key_1, primary_key_2)

As its name says, using this will mark a column as PrimaryKey. Using multiple values will result in a Compound Value. The first PrimaryKey is used to partition data. phantom will force you to always define a PartitionKey so you don't forget about how your data is partitioned.

In essence, a compound key means your table has exactly one partition key and at least one primary key. This mix is called compound

A compound key in C* looks like this:

     partition_key,
     primary_key_1,
     primary_key_2
)```.

Before you add too many of these, remember they all have to go into a ```where``` clause.
You can only query with either the full partition key or the full primary key, even if its compound. phantom can't yet give you a compile time error for this, but Cassandra will give you a runtime one.

Because of how Cassandra works, you will only be able to have the following `where` clauses based on the above table:

```sql

SELECT WHERE partition_key = "some value"
SELECT WHERE partition_key = "some value" AND primary_key_1 = "some_other_1" AND primary_key_2 = "some_other_2"

If you want any other kinds of where clause matches, you will need to basically use alternative modelling approaches to obtain them. Obviously you can mix any kind of valid where operators above, so you don't have to use eqs.

This is a secondary index in Cassandra. It can help you enable querying really fast, but it's not exactly high performance. It's generally best to avoid it, we implemented for the sake of being feature complete.

When you mix in Index[T] on a column, phantom will let you use it in a where clause.

When you want to use a column in a where clause, you need an index on it. Cassandra data modelling is a more convoluted topic, but phantom offers com.websudos.phantom.keys.Index to enable querying.

The CQL 3 schema for secondary indexes can also be auto-generated with ExampleRecord4.create() and is directly taken care of by table auto-generation. Phantom is capable of analysing your schema DSL and creating indexes at the correct time in the application, namely only after the tables have been created, otherwise the index creation will fail since it will refer to a non-existing table.

SELECT is the only query you can perform with an Index column. This is a Cassandra limitation. The relevant tests are found here.

import com.websudos.phantom.dsl._

sealed class ExampleRecord4 extends CassandraTable[ExampleRecord4, ExampleModel] {

  object id extends UUIDColumn(this) with PartitionKey[UUID]
  object order_id extends LongColumn(this) with ClusteringOrder[Long] with Descending
  object timestamp extends DateTimeColumn(this) with Index[DateTime]
  object name extends StringColumn(this) with Index[String]
  object props extends MapColumn[ExampleRecord2, ExampleRecord, String, String](this)
  object test extends OptionalIntColumn(this)

  override def fromRow(row: Row): ExampleModel = {
    ExampleModel(id(row), name(row), props(row), timestamp(row), test(row));
  }
}

This can be used with either java.util.Date or org.joda.time.DateTime. It tells Cassandra to store records in a certain order based on this field.

An example might be: object timestamp extends DateTimeColumn(this) with ClusteringOrder[DateTime] with Ascending.

To fully define a clustering column, you MUST also mixin either Ascending or Descending to indicate the sorting order.

back to top

Phantom also supports using Compound keys out of the box. The schema can once again by auto-generated.

A table can have multiple PartitionKey keys and several PrimaryKey definitions. Phantom will use these keys to build a compound value. Example scenario, with the compound key: (id, timestamp, name)

import com.websudos.phantom.dsl._

sealed class ExampleRecord3 extends CassandraTable[ExampleRecord3, ExampleModel] {

  object id extends UUIDColumn(this) with PartitionKey[UUID]
  object order_id extends LongColumn(this) with ClusteringOrder[Long] with Descending
  object timestamp extends DateTimeColumn(this) with PrimaryKey[DateTime]
  object name extends StringColumn(this) with PrimaryKey[String]
  object props extends MapColumn[ExampleRecord2, ExampleRecord, String, String](this)
  object test extends OptionalIntColumn(this)

  override def fromRow(row: Row): ExampleModel = {
    ExampleModel(id(row), name(row), props(row), timestamp(row), test(row));
  }
}

back to top

import scala.concurrent.Await
import scala.concurrent.duration._
import com.websudos.phantom.dsl._

sealed class ExampleRecord2 extends CassandraTable[ExampleRecord2, ExampleModel] {

  object id extends UUIDColumn(this) with PartitionKey[UUID]
  object order_id extends LongColumn(this) with ClusteringOrder[Long] with Descending
  object timestamp extends DateTimeColumn(this)
  object name extends StringColumn(this)
  object props extends MapColumn[ExampleRecord2, ExampleRecord, String, String](this)
  object test extends OptionalIntColumn(this)

  override def fromRow(row: Row): ExampleModel = {
    ExampleModel(id(row), name(row), props(row), timestamp(row), test(row));
  }
}


val orderedResult = Await.result(Articles.select.where(_.id gtToken one.get.id ).fetch, 5000 millis)

back to top

Operator name Description
eqsToken The "equals" operator. Will match if the objects are equal
gtToken The "greater than" operator. Will match a the record is greater than the argument
gteToken The "greater than or equals" operator. Will match a the record is greater than the argument
ltToken The "lower than" operator. Will match a the record that is less than the argument and exists
lteToken The "lower than or equals" operator. Will match a the record that is less than the argument

For more details on how to use Cassandra partition tokens, see SkipRecordsByToken.scala