Skip to content

Indexes

Flavian Alexandru edited this page Sep 7, 2016 · 17 revisions

Build Status Coverage Status Maven Central Bintray

phantom uses a specific set of traits to enforce more advanced Cassandra limitations and schema rules at compile time. Instead of waiting for Cassandra to tell you you've done bad things, phantom won't let you compile them, saving you a lot of time.

The error messages you get when your model is off with respect to Cassandra rules is not particularly helpful and we are working on a better builder to allow for better error messages. Until then, if you see things like:

import com.websudos.phantom.dsl._

case class Student(id: UUID, name: String)

class Students extends CassandraTable[Students, Student] {
  object id extends UUIDColumn(this) with PartitionKey[UUID]
  object name extends StringColumn(this)

  def fromRow(row: Row): Student = {
    Student(id(row), name(row))
  }
}

object Students extends Students with Connector {

  /**
   * The below code will result in a compilation error phantom produces by design.
   * This behaviour is not only correct with respect to CQL but also intended by the implementation.
   *
   * The reason why it won't compile is because the "name" column is not an index in the "Students" table, which means using "name" in a "where" clause is
   * invalid CQL. Phantom prevents you from running most invalid queries by simply giving you a compile time error instead.
   */
  def getByName(name: String): Future[Option[Student]] = {
    select.where(_.name eqs name).one()
  }
}

The compilation error message for the above looks something like this:

 value eqs is not a member of object x$9.name

Might seem overly mysterious to start with, but the logic is simple. There is no implicit conversion in scope to convert your non-indexed column to a QueryColumn. If you don't have an index, you can't query.

  Students.update.where(_.id eqs someId).onlyIf(_.name is "test")

This is the default partitioning key of the table, telling Cassandra how to divide data into partitions and store them accordingly. You must define at least one partition key for a table. Phantom will gently remind you of this with a fatal error.

If you use a single partition key, the PartitionKey will always be the first PrimaryKey in the schema. Phantom distinguishes between the two types of keys using separate traits for PartitionKey and PrimaryKey, as well as a separate ClusteringKey when you want to define ordering.

Let's take for example the following CQL table, describing culinary recipes:

CREATE TABLE IF NOT EXISTS somekeyspace.recipes(
  url text,
  description text,
  ingredients list<text>,
  servings int,
  lastcheckedat timestamp,
  props map<text, text>,
  uid uuid,
  PRIMARY KEY (url)
);

To model it in phantom, we basically need a single partition key defined on the schema, and the entire CQL PRIMARY KEY of the table will be composed from a single column. So in this example we chose to index by the url field, just like in the CQL above.

class Recipes extends CassandraTable[ConcreteRecipes, Recipe] {

  // notice we explicitly mix in PartitionKey here.
  object url extends StringColumn(this) with PartitionKey[String]

  object description extends OptionalStringColumn(this)

  object ingredients extends ListColumn[String](this)

  object servings extends OptionalIntColumn(this)

  object lastcheckedat extends DateTimeColumn(this)

  object props extends MapColumn[String, String](this)

  object uid extends UUIDColumn(this)


  override def fromRow(r: Row): Recipe = {
    Recipe(
      url(r),
      description(r),
      ingredients(r),
      servings(r),
      lastcheckedat(r),
      props(r),
      uid(r)
    )
  }
}

Using more than one PartitionKey[T] in your schema definition will output a Composite Key in Cassandra. PRIMARY_KEY((your_partition_key_1, your_partition_key2), primary_key_1, primary_key_2).

As it's name says, using this will mark a column as PrimaryKey. Using multiple values will result in a Compound Value. The first PrimaryKey is used to partition data. phantom will force you to always define a PartitionKey so you don't forget about how your data is partitioned. We also use this DSL restriction because we hope to do more clever things with it in the future.

A compound key in C* looks like this: PRIMARY_KEY(primary_key, primary_key_1, primary_key_2).

Before you add too many of these, remember they all have to go into a where clause. You can only query with a full primary key, even if it's compound. phantom can't yet give you a compile time error for this, but Cassandra will give you a runtime one.

This is a SecondaryIndex in Cassandra. It can help you enable querying really fast, but it's not exactly high performance. It's generally best to avoid it, we implemented it to show off what good guys we are.

When you mix in Index[T] on a column, phantom will let you use it in a where clause. However, don't forget to allowFiltering for such queries, otherwise C* will give you an error.

When you want to use a column in a where clause, you need an index on it. Cassandra data modeling is out of the scope of this writing, but phantom offers com.websudos.phantom.keys.Index to enable querying.

The CQL 3 schema for secondary indexes can also be auto-generated with ExampleRecord4.create().

SELECT is the only query you can perform with an Index column. This is a Cassandra limitation. The relevant tests are found here.

import com.websudos.phantom.dsl._

sealed class ExampleRecord4 extends CassandraTable[ExampleRecord4, ExampleModel] {

  object id extends UUIDColumn(this) with PartitionKey[UUID]
  object order_id extends LongColumn(this) with ClusteringOrder[Long] with Descending
  object timestamp extends DateTimeColumn(this) with Index[DateTime]
  object name extends StringColumn(this) with Index[String]
  object props extends MapColumn[ExampleRecord2, ExampleRecord, String, String](this)
  object test extends OptionalIntColumn(this)

  override def fromRow(row: Row): ExampleModel = {
    ExampleModel(id(row), name(row), props(row), timestamp(row), test(row));
  }
}

This can be used with either java.util.Date or org.joda.time.DateTime. It tells Cassandra to store records in a certain order based on this field.

An example might be: object timestamp extends DateTimeColumn(this) with ClusteringOrder[DateTime] with Ascending To fully define a clustering column, you MUST also mixin either Ascending or Descending to indicate the sorting order.

back to top

Phantom also supports using Compound keys out of the box. The schema can once again by auto-generated.

A table can have only one PartitionKey but several PrimaryKey definitions. Phantom will use these keys to build a compound value. Example scenario, with the compound key: (id, timestamp, name)

import com.websudos.phantom.dsl._

sealed class ExampleRecord3 extends CassandraTable[ExampleRecord3, ExampleModel] {

  object id extends UUIDColumn(this) with PartitionKey[UUID]
  object order_id extends LongColumn(this) with ClusteringOrder[Long] with Descending
  object timestamp extends DateTimeColumn(this) with PrimaryKey[DateTime]
  object name extends StringColumn(this) with PrimaryKey[String]
  object props extends MapColumn[ExampleRecord2, ExampleRecord, String, String](this)
  object test extends OptionalIntColumn(this)

  override def fromRow(row: Row): ExampleModel = {
    ExampleModel(id(row), name(row), props(row), timestamp(row), test(row));
  }
}

back to top

import scala.concurrent.Await
import scala.concurrent.duration._
import com.websudos.phantom.dsl._

sealed class ExampleRecord2 extends CassandraTable[ExampleRecord2, ExampleModel] {

  object id extends UUIDColumn(this) with PartitionKey[UUID]
  object order_id extends LongColumn(this) with ClusteringOrder[Long] with Descending
  object timestamp extends DateTimeColumn(this)
  object name extends StringColumn(this)
  object props extends MapColumn[ExampleRecord2, ExampleRecord, String, String](this)
  object test extends OptionalIntColumn(this)

  override def fromRow(row: Row): ExampleModel = {
    ExampleModel(id(row), name(row), props(row), timestamp(row), test(row));
  }
}


val orderedResult = Await.result(Articles.select.where(_.id gtToken one.get.id ).fetch, 5000 millis)

back to top

Operator name Description
eqsToken The "equals" operator. Will match if the objects are equal
gtToken The "greater than" operator. Will match a the record is greater than the argument
gteToken The "greater than or equals" operator. Will match a the record is greater than the argument
ltToken The "lower than" operator. Will match a the record that is less than the argument and exists
lteToken The "lower than or equals" operator. Will match a the record that is less than the argument

For more details on how to use Cassandra partition tokens, see SkipRecordsByToken.scala