This package contains a HDFS compatable layer (CFS) and a CassandraJobConf which can be used to run MR jobs without HDFS or dedicated job/task trackers.
It also includes a hive-driver for accessing data in cassandra as well as a hive meta-store implementation.
Hadoop jobs and Hive are setup to work with MR cluster.
For detailed docs please see: http://www.datastax.com/docs/0.8/brisk/index
You can also discuss Brisk on freenode #datastax-brisk
On linux systems, you need to run the following as root
echo 1 > /proc/sys/vm/overcommit_memory
This is to avoid OOM errors when tasks are spawned.
To try it out run:
-
compile and download all dependencies
ant
-
start cassandra with built in job/task trackers
./bin/brisk cassandra -t
-
view jobtracker
http://localhost:50030
-
examine CassandraFS
./bin/brisk hadoop fs -lsr cfs:///
-
start hive shell or webUI
./bin/brisk hive
or
./bin/brisk hive --service hwi
open web browser to http://localhost:9999/hwi