Skip to content

Troubleshooting, testing and live coding

Quantisan edited this page Mar 14, 2013 · 4 revisions


Catching data errors with traps

You can use Cascading Traps with Cascalog to capture tuples whose processing fails. To store those tuples into a sink tap (for example a local file or hfs-textline), use the :trap keyword with an error sink:

(def errors (lfs-textline "file:///tmp/people.bad_records" :sinkmode :replace)) 
;; or (stdout) or (hfs-textline "hdfs:///tmp/...") if running on Hadoop

(<- [?name ?age]
      (people ?name ?age)
      (:trap errors)
      (< ?age 40))


You may use the functions and macros from the cascalog.testing namespace together with clojure.test test your queries. See Cascalog's own tests for examples.

It uses for example fact?- to execute a query and compare its outputs with the expected ones or something like (facts query => (produces [[3 10] [1 5] [5 11]]) where (def query (<- ...)). Read Sam Ritchie's blog post Cascalog Testing 2.0 for more details and examples of midje-cascalog 0.4.0.

Live coding

There are certain features that support live, interactive coding:

  • Use simple Clojure collections as data sources ((def people [["ben" 21] ["jim" 42]]))
  • You can during development easily change some parts of Cascalog code to standard Clojure functions and call them from the REPL, for example a custom operator by replacing (defaggregateop with (defn .
  • Queries can be of course executed from the REPL