forked from nathanmarz/cascalog
-
Notifications
You must be signed in to change notification settings - Fork 0
Troubleshooting, testing and live coding
Quantisan edited this page Mar 14, 2013
·
4 revisions
You can use Cascading Traps with Cascalog to capture tuples whose processing fails. To store those tuples into a sink tap (for example a local file or hfs-textline), use the :trap
keyword with an error sink:
(def errors (lfs-textline "file:///tmp/people.bad_records" :sinkmode :replace))
;; or (stdout) or (hfs-textline "hdfs:///tmp/...") if running on Hadoop
(<- [?name ?age]
(people ?name ?age)
(:trap errors)
(< ?age 40))
You may use the functions and macros from the cascalog.testing namespace together with clojure.test test your queries. See Cascalog's own tests for examples.
It uses for example fact?-
to execute a query and compare its outputs with the expected ones or something like (facts query => (produces [[3 10] [1 5] [5 11]])
where (def query (<- ...))
. Read Sam Ritchie's blog post Cascalog Testing 2.0 for more details and examples of midje-cascalog 0.4.0.
There are certain features that support live, interactive coding:
- Use simple Clojure collections as data sources (
(def people [["ben" 21] ["jim" 42]])
) - You can during development easily change some parts of Cascalog code to standard Clojure functions and call them from the REPL, for example a custom operator by replacing
(defaggregateop
with(defn
. - Queries can be of course executed from the REPL