Cascalog

Cascalog is a fully-featured data processing and querying library for Clojure. The main use cases for Cascalog are processing "Big Data" on top of Hadoop or doing analysis on your local computer from the Clojure REPL. Cascalog is a replacement for tools like Pig, Hive, and Cascading.

Cascalog operates at a significantly higher level of abstraction than a tool like SQL. More importantly, its tight integration with Clojure gives you the power to use abstraction and composition techniques with your data processing code just like you would with any other code. It's this latter point that sets Cascalog far above any other tool in terms of expressive power.

Follow the getting started steps, check out the tutorials, and you'll be running Cascalog queries on your local computer within 5 minutes.

Cascalog is hosted on Github at [[http://github.com/nathanmarz/cascalog]].

Getting help

Cascalog Google Group
#cascalog room on Freenode
API Documentation

Documentation

[[Getting Started]]
[[JCascalog]]
[[Defining and executing queries]]
[[How Cascalog executes a query]]
[[Guide to custom operations]]
[[Predicate operators]]
[[Methods for handling wide sources]]
[[Joins in Cascalog]]
[[Built-in operations]]
[[Predicate macros]]
[[Cascalog and Hadoop Security]]

Articles around the web

Introductory Tutorial, Part 1
Introductory Tutorial, Part 2
Developing and deploying a Cascalog query on a Hadoop cluster
Why Yieldbot chose Cascalog over Pig for Hadoop processing
Summarizing next-gen sequencing variation statistics with Hadoop using Cascalog
Which operation def macro should I use in Cascalog?
Predicate macros
Cascalog made easier
Generator as filter / negations
Catching errors with traps
Testing Cascalog with Midje
Testing Cascalog with Midje, Part 2

Documentation Todo

[[Option predicates]]
[[Building queries dynamically]]
[[Query builders]]
[[Global sorting]]
[[Negations and generators-as-sets]]
[[Tuple Serialization]]

Other

[[Who's using Cascalog]]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home.md

Home.md

Cascalog

Getting help

Documentation

Articles around the web

Documentation Todo

Other

Files

Home.md

Latest commit

History

Home.md

File metadata and controls

Cascalog

Getting help

Documentation

Articles around the web

Documentation Todo

Other