Working with CPF

MarkLogic's Content Processing Framework lets you set up a state transition graph, where a series of pipelines are run against documents. CPF manages documents moving through the pipelines so that your application doesn't have to. This page describes how to use Roxy to set up and deploy CPF. See MarkLogic's CPF documentation for full details.

Triggers Database

You must configure a triggers database to deploy CPF. Note that "clean triggers" will remove ALL triggers from the specified triggers database. You should set up a database specifically for your application's triggers, not use the provided Triggers database.

Specify the triggers database with the triggers-db property in build.properties.

Initialization

Roxy's CPF configuration depends on one file: deploy/pipeline-config.xml. To create this file, run this command:

$ ml initcpf

This command will create the basic config file for you to modify.

Configuration

Configuring Pipelines

System Pipelines

MarkLogic comes with a set of built-in pipelines. When configuring CPF in Roxy, you can refer to these pipelines by name. For instance, here's the configuration for the Status Change Handling pipeline, which should be a part of nearly all your CPF configurations.

  <system-pipelines>
    <system-pipeline>Status Change Handling</system-pipeline>
  </system-pipelines>

To find the correct names for other System-provided pipelines, go to the source. You'll find the pipeline config files in the MarkLogic installation directory, then MarkLogic/Installer/. Look for files called *-pipeline.xml. Near the top of those files, you'll find a element. Make sure you capture the full name, exactly as specified.

Application Pipelines

For pipelines specific to your application, you will provide an XML document describing your pipeline and XQuery modules that implement it.

The XQuery modules must be under your project's source code directory (src) so that they will get deployed. It's good practice to put the XML pipeline description files in the same place, for clarity. With that in mind, create a src/pipelines directory that will hold both sets of files. The XML pipeline files should follow the same format that the system pipelines use. See MarkLogic's documentation for what the XQuery modules should expect.

To configure your application pipelines, specify the path to the XML file. Suppose we're doing a pipeline called "sample". I put sample.xml and sample.xqy in the src/pipelines/ directory, then:

  <pipelines>
    <!-- one <pipeline> for each cpf pipeline to install in this domain -->
    <pipeline>/pipelines/sample.xml</pipeline>
  </pipelines>

Configuring Scope

A CPF domain targets files identified by a scope. There are three choices for the type of scope: directory, collection, or document. Choose how your domain will be scoped and copy that section outside of the XML comment.

<!--
  3 types of scopes exist: Make sure you use one

  <scope>
    <type>directory</type>
    <uri>/</uri>
    <depth>infinity</depth>
  </scope>
  <scope>
    <type>collection</type>
    <uri>MyCollection</uri>
    <depth/>
  </scope>
  <scope>
    <type>document</type>
    <uri>/stuff.xml</uri>
    <depth/>
  </scope>
-->

Modules database

Just a quick note that CPF requires code used by the pipelines to be in a modules database (not on the filesystem). Note also that the CPF main modules can import library modules in the normal way, so you can put complex processing in a library module and write unit tests for it.

Deploying

Since the CPF configuration is dependent on certain modules, if this is your first time deploying a CPF configuration, you should deploy your modules first:

$ ml local deploy modules

To deploy your CPF configuration, run the "deploy cpf" command:

$ ml local deploy cpf

Cleaning

To remove your CPF configuration, run the "clean cpf" command. Note that this command will remove your CPF configuration from the targeted server, but will not change your configuration file.

$ ml local clean cpf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly