Skip to content
This repository has been archived by the owner on Sep 26, 2019. It is now read-only.
nonword edited this page Feb 2, 2016 · 1 revision

In addition to the data submitted by users, some projects have existing data from other sources that might benefit from community refinement. Example scenarios include:

  • You'd like to incorporate data from an existing database, whose data you don't entirely trust.
  • You've run your materials through an OCR engine like Tesseract and the result looks good enough, but you'd like to feed it into Scribe for correction/amendment.
  • You've run layout analysis on your materials (e.g. using OCRPus) and believe you can positively identify certain regions of images as being of a certain type.

If, when beginning a project, you have existing data of this sort, you may benefit from feeding it into Scribe so that it can be corrected alongside user-contributed data.

You have a couple options for doing this. One is to manually insert the data into Scribe's Mongo database directly. A second option, which you may decide is simpler & cleaner, is to use Scribe's existing HTTP interface to generate classifications as a "bot" user.

Rationale

Scribe's open HTTP endpoints allow one to create classifications without authentication. Because Scribe allows one to submit classifications anonymously, there's nothing strictly preventing you from scripting classification creation without using a Bot token. You'll find that not using such a token, however, causes your programatically generated classifications to appear undifferentiated from other anonymous contributions. Using a Bot token ensures that the classifications are associated with your Bot account. That should make reasoning about contributor activity later much easier because you can distinguish your own Bot contributions from anonymous user contributions.

Creation

Scribe bot accounts can be created using this simple rake interface:

rake bot:create[NAME] 
  • NAME: String Bot's name (Default 'ScribeBot')

This will create a bot with the given name and print the Bot's automatically assigned auth token. For example:

  $ rake bot:create[OcrBot]
  ...
  Created OcrBot. Use HTTP header to authenticate:
    HTTP_BOT_AUTH=56b0dfc670617553d2000000:iFjYc_p7UrKthrgSEr7z

In the example above, I've named the bot "OcrBot" to distinguish it as the bot account I use when creating classifications from OCR data.

The last line gives the HTTP header to use in your script. See projects/emigrant/bot-example.rb:97 for example Ruby code using such a header.

Resetting Token

The following command can be used to generate a new auth token for a given bot account. This should be used if you misplace a token or believe the token may have been compromised.

rake bot:reset[NAME] 
  • NAME: String Bot's name (Default 'ScribeBot')

Delete Bot

The following command can be used to delete a bot account.

rake bot:delete[NAME] 
  • NAME: String Bot's name (Default 'ScribeBot')

This may be used to completely invalidate a bot token. Note that this is not reversible. If you create a bot "A", delete it, and then create a new bot "A", the second "A" will be a different account from the first. In general, you should not need to delete Bot accounts; This hook is provided mainly for completeness and for use when you mistakenly create a bot.

Deleting bots does not delete their classifications, but will make it a little harder to identify who created those classification later. If you're just looking to invalidate a bot, consider just resetting the token using bot:reset; If no one knows the bot's token, it's as good as disabled.

Example Useage

An example Ruby script demonstrating how to create Mark and Transcribe classifications programatically using a Bot auth token can be found in project/emigrant/bot-example.rb