Skip to content

CKrawczyk/python-reducers-for-caesar

Repository files navigation

python-reducers-for-caesar

This is a collection of external reducers written for caesar and offline use.

Offline use

To install (python 3 only):

git clone https://github.com/zooniverse/aggregation-for-caesar.git
cd aggregation-for-caesar
pip install .

Download your data from the project builder

You will need two files for offline use:

  • The classification dump: The Request new classification export or Request new workflow classification export button from the lab's Data Export tab
  • The workflow dump: The Request new workflow export button from the lab's Data Export tab

Extracting data

Note: this only works for question tasks and the drawing tool's point data at the moment

Use the command line tool to extract your data into one flat csv file for each extractor used:

usage: extract_panoptes_csv.py [-h] [-v VERSION] [-H] [-o OUTPUT]
                               classification_csv workflow_csv workflow_id

extract data from panoptes classifications based on the workflow

positional arguments:
  classification_csv    the classificaiton csv file containing the panoptes
                        data dump
  workflow_csv          the csv file containing the workflow data
  workflow_id           the workflow ID you would like to extract

optional arguments:
  -h, --help            show this help message and exit
  -v VERSION, --version VERSION
                        the workflow version to extract
  -H, --human           switch to make the data column labels use the task and
                        question labels instead of generic labels
  -o OUTPUT, --output OUTPUT
                        the base name for output csv file to store the
                        annotation extractions (one file will be created for
                        each extractor used)

example usage:

extract_panoptes_csv.py mark-galaxy-centers-and-foreground-stars-classifications.csv galaxy-zoo-3d-workflows.csv 3513 -v 1 -o galaxy_center_and_star_mpl5.csv

This will extract the user drawn data points from workflow 3513 with a major version of 1 and place them in a csv file named point_extractor_galaxy_center_and_star_mpl5.csv.

Reducing data

Note: this only works for question tasks and the drawing tool's point data at the moment

usage: reduce_panoptes_csv.py [-h] [-F {first,last,all}] [-k KEYWORDS]
                              [-o OUTPUT]
                              extracted_csv

reduce data from panoptes classifications based on the extracted data (see
extract_panoptes_csv)

positional arguments:
  extracted_csv         the extracted csv file output from
                        extract_panoptes_csv

optional arguments:
  -h, --help            show this help message and exit
  -F {first,last,all}, --filter {first,last,all}
                        how to filter a user makeing multiple classifications
                        for one subject
  -k KEYWORDS, --keywords KEYWORDS
                        keywords to be passed into the reducer in the form of
                        a json string, e.g. '{"eps": 5.5, "min_samples": 3}'
                        (note: double quotes must be used inside the brackets)
  -o OUTPUT, --output OUTPUT
                        the base name for output csv file to store the
                        reductions

example usage:

reduce_panoptes_csv.py point_extractor_galaxy_center_and_star_mpl5.csv -F first -k '{"eps": 5, "min_sample": 3}' -o 'galaxy_and_star_mpl5.csv'

This will produce a reduced csv file named point_reducer_galaxy_and_star_mpl5.csv. If a user classified an image more than once only the first one is kept.

reading csv files in python

The resulting csv files typically contain arrays as values. These arrays are typically read in as strings by most csv readers. To make it easier to read these files in a "science ready" way a utility function for pandas.read_csv is provided in panoptes_aggregation.csv_utils:

import pandas
from panoptes_aggregation.csv_utils import unjson_dataframe

# the `data.*` columns are read in as strings instead of arrays
data = pandas.read_csv('point_reducer_galaxy_and_star_mpl5.csv')

# use unjson_dataframe to convert them to lists
# all values are updated in place leaving null values untouched
unjson_dataframe(data)

Caesar

Build/run the app in docker

To run a local version use:

docker-compose build
docker-compose up

and listen on localhost:5000. The documentation will automatically be created and added to the '/docs' route.

run tests

To run the tests use:

docker-compose run aggregation /bin/bash -lc "nosetests"

Contributing

  1. Use PEP8 syntax
  2. Automatic documentation will be created using sphinx so add doc strings to any files created and functions written
  3. A guide for writing extractors
  4. A guide for writing reducers

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages