This is a collection of external reducers written for caesar and offline use.
To install (python 3 only):
git clone https://github.com/zooniverse/aggregation-for-caesar.git
cd aggregation-for-caesar
pip install .
You will need two files for offline use:
- The classification dump: The
Request new classification export
orRequest new workflow classification export
button from the lab'sData Export
tab - The workflow dump: The
Request new workflow export
button from the lab'sData Export
tab
Note: this only works for question tasks and the drawing tool's point data at the moment
Use the command line tool to extract your data into one flat csv
file for each extractor used:
usage: extract_panoptes_csv.py [-h] [-v VERSION] [-H] [-o OUTPUT]
classification_csv workflow_csv workflow_id
extract data from panoptes classifications based on the workflow
positional arguments:
classification_csv the classificaiton csv file containing the panoptes
data dump
workflow_csv the csv file containing the workflow data
workflow_id the workflow ID you would like to extract
optional arguments:
-h, --help show this help message and exit
-v VERSION, --version VERSION
the workflow version to extract
-H, --human switch to make the data column labels use the task and
question labels instead of generic labels
-o OUTPUT, --output OUTPUT
the base name for output csv file to store the
annotation extractions (one file will be created for
each extractor used)
example usage:
extract_panoptes_csv.py mark-galaxy-centers-and-foreground-stars-classifications.csv galaxy-zoo-3d-workflows.csv 3513 -v 1 -o galaxy_center_and_star_mpl5.csv
This will extract the user drawn data points from workflow 3513
with a major version of 1
and place them in a csv
file named point_extractor_galaxy_center_and_star_mpl5.csv
.
Note: this only works for question tasks and the drawing tool's point data at the moment
usage: reduce_panoptes_csv.py [-h] [-F {first,last,all}] [-k KEYWORDS]
[-o OUTPUT]
extracted_csv
reduce data from panoptes classifications based on the extracted data (see
extract_panoptes_csv)
positional arguments:
extracted_csv the extracted csv file output from
extract_panoptes_csv
optional arguments:
-h, --help show this help message and exit
-F {first,last,all}, --filter {first,last,all}
how to filter a user makeing multiple classifications
for one subject
-k KEYWORDS, --keywords KEYWORDS
keywords to be passed into the reducer in the form of
a json string, e.g. '{"eps": 5.5, "min_samples": 3}'
(note: double quotes must be used inside the brackets)
-o OUTPUT, --output OUTPUT
the base name for output csv file to store the
reductions
example usage:
reduce_panoptes_csv.py point_extractor_galaxy_center_and_star_mpl5.csv -F first -k '{"eps": 5, "min_sample": 3}' -o 'galaxy_and_star_mpl5.csv'
This will produce a reduced csv
file named point_reducer_galaxy_and_star_mpl5.csv
. If a user classified an image more than once only the first one is kept.
The resulting csv files typically contain arrays as values. These arrays are typically read in as strings by most csv readers. To make it easier to read these files in a "science ready" way a utility function for pandas.read_csv
is provided in panoptes_aggregation.csv_utils
:
import pandas
from panoptes_aggregation.csv_utils import unjson_dataframe
# the `data.*` columns are read in as strings instead of arrays
data = pandas.read_csv('point_reducer_galaxy_and_star_mpl5.csv')
# use unjson_dataframe to convert them to lists
# all values are updated in place leaving null values untouched
unjson_dataframe(data)
To run a local version use:
docker-compose build
docker-compose up
and listen on localhost:5000
. The documentation will automatically be created and added to the '/docs' route.
To run the tests use:
docker-compose run aggregation /bin/bash -lc "nosetests"
- Use PEP8 syntax
- Automatic documentation will be created using sphinx so add doc strings to any files created and functions written
- A guide for writing extractors
- A guide for writing reducers