Updating extract_workflows_data.sh to be a python script #6

CKrawczyk · 2019-09-12T13:50:05Z

All of the aggregation code's command line API can be called from withing a python script. This can make interfacing with the convert_to_ibcc easier in the long run. The python version of this script would look like:

from panoptes_aggregation.scripts.config_workflow_panoptes import config_workflow
from panoptes_aggregation.scripts.extract_panoptes_csv import extract_csv
from io import StringIO
import pandas
import os

WORKFLOW_FILE = ''
DATA_OUT_DIR = ''
DATA_IN_DIR = ''

config_dir = os.path.join(DATA_OUT_DIR, 'config')
classification_csv_file = os.path.join(DATA_IN_DIR, 'classifications.csv')
subject_csv_file = os.path.join(DATA_IN_DIR, 'subejcts.csv')

workflows = pandas.read_csv(WORKFLOW_FILE)
workflow_ids = workflows.workflow_id.unique()

for workflow_id in workflow_ids:
    extractor_config, _, task_label_config = config_workflow(
        WORKFLOW_FILE,
        workflow_id
    )
    print('Exporting data from workflow: {0}'.format(workflow_id))
    extract_filenames = extract_csv(
        classification_csv_file,
        StringIO(str(extractor_config)),
        output_dir=DATA_OUT_DIR,
        order=True,
        output_name='workflow_{0}_classifications'.format(workflow_id)
    )
    if len(extract_filenames > 0):
        point_extracts_filename = [filename for filename in extract_filenames if 'point_extractor_by_frame' in filename][0]
        question_extracts_filename = [filename for filename in extract_filenames if 'question_extractor' in filename][0]
        ## Call convert_to_ibcc here

Where the three empty strings at the top can be read in via the command line (e.g. sys.argv or argparse).

The config_workflow function returns the extractor config as a dict, the reducer configs as a list of dicts (currently there is bug and this is only returning the last reducer config instead of all of them), and the task labels as a dict.

The extract_csv function returns a list of file paths for each extraction file written to disk. The second arg to this function is expecting a filename (or any file like object), so I use StringIO to convert the extractor config from a dict to a string being read in. You could also construct the path to the config file that was written to disk if you wanted.

At the end of this script is where the call to the convert_to_ibcc code should be called. It might be best if convert_to_ibccs argparse bit was converted to a python function that can be imported and called directly.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updating extract_workflows_data.sh to be a python script #6

Updating extract_workflows_data.sh to be a python script #6

CKrawczyk commented Sep 12, 2019

Updating extract_workflows_data.sh to be a python script #6

Updating extract_workflows_data.sh to be a python script #6

Comments

CKrawczyk commented Sep 12, 2019