Skip to content

Adding New Subjects and Local Databases

Will Granger edited this page Dec 4, 2019 · 4 revisions

There are several steps to follow when adding new subjects to the table.

Creating a Local Database for the Subjects

  1. Download a subject export .csv for the Galaxy Zoo project. This can be done through the Python client and also on the project builder page under data exports, "Request new subject export".

  2. Extract your subject set from export. You'll need to export the subject set(s) you want to put on the table from the aforementioned export and put those into a database for the table to use. There's a script that can do just this under the data digging repo. Don't forget to change two items in the script:

    • Use the correct .csv title for your data export on the .read_csv() line
    • Insert the subject set(s) you wish to extract for the subject_set_ids array
    • You may need to use python if python3 isn't executing on the command line. The script should create a new .csv that is ready for importing into a local db.
  3. Create a local DB using the newly created CSV. Use any SQL db creator (there is a good one linked on this README) to import the CSV you created into a SQL table. Important: The table should be named Subjects and follow the schema on this repo's README. Don't forget to put the created .db file into the "documents" folder of your computer with the correct name:

    • Staging: GZ_Staging_Subjects.db
    • Production: GZ_Production_Subjects.db

When adding new subjects to an existing DB, it's no problem if you need to delete the older Subjects database table on your system to import the updated CSV, thus creating a new Subjects table. If the subject CSV from Panoptes is recent, it should contain updated classification_counts. Also, Caesar will update answer tallies and classification counts when subjects are refreshed.

Your local database should be ready to go! If the database classification counts accidentally get wiped, the counts for both total answers and individual answers should be updated by Caesar. Classification counts and answer tallies for each subject are queried from Caesar when moving to a new cutout on the table. The local database is updated with these counts if they appear different from the local DB.

Gotchas!

  • When importing a .csv into a database, make sure the option to mark "first row as database columns" (or something similar) is selected.
  • Make sure subject_id is identified as TEXT in the database! Sometimes, importing from a .csv will automatically mark the column as INTEGER, which will crash the app.

Adding subjects locally

There is some information on the README about adding subjects locally. Local subjects are not necessary for the app to run, but doing so will improve performance. Local subjects were added to reduce the dependency on Panoptes to fetch subject images. However, if a subject is missing locally, the app will fall back on fetching the subject from Panoptes.