Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add annotation.reference information to the CSV exports #1628

Open
mkdir-washington-edu opened this issue Dec 20, 2024 · 3 comments
Open

Add annotation.reference information to the CSV exports #1628

mkdir-washington-edu opened this issue Dec 20, 2024 · 3 comments

Comments

@mkdir-washington-edu
Copy link

The problem
CSVs are much more approachable than JSON files for the average user, and instructors using annotation exports for various kinds of analysis want to see the relationship between annotations.

Example ticket: https://app.hubspot.com/contacts/6291320/record/0-5/18074066011

The solution
In the "export" option in the client, include the "reference" information so people using the exports for analysis can easily relate or reconstruct the annotation threads.

Example:
Current CSV export:
Created at Author Page URL Group Type Quote Comment Tags
2024-12-20 10:09 mdiroberts https://example.com/ abc internal testing? Reply reply
2024-10-30 14:02 mdiroberts https://example.com/ abc internal testing? Annotation documents anno question

Proposed CSV export:
Created at Author Page URL Group Type ID Reference Quote Comment Tags
2024-12-20 10:09 mdiroberts https://example.com/ abc internal testing? Reply "X72iLr7kEe-8vIsqCNlnHw" "F5YqwJbpEe-kJWcL3BHQxQ" reply
2024-10-30 14:02 mdiroberts https://example.com/ abc internal testing? Annotation "F5YqwJbpEe-kJWcL3BHQxQ" NULL documents anno question

@robertknight
Copy link
Member

The references field in the API is an array containing every ancestor of the annotation in the thread. Some ancestors may have been deleted, so you need the full list to be sure of being able to associate a reply with its top-level annotation. In JSON this is straightforward to encode as an array. In CSV we'd need to choose an encoding. The simplest solution is a comma-separated list, making sure that the field is properly escaped when exported.

@mkdir-washington-edu
Copy link
Author

For encoding: currently the list of tags on an annotation are handled correctly by Google Sheets when importing the csv, though I've seen issues with Excel properly decoding them. Excel will keep assuming that each tag in the list is a new column value, steadily displacing all f the data for subsequent rows.

Some additional context from an instructor (to help with prioritization):

JSON files are not practical. I need the text of annotation and replies in a text format to use in a text-based program for qualitative research. It is important to know which reply attaches to what “original” annotation for the purpose of data analysis, since I will need to treat replied to annotation differently to original annotations. Also, for an in-depth content analysis, I need to know which rely matches to what annotation. The replies are almost useless unless I know what they are replying to.

@robertknight
Copy link
Member

I'd forgotten we'd already had to solve encoding lists for handling the tags field. We should treat references in the same way. The request makes sense and is likely quite straightforward to implement.

JSON files are not practical. I need the text of annotation and replies in a text format to use in a text-based program for qualitative research.

For what it's worth, an interim solution may be to use AI to help with this:

  1. Go to ChatGPT
  2. Start a new chat and attach an exported JSON file
  3. Enter a prompt like: "Convert the records in this JSON file to CSV. Include only these fields: ID, username, text, tags, references."

This worked for me for a small file of 10-20 annotations. Not sure if it will work with a much larger one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants