Merge pull request #719 from pupil-labs/event-automation-alpha-lab

Event Automation Alpha Lab
pupil-labs · Nov 15, 2024 · 6a926e4 · 6a926e4
2 parents 992d5e5 + d1471b0
commit 6a926e4
Show file tree

Hide file tree

Showing 4 changed files with 109 additions and 2 deletions.
diff --git a/alpha-lab/.vitepress/config.mts b/alpha-lab/.vitepress/config.mts
@@ -51,6 +51,7 @@ let theme_config_additions = {
       items: [
         { text: "Undistort Video and Gaze Data", link: "/undistort/" },
         { text: "Use Neon with Pupil Capture", link: "/neon-with-capture/" },
+        { text: "Automate Event Annotations with Pupil Cloud and GPT", link: "/event-automation-gpt/" }
       ],
     },
   ],

diff --git a/alpha-lab/cards.json b/alpha-lab/cards.json
@@ -57,7 +57,7 @@
       "href": "/alpha-lab/scanpath-rim/#generate-static-and-dynamic-scanpaths"
     },
     "image": "/alpha-lab/scanpath_image_nad.webp",
-    "category": "Gaze Mapping"
+    "category": "Reference Image Mapper"
   },
   {
     "title": "Uncover Gaze Behaviour on Phones",
@@ -158,5 +158,15 @@
     },
     "image": "/alpha-lab/imu-transformations.webp",
     "category": "Coordinate Systems"
-  }
+  },
+  {
+    "title": "Automate Event Annotations with Pupil Cloud and GPT",
+    "details": "Automatically annotate important activities and events in your Pupil Cloud recordings with GPT-4o.",
+    "link": {
+        "text": "View",
+        "href": "/alpha-lab/event-automation-gpt/"
+    },
+    "image": "/alpha-lab/event-annotation.webp",
+    "category": "Other"
+}
 ]
diff --git a/alpha-lab/event-automation-gpt/index.md b/alpha-lab/event-automation-gpt/index.md
@@ -0,0 +1,96 @@
+---
+title: 
+description: ""
+permalink: /alpha-lab/event-automation-gpt
+meta:
+  - name: twitter:card
+    content: player
+  - name: twitter:image
+    content: "https://i.ytimg.com/vi/cuvWqVOAc5M/maxresdefault.jpg"
+  - name: twitter:player
+    content: "https://www.youtube.com/embed/cuvWqVOAc5M"
+  - name: twitter:width
+    content: "1280"
+  - name: twitter:height
+    content: "720"
+  - property: og:image
+    content: "https://i.ytimg.com/vi/cuvWqVOAc5M/maxresdefault.jpg"
+tags: [Neon, Cloud]
+---
+
+<script setup>
+import TagLinks from '@components/TagLinks.vue'
+</script>
+
+# Automate Event Annotations with Pupil Cloud and GPT
+
+<TagLinks :tags="$frontmatter.tags" />
+
+<Youtube src="IFV5hG3HQW8"/>
+
+::: tip
+Tired of endless manual frame-by-frame coding? What if you could automatically capture when people focused on specific 
+objects or what activities they were engaged in, all without sifting through hours of eye-tracking footage? Here, we 
+demonstrate how to build your own GPT-based personal annotation assistant!
+:::
+
+## Scaling Eye Tracking Analysis with Automation: A Proof of Concept
+
+In eye tracking research, analyzing recordings and identifying key moments—such as when users interact with specific 
+objects—has typically required a tedious, frame-by-frame review. This manual process is time-consuming and limits scalability.
+
+In this article, we explore how automation can overcome these challenges. Using a Large Multimodal Model (OpenAI's GPT-4o), we 
+experiment with prompts to detect specific actions, such as reaching for an object, or what features of the environment 
+were being gazed at, and automatically add the respective annotations to [Pupil Cloud](https://pupil-labs.com/products/cloud) 
+recordings via the Pupil Cloud API. While still in its early stages, this approach shows promise in making the annotation 
+process faster and more scalable.
+
+## What This Tool Brings to the Table
+
+This tool comes at a time when the need for more efficient eye tracking analysis workflows is growing. For example, our 
+latest eye tracker, [Neon](https://pupil-labs.com/products/neon), can record (accurately and robustly) for over four hours 
+continuously. This makes larger-scale data collections feasible.
+
+By combining the GPT-4o model with customizable prompts, we test how users can automate the identification process, 
+offering a potential solution for streamlining what can be a labor-intensive process.
+
+## Getting Started
+
+With this tool, getting started is simple. You'll need to upload recordings to Pupil Cloud, obtain a developer token 
+(click [**here**](https://cloud.pupil-labs.com/settings/developer) to obtain yours), and have an OpenAI key. Then, 
+follow the setup guide linked in [**our Github repository**](https://github.com/pupil-labs/automate_custom_events), 
+which provides all necessary instructions. 
+
+The tool's user-friendly GUI will prompt you to select recording details and enter your desired prompts.
+
+## Using The Right Prompt
+
+When defining the prompts, clarity and specificity are essential to optimize results with GPT-4o. Users can follow the 
+recommendations listed below to improve detection accuracy:
+
+- Be clear and specific: Instead of _"the driver is looking around"_ use _"the driver is looking at the rearview mirror"_.
+- Use present tense: Frame prompts in the present tense to align with the video's real-time context, such as _"the driver is adjusting the mirror"_.
+- Include relevant context: Add details when necessary, like _"the driver is checking the rearview mirror while merging into traffic"_ to give the model more information to work with.
+- Avoid abstract or subjective terms: Stick to concrete, observable actions that can be visually confirmed in the video. Avoid using emotions or intentions as part of the prompt. For example, consider _"the driver is looking at their phone"_ instead of, _"the driver is distracted"_.
+- Use specific objects or locations: Mention key objects or areas in the frame to guide the model's attention. For instance, _"the person is pointing at the map on the wall"_ is better than _"the person is pointing"_.
+- Limit prompts to a single action per frame: For complex scenes with multiple activities, split them into individual prompts to improve detection.
+
+## Event Annotations In Pupil Cloud: Powered By Your Prompts
+
+After processing, event annotations are automatically added to your recording in Pupil Cloud, aligned with frames where 
+the specified activities were detected.
+
+How you then use these events is up to you. In the example video, we chose to run the 
+[Reference Image Mapper](https://docs.pupil-labs.com/neon/pupil-cloud/enrichments/reference-image-mapper/) enrichment 
+between events that corresponded to the beginning and end of a section of interest. But you could also download the 
+events and use them for offline analysis, such as computing blink rate for the same section of the recording.
+
+Our initial tests with GPT-4o have shown promising potential in detecting gazed-at objects and recognizing prompted 
+activities. Achieving the best results relies on the clarity and specificity of the provided prompts. 
+
+We think this tool could mark the beginnings of making eye tracking analysis more dynamic and efficient. Be sure to 
+experiment and post feedback on our [Discord server](https://pupil-labs.com/chat)!
+
+::: tip
+Need assistance automating event annotation via the Cloud API? Reach out to us via email at [[email protected]](mailto:[email protected]), on our [Discord server](https://pupil-labs.com/chat/), or visit our [Support Page](https://pupil-labs.com/products/support/) for formal support options.
+:::
diff --git a/alpha-lab/public/event-annotation.webp b/alpha-lab/public/event-annotation.webp