-
-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Customize Transcription Output #161
Comments
Hey there! First, since this is pretty much standard reformatting, I think a custom transcription output is the way to go. I'm going to try to see how to best approach this since - as you also pointed out - a lot of studios / post houses actually have unique preferences when it comes to this, and coding it universally for everybody wouldn't make too much sense. But allowing anyone to make up their own custom export templates might make more sense... To address some of the issues that you also mentioned:
If you're referring to AVID DS exports, I think this is due to the standard format. Apart from adding the speaker in front of every line, there's not much we can change I think.
You can OPTION/ALT + click on the transcript or right-click -> Edit and rename the detected speakers. You can also use CMD/CTRL+F to find and replace speaker names in bulk. Or, is there something not working correctly on your end?
The Assistant should be able to do this, especially when using GPT-4, but since this is solvable by a simple formatting algorithm, I think using AI for the task is an overkill (and costly, depending on the amount of transcripts you deal with)...
Just to make sure, if you have an OpenAI key, it needs to be entered in Preferences -> Assistant -> OpenAI API Key Cheers! |
Thanks Octimot! I'll look for a solution to get it quickly reformatted in the meantime. GPT deifnitely can do it but the character limit impedes me. I'll re-enter my api key to see if that corrects the issue. I'm beginning with the transcription features first as that can cure a lot of pain. We are primarily an Avid house although we do have some Resolve and one show that uses Premeire. I look forward to seeing how else we can benefit from your tool. |
I'll push an update on Github that allows the creation of custom transcription exports sometime next week or maybe sooner... For the particular use case that you mentioned, the template you'd need to create would probably look like this: name: Custom Export Template
extension: txt
segment_template: |
[{transcription_name}]
{segment_start_tc}
{segment_speaker_name}: {segment_text}
segment_separator: "\n\n" Once this is saved in a .yaml file in templates/transcription_export you'll be able to export exactly in the format you need. Question: are you using the git version of the tool, or the standalone? |
Thanks so much for that! I am using the standalone on a windows 10 workstation. The version that plugs into Resovle may be beneficial down the line. I'll answer that question once I start working with my AEs on this to see what they think. I want to get it functional for us first though. I'll try your template! |
Both the git and the standalone versions should connect to Resolve Studio. It would be great if you could attempt a git installation because you'll be able to access the update I was mentioning faster (as soon as I push it to Github)! I'll come back on this issue when it's up and ready. Cheers |
I'm working on the git, I am having some issues. I guess I should be on python 3.10 rather than 3.12? |
Yes, some of the packages that the tool is using are not tested or not compatible with anything newer than 3.10. |
Ok, I have StoryToolkitAI GIT launchable. |
I just pushed version 0.24.1 which includes custom transcription export templates (commit bb2011a) Just update the tool and try to add the custom template that I recommended above in the Full instructions for how to work with custom export templates here. Please let me know if the templates work on your end. Cheers! |
Will do, Thanks! |
Thanks so much for your attention, Octimot! I have it working pretty well now. It is still struggling with speaker detection so I'm playing with models and settings to try to fine tune that. An issue I've come across with this process is that if I generate a transcription and Detect speakers via the Ingest, I get a valid transcription from my custom template yaml. But if I try to change some settings and run Detect Speakers on the json file that was already built, I lose the content when exporting a new transcription. It just gives me the header. I'll attach those here but please let me know ifthis should be an entirely new issue or if this is a good thread for this. I do have the workaround of it working the first time it's generated although it does occasionally fail the Speaker ID for some reason. |
As far as I can tell, you should remove the conditional with the speaker, see below:
What that does is it tells the export function to only export segments that have "Speaker 1" as the speaker name. Cheers! |
I've created a WAV in Avid with timecode. Imported into Avid and Resolve to check it exists which is does. However the timecode doesn't seem to carry across to StoryToolKitAI? H264s are fine. |
Is your feature request related to a problem? Please describe.
We submit transcriptions of interviews and clips to studio producers. They have a very specific format that they want. I can potentially achieve this by taking the srt file and having another ai reformat it for me, but that's a lot of tedious extra steps. Is it possible to reconfigure the transcription provided by StoryToolkit?
Describe the solution you'd like
I would like a config file or settings within the app. Here's an example of what I mean:
In app, the transcription looks like this:
Speaker 2
It's fine.
Speaker 3
Pick a roll, baby girl.
I need to be able to output that to something like this:
[011824_JOE INTV]
13:51:51
WOMAN 2: It's fine.
[011824_JOE INTV]
13:51:55
INTERVIEWER: Pick a roll, baby girl.
Every segment has the name of the file, the timecode, and the transcript.
If I output to text for Avid, it doesn't include timecodes or anything, just the transcript with line breaks, not even speakers. Also, I would like to be able to rename identified speakers for use in this output.
Describe alternatives you've considered
The workaround would be to feed the file to an ai text tool and have it reformat to what I need.
Additional context
Sorry if this is addressed elsewhere already. Also maybe the assistant could do this. I can't get it to work, it just says it is having trouble connecting. I do have an API key.
Thanks!
The text was updated successfully, but these errors were encountered: