Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions surrounding speech_to_text_generation_service (e.g., do we need a speech_to_text_request_service REST API?) #5

Closed
4 of 6 tasks
Tracked by #1
jmartin-sul opened this issue Sep 12, 2024 · 1 comment
Labels

Comments

@jmartin-sul
Copy link
Member

jmartin-sul commented Sep 12, 2024

to unblock this ticket, some progress probably needs to be made on #2 (so on one or both of its subquests, #3 and #4). otherwise, these questions might not have enough grounding in specific implementation goals.

  • How do we trigger speech_to_text_generation_service service to run on some content? A REST call to a speech-to-text API that sits in front of the speech_to_text_generation_service doing the work, however it's implemented? Something that watches the input S3 bucket for new content, a la what we had abbyy_watcher do for detecting content to OCR, triggering speech_to_text_generation_service?
    • One advantage of having speech_to_text_generation_service sit behind an API of our definition is that, whether the speech-to-text generation is done by Whisper in a Docker container we define or by a managed service like SageMaker (or some third option), the API could remain stable from the perspective of e.g. our workflow service. This indicates that we should look at the SageMaker API and similar APIs to see if we can craft an API that is agnostic to those possible implementations/adapters, should we stand up our own REST API. Another question if we run our own REST API is whether it lives in our on prem infrastructure, or on the cloud side where the speech-to-text tool runs.
    • what might the REST API be called if we implement one? speech_to_text_request_service? workflow service talks to speech_to_text_request_service talks to speech_to_text_generation_service (which probably calls back to workflow service so that the workflow steps can proceed)?
    • If we have a REST API, does that service run on prem, or does it run in the cloud with the service that actually does the text extraction?
  • Can we use something like localstack for local development to simulate things like s3 buckets? Docker compose file could point at localstack bucket for testing if we go the route of implementing speech-to-text generation via Docker container of our definition? And if we go the route of a manged service like SageMaker, we could still use localstack to simulate the puts/gets to/from S3 buckets, which is likely how the input/output files would be transferred regardless of underlying speech-to-text implementation?
  • For either the custom container or managed service approach, do we need to spin up a queue or message service on the cloud side, e.g. SNS or SQS if using AWS?
@jmartin-sul jmartin-sul changed the title Questions surrounding transcript_generation_service itself (e.g., do we need a transcript_request_service REST API?) Questions surrounding transcript_generation_service (e.g., do we need a transcript_request_service REST API?) Sep 12, 2024
@jmartin-sul jmartin-sul changed the title Questions surrounding transcript_generation_service (e.g., do we need a transcript_request_service REST API?) [blocked] Questions surrounding transcript_generation_service (e.g., do we need a transcript_request_service REST API?) Sep 12, 2024
@jmartin-sul jmartin-sul changed the title [blocked] Questions surrounding transcript_generation_service (e.g., do we need a transcript_request_service REST API?) Questions surrounding transcript_generation_service (e.g., do we need a transcript_request_service REST API?) Sep 12, 2024
@jmartin-sul jmartin-sul changed the title Questions surrounding transcript_generation_service (e.g., do we need a transcript_request_service REST API?) Questions surrounding speech_to_text_generation_service (e.g., do we need a speech_to_text_request_service REST API?) Sep 13, 2024
@jmartin-sul
Copy link
Member Author

we're using SQS both to notify our STT worker container that there is content to process, and to notify SDR that content has been processed. no custom REST API needed. see e.g. sul-dlss/common-accessioning#1356 and sul-dlss/common-accessioning#1358

localstack investigation ticketed as sul-dlss/common-accessioning#1364

closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant