Skip to content

Commit

Permalink
add assemblyai subcommand
Browse files Browse the repository at this point in the history
Fix #16
  • Loading branch information
swayanshupanda authored Oct 20, 2024
1 parent 5fce743 commit 8219c2e
Show file tree
Hide file tree
Showing 8 changed files with 256 additions and 27 deletions.
Binary file added .DS_Store
Binary file not shown.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -22,3 +22,6 @@ go.work
go.work.sum
podscript
scratch.go

#VS Code Settings
.vscode
44 changes: 37 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,16 @@
# podscript

podscript is a tool to generate transcripts for podcasts (and other similar audio files), using LLMs and other Speech-to-Text (STT) APIs. Currently, [ChatGPT](https://platform.openai.com/docs/overview), [Anthropic](https://docs.anthropic.com/en/api/getting-started), [Deepgram](https://playground.deepgram.com/?endpoint=listen&smart_format=true&language=en&model=nova-2) and [Groq](https://console.groq.com/playground) are supported.

## Prerequisites

You need an API key for at least one of the following services to use podscript:
* [ChatGPT API Key](https://platform.openai.com/api-keys) or [Anthropic API Key](https://console.anthropic.com/settings/keys), to clean up and transcribe YouTube autogenerated captions using either OpenAI's ChatGPT model or Anthropic's Claude model.
* [Deepgram API Key](https://developers.deepgram.com/docs/make-your-first-api-request#create-a-deepgram-api-key) to transcribe any podcast audio file. Deepgram has some excellent and cheap STT models and offers free signup and $200 in credit to get started.
* [Groq API Key](https://console.groq.com/keys) to clean up and transcribe YouTube autogenerated captions, or use Groq's `whisper-v3-large` model to transcribe an audio file.
*_(more APIs, for e.g. OpenAI Whisper will be supported in the future. Contributions are welcome)_.

- [ChatGPT API Key](https://platform.openai.com/api-keys) or [Anthropic API Key](https://console.anthropic.com/settings/keys), to clean up and transcribe YouTube autogenerated captions using either OpenAI's ChatGPT model or Anthropic's Claude model.
- [Deepgram API Key](https://developers.deepgram.com/docs/make-your-first-api-request#create-a-deepgram-api-key) to transcribe any podcast audio file. Deepgram has some excellent and cheap STT models and offers free signup and $200 in credit to get started.
- [Groq API Key](https://console.groq.com/keys) to clean up and transcribe YouTube autogenerated captions, or use Groq's `whisper-v3-large` model to transcribe an audio file.
- [Assembly API Key](https://www.assemblyai.com/) to use AssemblyAI's `best` model to transcribe an audio file.
-_(more APIs, for e.g. OpenAI Whisper will be supported in the future. Contributions are welcome)_.

## Install

Expand All @@ -17,15 +21,19 @@ You need an API key for at least one of the following services to use podscript:
```

## Configure
This command displays prompts to enter API keys for supported services, and write them to `$HOME/.podscript.toml`.

This command displays prompts to enter API keys for supported services, and write them to `$HOME/.podscript.toml`.

```shell
> podscript configure
```

Alternatively, you can set keys in environment variable prefixed with `PODSCRIPT_`, for e.g. `PODSCRIPT_OPENAI_API_KEY` and `PODSCRIPT_DEEPGRAM_API_KEY`.

## Usage

### Transcript from YouTube autogenerated captions

For podcasts on YouTube with autogenerated captions (e.g. [Andrew Huberman](https://www.youtube.com/watch?v=WFcYF_pxLgA) and [Cal Newport](https://www.youtube.com/watch?v=OvlfCW3Ec1g)), use the `ytt` subcommand to download the captions from the YouTube video and feed it to an LLM model to generate a clean transcript. You can customise the model used for transcription using the `--model` flag, which can be one of `gpt-4o-mini` (default if ommitted), `gpt-4o`, `claude-3-5-sonnet-20240620` or `llama-3.1-70b-versatile`.

```shell
Expand All @@ -39,6 +47,7 @@ To customise the path and add a recognizable suffix to the transcripts, use the
```

Sample Output:

```text
wrote raw autogenerated captions to /Users/deepak/Downloads/raw_transcript_2024-07-05-170548_short.txt
transcribed part 1/1…
Expand All @@ -48,6 +57,7 @@ wrote cleaned up transcripts to /Users/deepak/Downloads/cleaned_transcript_2024-
You can also customise the model used for transcription using the `--model` flag, which can be one of `gpt-4o-mini` (default if ommitted), `gpt-4o` or `claude-3-5-sonnet-20240620`.

### Transcript from Deepgram API

Use the `deepgram` subcommand to generate transcripts that are of a higher quality than YouTube autogenerated captions. Deepgram provides a [great API](https://playground.deepgram.com/?endpoint=listen&smart_format=true&language=en&model=nova-2) (with $200 free signup credit!) and excellent, fast models for transcribing audio files.

Locate the audio file link for any podcast on [ListenNotes](https://www.listennotes.com/) and use the `--from-url` option
Expand All @@ -57,6 +67,7 @@ Locate the audio file link for any podcast on [ListenNotes](https://www.listenno
```

Sample Output:

```text
podscript deepgram --from-url https://audio.listennotes.com/e/p/d6cc86364eb540c1a30a1cac2b77b82c/
wrote raw JSON API response to deepgram_api_response_2024-07-05-173538.json
Expand All @@ -67,22 +78,41 @@ Alternatively, you can pass a local audio file to the command by setting `--from

> [!TIP]
> You can find the audio download link for a podcast on ListenNotes under the More menu
>
>
> <img width="252" alt="image" src="https://github.com/deepakjois/podscript/assets/5342/1f400964-e575-4f59-9de0-ee75f386b27d">
### Transcript from Groq Whisper API

Use the `groq` subcommand to generate transcripts using the `whisper-v3-large` model from [Groq's API endpoint](https://console.groq.com/docs/speech-text) (which as of Jul 2024 is in beta and free to use within your rate limits).

```shell
> podscript groq huberman.mp3
```

Sample Output:

```text
wrote raw JSON API response to groq_whisper_api_response_2024-07-11-145154.json
wrote transcript to groq_whisper_api_transcript_2024-07-11-145154.txt
```

Use the `--verbose` flag to dump timestamps for audio segments in the raw JSON response.
Use the `--verbose` flag to dump timestamps for audio segments in the raw JSON response.

### Transcript from Assembly AI API

Use the `assemblyai` subcommand to generate transcripts using the `best` model from [Assembly AI's API endpoint](https://www.assemblyai.com/docs) (which as of Oct 2024 free to use within your credit limits and they provide $50 credits free on signup).

```shell
> podscript assemblyai --from-url https://audio.listennotes.com/e/p/d6cc86364eb540c1a30a1cac2b77b82c/
```

Sample Output:

```text
Wrote transcript to assemblyai_api_transcript_2024-10-04-191551.txt
```

Alternatively, you can pass a url to the command by setting `--url` flag and passing the url instead of local file path. You can also customise the path and add a recognizable suffix with `--path` and `--suffix` options.

## Feedback

Expand Down
142 changes: 142 additions & 0 deletions cmd/assemblyai/assemblyai.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,142 @@
package assemblyai

import (
"context"
"errors"
"fmt"
"net/url"
"os"
"path"
"path/filepath"
"time"

"github.com/spf13/cobra"
"github.com/spf13/viper"

aai "github.com/AssemblyAI/assemblyai-go-sdk"
)

const (
maxLocalFileSize int64 = 2200 * 1024 * 1024 // Approximate 2.2GB in bytes
)

func init() {
Command.Flags().StringP("path", "p", "", "save transcripts and API responses to path")
Command.Flags().StringP("suffix", "s", "", "append suffix to filenames for easier recognition")
Command.Flags().BoolP("verbose", "v", false, "fetch verbose JSON response (includes token and start/end timestamps)")
Command.Flags().StringP("from-url", "u", "", "URL of the audio file to transcribe")
Command.Flags().StringP("from-file", "f", "", "Local path to the audio file to transcribe")
}

var Command = &cobra.Command{
Use: "assemblyai",
Short: "Generate transcript of an audio file using Assembly AI's API.",
RunE: func(cmd *cobra.Command, args []string) error {
apiKey := viper.GetString("assemblyai_api_key")
if apiKey == "" {
return errors.New("assembly AI's API key not found. Please run 'podscript configure' or set the ASSEMBLYAI_API_KEY environment variable")
}

folder, _ := cmd.Flags().GetString("path")
suffix, _ := cmd.Flags().GetString("suffix")
audioURL, _ := cmd.Flags().GetString("from-url")
audioFilePath, _ := cmd.Flags().GetString("from-file")
verbose, _ := cmd.Flags().GetBool("verbose")

if folder == "" {
folder = "." // Default to current directory if no path is specified
}

folder = filepath.Clean(folder)
if fi, err := os.Stat(folder); err != nil || !fi.IsDir() {
return fmt.Errorf("path not found: %s", folder)
}

timestamp := time.Now().Format("2006-01-02-150405")
filenameSuffix := timestamp
if suffix != "" {
filenameSuffix = fmt.Sprintf("%s_%s", timestamp, suffix)
}

client := aai.NewClient(apiKey)
ctx := context.Background()

var transcript *aai.Transcript
var err error

if audioURL != "" {
// Handle URL input
parsedURL, err := url.ParseRequestURI(audioURL)
if err != nil || (parsedURL.Scheme != "http" && parsedURL.Scheme != "https") {
return fmt.Errorf("invalid URL: %s", audioURL)
}

params := &aai.TranscriptOptionalParams{
SpeakerLabels: aai.Bool(true),
Punctuate: aai.Bool(true),
FormatText: aai.Bool(true),
}
transcriptValue, err := client.Transcripts.TranscribeFromURL(ctx, audioURL, params)
if err != nil {
return fmt.Errorf("failed to transcribe from URL: %w", err)
}
transcript = &transcriptValue
fmt.Printf("Generated transcript from URL %s\n", audioURL)

} else if audioFilePath != "" {
// Handle file input
audioFilePath := filepath.Clean(audioFilePath)
fi, err := os.Stat(audioFilePath)
if err != nil || fi.IsDir() {
return fmt.Errorf("invalid audio file: %s", audioFilePath)
}

if fi.Size() > maxLocalFileSize {
return fmt.Errorf("file size exceeds 2.2GB limit")
}

file, err := os.Open(audioFilePath)
if err != nil {
return fmt.Errorf("error opening file: %w", err)
}
defer file.Close()

transcriptValue, err := client.Transcripts.TranscribeFromReader(ctx, file, nil)
if err != nil {
return fmt.Errorf("failed to transcribe from file: %w", err)
}
transcript = &transcriptValue
} else {
return errors.New("please provide either a valid URL or a file path")
}

if transcript == nil || transcript.Text == nil {
return errors.New("transcription failed: received nil transcript from AssemblyAI API")
}

transcriptFilename := path.Join(folder, fmt.Sprintf("assemblyai_transcript_%s.txt", filenameSuffix))
transcriptFilename = filepath.Clean(transcriptFilename)
file, err := os.Create(transcriptFilename)
if err != nil {
return fmt.Errorf("failed to create transcript file: %w", err)
}
defer file.Close()

for _, utterance := range transcript.Utterances {
_, err := fmt.Fprintf(file, "Speaker %s: %s\n\n",
aai.ToString(utterance.Speaker),
aai.ToString(utterance.Text),
)
if err != nil {
return fmt.Errorf("failed to write utterance to file: %w", err)
}
}
fmt.Printf("Wrote transcript to %s\n", transcriptFilename)

if verbose {
fmt.Printf("Transcript metadata: %+v\n", transcript)
}

return nil
},
}
5 changes: 5 additions & 0 deletions cmd/configure/configure.go
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,11 @@ var Command = &cobra.Command{
return err
}

// Assembly AI
if err := setViperKeyFromPrompt("AssemblyAI API key", "assemblyai_api_key"); err != nil {
return err
}

err := viper.WriteConfigAs(viper.ConfigFileUsed())
if err != nil {
return fmt.Errorf("error writing config: %v", err)
Expand Down
3 changes: 3 additions & 0 deletions cmd/root.go
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ import (
"github.com/deepakjois/podscript/cmd/deepgram"
"github.com/deepakjois/podscript/cmd/groq"
"github.com/deepakjois/podscript/cmd/ytt"
"github.com/deepakjois/podscript/cmd/assemblyai"
"github.com/spf13/cobra"
"github.com/spf13/viper"
)
Expand All @@ -25,6 +26,7 @@ var supportedLLMKeys = []string{
"openai_api_key",
"anthropic_api_key",
"groq_api_key",
"assemblyai_api_key",
}

func init() {
Expand All @@ -34,6 +36,7 @@ func init() {
rootCmd.AddCommand(ytt.Command)
rootCmd.AddCommand(deepgram.Command)
rootCmd.AddCommand(groq.Command)
rootCmd.AddCommand(assemblyai.Command)
rootCmd.CompletionOptions.DisableDefaultCmd = true
rootCmd.SilenceUsage = true
}
Expand Down
44 changes: 24 additions & 20 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ module github.com/deepakjois/podscript
go 1.22.4

require (
github.com/AssemblyAI/assemblyai-go-sdk v1.8.1
github.com/charmbracelet/huh v0.4.2
github.com/deepakjois/ytt v0.0.0-20240922124700-664221d83d24
github.com/deepgram/deepgram-go-sdk v1.3.6
Expand All @@ -12,38 +13,31 @@ require (
)

require (
github.com/atotto/clipboard v0.1.4 // indirect
github.com/aymanbagabas/go-osc52/v2 v2.0.1 // indirect

// indirect dependencies
github.com/catppuccin/go v0.2.0 // indirect
github.com/cenkalti/backoff v2.2.1+incompatible // indirect
github.com/charmbracelet/bubbles v0.18.0 // indirect
github.com/charmbracelet/bubbletea v0.26.6 // indirect
github.com/charmbracelet/lipgloss v0.11.0 // indirect
github.com/charmbracelet/x/exp/strings v0.0.0-20240524151031-ff83003bf67a // indirect
github.com/dlclark/regexp2 v1.11.0 // indirect
github.com/dustin/go-humanize v1.0.1 // indirect
github.com/go-logr/logr v1.4.1 // indirect
github.com/google/uuid v1.6.0 // indirect
github.com/gorilla/schema v1.3.0 // indirect
github.com/pkoukk/tiktoken-go v0.1.6 // indirect
github.com/rogpeppe/go-internal v1.12.0 // indirect
gitlab.com/golang-commonmark/html v0.0.0-20191124015941-a22733972181 // indirect
gitlab.com/golang-commonmark/linkify v0.0.0-20191026162114-a0c2df6c8f82 // indirect
gitlab.com/golang-commonmark/markdown v0.0.0-20211110145824-bf3e522c626a // indirect
gitlab.com/golang-commonmark/mdurl v0.0.0-20191124015652-932350d1cb84 // indirect
gitlab.com/golang-commonmark/puny v0.0.0-20191124015043-9f83538fa04f // indirect
gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c // indirect
k8s.io/klog/v2 v2.110.1 // indirect
)

require (
github.com/atotto/clipboard v0.1.4 // indirect
github.com/aymanbagabas/go-osc52/v2 v2.0.1 // indirect
github.com/charmbracelet/x/ansi v0.1.2 // indirect
github.com/charmbracelet/x/exp/strings v0.0.0-20240524151031-ff83003bf67a // indirect
github.com/charmbracelet/x/input v0.1.1 // indirect
github.com/charmbracelet/x/term v0.1.1 // indirect
github.com/charmbracelet/x/windows v0.1.2 // indirect
github.com/dlclark/regexp2 v1.11.0 // indirect
github.com/dustin/go-humanize v1.0.1 // indirect
github.com/erikgeiser/coninput v0.0.0-20211004153227-1c3628e74d0f // indirect
github.com/fsnotify/fsnotify v1.7.0 // indirect
github.com/go-logr/logr v1.4.1 // indirect
github.com/google/go-querystring v1.1.0 // indirect
github.com/google/uuid v1.6.0 // indirect
github.com/gorilla/schema v1.4.1 // indirect
github.com/hashicorp/hcl v1.0.0 // indirect
github.com/inconshreveable/mousetrap v1.1.0 // indirect
github.com/klauspost/compress v1.17.6 // indirect
github.com/lucasb-eyer/go-colorful v1.2.0 // indirect
github.com/magiconair/properties v1.8.7 // indirect
github.com/mattn/go-isatty v0.0.20 // indirect
Expand All @@ -54,7 +48,9 @@ require (
github.com/muesli/cancelreader v0.2.2 // indirect
github.com/muesli/termenv v0.15.2 // indirect
github.com/pelletier/go-toml/v2 v2.2.2 // indirect
github.com/pkoukk/tiktoken-go v0.1.6 // indirect
github.com/rivo/uniseg v0.4.7 // indirect
github.com/rogpeppe/go-internal v1.12.0 // indirect
github.com/sagikazarmark/locafero v0.4.0 // indirect
github.com/sagikazarmark/slog-shim v0.1.0 // indirect
github.com/sourcegraph/conc v0.3.0 // indirect
Expand All @@ -63,11 +59,19 @@ require (
github.com/spf13/pflag v1.0.5 // indirect
github.com/subosito/gotenv v1.6.0 // indirect
github.com/xo/terminfo v0.0.0-20220910002029-abceb7e1c41e // indirect
gitlab.com/golang-commonmark/html v0.0.0-20191124015941-a22733972181 // indirect
gitlab.com/golang-commonmark/linkify v0.0.0-20191026162114-a0c2df6c8f82 // indirect
gitlab.com/golang-commonmark/markdown v0.0.0-20211110145824-bf3e522c626a // indirect
gitlab.com/golang-commonmark/mdurl v0.0.0-20191124015652-932350d1cb84 // indirect
gitlab.com/golang-commonmark/puny v0.0.0-20191124015043-9f83538fa04f // indirect
go.uber.org/multierr v1.11.0 // indirect
golang.org/x/exp v0.0.0-20240222234643-814bf88cf225 // indirect
golang.org/x/sync v0.7.0 // indirect
golang.org/x/sys v0.21.0 // indirect
golang.org/x/text v0.15.0 // indirect
gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c // indirect
gopkg.in/ini.v1 v1.67.0 // indirect
gopkg.in/yaml.v3 v3.0.1 // indirect
k8s.io/klog/v2 v2.110.1 // indirect
nhooyr.io/websocket v1.8.7 // indirect
)
Loading

0 comments on commit 8219c2e

Please sign in to comment.