Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor to expose parseTranscript method to resolve CORS error from Obsidian youtube-summarizer plugin #34

Open
rshmhrj opened this issue Jul 25, 2024 · 0 comments

Comments

@rshmhrj
Copy link

rshmhrj commented Jul 25, 2024

Hi all, I have helped make a few minor updates in an Obsidian plugin named youtube-summarizer. It has been broken for a while due to #19, and I have finally gotten a chance to look into it.

I started by updating to the latest version of youtube-transcript (from 1.0.6 to 1.2.1), and the TypeError issue went away, but it is replaced with some CORS errors, and I have no idea why.

image

Looking into the src, it seems like youtube-transcript is using fetch. Within the plugin, we are using the request module from the obsidian package and those requests go through without any CORS issues:

image

Instead of enabling proxy support like #25, I'm proposing a tiny refactor to modularize the fetching and parsing.

Instead of calling fetchTranscript(url) and get back the transcriptList like this:

const transcriptList = await YoutubeTranscript.fetchTranscript(url);
const transcript = transcriptList.map((transcript) => transcript.text).join(" ");

I'd like to do the fetch myself and call something like YoutubeTranscript.parseTranscript(transcriptText):

import { request } from "obsidian";
...
let response = await request({ url: url, method: "GET" });
const transcriptList = await YoutubeTranscript.parseTranscript(response);
const transcript = transcriptList.map((transcript) => transcript.text).join(" ");

Code-wise, YoutubeTranscript would have two public functions:

  public static async fetchTranscript( videoId: string, config?: TranscriptConfig ): Promise<TranscriptResponse[]>
  public static async parseTranscript( transcriptText: string ): Promise<TranscriptResponse[]>

fetchTranscript could be refactored/shortened to:

  public static async fetchTranscript( 
    videoId: string,
    config?: TranscriptConfig
  ): Promise<TranscriptResponse[]> {
    const identifier = this.retrieveVideoId(videoId);
    const videoPageResponse = await fetch(
      `https://www.youtube.com/watch?v=${identifier}`,
      {
        headers: {
          ...(config?.lang && { 'Accept-Language': config.lang }),
          'User-Agent': USER_AGENT,
        },
      }
    );
    const videoPageBody = await videoPageResponse.text();
    return this.parseTranscript(videoPageBody);

and parseTranscript would contain the rest of what is currently in fetchTranscript

Because the fetchTranscript signature and return type are the same, it should continue to work as is for existing users, and I think the modularity can help. Maybe not all CORS issues would be able to be resolved with this change, so a future proxy configuration option might still be needed. Idk.

I can draft a PR with these changes and test them out to at least validate that these minor changes would work for us and enable the youtube-summarizer plugin to finally start working again 😄

Please let me know your thoughts: @Kakulukian @mefengl and others

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant