Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YouTube Comments #60

Closed
EtorixDev opened this issue Nov 8, 2023 · 6 comments · Fixed by #66
Closed

YouTube Comments #60

EtorixDev opened this issue Nov 8, 2023 · 6 comments · Fixed by #66

Comments

@EtorixDev
Copy link

Hello, I notice in #17 it's stated that getting comments is not part of the InnerTube API. I'm not sure if things have changed or if I am misunderstanding what constitutes as part of the InnerTube API, but by doing the following I have managed to get the comments:

  1. Send a next request to https://www.youtube.com/youtubei/v1/next?key={key} with the specified video ID in the data.
  2. Extract the continuation token. There's a default, a "Top" sort, and a "New" sort. I've only tried the default.
  3. Sending a second next request without specifying the video ID, but instead specifying the continuation in the data block.
  4. This should return the first 20 or so comments in a very ugly nested way.

Something I've yet to figure out is how to get a highlighted comment to appear at the top of the json list. If you click on a YouTube comment's date, it will open a link with a "&lc=" param that has the comment's ID. And in the comments it will appear at the top as "Highlighted".

If I use the continuation token for the second request from the dev tools inspector when loading the highlighted comment link in the browser then the second next request properly returns the highlighted comment at the top of the json list.

However, if I try using the continuation retrieved from the first next request programmatically then it always returns the comments without the highlighted comment at the top, so it can be assumed the highlighted comment is tied to the continuation token which seems to be generated outside of the scope of the next endpoint, unless I've simply not found the correct way yet.

@tombulled
Copy link
Owner

Hi, apologies for the late reply, I'll take a look into this now

@tombulled
Copy link
Owner

tombulled commented Jan 6, 2024

I've been able to reproduce the ability to list the first n comments (either "top" or "newest").

Here's the (admittedly lashed together) script I used:

from innertube import InnerTube

ENGAGEMENT_SECTION_COMMENTS = "engagement-panel-comments-section"
C0MMENTS_TOP = "Top comments"
COMMENTS_NEWEST = "Newest first"


def parse_text(text):
    return "".join(run["text"] for run in text["runs"])


def extract_engagement_panels(next_data):
    engagement_panels = {}
    raw_engagement_panels = next_data.get("engagementPanels", [])

    for raw_engagement_panel in raw_engagement_panels:
        engagement_panel = raw_engagement_panel.get(
            "engagementPanelSectionListRenderer", {}
        )
        target_id = engagement_panel.get("targetId")

        engagement_panels[target_id] = engagement_panel

    return engagement_panels


def parse_sort_filter_sub_menu(menu):
    menu_items = menu["sortFilterSubMenuRenderer"]["subMenuItems"]

    return {menu_item["title"]: menu_item for menu_item in menu_items}


def extract_comments(next_continuation_data):
    return [
        continuation_item["commentThreadRenderer"]
        for continuation_item in next_continuation_data["onResponseReceivedEndpoints"][
            1
        ]["reloadContinuationItemsCommand"]["continuationItems"][:-1]
    ]


# YouTube Web CLient
client = InnerTube("WEB", "2.20240105.01.00")

# ShortCircuit - Dell just DESTROYED the Surface Pro! - Dell XPS 13 2-in-1
video = client.next("BV1O7RR-VoA")

engagement_panels = extract_engagement_panels(video)
comments = engagement_panels[ENGAGEMENT_SECTION_COMMENTS]
comments_header = comments["header"]["engagementPanelTitleHeaderRenderer"]
comments_title = parse_text(comments_header["title"])
comments_context = parse_text(comments_header["contextualInfo"])
comments_menu_items = parse_sort_filter_sub_menu(comments_header["menu"])
comments_top = comments_menu_items[C0MMENTS_TOP]
comments_top_continuation = comments_top["serviceEndpoint"]["continuationCommand"][
    "token"
]

print(f"{comments_title} ({comments_context})...")
print()

comments_continuation = client.next(continuation=comments_top_continuation)

comments = extract_comments(comments_continuation)

for comment in comments:
    comment_renderer = comment["comment"]["commentRenderer"]

    comment_author = comment_renderer["authorText"]["simpleText"]
    comment_content = parse_text(comment_renderer["contentText"])

    print(f"[{comment_author}]")
    print(comment_content)
    print()
$ python app.py
Comments (1.7K)...

[@ViXoZuDo]
I would 100% prefer the headphone jack over that camera...

[@ouilsen2]
As a Surface Pro user I have one observation...

...

(I'll add this to the examples/ directory in case it helps anyone else)

I'll have a fiddle with highlighting a comment now in case I can figure out what's going on there

@tombulled
Copy link
Owner

It looks like highlighting a comment sends off a request to the /next endpoint with some params and the videoId. I'll see if I can whip up a quick PoC for this now

@tombulled
Copy link
Owner

I think I've figured out what was happening with highlighting a comment not working. The continuation tokens for "top" and "newest" you can extract from engagementPanels aren't influenced by the params passed to the /next endpoint, however the continuation token for the comment-item-section does change.

The below example ignores the engagementPanels entirely and instead uses the continuation token for the comments item section:

from innertube import InnerTube

# YouTube Web CLient
CLIENT = InnerTube("WEB", "2.20240105.01.00")


def parse_text(text):
    return "".join(run["text"] for run in text["runs"])


def flatten(items):
    flat_items = {}

    for item in items:
        key = next(iter(item))
        val = item[key]

        flat_items.setdefault(key, []).append(val)

    return flat_items


def flatten_item_sections(item_sections):
    return {
        item_section["sectionIdentifier"]: item_section
        for item_section in item_sections
    }


def extract_comments(next_continuation_data):
    return [
        continuation_item["commentThreadRenderer"]
        for continuation_item in next_continuation_data["onResponseReceivedEndpoints"][
            1
        ]["reloadContinuationItemsCommand"]["continuationItems"][:-1]
    ]


def extract_comments_continuation_token(next_data):
    contents = flatten(
        next_data["contents"]["twoColumnWatchNextResults"]["results"]["results"][
            "contents"
        ]
    )
    item_sections = flatten_item_sections(contents["itemSectionRenderer"])
    comment_item_section_content = item_sections["comment-item-section"]["contents"][0]
    comments_continuation_token = comment_item_section_content[
        "continuationItemRenderer"
    ]["continuationEndpoint"]["continuationCommand"]["token"]

    return comments_continuation_token


def get_comments(video_id, params=None):
    video = CLIENT.next(video_id, params=params)

    continuation_token = extract_comments_continuation_token(video)

    comments_continuation = CLIENT.next(continuation=continuation_token)

    return extract_comments(comments_continuation)


def print_comment(comment):
    comment_renderer = comment["comment"]["commentRenderer"]

    comment_author = comment_renderer["authorText"]["simpleText"]
    comment_content = parse_text(comment_renderer["contentText"])

    print(f"[{comment_author}]")
    print(comment_content)
    print()


video_id = "BV1O7RR-VoA"

# Get comments for a given video
comments = get_comments(video_id)

# Select a comment to highlight (in this case the 3rd one)
comment = comments[2]

# Print the comment we're going to highlight
print("### Highlighting Comment: ###")
print()
print_comment(comment)
print("---------------------")
print()

# Extract the 'params' to highlight this comment
params = comment["comment"]["commentRenderer"]["publishedTimeText"]["runs"][0][
    "navigationEndpoint"
]["watchEndpoint"]["params"]

# Get comments, but highlighting the selected comment
highlighted_comments = get_comments(video_id, params=params)

print("### Comments: ###")
print()

for comment in highlighted_comments:
    print_comment(comment)
$ python app.py
### Highlighting Comment: ###

[@alphacompton]
The built in mic on the 2-1 is exceptional and the camera is excellent from your video sample. Look like a better buy especially if it's cheaper than the Surface pro.

---------------------

### Comments: ###

[@alphacompton]
The built in mic on the 2-1 is exceptional and the camera is excellent from your video sample. Look like a better buy especially if it's cheaper than the Surface pro.

[@ouilsen2]
As a Surface Pro user I have one observation....

...

Hope that helps!

Please let me know if you have any further questions, or if this answers your query

Best, Tom

@EtorixDev
Copy link
Author

Hi, thanks for the detailed reply.

The idea behind the highlighting was to store a reference (such as the comment ID) to it in a database and come back to it later. One such use case would be a system that checks for the existence of a membership badge on a user's message monthly. That's why it would have been ideal to have a way to programmatically jump straight to the comment in 1 request like in the browser (on the initial lookup, not just subsequent ones).

Unfortunately from your response it seems "highlighting" a comment internally is done with the comment's watchEndpoint params, so the initial request for the comment will require scraping them all until the target comment is found by checking for the comment ID, and then storing the params instead of the comment ID for future immediate lookup.

Would this work, or do you suspect the params of comments change often?

Thanks again.

@tombulled
Copy link
Owner

Hi @EtorixDev, apologies for the late turn around on a reply to your last comment. I believe the params field contains base-64 encoded protobuf data (potentially also url-encoded). You should be able to decode the contents of the param using a tool such as https://protobuf-decoder.netlify.app/. It is possible that the protobuf structure contains the comment ID, and that all other fields are static. If this is the case, you should be able to generate the correct params value knowing only the comment ID.

Unfortunately I went to test this using the examples/list-video-comments-highlighted.py example script I wrote a while back and it seems YouTube has changed their comments API around again. If I get some spare time I'll give the API another poke, however I hope this comment has at least given you a bit of a steer 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants