Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YouTube] Add more parameters to InnerTube requests, use the iOS client for livestreams and fix extraction of embeddable age-restricted videos and contents with a warning before playback #780

Merged
merged 18 commits into from
Apr 9, 2022

Conversation

AudricV
Copy link
Member

@AudricV AudricV commented Jan 15, 2022

This PR adds more parameters to InnerTube requests:

  • the playbackContext JSON object sent in player requests of the desktop web client, only for the web client (it seems it was needed to avoid some throttling);

  • a parameter not sent by all clients on player request bodies (I saw it on the web client and I do not see it right now): cpn, aka contentPlaybackNonce. It seems it's like a sort of client authenticity sent by all official clients on videoplayback URLs: it uses strong random values to generate a 16-character-string. This PR adds it on both parts (request bodies of player requests and videoplayback URLs) for all clients;

  • the id of the video as a URL parameter of the InnerTube requests and a t parameter, on which we need to do more researches (a 12-character string which is also unique to each player request), only sent for mobile clients (and only done by the YouTube apps);

  • for player and next requests: racyCheckOk, for age-restricted contents (doesn't seem to do something when used anonymously), contentCheckOk, to allow playback of contents with a warning before playing them because of the sensitive topics they contain;

  • a new query parameter, set to false: prettyPrint. YouTube was returning pretty printed responses before but that's not the case anymore, because they added this parameter which reduces a lot response sizes (but not really transfer size): take a look at the following screenshot:

    Transfer size with and without prettyprint parameter set to false

    (Where Transfert means Transfer and Taille means Size in French.)


This PR also supersedes #732 and fixes most of its problems:

  • the fetch of the iOS player is only enabled for livestreams;
  • the fetch of the Android player is disabled only for livestreams;
  • extractor clients can force the fetch of the iOS and Android clients, for every stream type, by using two static methods in YoutubeStreamExtractor;
  • streams available are not only 30fps streams (the way to get 60fps streams was discovered very recently, see [Findings] YouTube and HLS manifests on Apple clients (iOS and MacOS) #680);
  • the HLS manifest was only used if an HLS manifest cannot be found on both web and Android clients.

Like said in #732, fetching the iOS client (with a deviceModel field in the JSON payload which matches a recent Apple device model (see https://gist.github.com/adamawolf/3048717) to get 60 fps streams and an iOS user agent to get a single HLS manifest with a regular streamingData JSON object instead of an hlsFormats object for livestreams) allows to get an HLS manifest for regular videos and an HLS manifest with separated video and audio for livestreams.


This PR fixes some bugs with the extraction of the client version and the key and use a very lighter way than the current one, still used as a fallback, to find the client version and key of YouTube and YouTube Music, used by their service worker, by respectively fetching https://www.youtube.com/sw.js and https://music.youtube.com/sw.js. A new method in the Utils class has been added to decrease code duplication and increase readability of my changes.

Harcoded client versions and mocks have been also updated to a more recent version.


This PR finally fixes the extraction of embeddable age-restricted contents (the ones available before), by using the new way discovered to get streams of them, as written in TeamNewPipe/NewPipe#8102.

NewPipe Debug APK to test the changes (source code): app-debug.zip


Fixes TeamNewPipe/NewPipe#8102, fixes TeamNewPipe/NewPipe#8103, closes #680 (for the extractor implementation).

@AudricV AudricV added bug Issue is related to a bug enhancement New feature or request youtube service, https://www.youtube.com/ labels Jan 15, 2022
@Stypox
Copy link
Member

Stypox commented Jan 24, 2022

Note that this cpn parameter blocks (at least makes harder)

You could just have something that seeds the random number generator used by that parameter before running the tests

@Stypox
Copy link
Member

Stypox commented Jan 27, 2022

In Utils.java add this code:

// at the top
private static final SecureRandom random = new SecureRandom();

// ...

// then add these methods
/**
 * Generates a random string using the secure random device {@link #random}.
 * {@link #setRandomSeed(long)} might be useful when mocking tests.
 * @param alphabet which characters to use
 * @param the length of the returned string
 * @return a random string of the requested length made of only characters from the provided alphabet
 */
public static String randomStringFromAlphabet(final String alphabet, final int length) {
    final StringBuilder stringBuilder = new StringBuilder();
    for (int i = 0; i < length; ++i) {
        stringBuilder.append(alphabet.charAt(random.nextInt(alphabet.length())));
    }
    return stringBuilder.toString()
}

/**
 * Seeds the random device used for {@link #randomStringFromAlphabet(String, int)}. Use this in tests so that they can be mocked as the same random numbers are always generated. This is not intended to be used outside of tests
 * @param seed the seed to pass to {@link SecureRandom#setSeed(long)}
 */
public setRandomSeed(final long seed) {
    random.setSeed(seed);
}

@AudricV AudricV force-pushed the yt-more-params-innertube-requests branch from 1f32c10 to a44467a Compare February 6, 2022 18:49
@AudricV AudricV requested a review from Stypox February 6, 2022 18:50
@AudricV
Copy link
Member Author

AudricV commented Feb 6, 2022

Tests runs fine with the mock downloader on my computer (I updated mocks) but not in the CI. I applied what Stypox said, but it seems to be not really working. Someone has an idea for that?

@XiangRongLin
Copy link
Collaborator

@TiA4f8R My guess it that setSeed(), does not produce the same effect as when it is passed in through the contstructor. See https://docs.oracle.com/javase/8/docs/api/java/security/SecureRandom.html#setSeed-byte:A-

Reseeds this random object. The given seed supplements, rather than replaces, the existing seed. Thus, repeated calls are guaranteed never to reduce randomness.

You could try it out locally if it always returns the same values

@AudricV AudricV force-pushed the yt-more-params-innertube-requests branch from c00f9e6 to 6938a4e Compare February 8, 2022 08:35
@Stypox
Copy link
Member

Stypox commented Feb 16, 2022

Oh, I assumed setSeed just set the seed. Anyway, setNumberGenerator also looks good :-)

Copy link
Member

@Stypox Stypox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good to me (except comments, but you have probably a reason for making them like that, so don't make changes, I just wanted to open some other discussions) :-)

@AudricV AudricV marked this pull request as draft March 8, 2022 18:42
@AudricV AudricV force-pushed the yt-more-params-innertube-requests branch 2 times, most recently from c472896 to f6d6d0e Compare March 15, 2022 19:37
@AudricV AudricV marked this pull request as ready for review March 15, 2022 20:17
@Stypox
Copy link
Member

Stypox commented Mar 16, 2022

@litetex should we proceed with merging this? It is ready in my opinion, and related tests succeed.

Copy link
Member

@litetex litetex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor changes

return JsonObject.builder()
.object("context")
.object("client")
.value("clientName", "IOS")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this value be in a constant?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not right now, but we should probably refactor how clients are managed in the extractor later, to deduplicate code used.

@AudricV AudricV force-pushed the yt-more-params-innertube-requests branch from f6d6d0e to 5f12ac2 Compare March 27, 2022 18:43
…VersionAndKey method

The boolean keyAndVersionExtracted in YoutubeParsingHelper was not set to false when resetting the client version and the key, which makes the extractor uses null on the next getting of the client version or the key if the clientVersion and the key were extracted before.
Also update client versions.
…ter the Android client

The cpn param, aka the content playback nonce param, is a parameter sent by YouTube web client in videoplayback requests, and for some of them, in the player request body. This PR adds it everywhere.

For the desktop/WEB client, some params were missing from the playbackContext object, which seemed (or not) to make YouTube throttle streams extracted from the WEB client. This PR adds them.

Fingerprinting on the WEB client basing on the client version used is not possible anymore, because the latest client version is extracted at the first time of a YouTube request on a session which require the extractor to fetch again the website (and this may come back the reCaptcha issues again unfortunately, but it seems there is no other way to get it).

For the Android client, the video id is now also sent as a query parameter, like a 12 characters string, in the t query parameter, in order to spoof better this client. Researches need to be done on this parameter, unique to each request, and how it is generated by clients.

This commit also fixes a small bug with the Android User-Agent string.

Some code improvements have been also made.
…and key from YouTube and YouTube Music

This is done by fetching https://www.youtube.com/sw.js for YouTube and https://music.youtube.com/sw.js for YouTube Music.

Two new methods in Utils class have been added which allow to try to get a match of regular expressions in a string array, or a Pattern array, on a content, on a specific index or 0.
Also some code refactoring has been made in this class.
…bled the Android client for livestreams

The iOS client is only enabled for livestreams and the Android client is now only enabled for videos, both by default.

A way to force, or not, the fetch of both clients have been added with two new static methods in YoutubeStreamExtractor.
…arameter with the false value

InnerTube responses return pretty printed responses, which increase responses' size for nothing.

By using the prettyPrint parameter on requests and setting its value to false, responses are not pretty printed anymore, which reduces responses size, and so data transfer and processing times.
This usage has been recently deployed by YouTube on their websites.
…er agents

Also provide ability to get mobile user-agents used for mobile InnerTube requests and deduplicate related code.
…ns and key

Also move the iPhone device machine id to a constant, explain how it is used and move the licence in the header of the file, and fix missing imports in YoutubeStreamExtractor (due to a rebase issue).
Copy link
Member

@litetex litetex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not finished yet, but no time left for today.

Here my current review:

Copy link
Member

@FireMasterK FireMasterK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a quick peek :)

*/
private static final String IOS_DEVICE_MODEL = "iPhone14,5";

private static Random numberGenerator = new SecureRandom();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just use a regular Random instance? I don't think we're doing anything of cryptographic importance

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you remember correctly, I said in several places the JavaScript clients are using the window.crypto.getRandomValues, so I used this to mickmick the best official clients.

By doing some basic reverse engineering, even if I am not sure, they are also using this in the Android app.

innertubeClientName,
innertubeClientVersion
};
youtubeMusicKey = new String[] {musicKey, musicClientName, musicClientVersion};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we potentially introduce an object/class for this than using a String[] array? (To remove the need to guess what each index is)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be done when we will refactor clients (with requests bodies buildings too, see #780 (comment)).

Also revert indentation in Utils.mixedNumberWordToLong.
…raction of contents with warnings and more

Use the TV embedded client technique to get streams of embeddable age-restricted videos.

This client doesn't provide the playerMicroFormatRenderer object in the player response, but it is still returned on the WEB player response, even for unavailable (but non-private) contents, so we need now to store it, as we are replacing the player response from the WEB client by the TV embedded one.
Otherwise, some metadata such as the unlisted property, category, the uploadDate and the publishDate properties.

The outdated code for these contents has been removed.

Add the racyCheckOk and contentCheckOk to player and next requests to the InnerTube API.
The first doesn't seem to make any difference when used anonymously, but the second one is needed to get streams of contents with a warning before they can be played.

Also apply some requested changes, fixes and improvements in YoutubeParsingHelper and YoutubeStreamExtractor.
…MixTest

Mixes seems to be not given by YouTube anymore if you use a PENDING consent cookie value.
As mocks needs to updated, the test is always failing because of this change.
@AudricV AudricV changed the title [YouTube] Add more parameters to InnerTube requests and videoplayback URLs and use the iOS client for livestreams [YouTube] Add more parameters to InnerTube requests, use the iOS client for livestreams and fix extraction of embeddable age-restricted videos and contents with a warning before playback Apr 4, 2022
@AudricV AudricV requested a review from litetex April 4, 2022 17:52
Copy link
Member

@litetex litetex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM now

  • Please create followup issues/PRs

APK seems to work fine :)

  • Age-restricted videos work again (for now)
  • You can rewind YT live streams now, nice 👍

Good work

PS: I think I have to rebase my invidious PR after this 😆

…ctorRelatedMixTest.testRelatedItems test disabled
@0Karakurt0

This comment was marked as off-topic.

@AudricV

This comment was marked as resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue is related to a bug enhancement New feature or request youtube service, https://www.youtube.com/
Projects
None yet
7 participants