-
Notifications
You must be signed in to change notification settings - Fork 422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple images support #889
Conversation
...java/org/schabi/newpipe/extractor/services/bandcamp/extractors/BandcampChannelExtractor.java
Outdated
Show resolved
Hide resolved
97504b5
to
b76ceae
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multiservice:
original images are not returned, for performance and size purposes
I don’t understand why. We provide all resolutions so that the client can choose, so why remove original? If it’s too large the client can decide not to use it.
PeerTube: resolution of thumbnails is not known. Should an issue be opened in the PeerTube repository to request the addition of it?
I think yes
Contrary to a previous attempt (#268), the real size is so returned instead of a quality level. In my opinion, it is not to the extractor to decide if images are high or small, but to clients, as each client has its own needs.
But how will the client decide if both width and height quality are unknown?
Taking example with PeerTube, previewThumbnail is high quality, thumbnailPath is low quality, we return both as unknown, how is the client supposed to know which one to choose?
While if we have LOW, MEDIUM, HIGH (maybe LOWEST, HIGHEST too), we can set previewPath to HIGH and thumbnailPath to low and the client will know even if there is both HEIGHT_UNKNOWN and WIDTH_UNKNOWN
IMO we should both return actual width/height and estimated quality by extractor.
So that client can choose with size if they want, but when it’s unknown they can use what extractor tells (or even always).
Maybe we should also sort by Lists by image quality, either from lowest to highest or the other way.
...org/schabi/newpipe/extractor/services/media_ccc/extractors/MediaCCCRecentKioskExtractor.java
Outdated
Show resolved
Hide resolved
...wpipe/extractor/services/media_ccc/extractors/infoItems/MediaCCCStreamInfoItemExtractor.java
Outdated
Show resolved
Hide resolved
.../java/org/schabi/newpipe/extractor/services/peertube/extractors/PeertubeStreamExtractor.java
Outdated
Show resolved
Hide resolved
...abi/newpipe/extractor/services/soundcloud/extractors/SoundcloudChannelInfoItemExtractor.java
Show resolved
Hide resolved
I asked about this in the IRC channel and here is the conversation I had with Stypox (Matrix link): Me:
Stypox:
Me:
Stypox:
|
b76ceae
to
8cac434
Compare
8cac434
to
ead2208
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Full review is going to come in the next days. The code I have read so far, looks good 👍
I agree with the comments made by @B0pol
Just a note: please annotate the new methods which are going to be accessible for library users. I did not highlight every usage, just a few.
extractor/src/main/java/org/schabi/newpipe/extractor/InfoItem.java
Outdated
Show resolved
Hide resolved
extractor/src/main/java/org/schabi/newpipe/extractor/comments/CommentsInfoItem.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code structure looks well done, thank you! I reviewed the code but I did not check whether the URLs you extract actually point somewhere, but I trust you and (in part) the tests for that. I also didn't check if there are more thumbnails that can be extracted, I also trust you here (though even if there were, it wouldn't be a big problem). I have just a few comments + B0pol&TobiGr's comments need to be solved, too.
.../java/org/schabi/newpipe/extractor/services/bandcamp/extractors/BandcampExtractorHelper.java
Outdated
Show resolved
Hide resolved
...tor/services/bandcamp/extractors/streaminfoitem/BandcampPlaylistStreamInfoItemExtractor.java
Show resolved
Hide resolved
} | ||
if (streamInfo.getAudioStreams() == null) { | ||
streamInfo.setAudioStreams(Collections.emptyList()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you removing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am removing this because the default values are now set in the corresponding attributes.
is the developement still active on this one ? |
Yes, it is. |
477082a
to
1e3458d
Compare
This comment was marked as spam.
This comment was marked as spam.
What's stopping this from being merged? If for any reason there is some unwanted drawback to implementing this, any kind of response can help clear the water. |
…s and image URLs These new public and static methods, added in SoundcloudParsingHelper, getAllImagesFromArtworkOrAvatarUrl(String) and getAllImagesFromVisualUrl(String) (which call a common private method, getAllImagesFromImageUrlReturned(String, List<ImageSuffix>, List<Image>)), return an unmodifiable list of JPEG images containing almost every image resolution provided by SoundCloud except the original size and the tiny resolution (for artworks and avatars, as the image size is 20x20 for artworks and 18x18 for avatars, so very close to or equal to the t20x20 resolution): - for artworks and avatars: - mini: 16x16; - t20x20: 20x20; - small: 32x32; - badge: 47x47; - t50x50: 50x50; - t60x60: 60x60; - t67x67: 67x67; - large: 100x100; - t120x120: 120x120; - t200x200: 200x200; - t240x240: 240x240; - t250x250: 250x250; - t300x300: 300x300; - t500x500: 500x500. - for visuals/user banners: - t1240x260: 1240x260; - t2480x520: 2480x520. Duplicated code in two methods of SoundcloudParsingHelper (getUsersFromApi(ChannelInfoItemsCollector, String) and getStreamsFromApi(StreamInfoItemsCollector, String, boolean)) has been merged into one common private method, getNextPageUrlFromResponseObject(JsonObject).
…r avatars as uploader avatars in SoundcloudStreamInfoItemExtractor
… and channels Four new static methods have been added in PeertubeParsingHelper to do so: - two public methods to get the corresponding image type: getAvatarsFromOwnerAccountOrVideoChannelObject(String, JsonObject) and getBannersFromAccountOrVideoChannelObject(String, JsonObject); - two private methods as helper methods: getImagesFromAvatarsOrBanners(String, JsonObject, String, String) and getImagesFromAvatarOrBannerArray(String, JsonArray).
This method, getThumbnailsFromPlaylistOrVideoItem, has been added in PeertubeParsingHelper and returns the two image variants for playlists and videos.
Also lower the visibility of attributes of channels and playlists InfoItems to private.
…vatar picture The default avatar picture was used when no profile picture was found, but it was removed and split in multiple images. Thumbnails' size is not known, as this data is not provided by the API.
Bandcamp images work with image IDs, which provide different resolutions. Images on Bandcamp are not always squares, and some IDs respect aspect ratios where some others not. The extractor will only use the ones which preserve aspect ratio and will not provide original images, for performance and size purposes. Because of this aspect ratio preservation constraint, only one dimension will be known at a time. The image IDs with their respective dimension used are: - 10: 1200w; - 101: 90h; - 170: 422h; - 171: 646h; - 20: 1024w; - 200: 420h; - 201: 280h; - 202: 140h; - 204: 360h; - 205: 240h; - 206: 180h; - 207: 120h; - 43: 100h; - 44: 200h. (Where w represents the width of the image and h the height of the image) Note that these dimensions are theoretical because if the image size is less than the dimensions of the image ID, it will be not upscaled but kept to its original size. All these resolutions are stored in a private static list of ThumbnailSuffixes in BandcampExtractorHelper, in which the methods to get mutliple images have been added: - getImagesFromImageUrl(String): public method to get images from an image URL; - getImagesFromImageId(long, boolean): public method to get images from an image ID; - getImagesFromImageBaseUrl(String): private utility method to get images from the static list of ThumbnailSuffixes from a given image base URL, containing the path to the image, a "a" letter if it comes from an album, its ID and an underscore. Some existing methods have been also edited: - the documentation of getImageUrl(long, boolean) has been changed to reflect the Bandcamp images findings; - getThumbnailUrlFromSearchResult has been renamed to getImagesFromSearchResult, and a documentation has been added to this method. The method replaceHttpWithHttps of the Utils class has been also used in BandcampExtractorHelper instead of doing manually what the method does.
…os and streams These three new methods, added in MediaCCCParsingHelper, getImageListFromImageUrl(String), getThumbnailsFromStreamItem(JsonObject) and getThumbnailsFromLiveStreamItem(JsonObject) (the last two are based on a common method, getThumbnailsFromObject(JsonObject, String, String)), return an empty list if the case no image URL could be extracted. Images returned have their height and width unknown and a resolution level depending on the image key of the JSON API response.
Also remove usage of the conference logo as the banner of a conference, as it is a logo and not a banner.
Also suppress unused warnings in BaseStreamExtractorTest, like it is done on other BaseExtractorTests interfaces.
This new method, defaultTestImageList(List<Image), will check that the image list is not null. For each image, it will test that its URL is secure and its height and width are more than or equal to their relevant unknown constants in the Image class (HEIGHT_UNKNOWN and WIDTH_UNKNOWN).
… is empty and to test image collections Two new methods have been added in ExtractorAsserts to check if a collection is empty: - assertNotEmpty(String, Collection<?>), checking: - the non nullity of the collection; - its non emptiness (if that's not case, an exception will be thrown using the provided message). - assertNotEmpty(Collection<?>), calling assertNotEmpty(String, Collection<?>) with null as the value of the string argument. A new one has been added to this assertion class to check the contrary: assertEmpty(Collection<?>), checking emptiness of the collection only if it is not null. Three new methods have been added in ExtractorAsserts as utility test methods for image collections: - assertContainsImageUrlInImageCollection(String, Collection<Image>), checking that: - the provided URL and image collection are not null; - the image collection contains at least one image which has the provided string value as its URL (which is a string) property. - assertContainsOnlyEquivalentImages(Collection<Image>, Collection<Image>), checking that: - both collections are not null; - they have the same size; - each image of the first collection has its equivalent in the second one. This means that the properties of an image in the first collection must be equal in an image of the second one. - assertNotOnlyContainsEquivalentImages(Collection<Image>, Collection<Image>), checking that: - both collections are not null; - one of the following conditions is met: - they have different sizes; - an image of the first collection has not its equivalent in the second one. This means that the properties of an image in the first collection must be not equal in an image of the second one. These methods will be used by services extractors tests (and default ones) to test image collections.
…in DefaultStreamExtractorTest
This method, testImages(Collection<Image>), will use first the default image collection test in DefaultTests and then will check that each image URL contains the string yt. The JavaDoc of the class has been also updated to reflect the changes made in it (it is now more general).
Also remove some public test methods modifiers, add missing Test annotations on old Junit 4 tests (and update them if needed), and use final in some places where it was possible.
Also remove some public test methods modifiers, add missing Test annotations on old Junit 4 tests (and update them if needed), and improve some code.
This method, testImages(Collection<Image>), will use first the default image collection test in DefaultTests and then will check that each image URL contains f4.bcbits.com/img and ends with .jpg or .png. To do so, a new non-instantiable final class has been added: BandcampTestUtils.
Also remove some public test methods modifiers, add missing Test annotations on old Junit 4 tests (and update them if needed), and use final in some places where it was possible. BandcampChannelExtractorTest.testLength has been removed as the test is always true.
Also remove some public test methods modifiers.
1e6c8b7
to
e8bfd20
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
replaceHttpWithHttps(url), HEIGHT_UNKNOWN, WIDTH_UNKNOWN, | ||
ResolutionLevel.UNKNOWN)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The size of the banner seems to be stored in the img tag: <img src="https://f4.bcbits.com/img/0032808625_100.png" width="975" height="180">
. Also, as you can see the url contains an id followed by 100. Changing the 100 to other values yields other images. So I guess in a separate PR the bandcamp image support could be improved.
No, because it requires to do changes in it which I don't know how to do, at least in the best way.
No, because I don't know how to do the app support, at least properly, such as implementing options to select an image which fits the resolution of the UI element on which it will be shown (or a highest one if it is already in cache), selecting maximum or minimum resolutions.
This pull request introduces support of multiple images instead of a single image URL.
This will allow control over the image quality used, and also improve it in several places, especially on YouTube.
These changes of course will benefit to all extractor clients, and not only to NewPipe itself: Piped is a good example on which low quality of images can be easily seen.
Click here to hide or show how images of Creative Commons channel on Piped looks and how they would look if best resolutions are used instead
Concept implementation and code changes details
Data of images are handled by a specific new class,
Image
, containing the URL to an image, its height and width, if they are known; otherwise, the relevant constants are returned (HEIGHT_UNKNOWN
andWIDTH_UNKNOWN
).Contrary to a previous attempt (#268), the real size is so returned instead of a quality level. In my opinion, it is not to the extractor to decide if images are high or small, but to clients, as each client has its own needs.
Images are returned as unmodifiable lists, because you should be not able to modify data extracted by extractors in its objects, i.e. insert a new stream in a stream list, clients who want to do so should use copies of the lists or their elements (note: that's the case in some places and this should be fixed, but that's to be discussed in a separated issue).
The images methods (and attributes) of extractor classes have been updated to reflect the changes:
Click here to hide or show image methods (and attributes) changes of extractor classes
(getters return type:
String
; setters return type:void
)(getters return type:
List<Image>
; setters return type:void
)InfoItem
getThumbnailUrl()
-
setThumbnailUrl(String)
getThumbnails()
-
setThumbnails(List<Image>)
CommentsInfoItem
getUploaderAvatarUrl()
-
setUploaderAvatarUrl(String)
getUploaderAvatars()
-
setUploaderAvatars(List<Image>)
StreamInfoItem
getUploaderAvatarUrl()
-
setUploaderAvatarUrl(String)
getUploaderAvatars()
-
setUploaderAvatars(List<Image>)
InfoItemExtractor
getThumbnailUrl()
getThumbnails()
CommentsInfoItemExtractor
getUploaderAvatarUrl()
getUploaderAvatars()
StreamInfoItemExtractor
getUploaderAvatarUrl()
getUploaderAvatars()
ChannelExtractor
getAvatarUrl()
-
getBannerUrl()
-
getParentChannelAvatarUrl()
getAvatars()
-
getBanners()
-
getParentChannelAvatars()
PlaylistExtractor
getUploaderAvatarUrl()
-
getThumbnailUrl()
-
getBannerUrl()
-
getSubChannelAvatarUrl()
getUploaderAvatars()
-
getThumbnails()
-
getBanners()
-
getSubChannelAvatars()
StreamExtractor
getThumbnailUrl()
-
getUploaderAvatarUrl()
-
getSubChannelAvatarUrl()
getThumbnails()
-
getUploaderAvatars()
-
getSubChannelAvatars()
ChannelInfo
getParentChannelAvatarUrl()
-
setParentChannelAvatarUrl(String)
-
getAvatarUrl()
-
setAvatarUrl(String)
-
getBannerUrl()
-
setBannerUrl(String)
getParentChannelAvatars()
-
setParentChannelAvatars(List<Image)
-
getAvatars()
-
setAvatars(List<Image>)
-
getBanners()
-
setBanners(List<Image>)
PlaylistInfo
getThumbnailUrl()
-
setThumbnailUrl(String)
-
getBannerUrl()
-
setBannerUrl(String)
-
getUploaderAvatarUrl()
-
setUploaderAvatarUrl(String)
-
getSubChannelAvatarUrl()
-
setSubChannelAvatarUrl(String)
getThumbnails()
-
setThumbnails(List<Image>)
-
getBanners()
-
setBanners(List<Image)
-
getUploaderAvatars()
-
setUploaderAvatars(List<Image>)
-
getSubChannelAvatars()
-
setSubChannelAvatars(List<Image>)
StreamInfo
getThumbnailUrl()
-
setThumbnailUrl(String)
-
getUploaderAvatarUrl()
-
setUploaderAvatarUrl(String)
-
getSubChannelAvatarUrl()
-
setSubChannelAvatarUrl(String)
getThumbnails()
-
setThumbnails(List<Image>)
-
getUploaderAvatars()
-
setUploaderAvatars(List<Image>)
-
getSubChannelAvatars()
-
setSubChannelAvatars(List<Image>)
Services implementations
Several methods have been added, changed and/or removed, depending of the services. These changes can be found in the commits of this PR.
The implementation of multiple images support differs between each service.
YouTube
: images are got fromthumbnails
arrays returned by YouTube, except for mixes when they are extracted (and so when they do not come from a related item or a search result item), for which image and their resolutions are hardcoded:default.jpg
: 120x90px;mqdefault.jpg
: 320x180px;hqdefault.jpg
: 480x360px.SoundCloud
: resolutions are always hardcoded:Click here to hide or show image resolutions used on SoundCloud
mini
: 16x16px;t20x20
: 20x20px;small
: 32x32px;badge
: 47x47px;t50x50
: 50x50px;t60x60
: 60x60px;t67x67
: 67x67px;large
: 100x100px (the resolution provided by SoundCloud in its internal (and public?) API);t120x120
: 120x120px;t200x200
: 200x200px;t240x240
: 240x240px;t250x250
: 250x250px;t300x300
: 300x300px;t500x500
: 500x500px.original
):t1240x260
: 1240x260px;t2480x520
: 2480x520px.PeerTube
: resolution of images are only provided, and so known, for banners and avatars. The implementation uses firstavatars
andbanners
JSON arrays, as they contain multiple images;avatar
andbanner
objects are also used as a fallback if these arrays are not present or empty (so compatibility with old instances is kept);Bandcamp
: as images may be not squares, only image resolutions which preserve aspect ratios are selected (and hardcoded). As a matter of fact, only one dimension of an image is known per image ID:Click here to hide or show Bandcamp image IDs and their dimension known
10
: 1200px wide;101
: 90px high;170
: 422px high;171
: 646px high;20
: 1024px wide;200
: 420px high;201
: 280px high;202
: 140px high;204
: 360px high;205
: 240px high;206
: 180px high;207
: 120px high;43
: 100px high;44
: 200px high.This does not apply on banners, for which only the banner image URL extracted from the website is used, and so its dimensions are not known. See why this has been chosen in the
Known limitations/issues
section above.MediaCCC
: There is only one image for contents. A switch to the higher image resolution has been also made in this PR.To help with implementations and harcoded image resolutions, which are always suffixes to image URLs and/or paths, a class has been added to manage them:
ImageSuffix
. An object of this class stores an image suffix string to add and the height and width corresponding to the image itself.Known limitations/issues
Multi-service
:Frameset
s and onStreamSegment
s: changes required to implement multi-image support on these classes should be made in a separate PR.YouTube
:SoundCloud
: if default prefixes are changed by SoundCloud (the ones returned in the image URL provided by the internal API), image extraction would be broken;PeerTube
: resolution of thumbnails is not known. Should an issue be opened in the PeerTube repository to request the addition of it?Bandcamp
: among the image IDs used for avatars and covers, some of them does not seem to match the same dimension property known on banners, according to the following test made on the banner URL returned for the artist used inBandcampChannelExtractorTest
:Click here to hide or show results
10
: maximum size provided (975x180px);101
: height respected (90px);170
: height not respected and maximum size not provided (750x138px);171
: maximum size provided (975x180px);20
: maximum size provided (975x180px);200
: maximum size provided (975x180px);201
: height not respected and maximum size not provided (750x138px);202
: height not respected and maximum size not provided (375x69px);204
: height not respected and maximum size not provided (960x177px);205
: height not respected and maximum size not provided (640x118px);206
: height not respected and different resolution provided (480x89px);207
: height not respected and different resolution provided (320x59px);43
: height respected (542x100px);44
: maximum size provided (975x180px).This should be investigated later and not in this PR.
MediaCCC
: image sizes are not provided and so not known. Should an issue be opened in the MediaCCC repository to request the addition of it?Changes on tests
Changes were also required on the tests structure, and the ones made are the following ones:
Click here to hide or show changes on the tests structure
(return type:
void
)(return type:
void
)BaseChannelExtractorTest
testAvatarUrl()
-
testBannerUrl()
testAvatars()
-
testBanners()
BasePlaylistExtractorTest
testThumbnailUrl()
-
testBannerUrl()
-
testUploaderAvatarUrl()
testThumbnails()
-
testBanners()
-
testUploaderAvatars()
BaseStreamExtractorTest
testUploaderAvatarUrl()
-
testSubChannelAvatarUrl()
-
testThumbnailUrl()
testUploaderAvatars()
-
testSubChannelAvatars()
-
testThumbnails()
Image test methods which used URL in their name have been renamed to reflect the multiple images support changes.
A default test on image collections has been added and ensures that:
Each test is responsible to call this default method or use custom ones like it is made on YouTube (using the default test and asserting that each image URL contains the string
yt
) and Bandcamp (using the default test and asserting that each image URL contains the stringf4.bcbits.com/img
and ends with.jpg
or.png
).Other changes
A few other changes have been also made in this PR:
YoutubeMusicSearchExtractor
'sInfoItem
s have been moved to external classes, which do not depend of the YouTube ones anymore (it was useless to do so, as the objects of YouTube are not ordered in the same way/are not the same than on YouTube Music) in order to increase readability and perform code changes easier;public
modifiers have been removed from edited test classes;@Test
annotations have been added on old Junit 4 tests on modified test classes and the corresponding tests have been fixed if needed;Closes #649, fixes #763.