]> jfr.im git - yt-dlp.git/commitdiff
pull changes from remote master (#190)
authorAakash Gajjar <redacted>
Tue, 25 Aug 2020 14:53:34 +0000 (20:23 +0530)
committerGitHub <redacted>
Tue, 25 Aug 2020 14:53:34 +0000 (20:23 +0530)
* [scrippsnetworks] Add new extractor(closes #19857)(closes #22981)

* [teachable] Improve locked lessons detection (#23528)

* [teachable] Fail with error message if no video URL found

* [extractors] add missing import for ScrippsNetworksIE

* [brightcove] cache brightcove player policy keys

* [prosiebensat1] improve geo restriction handling(closes #23571)

* [soundcloud] automatically update client id on failing requests

* [spankbang] Fix extraction (closes #23307, closes #23423, closes #23444)

* [spankbang] Improve removed video detection (#23423)

* [brightcove] update policy key on failing requests

* [pornhub] Fix extraction and add support for m3u8 formats (closes #22749, closes #23082)

* [pornhub] Improve locked videos detection (closes #22449, closes #22780)

* [brightcove] invalidate policy key cache on failing requests

* [soundcloud] fix client id extraction for non fatal requests

* [ChangeLog] Actualize
[ci skip]

* [devscripts/create-github-release] Switch to using PAT for authentication

Basic authentication will be deprecated soon

* release 2020.01.01

* [redtube] Detect private videos (#23518)

* [vice] improve extraction(closes #23631)

* [devscripts/create-github-release] Remove unused import

* [wistia] improve format extraction and extract subtitles(closes #22590)

* [nrktv:seriebase] Fix extraction (closes #23625) (#23537)

* [discovery] fix anonymous token extraction(closes #23650)

* [scrippsnetworks] add support for www.discovery.com videos

* [scrippsnetworks] correct test case URL

* [dctp] fix format extraction(closes #23656)

* [pandatv] Remove extractor (#23630)

* [naver] improve extraction

- improve geo-restriction handling
- extract automatic captions
- extract uploader metadata
- extract VLive HLS formats

* [naver] improve metadata extraction

* [cloudflarestream] improve extraction

- add support for bytehighway.net domain
- add support for signed URLs
- extract thumbnail

* [cloudflarestream] import embed URL extraction

* [lego] fix extraction and extract subtitle(closes #23687)

* [safari] Fix kaltura session extraction (closes #23679) (#23670)

* [orf:fm4] Fix extraction (#23599)

* [orf:radio] Clean description and improve extraction

* [twitter] add support for promo_video_website cards(closes #23711)

* [vodplatform] add support for embed.kwikmotion.com domain

* [ndr:base:embed] Improve thumbnails extraction (closes #23731)

* [canvas] Add support for new API endpoint and update tests (closes #17680, closes #18629)

* [travis] Add flake8 job (#23720)

* [yourporn] Fix extraction (closes #21645, closes #22255, closes #23459)

* [ChangeLog] Actualize
[ci skip]

* release 2020.01.15

* [soundcloud] Restore previews extraction (closes #23739)

* [orf:tvthek] Improve geo restricted videos detection (closes #23741)

* [zype] improve extraction

- extract subtitles(closes #21258)
- support URLs with alternative keys/tokens(#21258)
- extract more metadata

* [americastestkitchen] fix extraction

* [nbc] add support for nbc multi network URLs(closes #23049)

* [ard] improve extraction(closes #23761)

- simplify extraction
- extract age limit and series
- bypass geo-restriction

* [ivi:compilation] Fix entries extraction (closes #23770)

* [24video] Add support for 24video.vip (closes #23753)

* [businessinsider] Fix jwplatform id extraction (closes #22929) (#22954)

* [ard] add a missing condition

* [azmedien] fix extraction(closes #23783)

* [voicerepublic] fix extraction

* [stretchinternet] fix extraction(closes #4319)

* [youtube] Fix sigfunc name extraction (closes #23819)

* [ChangeLog] Actualize
[ci skip]

* release 2020.01.24

* [soundcloud] imporve private playlist/set tracks extraction

https://github.com/ytdl-org/youtube-dl/issues/3707#issuecomment-577873539

* [svt] fix article extraction(closes #22897)(closes #22919)

* [svt] fix series extraction(closes #22297)

* [viewlift] improve extraction

- fix extraction(closes #23851)
- add add support for authentication
- add support for more domains

* [vimeo] fix album extraction(closes #23864)

* [tva] Relax _VALID_URL (closes #23903)

* [tv5mondeplus] Fix extraction (closes #23907, closes #23911)

* [twitch:stream] Lowercase channel id for stream request (closes #23917)

* [sportdeutschland] Update to new sportdeutschland API

They switched to SSL, but under a different host AND path...
Remove the old test cases because these videos have become unavailable.

* [popcorntimes] Add extractor (closes #23949)

* [thisoldhouse] fix extraction(closes #23951)

* [toggle] Add support for mewatch.sg (closes #23895) (#23930)

* [compat] Introduce compat_realpath (refs #23991)

* [update] Fix updating via symlinks (closes #23991)

* [nytimes] improve format sorting(closes #24010)

* [abc:iview] Support 720p (#22907) (#22921)

* [nova:embed] Fix extraction (closes #23672)

* [nova:embed] Improve (closes #23690)

* [nova] Improve extraction (refs #23690)

* [jpopsuki] Remove extractor (closes #23858)

* [YoutubeDL] Fix playlist entry indexing with --playlist-items (closes #10591, closes #10622)

* [test_YoutubeDL] Fix get_ids

* [test_YoutubeDL] Add tests for #10591 (closes #23873)

* [24video] Add support for porn.24video.net (closes #23779, closes #23784)

* [npr] Add support for streams (closes #24042)

* [ChangeLog] Actualize
[ci skip]

* release 2020.02.16

* [tv2dk:bornholm:play] Fix extraction (#24076)

* [imdb] Fix extraction (closes #23443)

* [wistia] Add support for multiple generic embeds (closes #8347, closes #11385)

* [teachable] Add support for multiple videos per lecture (closes #24101)

* [pornhd] Fix extraction (closes #24128)

* [options] Remove duplicate short option -v for --version (#24162)

* [extractor/common] Convert ISM manifest to unicode before processing on python 2 (#24152)

* [YoutubeDL] Force redirect URL to unicode on python 2

* Remove no longer needed compat_str around geturl

* [youjizz] Fix extraction (closes #24181)

* [test_subtitles] Remove obsolete test

* [zdf:channel] Fix tests

* [zapiks] Fix test

* [xtube] Fix metadata extraction (closes #21073, closes #22455)

* [xtube:user] Fix test

* [telecinco] Fix extraction (refs #24195)

* [telecinco] Add support for article opening videos

* [franceculture] Fix extraction (closes #24204)

* [xhamster] Fix extraction (closes #24205)

* [ChangeLog] Actualize
[ci skip]

* release 2020.03.01

* [vimeo] Fix subtitles URLs (#24209)

* [servus] Add support for new URL schema (closes #23475, closes #23583, closes #24142)

* [youtube:playlist] Fix tests (closes #23872) (#23885)

* [peertube] Improve extraction

* [peertube] Fix issues and improve extraction (closes #23657)

* [pornhub] Improve title extraction (closes #24184)

* [vimeo] fix showcase password protected video extraction(closes #24224)

* [youtube] Fix age-gated videos support without login (closes #24248)

* [youtube] Fix tests

* [ChangeLog] Actualize
[ci skip]

* release 2020.03.06

* [nhk] update API version(closes #24270)

* [youtube] Improve extraction in 429 error conditions (closes #24283)

* [youtube] Improve age-gated videos extraction in 429 error conditions (refs #24283)

* [youtube] Remove outdated code

Additional get_video_info requests don't seem to provide any extra itags any longer

* [README.md] Clarify 429 error

* [pornhub] Add support for pornhubpremium.com (#24288)

* [utils] Add support for cookies with spaces used instead of tabs

* [ChangeLog] Actualize
[ci skip]

* release 2020.03.08

* Revert "[utils] Add support for cookies with spaces used instead of tabs"

According to [1] TABs must be used as separators between fields.
Files produces by some tools with spaces as separators are considered
malformed.

1. https://curl.haxx.se/docs/http-cookies.html

This reverts commit cff99c91d150df2a4e21962a3ca8d4ae94533b8c.

* [utils] Add reference to cookie file format

* Revert "[vimeo] fix showcase password protected video extraction(closes #24224)"

This reverts commit 12ee431676bb655f04c7dd416a73c1f142ed368d.

* [nhk] Relax _VALID_URL (#24329)

* [nhk] Remove obsolete rtmp formats (closes #24329)

* [nhk] Update m3u8 URL and use native hls (#24329)

* [ndr] Fix extraction (closes #24326)

* [xtube] Fix formats extraction (closes #24348)

* [xtube] Fix typo

* [hellporno] Fix extraction (closes #24399)

* [cbc:watch] Add support for authentication

* [cbc:watch] Fix authenticated device token caching (closes #19160)

* [soundcloud] fix download url extraction(closes #24394)

* [limelight] remove disabled API requests(closes #24255)

* [bilibili] Add support for new URL schema with BV ids (closes #24439, closes #24442)

* [bilibili] Add support for player.bilibili.com (closes #24402)

* [teachable] Extract chapter metadata (closes #24421)

* [generic] Look for teachable embeds before wistia

* [teachable] Update upskillcourses domain

New version does not use teachable platform any longer

* [teachable] Update gns3 domain

* [teachable] Update test

* [ChangeLog] Actualize
[ci skip]

* [ChangeLog] Actualize
[ci skip]

* release 2020.03.24

* [spankwire] Fix extraction (closes #18924, closes #20648)

* [spankwire] Add support for generic embeds (refs #24633)

* [youporn] Add support form generic embeds

* [mofosex] Add support for generic embeds (closes #24633)

* [tele5] Fix extraction (closes #24553)

* [extractor/common] Skip malformed ISM manifest XMLs while extracting ISM formats (#24667)

* [tv4] Fix ISM formats extraction (closes #24667)

* [twitch:clips] Extend _VALID_URL (closes #24290) (#24642)

* [motherless] Fix extraction (closes #24699)

* [nova:embed] Fix extraction (closes #24700)

* [youtube] Skip broken multifeed videos (closes #24711)

* [soundcloud] Extract AAC format

* [soundcloud] Improve AAC format extraction (closes #19173, closes #24708)

* [thisoldhouse] Fix video id extraction (closes #24548)

Added support for:
with of without "www."
and either  ".chorus.build" or ".com"

It now validated correctly on older URL's
```
<iframe src="https://thisoldhouse.chorus.build/videos/zype/5e33baec27d2e50001d5f52f
```
and newer ones
```
<iframe src="https://www.thisoldhouse.com/videos/zype/5e2b70e95216cc0001615120
```

* [thisoldhouse] Improve video id extraction (closes #24549)

* [youtube] Fix DRM videos detection (refs #24736)

* [options] Clarify doc on --exec command (closes #19087) (#24883)

* [prosiebensat1] Improve extraction and remove 7tv.de support (#24948)

* [prosiebensat1] Extract series metadata

* [tenplay] Relax _VALID_URL (closes #25001)

* [tvplay] fix Viafree extraction(closes #15189)(closes #24473)(closes #24789)

* [yahoo] fix GYAO Player extraction and relax title URL regex(closes #24178)(closes #24778)

* [youtube] Use redirected video id if any (closes #25063)

* [youtube] Improve player id extraction and add tests

* [extractor/common] Extract multiple JSON-LD entries

* [crunchyroll] Fix and improve extraction (closes #25096, closes #25060)

* [ChangeLog] Actualize
[ci skip]

* release 2020.05.03

* [puhutv] Remove no longer available HTTP formats (closes #25124)

* [utils] Improve cookie files support

+ Add support for UTF-8 in cookie files
* Skip malformed cookie file entries instead of crashing (invalid entry len, invalid expires at)

* [dailymotion] Fix typo

* [compat] Introduce compat_cookiejar_Cookie

* [extractor/common] Use compat_cookiejar_Cookie for _set_cookie (closes #23256, closes #24776)

To always ensure cookie name and value are bytestrings on python 2.

* [orf] Add support for more radio stations (closes #24938) (#24968)

* [uol] fix extraction(closes #22007)

* [downloader/http] Finish downloading once received data length matches expected

Always do this if possible, i.e. if Content-Length or expected length is known, not only in test.
This will save unnecessary last extra loop trying to read 0 bytes.

* [downloader/http] Request last data block of exact remaining size

Always request last data block of exact size remaining to download if possible not the current block size.

* [iprima] Improve extraction (closes #25138)

* [youtube] Improve signature cipher extraction (closes #25188)

* [ChangeLog] Actualize
[ci skip]

* release 2020.05.08

* [spike] fix Bellator mgid extraction(closes #25195)

* [bbccouk] PEP8

* [mailru] Fix extraction (closes #24530) (#25239)

* [README.md] flake8 HTTPS URL (#25230)

* [youtube] Add support for yewtu.be (#25226)

* [soundcloud] reduce API playlist page limit(closes #25274)

* [vimeo] improve format extraction and sorting(closes #25285)

* [redtube] Improve title extraction (#25208)

* [indavideo] Switch to HTTPS for API request (#25191)

* [utils] Fix file permissions in write_json_file (closes #12471) (#25122)

* [redtube] Improve formats extraction and extract m3u8 formats (closes #25311, closes #25321)

* [ard] Improve _VALID_URL (closes #25134) (#25198)

* [giantbomb] Extend _VALID_URL (#25222)

* [postprocessor/ffmpeg] Embed series metadata with --add-metadata

* [youtube] Add support for more invidious instances (#25417)

* [ard:beta] Extend _VALID_URL (closes #25405)

* [ChangeLog] Actualize
[ci skip]

* release 2020.05.29

* [jwplatform] Improve embeds extraction (closes #25467)

* [periscope] Fix untitled broadcasts (#25482)

* [twitter:broadcast] Add untitled periscope broadcast test

* [malltv] Add support for sk.mall.tv (#25445)

* [brightcove] Fix subtitles extraction (closes #25540)

* [brightcove] Sort imports

* [twitch] Pass v5 accept header and fix thumbnails extraction (closes #25531)

* [twitch:stream] Fix extraction (closes #25528)

* [twitch:stream] Expect 400 and 410 HTTP errors from API

* [tele5] Prefer jwplatform over nexx (closes #25533)

* [jwplatform] Add support for bypass geo restriction

* [tele5] Bypass geo restriction

* [ChangeLog] Actualize
[ci skip]

* release 2020.06.06

* [kaltura] Add support for multiple embeds on a webpage (closes #25523)

* [youtube] Extract chapters from JSON (closes #24819)

* [facebook] Support single-video ID links

I stumbled upon this at https://www.facebook.com/bwfbadminton/posts/10157127020046316 . No idea how prevalent it is yet.

* [youtube] Fix playlist and feed extraction (closes #25675)

* [youtube] Fix thumbnails extraction and remove uploader id extraction warning (closes #25676)

* [youtube] Fix upload date extraction

* [youtube] Improve view count extraction

* [youtube] Fix uploader id and uploader URL extraction

* [ChangeLog] Actualize
[ci skip]

* release 2020.06.16

* [youtube] Fix categories and improve tags extraction

* [youtube] Force old layout (closes #25682, closes #25683, closes #25680, closes #25686)

* [ChangeLog] Actualize
[ci skip]

* release 2020.06.16.1

* [brightcove] Improve embed detection (closes #25674)

* [bellmedia] add support for cp24.com clip URLs(closes #25764)

* [youtube:playlists] Extend _VALID_URL (closes #25810)

* [youtube] Prevent excess HTTP 301 (#25786)

* [wistia] Restrict embed regex (closes #25969)

* [youtube] Improve description extraction (closes #25937) (#25980)

* [youtube] Fix sigfunc name extraction (closes #26134, closes #26135, closes #26136, closes #26137)

* [ChangeLog] Actualize
[ci skip]

* release 2020.07.28

* [xhamster] Extend _VALID_URL (closes #25789) (#25804)

* [xhamster] Fix extraction (closes #26157) (#26254)

* [xhamster] Extend _VALID_URL (closes #25927)

Co-authored-by: Remita Amine <redacted>
Co-authored-by: Sergey M․ <redacted>
Co-authored-by: nmeum <redacted>
Co-authored-by: Roxedus <redacted>
Co-authored-by: Singwai Chan <redacted>
Co-authored-by: cdarlint <redacted>
Co-authored-by: Johannes N <redacted>
Co-authored-by: jnozsc <redacted>
Co-authored-by: Moritz Patelscheck <redacted>
Co-authored-by: PB <redacted>
Co-authored-by: Philipp Hagemeister <redacted>
Co-authored-by: Xaver Hellauer <redacted>
Co-authored-by: d2au <redacted>
Co-authored-by: Jan 'Yenda' Trmal <redacted>
Co-authored-by: jxu <redacted>
Co-authored-by: Martin Ström <redacted>
Co-authored-by: The Hatsune Daishi <redacted>
Co-authored-by: tsia <redacted>
Co-authored-by: 3risian <redacted>
Co-authored-by: Tristan Waddington <redacted>
Co-authored-by: Devon Meunier <redacted>
Co-authored-by: Felix Stupp <redacted>
Co-authored-by: tom <redacted>
Co-authored-by: AndrewMBL <redacted>
Co-authored-by: willbeaufoy <redacted>
Co-authored-by: Philipp Stehle <redacted>
Co-authored-by: hh0rva1h <redacted>
Co-authored-by: comsomisha <redacted>
Co-authored-by: TotalCaesar659 <redacted>
Co-authored-by: Juan Francisco Cantero Hurtado <redacted>
Co-authored-by: Dave Loyall <redacted>
Co-authored-by: tlsssl <redacted>
Co-authored-by: Rob <redacted>
Co-authored-by: Michael Klein <redacted>
Co-authored-by: JordanWeatherby <redacted>
Co-authored-by: striker.sh <redacted>
Co-authored-by: Matej Dujava <redacted>
Co-authored-by: Glenn Slayden <redacted>
Co-authored-by: MRWITEK <redacted>
Co-authored-by: JChris246 <redacted>
Co-authored-by: TheRealDude2 <redacted>
134 files changed:
.github/ISSUE_TEMPLATE/1_broken_site.md
.github/ISSUE_TEMPLATE/2_site_support_request.md
.github/ISSUE_TEMPLATE/3_site_feature_request.md
.github/ISSUE_TEMPLATE/4_bug_report.md
.github/ISSUE_TEMPLATE/5_feature_request.md
.travis.yml
CONTRIBUTING.md
ChangeLog
README.md
devscripts/create-github-release.py
docs/supportedsites.md
test/test_YoutubeDL.py
test/test_YoutubeDLCookieJar.py
test/test_subtitles.py
test/test_youtube_chapters.py
test/test_youtube_signature.py
test/testdata/cookies/malformed_cookies.txt [new file with mode: 0644]
youtube_dl/YoutubeDL.py
youtube_dl/compat.py
youtube_dl/downloader/http.py
youtube_dl/extractor/abc.py
youtube_dl/extractor/americastestkitchen.py
youtube_dl/extractor/ard.py
youtube_dl/extractor/azmedien.py
youtube_dl/extractor/bbc.py
youtube_dl/extractor/bellmedia.py
youtube_dl/extractor/bilibili.py
youtube_dl/extractor/brightcove.py
youtube_dl/extractor/businessinsider.py
youtube_dl/extractor/canvas.py
youtube_dl/extractor/cbc.py
youtube_dl/extractor/cloudflarestream.py
youtube_dl/extractor/common.py
youtube_dl/extractor/crunchyroll.py
youtube_dl/extractor/dailymotion.py
youtube_dl/extractor/dctp.py
youtube_dl/extractor/discovery.py
youtube_dl/extractor/eporner.py
youtube_dl/extractor/extractors.py
youtube_dl/extractor/facebook.py
youtube_dl/extractor/franceculture.py
youtube_dl/extractor/generic.py
youtube_dl/extractor/giantbomb.py
youtube_dl/extractor/hellporno.py
youtube_dl/extractor/imdb.py
youtube_dl/extractor/indavideo.py
youtube_dl/extractor/iprima.py
youtube_dl/extractor/ivi.py
youtube_dl/extractor/jpopsukitv.py [deleted file]
youtube_dl/extractor/jwplatform.py
youtube_dl/extractor/kaltura.py
youtube_dl/extractor/lecturio.py
youtube_dl/extractor/lego.py
youtube_dl/extractor/limelight.py
youtube_dl/extractor/linuxacademy.py
youtube_dl/extractor/mailru.py
youtube_dl/extractor/malltv.py
youtube_dl/extractor/mediaset.py
youtube_dl/extractor/mediasite.py
youtube_dl/extractor/mitele.py
youtube_dl/extractor/mofosex.py
youtube_dl/extractor/motherless.py
youtube_dl/extractor/naver.py
youtube_dl/extractor/nbc.py
youtube_dl/extractor/ndr.py
youtube_dl/extractor/nhk.py
youtube_dl/extractor/nova.py
youtube_dl/extractor/npr.py
youtube_dl/extractor/nrk.py
youtube_dl/extractor/nytimes.py
youtube_dl/extractor/orf.py
youtube_dl/extractor/pandatv.py [deleted file]
youtube_dl/extractor/peertube.py
youtube_dl/extractor/periscope.py
youtube_dl/extractor/platzi.py
youtube_dl/extractor/pokemon.py
youtube_dl/extractor/popcorntimes.py [new file with mode: 0644]
youtube_dl/extractor/pornhd.py
youtube_dl/extractor/pornhub.py
youtube_dl/extractor/prosiebensat1.py
youtube_dl/extractor/puhutv.py
youtube_dl/extractor/redtube.py
youtube_dl/extractor/safari.py
youtube_dl/extractor/scrippsnetworks.py
youtube_dl/extractor/servus.py
youtube_dl/extractor/soundcloud.py
youtube_dl/extractor/spankbang.py
youtube_dl/extractor/spankwire.py
youtube_dl/extractor/spike.py
youtube_dl/extractor/sportdeutschland.py
youtube_dl/extractor/srmediathek.py
youtube_dl/extractor/stretchinternet.py
youtube_dl/extractor/svt.py
youtube_dl/extractor/teachable.py
youtube_dl/extractor/tele5.py
youtube_dl/extractor/telecinco.py
youtube_dl/extractor/telequebec.py
youtube_dl/extractor/tenplay.py
youtube_dl/extractor/tfo.py
youtube_dl/extractor/thisoldhouse.py
youtube_dl/extractor/toggle.py
youtube_dl/extractor/trunews.py
youtube_dl/extractor/tumblr.py
youtube_dl/extractor/tv2dk.py
youtube_dl/extractor/tv4.py
youtube_dl/extractor/tv5mondeplus.py
youtube_dl/extractor/tva.py
youtube_dl/extractor/tvplay.py
youtube_dl/extractor/twentyfourvideo.py
youtube_dl/extractor/twitch.py
youtube_dl/extractor/twitter.py
youtube_dl/extractor/uol.py
youtube_dl/extractor/vice.py
youtube_dl/extractor/viewlift.py
youtube_dl/extractor/vimeo.py
youtube_dl/extractor/vlive.py
youtube_dl/extractor/vodplatform.py
youtube_dl/extractor/voicerepublic.py
youtube_dl/extractor/wistia.py
youtube_dl/extractor/xhamster.py
youtube_dl/extractor/xtube.py
youtube_dl/extractor/yahoo.py
youtube_dl/extractor/youjizz.py
youtube_dl/extractor/youporn.py
youtube_dl/extractor/yourporn.py
youtube_dl/extractor/youtube.py
youtube_dl/extractor/zapiks.py
youtube_dl/extractor/zdf.py
youtube_dl/extractor/zype.py
youtube_dl/options.py
youtube_dl/postprocessor/ffmpeg.py
youtube_dl/update.py
youtube_dl/utils.py
youtube_dl/version.py

index 3a94bd621ff4c68eed4af3b1d7eb3764d6e85c35..f2260db465ea4798b90e08a2a22759dddcc8ee9c 100644 (file)
@@ -18,7 +18,7 @@ ## Checklist
 
 <!--
 Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
 
 <!--
 Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
-- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.11.28. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
+- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.07.28. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
 - Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
 - Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape.
 - Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates.
 - Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
 - Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape.
 - Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates.
@@ -26,7 +26,7 @@ ## Checklist
 -->
 
 - [ ] I'm reporting a broken site support
 -->
 
 - [ ] I'm reporting a broken site support
-- [ ] I've verified that I'm running youtube-dl version **2019.11.28**
+- [ ] I've verified that I'm running youtube-dl version **2020.07.28**
 - [ ] I've checked that all provided URLs are alive and playable in a browser
 - [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
 - [ ] I've searched the bugtracker for similar issues including closed ones
 - [ ] I've checked that all provided URLs are alive and playable in a browser
 - [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
 - [ ] I've searched the bugtracker for similar issues including closed ones
@@ -41,7 +41,7 @@ ## Verbose log
  [debug] User config: []
  [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
  [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
  [debug] User config: []
  [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
  [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
- [debug] youtube-dl version 2019.11.28
+ [debug] youtube-dl version 2020.07.28
  [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
  [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
  [debug] Proxy map: {}
  [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
  [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
  [debug] Proxy map: {}
index 72bee12aa2cde7b5c434dfaab92175259beebec8..8bc05c4ba736a8439c70c7f643aac79ba9366419 100644 (file)
@@ -19,7 +19,7 @@ ## Checklist
 
 <!--
 Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
 
 <!--
 Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
-- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.11.28. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
+- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.07.28. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
 - Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
 - Make sure that site you are requesting is not dedicated to copyright infringement, see https://yt-dl.org/copyright-infringement. youtube-dl does not support such sites. In order for site support request to be accepted all provided example URLs should not violate any copyrights.
 - Search the bugtracker for similar site support requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
 - Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
 - Make sure that site you are requesting is not dedicated to copyright infringement, see https://yt-dl.org/copyright-infringement. youtube-dl does not support such sites. In order for site support request to be accepted all provided example URLs should not violate any copyrights.
 - Search the bugtracker for similar site support requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
@@ -27,7 +27,7 @@ ## Checklist
 -->
 
 - [ ] I'm reporting a new site support request
 -->
 
 - [ ] I'm reporting a new site support request
-- [ ] I've verified that I'm running youtube-dl version **2019.11.28**
+- [ ] I've verified that I'm running youtube-dl version **2020.07.28**
 - [ ] I've checked that all provided URLs are alive and playable in a browser
 - [ ] I've checked that none of provided URLs violate any copyrights
 - [ ] I've searched the bugtracker for similar site support requests including closed ones
 - [ ] I've checked that all provided URLs are alive and playable in a browser
 - [ ] I've checked that none of provided URLs violate any copyrights
 - [ ] I've searched the bugtracker for similar site support requests including closed ones
index ddf67e95183c09de5b25c8ee826d760118a2c2a0..98348e0cd69deacf4a35d012f468d4b705bddee4 100644 (file)
@@ -18,13 +18,13 @@ ## Checklist
 
 <!--
 Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
 
 <!--
 Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
-- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.11.28. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
+- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.07.28. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
 - Search the bugtracker for similar site feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
 - Finally, put x into all relevant boxes (like this [x])
 -->
 
 - [ ] I'm reporting a site feature request
 - Search the bugtracker for similar site feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
 - Finally, put x into all relevant boxes (like this [x])
 -->
 
 - [ ] I'm reporting a site feature request
-- [ ] I've verified that I'm running youtube-dl version **2019.11.28**
+- [ ] I've verified that I'm running youtube-dl version **2020.07.28**
 - [ ] I've searched the bugtracker for similar site feature requests including closed ones
 
 
 - [ ] I've searched the bugtracker for similar site feature requests including closed ones
 
 
index 7122e2714dd92fe4c9655c9022c0c76c123ce4d4..86706f5289dad4f48409616ee3e6521c5b5a0e29 100644 (file)
@@ -18,7 +18,7 @@ ## Checklist
 
 <!--
 Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
 
 <!--
 Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
-- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.11.28. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
+- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.07.28. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
 - Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
 - Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape.
 - Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates.
 - Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
 - Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape.
 - Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates.
@@ -27,7 +27,7 @@ ## Checklist
 -->
 
 - [ ] I'm reporting a broken site support issue
 -->
 
 - [ ] I'm reporting a broken site support issue
-- [ ] I've verified that I'm running youtube-dl version **2019.11.28**
+- [ ] I've verified that I'm running youtube-dl version **2020.07.28**
 - [ ] I've checked that all provided URLs are alive and playable in a browser
 - [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
 - [ ] I've searched the bugtracker for similar bug reports including closed ones
 - [ ] I've checked that all provided URLs are alive and playable in a browser
 - [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
 - [ ] I've searched the bugtracker for similar bug reports including closed ones
@@ -43,7 +43,7 @@ ## Verbose log
  [debug] User config: []
  [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
  [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
  [debug] User config: []
  [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
  [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
- [debug] youtube-dl version 2019.11.28
+ [debug] youtube-dl version 2020.07.28
  [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
  [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
  [debug] Proxy map: {}
  [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
  [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
  [debug] Proxy map: {}
index a93882b39dace5577bb9bb58effe1086319e625c..52c2709f94346e08a8f304cb6e64e7b05ae13b1d 100644 (file)
@@ -19,13 +19,13 @@ ## Checklist
 
 <!--
 Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
 
 <!--
 Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
-- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.11.28. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
+- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.07.28. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
 - Search the bugtracker for similar feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
 - Finally, put x into all relevant boxes (like this [x])
 -->
 
 - [ ] I'm reporting a feature request
 - Search the bugtracker for similar feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
 - Finally, put x into all relevant boxes (like this [x])
 -->
 
 - [ ] I'm reporting a feature request
-- [ ] I've verified that I'm running youtube-dl version **2019.11.28**
+- [ ] I've verified that I'm running youtube-dl version **2020.07.28**
 - [ ] I've searched the bugtracker for similar feature requests including closed ones
 
 
 - [ ] I've searched the bugtracker for similar feature requests including closed ones
 
 
index 14d95fa84c105e3c310b2e0ecf984a7688bfb172..51afd469afe569df116d0dd5c200c426f36546b6 100644 (file)
@@ -13,7 +13,7 @@ dist: trusty
 env:
   - YTDL_TEST_SET=core
   - YTDL_TEST_SET=download
 env:
   - YTDL_TEST_SET=core
   - YTDL_TEST_SET=download
-matrix:
+jobs:
   include:
     - python: 3.7
       dist: xenial
   include:
     - python: 3.7
       dist: xenial
@@ -35,6 +35,11 @@ matrix:
       env: YTDL_TEST_SET=download
     - env: JYTHON=true; YTDL_TEST_SET=core
     - env: JYTHON=true; YTDL_TEST_SET=download
       env: YTDL_TEST_SET=download
     - env: JYTHON=true; YTDL_TEST_SET=core
     - env: JYTHON=true; YTDL_TEST_SET=download
+    - name: flake8
+      python: 3.8
+      dist: xenial
+      install: pip install flake8
+      script: flake8 .
   fast_finish: true
   allow_failures:
     - env: YTDL_TEST_SET=download
   fast_finish: true
   allow_failures:
     - env: YTDL_TEST_SET=download
index ac759ddc4ee356adc2eb5081d8bdd4325a9d14ff..58ab3a4b8947d5dbadf5f8be1e4bb0868868afec 100644 (file)
@@ -153,7 +153,7 @@ ### Adding support for a new site
 5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/extractor/extractors.py).
 6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc. Note that tests with `only_matching` key in test's dict are not counted in.
 7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/ytdl-org/youtube-dl/blob/7f41a598b3fba1bcab2817de64a08941200aa3c8/youtube_dl/extractor/common.py#L94-L303). Add tests and code for as many as you want.
 5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/extractor/extractors.py).
 6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc. Note that tests with `only_matching` key in test's dict are not counted in.
 7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/ytdl-org/youtube-dl/blob/7f41a598b3fba1bcab2817de64a08941200aa3c8/youtube_dl/extractor/common.py#L94-L303). Add tests and code for as many as you want.
-8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](http://flake8.pycqa.org/en/latest/index.html#quickstart):
+8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](https://flake8.pycqa.org/en/latest/index.html#quickstart):
 
         $ flake8 youtube_dl/extractor/yourextractor.py
 
 
         $ flake8 youtube_dl/extractor/yourextractor.py
 
index d2f17ee067c9215cdea2f6d1df5c30cee8a4b5ed..bf515f784b2cfefdcd29820c5a5e22e8057cfa5e 100644 (file)
--- a/ChangeLog
+++ b/ChangeLog
-version <unreleased>
+version 2020.07.28
+
+Extractors
+* [youtube] Fix sigfunc name extraction (#26134, #26135, #26136, #26137)
+* [youtube] Improve description extraction (#25937, #25980)
+* [wistia] Restrict embed regular expression (#25969)
+* [youtube] Prevent excess HTTP 301 (#25786)
++ [youtube:playlists] Extend URL regular expression (#25810)
++ [bellmedia] Add support for cp24.com clip URLs (#25764)
+* [brightcove] Improve embed detection (#25674)
+
+
+version 2020.06.16.1
+
+Extractors
+* [youtube] Force old layout (#25682, #25683, #25680, #25686)
+* [youtube] Fix categories and improve tags extraction
+
+
+version 2020.06.16
+
+Extractors
+* [youtube] Fix uploader id and uploader URL extraction
+* [youtube] Improve view count extraction
+* [youtube] Fix upload date extraction (#25677)
+* [youtube] Fix thumbnails extraction (#25676)
+* [youtube] Fix playlist and feed extraction (#25675)
++ [facebook] Add support for single-video ID links
++ [youtube] Extract chapters from JSON (#24819)
++ [kaltura] Add support for multiple embeds on a webpage (#25523)
+
+
+version 2020.06.06
+
+Extractors
+* [tele5] Bypass geo restriction
++ [jwplatform] Add support for bypass geo restriction
+* [tele5] Prefer jwplatform over nexx (#25533)
+* [twitch:stream] Expect 400 and 410 HTTP errors from API
+* [twitch:stream] Fix extraction (#25528)
+* [twitch] Fix thumbnails extraction (#25531)
++ [twitch] Pass v5 Accept HTTP header (#25531)
+* [brightcove] Fix subtitles extraction (#25540)
++ [malltv] Add support for sk.mall.tv (#25445)
+* [periscope] Fix untitled broadcasts (#25482)
+* [jwplatform] Improve embeds extraction (#25467)
+
+
+version 2020.05.29
+
+Core
+* [postprocessor/ffmpeg] Embed series metadata with --add-metadata
+* [utils] Fix file permissions in write_json_file (#12471, #25122)
+
+Extractors
+* [ard:beta] Extend URL regular expression (#25405)
++ [youtube] Add support for more invidious instances (#25417)
+* [giantbomb] Extend URL regular expression (#25222)
+* [ard] Improve URL regular expression (#25134, #25198)
+* [redtube] Improve formats extraction and extract m3u8 formats (#25311,
+  #25321)
+* [indavideo] Switch to HTTPS for API request (#25191)
+* [redtube] Improve title extraction (#25208)
+* [vimeo] Improve format extraction and sorting (#25285)
+* [soundcloud] Reduce API playlist page limit (#25274)
++ [youtube] Add support for yewtu.be (#25226)
+* [mailru] Fix extraction (#24530, #25239)
+* [bellator] Fix mgid extraction (#25195)
+
+
+version 2020.05.08
+
+Core
+* [downloader/http] Request last data block of exact remaining size
+* [downloader/http] Finish downloading once received data length matches
+  expected
+* [extractor/common] Use compat_cookiejar_Cookie for _set_cookie to always
+  ensure cookie name and value are bytestrings on python 2 (#23256, #24776)
++ [compat] Introduce compat_cookiejar_Cookie
+* [utils] Improve cookie files support
+    + Add support for UTF-8 in cookie files
+    * Skip malformed cookie file entries instead of crashing (invalid entry
+      length, invalid expires at)
+
+Extractors
+* [youtube] Improve signature cipher extraction (#25187, #25188)
+* [iprima] Improve extraction (#25138)
+* [uol] Fix extraction (#22007)
++ [orf] Add support for more radio stations (#24938, #24968)
+* [dailymotion] Fix typo
+- [puhutv] Remove no longer available HTTP formats (#25124)
+
+
+version 2020.05.03
+
+Core
++ [extractor/common] Extract multiple JSON-LD entries
+* [options] Clarify doc on --exec command (#19087, #24883)
+* [extractor/common] Skip malformed ISM manifest XMLs while extracting
+  ISM formats (#24667)
+
+Extractors
+* [crunchyroll] Fix and improve extraction (#25096, #25060)
+* [youtube] Improve player id extraction
+* [youtube] Use redirected video id if any (#25063)
+* [yahoo] Fix GYAO Player extraction and relax URL regular expression
+  (#24178, #24778)
+* [tvplay] Fix Viafree extraction (#15189, #24473, #24789)
+* [tenplay] Relax URL regular expression (#25001)
++ [prosiebensat1] Extract series metadata
+* [prosiebensat1] Improve extraction and remove 7tv.de support (#24948)
+- [prosiebensat1] Remove 7tv.de support (#24948)
+* [youtube] Fix DRM videos detection (#24736)
+* [thisoldhouse] Fix video id extraction (#24548, #24549)
++ [soundcloud] Extract AAC format (#19173, #24708)
+* [youtube] Skip broken multifeed videos (#24711)
+* [nova:embed] Fix extraction (#24700)
+* [motherless] Fix extraction (#24699)
+* [twitch:clips] Extend URL regular expression (#24290, #24642)
+* [tv4] Fix ISM formats extraction (#24667)
+* [tele5] Fix extraction (#24553)
++ [mofosex] Add support for generic embeds (#24633)
++ [youporn] Add support for generic embeds
++ [spankwire] Add support for generic embeds (#24633)
+* [spankwire] Fix extraction (#18924, #20648)
+
+
+version 2020.03.24
+
+Core
+- [utils] Revert support for cookie files with spaces used instead of tabs
+
+Extractors
+* [teachable] Update upskillcourses and gns3 domains
+* [generic] Look for teachable embeds before wistia
++ [teachable] Extract chapter metadata (#24421)
++ [bilibili] Add support for player.bilibili.com (#24402)
++ [bilibili] Add support for new URL schema with BV ids (#24439, #24442)
+* [limelight] Remove disabled API requests (#24255)
+* [soundcloud] Fix download URL extraction (#24394)
++ [cbc:watch] Add support for authentication (#19160)
+* [hellporno] Fix extraction (#24399)
+* [xtube] Fix formats extraction (#24348)
+* [ndr] Fix extraction (#24326)
+* [nhk] Update m3u8 URL and use native HLS downloader (#24329)
+- [nhk] Remove obsolete rtmp formats (#24329)
+* [nhk] Relax URL regular expression (#24329)
+- [vimeo] Revert fix showcase password protected video extraction (#24224)
+
+
+version 2020.03.08
+
+Core
++ [utils] Add support for cookie files with spaces used instead of tabs
+
+Extractors
++ [pornhub] Add support for pornhubpremium.com (#24288)
+- [youtube] Remove outdated code and unnecessary requests
+* [youtube] Improve extraction in 429 HTTP error conditions (#24283)
+* [nhk] Update API version (#24270)
+
+
+version 2020.03.06
+
+Extractors
+* [youtube] Fix age-gated videos support without login (#24248)
+* [vimeo] Fix showcase password protected video extraction (#24224)
+* [pornhub] Improve title extraction (#24184)
+* [peertube] Improve extraction (#23657)
++ [servus] Add support for new URL schema (#23475, #23583, #24142)
+* [vimeo] Fix subtitles URLs (#24209)
+
+
+version 2020.03.01
+
+Core
+* [YoutubeDL] Force redirect URL to unicode on python 2
+- [options] Remove duplicate short option -v for --version (#24162)
+
+Extractors
+* [xhamster] Fix extraction (#24205)
+* [franceculture] Fix extraction (#24204)
++ [telecinco] Add support for article opening videos
+* [telecinco] Fix extraction (#24195)
+* [xtube] Fix metadata extraction (#21073, #22455)
+* [youjizz] Fix extraction (#24181)
+- Remove no longer needed compat_str around geturl
+* [pornhd] Fix extraction (#24128)
++ [teachable] Add support for multiple videos per lecture (#24101)
++ [wistia] Add support for multiple generic embeds (#8347, 11385)
+* [imdb] Fix extraction (#23443)
+* [tv2dk:bornholm:play] Fix extraction (#24076)
+
+
+version 2020.02.16
+
+Core
+* [YoutubeDL] Fix playlist entry indexing with --playlist-items (#10591,
+  #10622)
+* [update] Fix updating via symlinks (#23991)
++ [compat] Introduce compat_realpath (#23991)
+
+Extractors
++ [npr] Add support for streams (#24042)
++ [24video] Add support for porn.24video.net (#23779, #23784)
+- [jpopsuki] Remove extractor (#23858)
+* [nova] Improve extraction (#23690)
+* [nova:embed] Improve (#23690)
+* [nova:embed] Fix extraction (#23672)
++ [abc:iview] Add support for 720p (#22907, #22921)
+* [nytimes] Improve format sorting (#24010)
++ [toggle] Add support for mewatch.sg (#23895, #23930)
+* [thisoldhouse] Fix extraction (#23951)
++ [popcorntimes] Add support for popcorntimes.tv (#23949)
+* [sportdeutschland] Update to new API
+* [twitch:stream] Lowercase channel id for stream request (#23917)
+* [tv5mondeplus] Fix extraction (#23907, #23911)
+* [tva] Relax URL regular expression (#23903)
+* [vimeo] Fix album extraction (#23864)
+* [viewlift] Improve extraction
+    * Fix extraction (#23851)
+    + Add support for authentication
+    + Add support for more domains
+* [svt] Fix series extraction (#22297)
+* [svt] Fix article extraction (#22897, #22919)
+* [soundcloud] Imporve private playlist/set tracks extraction (#3707)
+
+
+version 2020.01.24
+
+Extractors
+* [youtube] Fix sigfunc name extraction (#23819)
+* [stretchinternet] Fix extraction (#4319)
+* [voicerepublic] Fix extraction
+* [azmedien] Fix extraction (#23783)
+* [businessinsider] Fix jwplatform id extraction (#22929, #22954)
++ [24video] Add support for 24video.vip (#23753)
+* [ivi:compilation] Fix entries extraction (#23770)
+* [ard] Improve extraction (#23761)
+    * Simplify extraction
+    + Extract age limit and series
+    * Bypass geo-restriction
++ [nbc] Add support for nbc multi network URLs (#23049)
+* [americastestkitchen] Fix extraction
+* [zype] Improve extraction
+    + Extract subtitles (#21258)
+    + Support URLs with alternative keys/tokens (#21258)
+    + Extract more metadata
+* [orf:tvthek] Improve geo restricted videos detection (#23741)
+* [soundcloud] Restore previews extraction (#23739)
+
+
+version 2020.01.15
+
+Extractors
+* [yourporn] Fix extraction (#21645, #22255, #23459)
++ [canvas] Add support for new API endpoint (#17680, #18629)
+* [ndr:base:embed] Improve thumbnails extraction (#23731)
++ [vodplatform] Add support for embed.kwikmotion.com domain
++ [twitter] Add support for promo_video_website cards (#23711)
+* [orf:radio] Clean description and improve extraction
+* [orf:fm4] Fix extraction (#23599)
+* [safari] Fix kaltura session extraction (#23679, #23670)
+* [lego] Fix extraction and extract subtitle (#23687)
+* [cloudflarestream] Improve extraction
+    + Add support for bytehighway.net domain
+    + Add support for signed URLs
+    + Extract thumbnail
+* [naver] Improve extraction
+    * Improve geo-restriction handling
+    + Extract automatic captions
+    + Extract uploader metadata
+    + Extract VLive HLS formats
+    * Improve metadata extraction
+- [pandatv] Remove extractor (#23630)
+* [dctp] Fix format extraction (#23656)
++ [scrippsnetworks] Add support for www.discovery.com videos
+* [discovery] Fix anonymous token extraction (#23650)
+* [nrktv:seriebase] Fix extraction (#23625, #23537)
+* [wistia] Improve format extraction and extract subtitles (#22590)
+* [vice] Improve extraction (#23631)
+* [redtube] Detect private videos (#23518)
+
+
+version 2020.01.01
+
+Extractors
+* [brightcove] Invalidate policy key cache on failing requests
+* [pornhub] Improve locked videos detection (#22449, #22780)
++ [pornhub] Add support for m3u8 formats
+* [pornhub] Fix extraction (#22749, #23082)
+* [brightcove] Update policy key on failing requests
+* [spankbang] Improve removed video detection (#23423)
+* [spankbang] Fix extraction (#23307, #23423, #23444)
+* [soundcloud] Automatically update client id on failing requests
+* [prosiebensat1] Improve geo restriction handling (#23571)
+* [brightcove] Cache brightcove player policy keys
+* [teachable] Fail with error message if no video URL found
+* [teachable] Improve locked lessons detection (#23528)
++ [scrippsnetworks] Add support for Scripps Networks sites (#19857, #22981)
+* [mitele] Fix extraction (#21354, #23456)
+* [soundcloud] Update client id (#23516)
+* [mailru] Relax URL regular expressions (#23509)
+
+
+version 2019.12.25
 
 Core
 * [utils] Improve str_to_int
 + [downloader/hls] Add ability to override AES decryption key URL (#17521)
 
 Extractors
 
 Core
 * [utils] Improve str_to_int
 + [downloader/hls] Add ability to override AES decryption key URL (#17521)
 
 Extractors
+* [mediaset] Fix parse formats (#23508)
 + [tv2dk:bornholm:play] Add support for play.tv2bornholm.dk (#23291)
 + [slideslive] Add support for url and vimeo service names (#23414)
 * [slideslive] Fix extraction (#23413)
 + [tv2dk:bornholm:play] Add support for play.tv2bornholm.dk (#23291)
 + [slideslive] Add support for url and vimeo service names (#23414)
 * [slideslive] Fix extraction (#23413)
index 01f975958c8370016a39c9f3fb872241c977c62b..45326c69ec5bf3fa6665cca18e808a06546ce8ea 100644 (file)
--- a/README.md
+++ b/README.md
@@ -434,9 +434,9 @@ ## Post-processing Options:
                                      either the path to the binary or its
                                      containing directory.
     --exec CMD                       Execute a command on the file after
                                      either the path to the binary or its
                                      containing directory.
     --exec CMD                       Execute a command on the file after
-                                     downloading, similar to find's -exec
-                                     syntax. Example: --exec 'adb push {}
-                                     /sdcard/Music/ && rm {}'
+                                     downloading and post-processing, similar to
+                                     find's -exec syntax. Example: --exec 'adb
+                                     push {} /sdcard/Music/ && rm {}'
     --convert-subs FORMAT            Convert the subtitles to other format
                                      (currently supported: srt|ass|vtt|lrc)
 
     --convert-subs FORMAT            Convert the subtitles to other format
                                      (currently supported: srt|ass|vtt|lrc)
 
@@ -835,7 +835,9 @@ ### ExtractorError: Could not find JS function u'OF'
 
 ### HTTP Error 429: Too Many Requests or 402: Payment Required
 
 
 ### HTTP Error 429: Too Many Requests or 402: Payment Required
 
-These two error codes indicate that the service is blocking your IP address because of overuse. Contact the service and ask them to unblock your IP address, or - if you have acquired a whitelisted IP address already - use the [`--proxy` or `--source-address` options](#network-options) to select another IP address.
+These two error codes indicate that the service is blocking your IP address because of overuse. Usually this is a soft block meaning that you can gain access again after solving CAPTCHA. Just open a browser and solve a CAPTCHA the service suggests you and after that [pass cookies](#how-do-i-pass-cookies-to-youtube-dl) to youtube-dl. Note that if your machine has multiple external IPs then you should also pass exactly the same IP you've used for solving CAPTCHA with [`--source-address`](#network-options). Also you may need to pass a `User-Agent` HTTP header of your browser with [`--user-agent`](#workarounds).
+
+If this is not the case (no CAPTCHA suggested to solve by the service) then you can contact the service and ask them to unblock your IP address, or - if you have acquired a whitelisted IP address already - use the [`--proxy` or `--source-address` options](#network-options) to select another IP address.
 
 ### SyntaxError: Non-ASCII character
 
 
 ### SyntaxError: Non-ASCII character
 
@@ -1030,7 +1032,7 @@ ### Adding support for a new site
 5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/extractor/extractors.py).
 6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc. Note that tests with `only_matching` key in test's dict are not counted in.
 7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/ytdl-org/youtube-dl/blob/7f41a598b3fba1bcab2817de64a08941200aa3c8/youtube_dl/extractor/common.py#L94-L303). Add tests and code for as many as you want.
 5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/extractor/extractors.py).
 6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc. Note that tests with `only_matching` key in test's dict are not counted in.
 7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/ytdl-org/youtube-dl/blob/7f41a598b3fba1bcab2817de64a08941200aa3c8/youtube_dl/extractor/common.py#L94-L303). Add tests and code for as many as you want.
-8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](http://flake8.pycqa.org/en/latest/index.html#quickstart):
+8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](https://flake8.pycqa.org/en/latest/index.html#quickstart):
 
         $ flake8 youtube_dl/extractor/yourextractor.py
 
 
         $ flake8 youtube_dl/extractor/yourextractor.py
 
index 428111b3f0e893d9ae53da648844833e87dd72b3..2ddfa109698864f642b97f8bcb9846f84631e4d9 100644 (file)
@@ -1,7 +1,6 @@
 #!/usr/bin/env python
 from __future__ import unicode_literals
 
 #!/usr/bin/env python
 from __future__ import unicode_literals
 
-import base64
 import io
 import json
 import mimetypes
 import io
 import json
 import mimetypes
@@ -15,7 +14,6 @@
 
 from youtube_dl.compat import (
     compat_basestring,
 
 from youtube_dl.compat import (
     compat_basestring,
-    compat_input,
     compat_getpass,
     compat_print,
     compat_urllib_request,
     compat_getpass,
     compat_print,
     compat_urllib_request,
@@ -40,28 +38,20 @@ def _init_github_account(self):
         try:
             info = netrc.netrc().authenticators(self._NETRC_MACHINE)
             if info is not None:
         try:
             info = netrc.netrc().authenticators(self._NETRC_MACHINE)
             if info is not None:
-                self._username = info[0]
-                self._password = info[2]
+                self._token = info[2]
                 compat_print('Using GitHub credentials found in .netrc...')
                 return
             else:
                 compat_print('No GitHub credentials found in .netrc')
         except (IOError, netrc.NetrcParseError):
             compat_print('Unable to parse .netrc')
                 compat_print('Using GitHub credentials found in .netrc...')
                 return
             else:
                 compat_print('No GitHub credentials found in .netrc')
         except (IOError, netrc.NetrcParseError):
             compat_print('Unable to parse .netrc')
-        self._username = compat_input(
-            'Type your GitHub username or email address and press [Return]: ')
-        self._password = compat_getpass(
-            'Type your GitHub password and press [Return]: ')
+        self._token = compat_getpass(
+            'Type your GitHub PAT (personal access token) and press [Return]: ')
 
     def _call(self, req):
         if isinstance(req, compat_basestring):
             req = sanitized_Request(req)
 
     def _call(self, req):
         if isinstance(req, compat_basestring):
             req = sanitized_Request(req)
-        # Authorizing manually since GitHub does not response with 401 with
-        # WWW-Authenticate header set (see
-        # https://developer.github.com/v3/#basic-authentication)
-        b64 = base64.b64encode(
-            ('%s:%s' % (self._username, self._password)).encode('utf-8')).decode('ascii')
-        req.add_header('Authorization', 'Basic %s' % b64)
+        req.add_header('Authorization', 'token %s' % self._token)
         response = self._opener.open(req).read().decode('utf-8')
         return json.loads(response)
 
         response = self._opener.open(req).read().decode('utf-8')
         return json.loads(response)
 
index 2744dfca846d59520e8944231d569e0d1744ad2c..35c1050e5499238917243f09b5c142328ed37969 100644 (file)
@@ -28,10 +28,11 @@ # Supported sites
  - **acast:channel**
  - **ADN**: Anime Digital Network
  - **AdobeConnect**
  - **acast:channel**
  - **ADN**: Anime Digital Network
  - **AdobeConnect**
- - **AdobeTV**
- - **AdobeTVChannel**
- - **AdobeTVShow**
- - **AdobeTVVideo**
+ - **adobetv**
+ - **adobetv:channel**
+ - **adobetv:embed**
+ - **adobetv:show**
+ - **adobetv:video**
  - **AdultSwim**
  - **aenetworks**: A+E Networks: A&E, Lifetime, History.com, FYI Network and History Vault
  - **afreecatv**: afreecatv.com
  - **AdultSwim**
  - **aenetworks**: A+E Networks: A&E, Lifetime, History.com, FYI Network and History Vault
  - **afreecatv**: afreecatv.com
@@ -97,6 +98,7 @@ # Supported sites
  - **BiliBili**
  - **BilibiliAudio**
  - **BilibiliAudioAlbum**
  - **BiliBili**
  - **BilibiliAudio**
  - **BilibiliAudioAlbum**
+ - **BiliBiliPlayer**
  - **BioBioChileTV**
  - **BIQLE**
  - **BitChute**
  - **BioBioChileTV**
  - **BIQLE**
  - **BitChute**
@@ -388,7 +390,6 @@ # Supported sites
  - **JeuxVideo**
  - **Joj**
  - **Jove**
  - **JeuxVideo**
  - **Joj**
  - **Jove**
- - **jpopsuki.tv**
  - **JWPlatform**
  - **Kakao**
  - **Kaltura**
  - **JWPlatform**
  - **Kakao**
  - **Kaltura**
@@ -396,6 +397,7 @@ # Supported sites
  - **Kankan**
  - **Karaoketv**
  - **KarriereVideos**
  - **Kankan**
  - **Karaoketv**
  - **KarriereVideos**
+ - **Katsomo**
  - **KeezMovies**
  - **Ketnet**
  - **KhanAcademy**
  - **KeezMovies**
  - **Ketnet**
  - **KhanAcademy**
@@ -403,7 +405,6 @@ # Supported sites
  - **KinjaEmbed**
  - **KinoPoisk**
  - **KonserthusetPlay**
  - **KinjaEmbed**
  - **KinoPoisk**
  - **KonserthusetPlay**
- - **kontrtube**: KontrTube.ru - Труба зовёт
  - **KrasView**: Красвью
  - **Ku6**
  - **KUSI**
  - **KrasView**: Красвью
  - **Ku6**
  - **KUSI**
@@ -496,6 +497,7 @@ # Supported sites
  - **MNetTV**
  - **MoeVideo**: LetitBit video services: moevideo.net, playreplay.net and videochart.net
  - **Mofosex**
  - **MNetTV**
  - **MoeVideo**: LetitBit video services: moevideo.net, playreplay.net and videochart.net
  - **Mofosex**
+ - **MofosexEmbed**
  - **Mojvideo**
  - **Morningstar**: morningstar.com
  - **Motherless**
  - **Mojvideo**
  - **Morningstar**: morningstar.com
  - **Motherless**
@@ -513,7 +515,6 @@ # Supported sites
  - **mtvjapan**
  - **mtvservices:embedded**
  - **MuenchenTV**: münchen.tv
  - **mtvjapan**
  - **mtvservices:embedded**
  - **MuenchenTV**: münchen.tv
- - **MusicPlayOn**
  - **mva**: Microsoft Virtual Academy videos
  - **mva:course**: Microsoft Virtual Academy courses
  - **Mwave**
  - **mva**: Microsoft Virtual Academy videos
  - **mva:course**: Microsoft Virtual Academy courses
  - **Mwave**
@@ -619,16 +620,25 @@ # Supported sites
  - **Ooyala**
  - **OoyalaExternal**
  - **OraTV**
  - **Ooyala**
  - **OoyalaExternal**
  - **OraTV**
+ - **orf:burgenland**: Radio Burgenland
  - **orf:fm4**: radio FM4
  - **orf:fm4:story**: fm4.orf.at stories
  - **orf:iptv**: iptv.ORF.at
  - **orf:fm4**: radio FM4
  - **orf:fm4:story**: fm4.orf.at stories
  - **orf:iptv**: iptv.ORF.at
+ - **orf:kaernten**: Radio Kärnten
+ - **orf:noe**: Radio Niederösterreich
+ - **orf:oberoesterreich**: Radio Oberösterreich
  - **orf:oe1**: Radio Österreich 1
  - **orf:oe1**: Radio Österreich 1
+ - **orf:oe3**: Radio Österreich 3
+ - **orf:salzburg**: Radio Salzburg
+ - **orf:steiermark**: Radio Steiermark
+ - **orf:tirol**: Radio Tirol
  - **orf:tvthek**: ORF TVthek
  - **orf:tvthek**: ORF TVthek
+ - **orf:vorarlberg**: Radio Vorarlberg
+ - **orf:wien**: Radio Wien
  - **OsnatelTV**
  - **OutsideTV**
  - **PacktPub**
  - **PacktPubCourse**
  - **OsnatelTV**
  - **OutsideTV**
  - **PacktPub**
  - **PacktPubCourse**
- - **PandaTV**: 熊猫TV
  - **pandora.tv**: 판도라TV
  - **ParamountNetwork**
  - **parliamentlive.tv**: UK parliament videos
  - **pandora.tv**: 판도라TV
  - **ParamountNetwork**
  - **parliamentlive.tv**: UK parliament videos
@@ -664,6 +674,7 @@ # Supported sites
  - **Pokemon**
  - **PolskieRadio**
  - **PolskieRadioCategory**
  - **Pokemon**
  - **PolskieRadio**
  - **PolskieRadioCategory**
+ - **Popcorntimes**
  - **PopcornTV**
  - **PornCom**
  - **PornerBros**
  - **PopcornTV**
  - **PornCom**
  - **PornerBros**
@@ -761,6 +772,7 @@ # Supported sites
  - **screen.yahoo:search**: Yahoo screen search
  - **Screencast**
  - **ScreencastOMatic**
  - **screen.yahoo:search**: Yahoo screen search
  - **Screencast**
  - **ScreencastOMatic**
+ - **ScrippsNetworks**
  - **scrippsnetworks:watch**
  - **SCTE**
  - **SCTECourse**
  - **scrippsnetworks:watch**
  - **SCTE**
  - **SCTECourse**
@@ -913,6 +925,7 @@ # Supported sites
  - **tv2.hu**
  - **TV2Article**
  - **TV2DK**
  - **tv2.hu**
  - **TV2Article**
  - **TV2DK**
+ - **TV2DKBornholmPlay**
  - **TV4**: tv4.se and tv4play.se
  - **TV5MondePlus**: TV5MONDE+
  - **TVA**
  - **TV4**: tv4.se and tv4play.se
  - **TV5MondePlus**: TV5MONDE+
  - **TVA**
@@ -954,6 +967,7 @@ # Supported sites
  - **udemy**
  - **udemy:course**
  - **UDNEmbed**: 聯合影音
  - **udemy**
  - **udemy:course**
  - **UDNEmbed**: 聯合影音
+ - **UFCArabia**
  - **UFCTV**
  - **UKTVPlay**
  - **umg:de**: Universal Music Deutschland
  - **UFCTV**
  - **UKTVPlay**
  - **umg:de**: Universal Music Deutschland
@@ -993,7 +1007,6 @@ # Supported sites
  - **videomore**
  - **videomore:season**
  - **videomore:video**
  - **videomore**
  - **videomore:season**
  - **videomore:video**
- - **VideoPremium**
  - **VideoPress**
  - **Vidio**
  - **VidLii**
  - **VideoPress**
  - **Vidio**
  - **VidLii**
@@ -1003,8 +1016,8 @@ # Supported sites
  - **Vidzi**
  - **vier**: vier.be and vijf.be
  - **vier:videos**
  - **Vidzi**
  - **vier**: vier.be and vijf.be
  - **vier:videos**
- - **ViewLift**
- - **ViewLiftEmbed**
+ - **viewlift**
+ - **viewlift:embed**
  - **Viidea**
  - **viki**
  - **viki:channel**
  - **Viidea**
  - **viki**
  - **viki:channel**
index ce96661716c42ae0bf9c6a8ccb9ddf48c715e0a2..1e204e551b499edead22ce65e371787ca22ffc35 100644 (file)
@@ -816,11 +816,15 @@ def test_playlist_items_selection(self):
             'webpage_url': 'http://example.com',
         }
 
             'webpage_url': 'http://example.com',
         }
 
-        def get_ids(params):
+        def get_downloaded_info_dicts(params):
             ydl = YDL(params)
             ydl = YDL(params)
-            # make a copy because the dictionary can be modified
-            ydl.process_ie_result(playlist.copy())
-            return [int(v['id']) for v in ydl.downloaded_info_dicts]
+            # make a deep copy because the dictionary and nested entries
+            # can be modified
+            ydl.process_ie_result(copy.deepcopy(playlist))
+            return ydl.downloaded_info_dicts
+
+        def get_ids(params):
+            return [int(v['id']) for v in get_downloaded_info_dicts(params)]
 
         result = get_ids({})
         self.assertEqual(result, [1, 2, 3, 4])
 
         result = get_ids({})
         self.assertEqual(result, [1, 2, 3, 4])
@@ -852,6 +856,22 @@ def get_ids(params):
         result = get_ids({'playlist_items': '2-4,3-4,3'})
         self.assertEqual(result, [2, 3, 4])
 
         result = get_ids({'playlist_items': '2-4,3-4,3'})
         self.assertEqual(result, [2, 3, 4])
 
+        # Tests for https://github.com/ytdl-org/youtube-dl/issues/10591
+        # @{
+        result = get_downloaded_info_dicts({'playlist_items': '2-4,3-4,3'})
+        self.assertEqual(result[0]['playlist_index'], 2)
+        self.assertEqual(result[1]['playlist_index'], 3)
+
+        result = get_downloaded_info_dicts({'playlist_items': '2-4,3-4,3'})
+        self.assertEqual(result[0]['playlist_index'], 2)
+        self.assertEqual(result[1]['playlist_index'], 3)
+        self.assertEqual(result[2]['playlist_index'], 4)
+
+        result = get_downloaded_info_dicts({'playlist_items': '4,2'})
+        self.assertEqual(result[0]['playlist_index'], 4)
+        self.assertEqual(result[1]['playlist_index'], 2)
+        # @}
+
     def test_urlopen_no_file_protocol(self):
         # see https://github.com/ytdl-org/youtube-dl/issues/8227
         ydl = YDL()
     def test_urlopen_no_file_protocol(self):
         # see https://github.com/ytdl-org/youtube-dl/issues/8227
         ydl = YDL()
index f959798deb595165fddac6a8e555570c0420e454..05f48bd7417e478ab1ae9ae7eddaca4cbc7039c3 100644 (file)
@@ -39,6 +39,13 @@ def assert_cookie_has_value(key):
         assert_cookie_has_value('HTTPONLY_COOKIE')
         assert_cookie_has_value('JS_ACCESSIBLE_COOKIE')
 
         assert_cookie_has_value('HTTPONLY_COOKIE')
         assert_cookie_has_value('JS_ACCESSIBLE_COOKIE')
 
+    def test_malformed_cookies(self):
+        cookiejar = YoutubeDLCookieJar('./test/testdata/cookies/malformed_cookies.txt')
+        cookiejar.load(ignore_discard=True, ignore_expires=True)
+        # Cookies should be empty since all malformed cookie file entries
+        # will be ignored
+        self.assertFalse(cookiejar._cookies)
+
 
 if __name__ == '__main__':
     unittest.main()
 
 if __name__ == '__main__':
     unittest.main()
index 7d57a628e5ef79c5e12d13ccd0a2b515548ffa60..17aaaf20d9a002336af43d1afa2f7d49a186ac9a 100644 (file)
@@ -26,7 +26,6 @@
     ThePlatformIE,
     ThePlatformFeedIE,
     RTVEALaCartaIE,
     ThePlatformIE,
     ThePlatformFeedIE,
     RTVEALaCartaIE,
-    FunnyOrDieIE,
     DemocracynowIE,
 )
 
     DemocracynowIE,
 )
 
@@ -322,18 +321,6 @@ def test_allsubtitles(self):
         self.assertEqual(md5(subtitles['es']), '69e70cae2d40574fb7316f31d6eb7fca')
 
 
         self.assertEqual(md5(subtitles['es']), '69e70cae2d40574fb7316f31d6eb7fca')
 
 
-class TestFunnyOrDieSubtitles(BaseTestSubtitles):
-    url = 'http://www.funnyordie.com/videos/224829ff6d/judd-apatow-will-direct-your-vine'
-    IE = FunnyOrDieIE
-
-    def test_allsubtitles(self):
-        self.DL.params['writesubtitles'] = True
-        self.DL.params['allsubtitles'] = True
-        subtitles = self.getSubtitles()
-        self.assertEqual(set(subtitles.keys()), set(['en']))
-        self.assertEqual(md5(subtitles['en']), 'c5593c193eacd353596c11c2d4f9ecc4')
-
-
 class TestDemocracynowSubtitles(BaseTestSubtitles):
     url = 'http://www.democracynow.org/shows/2015/7/3'
     IE = DemocracynowIE
 class TestDemocracynowSubtitles(BaseTestSubtitles):
     url = 'http://www.democracynow.org/shows/2015/7/3'
     IE = DemocracynowIE
index 324ca852578531757d9964f2c90cf6f8e1c4d3b1..e69c57377e617e2864a80e2e736bb72c87c4122e 100644 (file)
@@ -267,7 +267,7 @@ def test_youtube_chapters(self):
         for description, duration, expected_chapters in self._TEST_CASES:
             ie = YoutubeIE()
             expect_value(
         for description, duration, expected_chapters in self._TEST_CASES:
             ie = YoutubeIE()
             expect_value(
-                self, ie._extract_chapters(description, duration),
+                self, ie._extract_chapters_from_description(description, duration),
                 expected_chapters, None)
 
 
                 expected_chapters, None)
 
 
index f0c370eeedc8942abc0b8cd8c10e57b4361d00c2..69df30edaa75efbeb51a1cad650c8e29b2596b66 100644 (file)
 ]
 
 
 ]
 
 
+class TestPlayerInfo(unittest.TestCase):
+    def test_youtube_extract_player_info(self):
+        PLAYER_URLS = (
+            ('https://www.youtube.com/s/player/64dddad9/player_ias.vflset/en_US/base.js', '64dddad9'),
+            # obsolete
+            ('https://www.youtube.com/yts/jsbin/player_ias-vfle4-e03/en_US/base.js', 'vfle4-e03'),
+            ('https://www.youtube.com/yts/jsbin/player_ias-vfl49f_g4/en_US/base.js', 'vfl49f_g4'),
+            ('https://www.youtube.com/yts/jsbin/player_ias-vflCPQUIL/en_US/base.js', 'vflCPQUIL'),
+            ('https://www.youtube.com/yts/jsbin/player-vflzQZbt7/en_US/base.js', 'vflzQZbt7'),
+            ('https://www.youtube.com/yts/jsbin/player-en_US-vflaxXRn1/base.js', 'vflaxXRn1'),
+            ('https://s.ytimg.com/yts/jsbin/html5player-en_US-vflXGBaUN.js', 'vflXGBaUN'),
+            ('https://s.ytimg.com/yts/jsbin/html5player-en_US-vflKjOTVq/html5player.js', 'vflKjOTVq'),
+            ('http://s.ytimg.com/yt/swfbin/watch_as3-vflrEm9Nq.swf', 'vflrEm9Nq'),
+            ('https://s.ytimg.com/yts/swfbin/player-vflenCdZL/watch_as3.swf', 'vflenCdZL'),
+        )
+        for player_url, expected_player_id in PLAYER_URLS:
+            expected_player_type = player_url.split('.')[-1]
+            player_type, player_id = YoutubeIE._extract_player_info(player_url)
+            self.assertEqual(player_type, expected_player_type)
+            self.assertEqual(player_id, expected_player_id)
+
+
 class TestSignature(unittest.TestCase):
     def setUp(self):
         TEST_DIR = os.path.dirname(os.path.abspath(__file__))
 class TestSignature(unittest.TestCase):
     def setUp(self):
         TEST_DIR = os.path.dirname(os.path.abspath(__file__))
diff --git a/test/testdata/cookies/malformed_cookies.txt b/test/testdata/cookies/malformed_cookies.txt
new file mode 100644 (file)
index 0000000..17bc403
--- /dev/null
@@ -0,0 +1,9 @@
+# Netscape HTTP Cookie File
+# http://curl.haxx.se/rfc/cookie_spec.html
+# This is a generated file!  Do not edit.
+
+# Cookie file entry with invalid number of fields - 6 instead of 7
+www.foobar.foobar      FALSE   /       FALSE   0       COOKIE
+
+# Cookie file entry with invalid expires at
+www.foobar.foobar      FALSE   /       FALSE   1.7976931348623157e+308 COOKIE  VALUE
index f5cb46308198e4c65316fba10ccf30d2f3e14b6a..19370f62b0d3ddb91c74ae4b6cf6c569341fbdc5 100755 (executable)
@@ -92,6 +92,7 @@
     YoutubeDLCookieJar,
     YoutubeDLCookieProcessor,
     YoutubeDLHandler,
     YoutubeDLCookieJar,
     YoutubeDLCookieProcessor,
     YoutubeDLHandler,
+    YoutubeDLRedirectHandler,
 )
 from .cache import Cache
 from .extractor import get_info_extractor, gen_extractor_classes, _LAZY_LOADER
 )
 from .cache import Cache
 from .extractor import get_info_extractor, gen_extractor_classes, _LAZY_LOADER
@@ -990,7 +991,7 @@ def report_download(num_entries):
                     'playlist_title': ie_result.get('title'),
                     'playlist_uploader': ie_result.get('uploader'),
                     'playlist_uploader_id': ie_result.get('uploader_id'),
                     'playlist_title': ie_result.get('title'),
                     'playlist_uploader': ie_result.get('uploader'),
                     'playlist_uploader_id': ie_result.get('uploader_id'),
-                    'playlist_index': i + playliststart,
+                    'playlist_index': playlistitems[i - 1] if playlistitems else i + playliststart,
                     'extractor': ie_result['extractor'],
                     'webpage_url': ie_result['webpage_url'],
                     'webpage_url_basename': url_basename(ie_result['webpage_url']),
                     'extractor': ie_result['extractor'],
                     'webpage_url': ie_result['webpage_url'],
                     'webpage_url_basename': url_basename(ie_result['webpage_url']),
@@ -2343,6 +2344,7 @@ def _setup_opener(self):
         debuglevel = 1 if self.params.get('debug_printtraffic') else 0
         https_handler = make_HTTPS_handler(self.params, debuglevel=debuglevel)
         ydlh = YoutubeDLHandler(self.params, debuglevel=debuglevel)
         debuglevel = 1 if self.params.get('debug_printtraffic') else 0
         https_handler = make_HTTPS_handler(self.params, debuglevel=debuglevel)
         ydlh = YoutubeDLHandler(self.params, debuglevel=debuglevel)
+        redirect_handler = YoutubeDLRedirectHandler()
         data_handler = compat_urllib_request_DataHandler()
 
         # When passing our own FileHandler instance, build_opener won't add the
         data_handler = compat_urllib_request_DataHandler()
 
         # When passing our own FileHandler instance, build_opener won't add the
@@ -2356,7 +2358,7 @@ def file_open(*args, **kwargs):
         file_handler.file_open = file_open
 
         opener = compat_urllib_request.build_opener(
         file_handler.file_open = file_open
 
         opener = compat_urllib_request.build_opener(
-            proxy_handler, https_handler, cookie_processor, ydlh, data_handler, file_handler)
+            proxy_handler, https_handler, cookie_processor, ydlh, redirect_handler, data_handler, file_handler)
 
         # Delete the default user-agent header, which would otherwise apply in
         # cases where our custom HTTP handler doesn't come into play
 
         # Delete the default user-agent header, which would otherwise apply in
         # cases where our custom HTTP handler doesn't come into play
index c75ab131b9955cec1367ec42aa41d8dadde423da..0ee9bc76020377dc811c0d06736eba8082a53dbd 100644 (file)
 except ImportError:  # Python 2
     import cookielib as compat_cookiejar
 
 except ImportError:  # Python 2
     import cookielib as compat_cookiejar
 
+if sys.version_info[0] == 2:
+    class compat_cookiejar_Cookie(compat_cookiejar.Cookie):
+        def __init__(self, version, name, value, *args, **kwargs):
+            if isinstance(name, compat_str):
+                name = name.encode()
+            if isinstance(value, compat_str):
+                value = value.encode()
+            compat_cookiejar.Cookie.__init__(self, version, name, value, *args, **kwargs)
+else:
+    compat_cookiejar_Cookie = compat_cookiejar.Cookie
+
 try:
     import http.cookies as compat_cookies
 except ImportError:  # Python 2
 try:
     import http.cookies as compat_cookies
 except ImportError:  # Python 2
@@ -2754,6 +2765,17 @@ def compat_expanduser(path):
         compat_expanduser = os.path.expanduser
 
 
         compat_expanduser = os.path.expanduser
 
 
+if compat_os_name == 'nt' and sys.version_info < (3, 8):
+    # os.path.realpath on Windows does not follow symbolic links
+    # prior to Python 3.8 (see https://bugs.python.org/issue9949)
+    def compat_realpath(path):
+        while os.path.islink(path):
+            path = os.path.abspath(os.readlink(path))
+        return path
+else:
+    compat_realpath = os.path.realpath
+
+
 if sys.version_info < (3, 0):
     def compat_print(s):
         from .utils import preferredencoding
 if sys.version_info < (3, 0):
     def compat_print(s):
         from .utils import preferredencoding
@@ -2976,6 +2998,7 @@ def compat_ctypes_WINFUNCTYPE(*args, **kwargs):
     'compat_basestring',
     'compat_chr',
     'compat_cookiejar',
     'compat_basestring',
     'compat_chr',
     'compat_cookiejar',
+    'compat_cookiejar_Cookie',
     'compat_cookies',
     'compat_ctypes_WINFUNCTYPE',
     'compat_etree_Element',
     'compat_cookies',
     'compat_ctypes_WINFUNCTYPE',
     'compat_etree_Element',
@@ -2998,6 +3021,7 @@ def compat_ctypes_WINFUNCTYPE(*args, **kwargs):
     'compat_os_name',
     'compat_parse_qs',
     'compat_print',
     'compat_os_name',
     'compat_parse_qs',
     'compat_print',
+    'compat_realpath',
     'compat_setenv',
     'compat_shlex_quote',
     'compat_shlex_split',
     'compat_setenv',
     'compat_shlex_quote',
     'compat_shlex_split',
index 3c72ea18b2304befd5221960503ff5b6141304c3..5046878dfcd874013e737e85d32764a95737406e 100644 (file)
@@ -227,7 +227,7 @@ def retry(e):
             while True:
                 try:
                     # Download and write
             while True:
                 try:
                     # Download and write
-                    data_block = ctx.data.read(block_size if not is_test else min(block_size, data_len - byte_counter))
+                    data_block = ctx.data.read(block_size if data_len is None else min(block_size, data_len - byte_counter))
                 # socket.timeout is a subclass of socket.error but may not have
                 # errno set
                 except socket.timeout as e:
                 # socket.timeout is a subclass of socket.error but may not have
                 # errno set
                 except socket.timeout as e:
@@ -299,7 +299,7 @@ def retry(e):
                     'elapsed': now - ctx.start_time,
                 })
 
                     'elapsed': now - ctx.start_time,
                 })
 
-                if is_test and byte_counter == data_len:
+                if data_len is not None and byte_counter == data_len:
                     break
 
             if not is_test and ctx.chunk_size and ctx.data_len is not None and byte_counter < ctx.data_len:
                     break
 
             if not is_test and ctx.chunk_size and ctx.data_len is not None and byte_counter < ctx.data_len:
index 4ac323bf6de6d17016c2425c133aad460072cadd..6637f4f3537591b46870cb7ec3f35d1167cb3cb7 100644 (file)
@@ -110,17 +110,17 @@ class ABCIViewIE(InfoExtractor):
 
     # ABC iview programs are normally available for 14 days only.
     _TESTS = [{
 
     # ABC iview programs are normally available for 14 days only.
     _TESTS = [{
-        'url': 'https://iview.abc.net.au/show/ben-and-hollys-little-kingdom/series/0/video/ZX9371A050S00',
-        'md5': 'cde42d728b3b7c2b32b1b94b4a548afc',
+        'url': 'https://iview.abc.net.au/show/gruen/series/11/video/LE1927H001S00',
+        'md5': '67715ce3c78426b11ba167d875ac6abf',
         'info_dict': {
         'info_dict': {
-            'id': 'ZX9371A050S00',
+            'id': 'LE1927H001S00',
             'ext': 'mp4',
             'ext': 'mp4',
-            'title': "Gaston's Birthday",
-            'series': "Ben And Holly's Little Kingdom",
-            'description': 'md5:f9de914d02f226968f598ac76f105bcf',
-            'upload_date': '20180604',
-            'uploader_id': 'abc4kids',
-            'timestamp': 1528140219,
+            'title': "Series 11 Ep 1",
+            'series': "Gruen",
+            'description': 'md5:52cc744ad35045baf6aded2ce7287f67',
+            'upload_date': '20190925',
+            'uploader_id': 'abc1',
+            'timestamp': 1569445289,
         },
         'params': {
             'skip_download': True,
         },
         'params': {
             'skip_download': True,
@@ -148,7 +148,7 @@ def tokenize_url(url, token):
                 'hdnea': token,
             })
 
                 'hdnea': token,
             })
 
-        for sd in ('sd', 'sd-low'):
+        for sd in ('720', 'sd', 'sd-low'):
             sd_url = try_get(
                 stream, lambda x: x['streams']['hls'][sd], compat_str)
             if not sd_url:
             sd_url = try_get(
                 stream, lambda x: x['streams']['hls'][sd], compat_str)
             if not sd_url:
index 8b32aa886e9696e9334f73a777a70264f28c9433..9c9d77ae107e0b822b46368d89445f21e9e830a6 100644 (file)
@@ -5,6 +5,7 @@
 from ..utils import (
     clean_html,
     int_or_none,
 from ..utils import (
     clean_html,
     int_or_none,
+    js_to_json,
     try_get,
     unified_strdate,
 )
     try_get,
     unified_strdate,
 )
 class AmericasTestKitchenIE(InfoExtractor):
     _VALID_URL = r'https?://(?:www\.)?americastestkitchen\.com/(?:episode|videos)/(?P<id>\d+)'
     _TESTS = [{
 class AmericasTestKitchenIE(InfoExtractor):
     _VALID_URL = r'https?://(?:www\.)?americastestkitchen\.com/(?:episode|videos)/(?P<id>\d+)'
     _TESTS = [{
-        'url': 'https://www.americastestkitchen.com/episode/548-summer-dinner-party',
+        'url': 'https://www.americastestkitchen.com/episode/582-weeknight-japanese-suppers',
         'md5': 'b861c3e365ac38ad319cfd509c30577f',
         'info_dict': {
         'md5': 'b861c3e365ac38ad319cfd509c30577f',
         'info_dict': {
-            'id': '1_5g5zua6e',
-            'title': 'Summer Dinner Party',
+            'id': '5b400b9ee338f922cb06450c',
+            'title': 'Weeknight Japanese Suppers',
             'ext': 'mp4',
             'ext': 'mp4',
-            'description': 'md5:858d986e73a4826979b6a5d9f8f6a1ec',
-            'thumbnail': r're:^https?://.*\.jpg',
-            'timestamp': 1497285541,
-            'upload_date': '20170612',
-            'uploader_id': 'roger.metcalf@americastestkitchen.com',
-            'release_date': '20170617',
+            'description': 'md5:3d0c1a44bb3b27607ce82652db25b4a8',
+            'thumbnail': r're:^https?://',
+            'timestamp': 1523664000,
+            'upload_date': '20180414',
+            'release_date': '20180414',
             'series': "America's Test Kitchen",
             'series': "America's Test Kitchen",
-            'season_number': 17,
-            'episode': 'Summer Dinner Party',
-            'episode_number': 24,
+            'season_number': 18,
+            'episode': 'Weeknight Japanese Suppers',
+            'episode_number': 15,
         },
         'params': {
             'skip_download': True,
         },
         'params': {
             'skip_download': True,
@@ -47,7 +47,7 @@ def _real_extract(self, url):
             self._search_regex(
                 r'window\.__INITIAL_STATE__\s*=\s*({.+?})\s*;\s*</script>',
                 webpage, 'initial context'),
             self._search_regex(
                 r'window\.__INITIAL_STATE__\s*=\s*({.+?})\s*;\s*</script>',
                 webpage, 'initial context'),
-            video_id)
+            video_id, js_to_json)
 
         ep_data = try_get(
             video_data,
 
         ep_data = try_get(
             video_data,
@@ -55,17 +55,7 @@ def _real_extract(self, url):
              lambda x: x['videoDetail']['content']['data']), dict)
         ep_meta = ep_data.get('full_video', {})
 
              lambda x: x['videoDetail']['content']['data']), dict)
         ep_meta = ep_data.get('full_video', {})
 
-        zype_id = ep_meta.get('zype_id')
-        if zype_id:
-            embed_url = 'https://player.zype.com/embed/%s.js?api_key=jZ9GUhRmxcPvX7M3SlfejB6Hle9jyHTdk2jVxG7wOHPLODgncEKVdPYBhuz9iWXQ' % zype_id
-            ie_key = 'Zype'
-        else:
-            partner_id = self._search_regex(
-                r'src=["\'](?:https?:)?//(?:[^/]+\.)kaltura\.com/(?:[^/]+/)*(?:p|partner_id)/(\d+)',
-                webpage, 'kaltura partner id')
-            external_id = ep_data.get('external_id') or ep_meta['external_id']
-            embed_url = 'kaltura:%s:%s' % (partner_id, external_id)
-            ie_key = 'Kaltura'
+        zype_id = ep_data.get('zype_id') or ep_meta['zype_id']
 
         title = ep_data.get('title') or ep_meta.get('title')
         description = clean_html(ep_meta.get('episode_description') or ep_data.get(
 
         title = ep_data.get('title') or ep_meta.get('title')
         description = clean_html(ep_meta.get('episode_description') or ep_data.get(
@@ -79,8 +69,8 @@ def _real_extract(self, url):
 
         return {
             '_type': 'url_transparent',
 
         return {
             '_type': 'url_transparent',
-            'url': embed_url,
-            'ie_key': ie_key,
+            'url': 'https://player.zype.com/embed/%s.js?api_key=jZ9GUhRmxcPvX7M3SlfejB6Hle9jyHTdk2jVxG7wOHPLODgncEKVdPYBhuz9iWXQ' % zype_id,
+            'ie_key': 'Zype',
             'title': title,
             'description': description,
             'thumbnail': thumbnail,
             'title': title,
             'description': description,
             'thumbnail': thumbnail,
index 8adae46449232fe487ec6796b5b60345f144d310..5b7b2dd6d2ee2427d0df3920f982b3eece6d67fc 100644 (file)
@@ -1,6 +1,7 @@
 # coding: utf-8
 from __future__ import unicode_literals
 
 # coding: utf-8
 from __future__ import unicode_literals
 
+import json
 import re
 
 from .common import InfoExtractor
 import re
 
 from .common import InfoExtractor
 from ..compat import compat_etree_fromstring
 
 
 from ..compat import compat_etree_fromstring
 
 
-class ARDMediathekIE(InfoExtractor):
-    IE_NAME = 'ARD:mediathek'
-    _VALID_URL = r'^https?://(?:(?:(?:www|classic)\.)?ardmediathek\.de|mediathek\.(?:daserste|rbb-online)\.de|one\.ard\.de)/(?:.*/)(?P<video_id>[0-9]+|[^0-9][^/\?]+)[^/\?]*(?:\?.*)?'
-
-    _TESTS = [{
-        # available till 26.07.2022
-        'url': 'http://www.ardmediathek.de/tv/S%C3%9CDLICHT/Was-ist-die-Kunst-der-Zukunft-liebe-Ann/BR-Fernsehen/Video?bcastId=34633636&documentId=44726822',
-        'info_dict': {
-            'id': '44726822',
-            'ext': 'mp4',
-            'title': 'Was ist die Kunst der Zukunft, liebe Anna McCarthy?',
-            'description': 'md5:4ada28b3e3b5df01647310e41f3a62f5',
-            'duration': 1740,
-        },
-        'params': {
-            # m3u8 download
-            'skip_download': True,
-        }
-    }, {
-        'url': 'https://one.ard.de/tv/Mord-mit-Aussicht/Mord-mit-Aussicht-6-39-T%C3%B6dliche-Nach/ONE/Video?bcastId=46384294&documentId=55586872',
-        'only_matching': True,
-    }, {
-        # audio
-        'url': 'http://www.ardmediathek.de/tv/WDR-H%C3%B6rspiel-Speicher/Tod-eines-Fu%C3%9Fballers/WDR-3/Audio-Podcast?documentId=28488308&bcastId=23074086',
-        'only_matching': True,
-    }, {
-        'url': 'http://mediathek.daserste.de/sendungen_a-z/328454_anne-will/22429276_vertrauen-ist-gut-spionieren-ist-besser-geht',
-        'only_matching': True,
-    }, {
-        # audio
-        'url': 'http://mediathek.rbb-online.de/radio/Hörspiel/Vor-dem-Fest/kulturradio/Audio?documentId=30796318&topRessort=radio&bcastId=9839158',
-        'only_matching': True,
-    }, {
-        'url': 'https://classic.ardmediathek.de/tv/Panda-Gorilla-Co/Panda-Gorilla-Co-Folge-274/Das-Erste/Video?bcastId=16355486&documentId=58234698',
-        'only_matching': True,
-    }]
-
-    @classmethod
-    def suitable(cls, url):
-        return False if ARDBetaMediathekIE.suitable(url) else super(ARDMediathekIE, cls).suitable(url)
+class ARDMediathekBaseIE(InfoExtractor):
+    _GEO_COUNTRIES = ['DE']
 
     def _extract_media_info(self, media_info_url, webpage, video_id):
         media_info = self._download_json(
             media_info_url, video_id, 'Downloading media JSON')
 
     def _extract_media_info(self, media_info_url, webpage, video_id):
         media_info = self._download_json(
             media_info_url, video_id, 'Downloading media JSON')
+        return self._parse_media_info(media_info, video_id, '"fsk"' in webpage)
 
 
+    def _parse_media_info(self, media_info, video_id, fsk):
         formats = self._extract_formats(media_info, video_id)
 
         if not formats:
         formats = self._extract_formats(media_info, video_id)
 
         if not formats:
-            if '"fsk"' in webpage:
+            if fsk:
                 raise ExtractorError(
                     'This video is only available after 20:00', expected=True)
             elif media_info.get('_geoblocked'):
                 raise ExtractorError(
                     'This video is only available after 20:00', expected=True)
             elif media_info.get('_geoblocked'):
-                raise ExtractorError('This video is not available due to geo restriction', expected=True)
+                self.raise_geo_restricted(
+                    'This video is not available due to geoblocking',
+                    countries=self._GEO_COUNTRIES)
 
         self._sort_formats(formats)
 
 
         self._sort_formats(formats)
 
-        duration = int_or_none(media_info.get('_duration'))
-        thumbnail = media_info.get('_previewImage')
-        is_live = media_info.get('_isLive') is True
-
         subtitles = {}
         subtitle_url = media_info.get('_subtitleUrl')
         if subtitle_url:
         subtitles = {}
         subtitle_url = media_info.get('_subtitleUrl')
         if subtitle_url:
@@ -92,9 +55,9 @@ def _extract_media_info(self, media_info_url, webpage, video_id):
 
         return {
             'id': video_id,
 
         return {
             'id': video_id,
-            'duration': duration,
-            'thumbnail': thumbnail,
-            'is_live': is_live,
+            'duration': int_or_none(media_info.get('_duration')),
+            'thumbnail': media_info.get('_previewImage'),
+            'is_live': media_info.get('_isLive') is True,
             'formats': formats,
             'subtitles': subtitles,
         }
             'formats': formats,
             'subtitles': subtitles,
         }
@@ -123,11 +86,11 @@ def _extract_formats(self, media_info, video_id):
                             update_url_query(stream_url, {
                                 'hdcore': '3.1.1',
                                 'plugin': 'aasp-3.1.1.69.124'
                             update_url_query(stream_url, {
                                 'hdcore': '3.1.1',
                                 'plugin': 'aasp-3.1.1.69.124'
-                            }),
-                            video_id, f4m_id='hds', fatal=False))
+                            }), video_id, f4m_id='hds', fatal=False))
                     elif ext == 'm3u8':
                         formats.extend(self._extract_m3u8_formats(
                     elif ext == 'm3u8':
                         formats.extend(self._extract_m3u8_formats(
-                            stream_url, video_id, 'mp4', m3u8_id='hls', fatal=False))
+                            stream_url, video_id, 'mp4', 'm3u8_native',
+                            m3u8_id='hls', fatal=False))
                     else:
                         if server and server.startswith('rtmp'):
                             f = {
                     else:
                         if server and server.startswith('rtmp'):
                             f = {
@@ -140,7 +103,9 @@ def _extract_formats(self, media_info, video_id):
                                 'url': stream_url,
                                 'format_id': 'a%s-%s-%s' % (num, ext, quality)
                             }
                                 'url': stream_url,
                                 'format_id': 'a%s-%s-%s' % (num, ext, quality)
                             }
-                        m = re.search(r'_(?P<width>\d+)x(?P<height>\d+)\.mp4$', stream_url)
+                        m = re.search(
+                            r'_(?P<width>\d+)x(?P<height>\d+)\.mp4$',
+                            stream_url)
                         if m:
                             f.update({
                                 'width': int(m.group('width')),
                         if m:
                             f.update({
                                 'width': int(m.group('width')),
@@ -151,6 +116,48 @@ def _extract_formats(self, media_info, video_id):
                         formats.append(f)
         return formats
 
                         formats.append(f)
         return formats
 
+
+class ARDMediathekIE(ARDMediathekBaseIE):
+    IE_NAME = 'ARD:mediathek'
+    _VALID_URL = r'^https?://(?:(?:(?:www|classic)\.)?ardmediathek\.de|mediathek\.(?:daserste|rbb-online)\.de|one\.ard\.de)/(?:.*/)(?P<video_id>[0-9]+|[^0-9][^/\?]+)[^/\?]*(?:\?.*)?'
+
+    _TESTS = [{
+        # available till 26.07.2022
+        'url': 'http://www.ardmediathek.de/tv/S%C3%9CDLICHT/Was-ist-die-Kunst-der-Zukunft-liebe-Ann/BR-Fernsehen/Video?bcastId=34633636&documentId=44726822',
+        'info_dict': {
+            'id': '44726822',
+            'ext': 'mp4',
+            'title': 'Was ist die Kunst der Zukunft, liebe Anna McCarthy?',
+            'description': 'md5:4ada28b3e3b5df01647310e41f3a62f5',
+            'duration': 1740,
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        }
+    }, {
+        'url': 'https://one.ard.de/tv/Mord-mit-Aussicht/Mord-mit-Aussicht-6-39-T%C3%B6dliche-Nach/ONE/Video?bcastId=46384294&documentId=55586872',
+        'only_matching': True,
+    }, {
+        # audio
+        'url': 'http://www.ardmediathek.de/tv/WDR-H%C3%B6rspiel-Speicher/Tod-eines-Fu%C3%9Fballers/WDR-3/Audio-Podcast?documentId=28488308&bcastId=23074086',
+        'only_matching': True,
+    }, {
+        'url': 'http://mediathek.daserste.de/sendungen_a-z/328454_anne-will/22429276_vertrauen-ist-gut-spionieren-ist-besser-geht',
+        'only_matching': True,
+    }, {
+        # audio
+        'url': 'http://mediathek.rbb-online.de/radio/Hörspiel/Vor-dem-Fest/kulturradio/Audio?documentId=30796318&topRessort=radio&bcastId=9839158',
+        'only_matching': True,
+    }, {
+        'url': 'https://classic.ardmediathek.de/tv/Panda-Gorilla-Co/Panda-Gorilla-Co-Folge-274/Das-Erste/Video?bcastId=16355486&documentId=58234698',
+        'only_matching': True,
+    }]
+
+    @classmethod
+    def suitable(cls, url):
+        return False if ARDBetaMediathekIE.suitable(url) else super(ARDMediathekIE, cls).suitable(url)
+
     def _real_extract(self, url):
         # determine video id from url
         m = re.match(self._VALID_URL, url)
     def _real_extract(self, url):
         # determine video id from url
         m = re.match(self._VALID_URL, url)
@@ -242,7 +249,7 @@ def _real_extract(self, url):
 
 
 class ARDIE(InfoExtractor):
 
 
 class ARDIE(InfoExtractor):
-    _VALID_URL = r'(?P<mainurl>https?://(www\.)?daserste\.de/[^?#]+/videos/(?P<display_id>[^/?#]+)-(?P<id>[0-9]+))\.html'
+    _VALID_URL = r'(?P<mainurl>https?://(www\.)?daserste\.de/[^?#]+/videos(?:extern)?/(?P<display_id>[^/?#]+)-(?P<id>[0-9]+))\.html'
     _TESTS = [{
         # available till 14.02.2019
         'url': 'http://www.daserste.de/information/talk/maischberger/videos/das-groko-drama-zerlegen-sich-die-volksparteien-video-102.html',
     _TESTS = [{
         # available till 14.02.2019
         'url': 'http://www.daserste.de/information/talk/maischberger/videos/das-groko-drama-zerlegen-sich-die-volksparteien-video-102.html',
@@ -256,6 +263,9 @@ class ARDIE(InfoExtractor):
             'upload_date': '20180214',
             'thumbnail': r're:^https?://.*\.jpg$',
         },
             'upload_date': '20180214',
             'thumbnail': r're:^https?://.*\.jpg$',
         },
+    }, {
+        'url': 'https://www.daserste.de/information/reportage-dokumentation/erlebnis-erde/videosextern/woelfe-und-herdenschutzhunde-ungleiche-brueder-102.html',
+        'only_matching': True,
     }, {
         'url': 'http://www.daserste.de/information/reportage-dokumentation/dokus/videos/die-story-im-ersten-mission-unter-falscher-flagge-100.html',
         'only_matching': True,
     }, {
         'url': 'http://www.daserste.de/information/reportage-dokumentation/dokus/videos/die-story-im-ersten-mission-unter-falscher-flagge-100.html',
         'only_matching': True,
@@ -302,21 +312,31 @@ def _real_extract(self, url):
         }
 
 
         }
 
 
-class ARDBetaMediathekIE(InfoExtractor):
-    _VALID_URL = r'https://(?:beta|www)\.ardmediathek\.de/[^/]+/(?:player|live)/(?P<video_id>[a-zA-Z0-9]+)(?:/(?P<display_id>[^/?#]+))?'
+class ARDBetaMediathekIE(ARDMediathekBaseIE):
+    _VALID_URL = r'https://(?:(?:beta|www)\.)?ardmediathek\.de/(?P<client>[^/]+)/(?:player|live|video)/(?P<display_id>(?:[^/]+/)*)(?P<video_id>[a-zA-Z0-9]+)'
     _TESTS = [{
     _TESTS = [{
-        'url': 'https://beta.ardmediathek.de/ard/player/Y3JpZDovL2Rhc2Vyc3RlLmRlL3RhdG9ydC9mYmM4NGM1NC0xNzU4LTRmZGYtYWFhZS0wYzcyZTIxNGEyMDE/die-robuste-roswita',
-        'md5': '2d02d996156ea3c397cfc5036b5d7f8f',
+        'url': 'https://ardmediathek.de/ard/video/die-robuste-roswita/Y3JpZDovL2Rhc2Vyc3RlLmRlL3RhdG9ydC9mYmM4NGM1NC0xNzU4LTRmZGYtYWFhZS0wYzcyZTIxNGEyMDE',
+        'md5': 'dfdc87d2e7e09d073d5a80770a9ce88f',
         'info_dict': {
             'display_id': 'die-robuste-roswita',
         'info_dict': {
             'display_id': 'die-robuste-roswita',
-            'id': 'Y3JpZDovL2Rhc2Vyc3RlLmRlL3RhdG9ydC9mYmM4NGM1NC0xNzU4LTRmZGYtYWFhZS0wYzcyZTIxNGEyMDE',
-            'title': 'Tatort: Die robuste Roswita',
+            'id': '70153354',
+            'title': 'Die robuste Roswita',
             'description': r're:^Der Mord.*trüber ist als die Ilm.',
             'duration': 5316,
             'description': r're:^Der Mord.*trüber ist als die Ilm.',
             'duration': 5316,
-            'thumbnail': 'https://img.ardmediathek.de/standard/00/55/43/59/34/-1774185891/16x9/960?mandant=ard',
-            'upload_date': '20180826',
+            'thumbnail': 'https://img.ardmediathek.de/standard/00/70/15/33/90/-1852531467/16x9/960?mandant=ard',
+            'timestamp': 1577047500,
+            'upload_date': '20191222',
             'ext': 'mp4',
         },
             'ext': 'mp4',
         },
+    }, {
+        'url': 'https://beta.ardmediathek.de/ard/video/Y3JpZDovL2Rhc2Vyc3RlLmRlL3RhdG9ydC9mYmM4NGM1NC0xNzU4LTRmZGYtYWFhZS0wYzcyZTIxNGEyMDE',
+        'only_matching': True,
+    }, {
+        'url': 'https://ardmediathek.de/ard/video/saartalk/saartalk-gesellschaftsgift-haltung-gegen-hass/sr-fernsehen/Y3JpZDovL3NyLW9ubGluZS5kZS9TVF84MTY4MA/',
+        'only_matching': True,
+    }, {
+        'url': 'https://www.ardmediathek.de/ard/video/trailer/private-eyes-s01-e01/one/Y3JpZDovL3dkci5kZS9CZWl0cmFnLTE1MTgwYzczLWNiMTEtNGNkMS1iMjUyLTg5MGYzOWQxZmQ1YQ/',
+        'only_matching': True,
     }, {
         'url': 'https://www.ardmediathek.de/ard/player/Y3JpZDovL3N3ci5kZS9hZXgvbzEwNzE5MTU/',
         'only_matching': True,
     }, {
         'url': 'https://www.ardmediathek.de/ard/player/Y3JpZDovL3N3ci5kZS9hZXgvbzEwNzE5MTU/',
         'only_matching': True,
@@ -328,73 +348,75 @@ class ARDBetaMediathekIE(InfoExtractor):
     def _real_extract(self, url):
         mobj = re.match(self._VALID_URL, url)
         video_id = mobj.group('video_id')
     def _real_extract(self, url):
         mobj = re.match(self._VALID_URL, url)
         video_id = mobj.group('video_id')
-        display_id = mobj.group('display_id') or video_id
-
-        webpage = self._download_webpage(url, display_id)
-        data_json = self._search_regex(r'window\.__APOLLO_STATE__\s*=\s*(\{.*);\n', webpage, 'json')
-        data = self._parse_json(data_json, display_id)
-
-        res = {
-            'id': video_id,
-            'display_id': display_id,
+        display_id = mobj.group('display_id')
+        if display_id:
+            display_id = display_id.rstrip('/')
+        if not display_id:
+            display_id = video_id
+
+        player_page = self._download_json(
+            'https://api.ardmediathek.de/public-gateway',
+            display_id, data=json.dumps({
+                'query': '''{
+  playerPage(client:"%s", clipId: "%s") {
+    blockedByFsk
+    broadcastedOn
+    maturityContentRating
+    mediaCollection {
+      _duration
+      _geoblocked
+      _isLive
+      _mediaArray {
+        _mediaStreamArray {
+          _quality
+          _server
+          _stream
         }
         }
-        formats = []
-        subtitles = {}
-        geoblocked = False
-        for widget in data.values():
-            if widget.get('_geoblocked') is True:
-                geoblocked = True
-            if '_duration' in widget:
-                res['duration'] = int_or_none(widget['_duration'])
-            if 'clipTitle' in widget:
-                res['title'] = widget['clipTitle']
-            if '_previewImage' in widget:
-                res['thumbnail'] = widget['_previewImage']
-            if 'broadcastedOn' in widget:
-                res['timestamp'] = unified_timestamp(widget['broadcastedOn'])
-            if 'synopsis' in widget:
-                res['description'] = widget['synopsis']
-            subtitle_url = url_or_none(widget.get('_subtitleUrl'))
-            if subtitle_url:
-                subtitles.setdefault('de', []).append({
-                    'ext': 'ttml',
-                    'url': subtitle_url,
-                })
-            if '_quality' in widget:
-                format_url = url_or_none(try_get(
-                    widget, lambda x: x['_stream']['json'][0]))
-                if not format_url:
-                    continue
-                ext = determine_ext(format_url)
-                if ext == 'f4m':
-                    formats.extend(self._extract_f4m_formats(
-                        format_url + '?hdcore=3.11.0',
-                        video_id, f4m_id='hds', fatal=False))
-                elif ext == 'm3u8':
-                    formats.extend(self._extract_m3u8_formats(
-                        format_url, video_id, 'mp4', m3u8_id='hls',
-                        fatal=False))
-                else:
-                    # HTTP formats are not available when geoblocked is True,
-                    # other formats are fine though
-                    if geoblocked:
-                        continue
-                    quality = str_or_none(widget.get('_quality'))
-                    formats.append({
-                        'format_id': ('http-' + quality) if quality else 'http',
-                        'url': format_url,
-                        'preference': 10,  # Plain HTTP, that's nice
-                    })
-
-        if not formats and geoblocked:
-            self.raise_geo_restricted(
-                msg='This video is not available due to geoblocking',
-                countries=['DE'])
-
-        self._sort_formats(formats)
-        res.update({
-            'subtitles': subtitles,
-            'formats': formats,
+      }
+      _previewImage
+      _subtitleUrl
+      _type
+    }
+    show {
+      title
+    }
+    synopsis
+    title
+    tracking {
+      atiCustomVars {
+        contentId
+      }
+    }
+  }
+}''' % (mobj.group('client'), video_id),
+            }).encode(), headers={
+                'Content-Type': 'application/json'
+            })['data']['playerPage']
+        title = player_page['title']
+        content_id = str_or_none(try_get(
+            player_page, lambda x: x['tracking']['atiCustomVars']['contentId']))
+        media_collection = player_page.get('mediaCollection') or {}
+        if not media_collection and content_id:
+            media_collection = self._download_json(
+                'https://www.ardmediathek.de/play/media/' + content_id,
+                content_id, fatal=False) or {}
+        info = self._parse_media_info(
+            media_collection, content_id or video_id,
+            player_page.get('blockedByFsk'))
+        age_limit = None
+        description = player_page.get('synopsis')
+        maturity_content_rating = player_page.get('maturityContentRating')
+        if maturity_content_rating:
+            age_limit = int_or_none(maturity_content_rating.lstrip('FSK'))
+        if not age_limit and description:
+            age_limit = int_or_none(self._search_regex(
+                r'\(FSK\s*(\d+)\)\s*$', description, 'age limit', default=None))
+        info.update({
+            'age_limit': age_limit,
+            'display_id': display_id,
+            'title': title,
+            'description': description,
+            'timestamp': unified_timestamp(player_page.get('broadcastedOn')),
+            'series': try_get(player_page, lambda x: x['show']['title']),
         })
         })
-
-        return res
+        return info
index fcbdc71b98d98076852e0f88559f4a2ed428d7af..b1e20def5343e6b1a077ff3ba0b36f6a96c4f2c4 100644 (file)
@@ -47,39 +47,19 @@ class AZMedienIE(InfoExtractor):
         'url': 'https://www.telebaern.tv/telebaern-news/montag-1-oktober-2018-ganze-sendung-133531189#video=0_7xjo9lf1',
         'only_matching': True
     }]
         'url': 'https://www.telebaern.tv/telebaern-news/montag-1-oktober-2018-ganze-sendung-133531189#video=0_7xjo9lf1',
         'only_matching': True
     }]
-
+    _API_TEMPL = 'https://www.%s/api/pub/gql/%s/NewsArticleTeaser/cb9f2f81ed22e9b47f4ca64ea3cc5a5d13e88d1d'
     _PARTNER_ID = '1719221'
 
     def _real_extract(self, url):
     _PARTNER_ID = '1719221'
 
     def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        host = mobj.group('host')
-        video_id = mobj.group('id')
-        entry_id = mobj.group('kaltura_id')
+        host, display_id, article_id, entry_id = re.match(self._VALID_URL, url).groups()
 
         if not entry_id:
 
         if not entry_id:
-            api_url = 'https://www.%s/api/pub/gql/%s' % (host, host.split('.')[0])
-            payload = {
-                'query': '''query VideoContext($articleId: ID!) {
-                    article: node(id: $articleId) {
-                      ... on Article {
-                        mainAssetRelation {
-                          asset {
-                            ... on VideoAsset {
-                              kalturaId
-                            }
-                          }
-                        }
-                      }
-                    }
-                  }''',
-                'variables': {'articleId': 'Article:%s' % mobj.group('article_id')},
-            }
-            json_data = self._download_json(
-                api_url, video_id, headers={
-                    'Content-Type': 'application/json',
-                },
-                data=json.dumps(payload).encode())
-            entry_id = json_data['data']['article']['mainAssetRelation']['asset']['kalturaId']
+            entry_id = self._download_json(
+                self._API_TEMPL % (host, host.split('.')[0]), display_id, query={
+                    'variables': json.dumps({
+                        'contextId': 'NewsArticle:' + article_id,
+                    }),
+                })['data']['context']['mainAsset']['video']['kaltura']['kalturaId']
 
         return self.url_result(
             'kaltura:%s:%s' % (self._PARTNER_ID, entry_id),
 
         return self.url_result(
             'kaltura:%s:%s' % (self._PARTNER_ID, entry_id),
index 901c5a54fb6f9d3320fbd0827a222c2c9e9676f0..002c39c394bf5d8bd3b74600d92a594e1a8ca2b2 100644 (file)
@@ -528,7 +528,7 @@ def _extract_from_legacy_playlist(self, playlist, playlist_id):
 
             def get_programme_id(item):
                 def get_from_attributes(item):
 
             def get_programme_id(item):
                 def get_from_attributes(item):
-                    for p in('identifier', 'group'):
+                    for p in ('identifier', 'group'):
                         value = item.get(p)
                         if value and re.match(r'^[pb][\da-z]{7}$', value):
                             return value
                         value = item.get(p)
                         if value and re.match(r'^[pb][\da-z]{7}$', value):
                             return value
index 485173774d9f9c2534f9b18f1668a8d5fb204dc9..9f9de96c61332ac405b33bfc1f5758f2c8fd6456 100644 (file)
@@ -25,8 +25,8 @@ class BellMediaIE(InfoExtractor):
                 etalk|
                 marilyn
             )\.ca|
                 etalk|
                 marilyn
             )\.ca|
-            much\.com
-        )/.*?(?:\bvid(?:eoid)?=|-vid|~|%7E|/(?:episode)?)(?P<id>[0-9]{6,})'''
+            (?:much|cp24)\.com
+        )/.*?(?:\b(?:vid(?:eoid)?|clipId)=|-vid|~|%7E|/(?:episode)?)(?P<id>[0-9]{6,})'''
     _TESTS = [{
         'url': 'https://www.bnnbloomberg.ca/video/david-cockfield-s-top-picks~1403070',
         'md5': '36d3ef559cfe8af8efe15922cd3ce950',
     _TESTS = [{
         'url': 'https://www.bnnbloomberg.ca/video/david-cockfield-s-top-picks~1403070',
         'md5': '36d3ef559cfe8af8efe15922cd3ce950',
@@ -62,6 +62,9 @@ class BellMediaIE(InfoExtractor):
     }, {
         'url': 'http://www.etalk.ca/video?videoid=663455',
         'only_matching': True,
     }, {
         'url': 'http://www.etalk.ca/video?videoid=663455',
         'only_matching': True,
+    }, {
+        'url': 'https://www.cp24.com/video?clipId=1982548',
+        'only_matching': True,
     }]
     _DOMAINS = {
         'thecomedynetwork': 'comedy',
     }]
     _DOMAINS = {
         'thecomedynetwork': 'comedy',
index 80bd696e21f3a4af3c996e9899ce439116e13d19..4dc597e160bcea4f0821719dc82e2908f20d1972 100644 (file)
 
 
 class BiliBiliIE(InfoExtractor):
 
 
 class BiliBiliIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.|bangumi\.|)bilibili\.(?:tv|com)/(?:video/av|anime/(?P<anime_id>\d+)/play#)(?P<id>\d+)'
+    _VALID_URL = r'''(?x)
+                    https?://
+                        (?:(?:www|bangumi)\.)?
+                        bilibili\.(?:tv|com)/
+                        (?:
+                            (?:
+                                video/[aA][vV]|
+                                anime/(?P<anime_id>\d+)/play\#
+                            )(?P<id_bv>\d+)|
+                            video/[bB][vV](?P<id>[^/?#&]+)
+                        )
+                    '''
 
     _TESTS = [{
         'url': 'http://www.bilibili.tv/video/av1074402/',
 
     _TESTS = [{
         'url': 'http://www.bilibili.tv/video/av1074402/',
@@ -92,6 +103,10 @@ class BiliBiliIE(InfoExtractor):
                 'skip_download': True,  # Test metadata only
             },
         }]
                 'skip_download': True,  # Test metadata only
             },
         }]
+    }, {
+        # new BV video id format
+        'url': 'https://www.bilibili.com/video/BV1JE411F741',
+        'only_matching': True,
     }]
 
     _APP_KEY = 'iVGUTjsxvpLeuDCf'
     }]
 
     _APP_KEY = 'iVGUTjsxvpLeuDCf'
@@ -109,7 +124,7 @@ def _real_extract(self, url):
         url, smuggled_data = unsmuggle_url(url, {})
 
         mobj = re.match(self._VALID_URL, url)
         url, smuggled_data = unsmuggle_url(url, {})
 
         mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
+        video_id = mobj.group('id') or mobj.group('id_bv')
         anime_id = mobj.group('anime_id')
         webpage = self._download_webpage(url, video_id)
 
         anime_id = mobj.group('anime_id')
         webpage = self._download_webpage(url, video_id)
 
@@ -419,3 +434,17 @@ def _real_extract(self, url):
                     entries, am_id, album_title, album_data.get('intro'))
 
         return self.playlist_result(entries, am_id)
                     entries, am_id, album_title, album_data.get('intro'))
 
         return self.playlist_result(entries, am_id)
+
+
+class BiliBiliPlayerIE(InfoExtractor):
+    _VALID_URL = r'https?://player\.bilibili\.com/player\.html\?.*?\baid=(?P<id>\d+)'
+    _TEST = {
+        'url': 'http://player.bilibili.com/player.html?aid=92494333&cid=157926707&page=1',
+        'only_matching': True,
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        return self.url_result(
+            'http://www.bilibili.tv/video/av%s/' % video_id,
+            ie=BiliBiliIE.ie_key(), video_id=video_id)
index 8e2f7217ab85a81a58d1bb902af02b6e62ec2ab6..2aa9f4782e0dfdb2b78225c2d1fe83a8568effe3 100644 (file)
@@ -5,32 +5,34 @@
 import re
 import struct
 
 import re
 import struct
 
-from .common import InfoExtractor
 from .adobepass import AdobePassIE
 from .adobepass import AdobePassIE
+from .common import InfoExtractor
 from ..compat import (
     compat_etree_fromstring,
 from ..compat import (
     compat_etree_fromstring,
+    compat_HTTPError,
     compat_parse_qs,
     compat_urllib_parse_urlparse,
     compat_urlparse,
     compat_xml_parse_error,
     compat_parse_qs,
     compat_urllib_parse_urlparse,
     compat_urlparse,
     compat_xml_parse_error,
-    compat_HTTPError,
 )
 from ..utils import (
 )
 from ..utils import (
-    ExtractorError,
+    clean_html,
     extract_attributes,
     extract_attributes,
+    ExtractorError,
     find_xpath_attr,
     fix_xml_ampersands,
     float_or_none,
     find_xpath_attr,
     fix_xml_ampersands,
     float_or_none,
-    js_to_json,
     int_or_none,
     int_or_none,
+    js_to_json,
+    mimetype2ext,
     parse_iso8601,
     smuggle_url,
     parse_iso8601,
     smuggle_url,
+    str_or_none,
     unescapeHTML,
     unsmuggle_url,
     unescapeHTML,
     unsmuggle_url,
-    update_url_query,
-    clean_html,
-    mimetype2ext,
     UnsupportedError,
     UnsupportedError,
+    update_url_query,
+    url_or_none,
 )
 
 
 )
 
 
@@ -424,7 +426,7 @@ def _extract_urls(ie, webpage):
         # [2] looks like:
         for video, script_tag, account_id, player_id, embed in re.findall(
                 r'''(?isx)
         # [2] looks like:
         for video, script_tag, account_id, player_id, embed in re.findall(
                 r'''(?isx)
-                    (<video\s+[^>]*\bdata-video-id\s*=\s*['"]?[^>]+>)
+                    (<video(?:-js)?\s+[^>]*\bdata-video-id\s*=\s*['"]?[^>]+>)
                     (?:.*?
                         (<script[^>]+
                             src=["\'](?:https?:)?//players\.brightcove\.net/
                     (?:.*?
                         (<script[^>]+
                             src=["\'](?:https?:)?//players\.brightcove\.net/
@@ -553,10 +555,16 @@ def build_format_id(kind):
 
         subtitles = {}
         for text_track in json_data.get('text_tracks', []):
 
         subtitles = {}
         for text_track in json_data.get('text_tracks', []):
-            if text_track.get('src'):
-                subtitles.setdefault(text_track.get('srclang'), []).append({
-                    'url': text_track['src'],
-                })
+            if text_track.get('kind') != 'captions':
+                continue
+            text_track_url = url_or_none(text_track.get('src'))
+            if not text_track_url:
+                continue
+            lang = (str_or_none(text_track.get('srclang'))
+                    or str_or_none(text_track.get('label')) or 'en').lower()
+            subtitles.setdefault(lang, []).append({
+                'url': text_track_url,
+            })
 
         is_live = False
         duration = float_or_none(json_data.get('duration'), 1000)
 
         is_live = False
         duration = float_or_none(json_data.get('duration'), 1000)
@@ -586,45 +594,63 @@ def _real_extract(self, url):
 
         account_id, player_id, embed, content_type, video_id = re.match(self._VALID_URL, url).groups()
 
 
         account_id, player_id, embed, content_type, video_id = re.match(self._VALID_URL, url).groups()
 
-        webpage = self._download_webpage(
-            'http://players.brightcove.net/%s/%s_%s/index.min.js'
-            % (account_id, player_id, embed), video_id)
+        policy_key_id = '%s_%s' % (account_id, player_id)
+        policy_key = self._downloader.cache.load('brightcove', policy_key_id)
+        policy_key_extracted = False
+        store_pk = lambda x: self._downloader.cache.store('brightcove', policy_key_id, x)
 
 
-        policy_key = None
+        def extract_policy_key():
+            webpage = self._download_webpage(
+                'http://players.brightcove.net/%s/%s_%s/index.min.js'
+                % (account_id, player_id, embed), video_id)
 
 
-        catalog = self._search_regex(
-            r'catalog\(({.+?})\);', webpage, 'catalog', default=None)
-        if catalog:
-            catalog = self._parse_json(
-                js_to_json(catalog), video_id, fatal=False)
+            policy_key = None
+
+            catalog = self._search_regex(
+                r'catalog\(({.+?})\);', webpage, 'catalog', default=None)
             if catalog:
             if catalog:
-                policy_key = catalog.get('policyKey')
+                catalog = self._parse_json(
+                    js_to_json(catalog), video_id, fatal=False)
+                if catalog:
+                    policy_key = catalog.get('policyKey')
+
+            if not policy_key:
+                policy_key = self._search_regex(
+                    r'policyKey\s*:\s*(["\'])(?P<pk>.+?)\1',
+                    webpage, 'policy key', group='pk')
 
 
-        if not policy_key:
-            policy_key = self._search_regex(
-                r'policyKey\s*:\s*(["\'])(?P<pk>.+?)\1',
-                webpage, 'policy key', group='pk')
+            store_pk(policy_key)
+            return policy_key
 
         api_url = 'https://edge.api.brightcove.com/playback/v1/accounts/%s/%ss/%s' % (account_id, content_type, video_id)
 
         api_url = 'https://edge.api.brightcove.com/playback/v1/accounts/%s/%ss/%s' % (account_id, content_type, video_id)
-        headers = {
-            'Accept': 'application/json;pk=%s' % policy_key,
-        }
+        headers = {}
         referrer = smuggled_data.get('referrer')
         if referrer:
             headers.update({
                 'Referer': referrer,
                 'Origin': re.search(r'https?://[^/]+', referrer).group(0),
             })
         referrer = smuggled_data.get('referrer')
         if referrer:
             headers.update({
                 'Referer': referrer,
                 'Origin': re.search(r'https?://[^/]+', referrer).group(0),
             })
-        try:
-            json_data = self._download_json(api_url, video_id, headers=headers)
-        except ExtractorError as e:
-            if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
-                json_data = self._parse_json(e.cause.read().decode(), video_id)[0]
-                message = json_data.get('message') or json_data['error_code']
-                if json_data.get('error_subcode') == 'CLIENT_GEO':
-                    self.raise_geo_restricted(msg=message)
-                raise ExtractorError(message, expected=True)
-            raise
+
+        for _ in range(2):
+            if not policy_key:
+                policy_key = extract_policy_key()
+                policy_key_extracted = True
+            headers['Accept'] = 'application/json;pk=%s' % policy_key
+            try:
+                json_data = self._download_json(api_url, video_id, headers=headers)
+                break
+            except ExtractorError as e:
+                if isinstance(e.cause, compat_HTTPError) and e.cause.code in (401, 403):
+                    json_data = self._parse_json(e.cause.read().decode(), video_id)[0]
+                    message = json_data.get('message') or json_data['error_code']
+                    if json_data.get('error_subcode') == 'CLIENT_GEO':
+                        self.raise_geo_restricted(msg=message)
+                    elif json_data.get('error_code') == 'INVALID_POLICY_KEY' and not policy_key_extracted:
+                        policy_key = None
+                        store_pk(None)
+                        continue
+                    raise ExtractorError(message, expected=True)
+                raise
 
         errors = json_data.get('errors')
         if errors and errors[0].get('error_subcode') == 'TVE_AUTH':
 
         errors = json_data.get('errors')
         if errors and errors[0].get('error_subcode') == 'TVE_AUTH':
index dfcf9bc6b50b9274d2e45ff7e0b6d1af9920cab0..73a57b1e4db835ab09ac308704bd105d796628ac 100644 (file)
@@ -9,21 +9,26 @@ class BusinessInsiderIE(InfoExtractor):
     _VALID_URL = r'https?://(?:[^/]+\.)?businessinsider\.(?:com|nl)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
     _TESTS = [{
         'url': 'http://uk.businessinsider.com/how-much-radiation-youre-exposed-to-in-everyday-life-2016-6',
     _VALID_URL = r'https?://(?:[^/]+\.)?businessinsider\.(?:com|nl)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
     _TESTS = [{
         'url': 'http://uk.businessinsider.com/how-much-radiation-youre-exposed-to-in-everyday-life-2016-6',
-        'md5': 'ca237a53a8eb20b6dc5bd60564d4ab3e',
+        'md5': 'ffed3e1e12a6f950aa2f7d83851b497a',
         'info_dict': {
         'info_dict': {
-            'id': 'hZRllCfw',
+            'id': 'cjGDb0X9',
             'ext': 'mp4',
             'ext': 'mp4',
-            'title': "Here's how much radiation you're exposed to in everyday life",
-            'description': 'md5:9a0d6e2c279948aadaa5e84d6d9b99bd',
-            'upload_date': '20170709',
-            'timestamp': 1499606400,
-        },
-        'params': {
-            'skip_download': True,
+            'title': "Bananas give you more radiation exposure than living next to a nuclear power plant",
+            'description': 'md5:0175a3baf200dd8fa658f94cade841b3',
+            'upload_date': '20160611',
+            'timestamp': 1465675620,
         },
     }, {
         'url': 'https://www.businessinsider.nl/5-scientifically-proven-things-make-you-less-attractive-2017-7/',
         },
     }, {
         'url': 'https://www.businessinsider.nl/5-scientifically-proven-things-make-you-less-attractive-2017-7/',
-        'only_matching': True,
+        'md5': '43f438dbc6da0b89f5ac42f68529d84a',
+        'info_dict': {
+            'id': '5zJwd4FK',
+            'ext': 'mp4',
+            'title': 'Deze dingen zorgen ervoor dat je minder snel een date scoort',
+            'description': 'md5:2af8975825d38a4fed24717bbe51db49',
+            'upload_date': '20170705',
+            'timestamp': 1499270528,
+        },
     }, {
         'url': 'http://www.businessinsider.com/excel-index-match-vlookup-video-how-to-2015-2?IR=T',
         'only_matching': True,
     }, {
         'url': 'http://www.businessinsider.com/excel-index-match-vlookup-video-how-to-2015-2?IR=T',
         'only_matching': True,
@@ -35,7 +40,8 @@ def _real_extract(self, url):
         jwplatform_id = self._search_regex(
             (r'data-media-id=["\']([a-zA-Z0-9]{8})',
              r'id=["\']jwplayer_([a-zA-Z0-9]{8})',
         jwplatform_id = self._search_regex(
             (r'data-media-id=["\']([a-zA-Z0-9]{8})',
              r'id=["\']jwplayer_([a-zA-Z0-9]{8})',
-             r'id["\']?\s*:\s*["\']?([a-zA-Z0-9]{8})'),
+             r'id["\']?\s*:\s*["\']?([a-zA-Z0-9]{8})',
+             r'(?:jwplatform\.com/players/|jwplayer_)([a-zA-Z0-9]{8})'),
             webpage, 'jwplatform id')
         return self.url_result(
             'jwplatform:%s' % jwplatform_id, ie=JWPlatformIE.ie_key(),
             webpage, 'jwplatform id')
         return self.url_result(
             'jwplatform:%s' % jwplatform_id, ie=JWPlatformIE.ie_key(),
index c506bc5dd2402a95752bdf3223fe4a24cf9d06ae..8667a0d0457cccfc145cc52bc1eb1c7816aa04b8 100644 (file)
@@ -13,6 +13,8 @@
     int_or_none,
     merge_dicts,
     parse_iso8601,
     int_or_none,
     merge_dicts,
     parse_iso8601,
+    str_or_none,
+    url_or_none,
 )
 
 
 )
 
 
@@ -20,15 +22,15 @@ class CanvasIE(InfoExtractor):
     _VALID_URL = r'https?://mediazone\.vrt\.be/api/v1/(?P<site_id>canvas|een|ketnet|vrt(?:video|nieuws)|sporza)/assets/(?P<id>[^/?#&]+)'
     _TESTS = [{
         'url': 'https://mediazone.vrt.be/api/v1/ketnet/assets/md-ast-4ac54990-ce66-4d00-a8ca-9eac86f4c475',
     _VALID_URL = r'https?://mediazone\.vrt\.be/api/v1/(?P<site_id>canvas|een|ketnet|vrt(?:video|nieuws)|sporza)/assets/(?P<id>[^/?#&]+)'
     _TESTS = [{
         'url': 'https://mediazone.vrt.be/api/v1/ketnet/assets/md-ast-4ac54990-ce66-4d00-a8ca-9eac86f4c475',
-        'md5': '90139b746a0a9bd7bb631283f6e2a64e',
+        'md5': '68993eda72ef62386a15ea2cf3c93107',
         'info_dict': {
             'id': 'md-ast-4ac54990-ce66-4d00-a8ca-9eac86f4c475',
             'display_id': 'md-ast-4ac54990-ce66-4d00-a8ca-9eac86f4c475',
         'info_dict': {
             'id': 'md-ast-4ac54990-ce66-4d00-a8ca-9eac86f4c475',
             'display_id': 'md-ast-4ac54990-ce66-4d00-a8ca-9eac86f4c475',
-            'ext': 'flv',
+            'ext': 'mp4',
             'title': 'Nachtwacht: De Greystook',
             'title': 'Nachtwacht: De Greystook',
-            'description': 'md5:1db3f5dc4c7109c821261e7512975be7',
+            'description': 'Nachtwacht: De Greystook',
             'thumbnail': r're:^https?://.*\.jpg$',
             'thumbnail': r're:^https?://.*\.jpg$',
-            'duration': 1468.03,
+            'duration': 1468.04,
         },
         'expected_warnings': ['is not a supported codec', 'Unknown MIME type'],
     }, {
         },
         'expected_warnings': ['is not a supported codec', 'Unknown MIME type'],
     }, {
@@ -39,23 +41,45 @@ class CanvasIE(InfoExtractor):
         'HLS': 'm3u8_native',
         'HLS_AES': 'm3u8',
     }
         'HLS': 'm3u8_native',
         'HLS_AES': 'm3u8',
     }
+    _REST_API_BASE = 'https://media-services-public.vrt.be/vualto-video-aggregator-web/rest/external/v1'
 
     def _real_extract(self, url):
         mobj = re.match(self._VALID_URL, url)
         site_id, video_id = mobj.group('site_id'), mobj.group('id')
 
 
     def _real_extract(self, url):
         mobj = re.match(self._VALID_URL, url)
         site_id, video_id = mobj.group('site_id'), mobj.group('id')
 
+        # Old API endpoint, serves more formats but may fail for some videos
         data = self._download_json(
             'https://mediazone.vrt.be/api/v1/%s/assets/%s'
         data = self._download_json(
             'https://mediazone.vrt.be/api/v1/%s/assets/%s'
-            % (site_id, video_id), video_id)
+            % (site_id, video_id), video_id, 'Downloading asset JSON',
+            'Unable to download asset JSON', fatal=False)
+
+        # New API endpoint
+        if not data:
+            token = self._download_json(
+                '%s/tokens' % self._REST_API_BASE, video_id,
+                'Downloading token', data=b'',
+                headers={'Content-Type': 'application/json'})['vrtPlayerToken']
+            data = self._download_json(
+                '%s/videos/%s' % (self._REST_API_BASE, video_id),
+                video_id, 'Downloading video JSON', fatal=False, query={
+                    'vrtPlayerToken': token,
+                    'client': '%s@PROD' % site_id,
+                }, expected_status=400)
+            message = data.get('message')
+            if message and not data.get('title'):
+                if data.get('code') == 'AUTHENTICATION_REQUIRED':
+                    self.raise_login_required(message)
+                raise ExtractorError(message, expected=True)
 
         title = data['title']
         description = data.get('description')
 
         formats = []
         for target in data['targetUrls']:
 
         title = data['title']
         description = data.get('description')
 
         formats = []
         for target in data['targetUrls']:
-            format_url, format_type = target.get('url'), target.get('type')
+            format_url, format_type = url_or_none(target.get('url')), str_or_none(target.get('type'))
             if not format_url or not format_type:
                 continue
             if not format_url or not format_type:
                 continue
+            format_type = format_type.upper()
             if format_type in self._HLS_ENTRY_PROTOCOLS_MAP:
                 formats.extend(self._extract_m3u8_formats(
                     format_url, video_id, 'mp4', self._HLS_ENTRY_PROTOCOLS_MAP[format_type],
             if format_type in self._HLS_ENTRY_PROTOCOLS_MAP:
                 formats.extend(self._extract_m3u8_formats(
                     format_url, video_id, 'mp4', self._HLS_ENTRY_PROTOCOLS_MAP[format_type],
@@ -134,20 +158,20 @@ class CanvasEenIE(InfoExtractor):
         },
         'skip': 'Pagina niet gevonden',
     }, {
         },
         'skip': 'Pagina niet gevonden',
     }, {
-        'url': 'https://www.een.be/sorry-voor-alles/herbekijk-sorry-voor-alles',
+        'url': 'https://www.een.be/thuis/emma-pakt-thilly-aan',
         'info_dict': {
         'info_dict': {
-            'id': 'mz-ast-11a587f8-b921-4266-82e2-0bce3e80d07f',
-            'display_id': 'herbekijk-sorry-voor-alles',
+            'id': 'md-ast-3a24ced2-64d7-44fb-b4ed-ed1aafbf90b8',
+            'display_id': 'emma-pakt-thilly-aan',
             'ext': 'mp4',
             'ext': 'mp4',
-            'title': 'Herbekijk Sorry voor alles',
-            'description': 'md5:8bb2805df8164e5eb95d6a7a29dc0dd3',
+            'title': 'Emma pakt Thilly aan',
+            'description': 'md5:c5c9b572388a99b2690030afa3f3bad7',
             'thumbnail': r're:^https?://.*\.jpg$',
             'thumbnail': r're:^https?://.*\.jpg$',
-            'duration': 3788.06,
+            'duration': 118.24,
         },
         'params': {
             'skip_download': True,
         },
         },
         'params': {
             'skip_download': True,
         },
-        'skip': 'Episode no longer available',
+        'expected_warnings': ['is not a supported codec'],
     }, {
         'url': 'https://www.canvas.be/check-point/najaar-2016/de-politie-uw-vriend',
         'only_matching': True,
     }, {
         'url': 'https://www.canvas.be/check-point/najaar-2016/de-politie-uw-vriend',
         'only_matching': True,
@@ -183,19 +207,44 @@ class VrtNUIE(GigyaBaseIE):
     IE_DESC = 'VrtNU.be'
     _VALID_URL = r'https?://(?:www\.)?vrt\.be/(?P<site_id>vrtnu)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
     _TESTS = [{
     IE_DESC = 'VrtNU.be'
     _VALID_URL = r'https?://(?:www\.)?vrt\.be/(?P<site_id>vrtnu)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
     _TESTS = [{
+        # Available via old API endpoint
         'url': 'https://www.vrt.be/vrtnu/a-z/postbus-x/1/postbus-x-s1a1/',
         'info_dict': {
             'id': 'pbs-pub-2e2d8c27-df26-45c9-9dc6-90c78153044d$vid-90c932b1-e21d-4fb8-99b1-db7b49cf74de',
         'url': 'https://www.vrt.be/vrtnu/a-z/postbus-x/1/postbus-x-s1a1/',
         'info_dict': {
             'id': 'pbs-pub-2e2d8c27-df26-45c9-9dc6-90c78153044d$vid-90c932b1-e21d-4fb8-99b1-db7b49cf74de',
-            'ext': 'flv',
+            'ext': 'mp4',
             'title': 'De zwarte weduwe',
             'title': 'De zwarte weduwe',
-            'description': 'md5:d90c21dced7db869a85db89a623998d4',
+            'description': 'md5:db1227b0f318c849ba5eab1fef895ee4',
             'duration': 1457.04,
             'thumbnail': r're:^https?://.*\.jpg$',
             'duration': 1457.04,
             'thumbnail': r're:^https?://.*\.jpg$',
-            'season': '1',
+            'season': 'Season 1',
             'season_number': 1,
             'episode_number': 1,
         },
             'season_number': 1,
             'episode_number': 1,
         },
-        'skip': 'This video is only available for registered users'
+        'skip': 'This video is only available for registered users',
+        'params': {
+            'username': '<snip>',
+            'password': '<snip>',
+        },
+        'expected_warnings': ['is not a supported codec'],
+    }, {
+        # Only available via new API endpoint
+        'url': 'https://www.vrt.be/vrtnu/a-z/kamp-waes/1/kamp-waes-s1a5/',
+        'info_dict': {
+            'id': 'pbs-pub-0763b56c-64fb-4d38-b95b-af60bf433c71$vid-ad36a73c-4735-4f1f-b2c0-a38e6e6aa7e1',
+            'ext': 'mp4',
+            'title': 'Aflevering 5',
+            'description': 'Wie valt door de mand tijdens een missie?',
+            'duration': 2967.06,
+            'season': 'Season 1',
+            'season_number': 1,
+            'episode_number': 5,
+        },
+        'skip': 'This video is only available for registered users',
+        'params': {
+            'username': '<snip>',
+            'password': '<snip>',
+        },
+        'expected_warnings': ['Unable to download asset JSON', 'is not a supported codec', 'Unknown MIME type'],
     }]
     _NETRC_MACHINE = 'vrtnu'
     _APIKEY = '3_0Z2HujMtiWq_pkAjgnS2Md2E11a1AwZjYiBETtwNE-EoEHDINgtnvcAOpNgmrVGy'
     }]
     _NETRC_MACHINE = 'vrtnu'
     _APIKEY = '3_0Z2HujMtiWq_pkAjgnS2Md2E11a1AwZjYiBETtwNE-EoEHDINgtnvcAOpNgmrVGy'
index 751a3a8f26c94ecb19c130503593515312bac6c7..fd5ec6033b80513012cf2615fc56e80c7e82cadc 100644 (file)
@@ -1,8 +1,10 @@
 # coding: utf-8
 from __future__ import unicode_literals
 
 # coding: utf-8
 from __future__ import unicode_literals
 
+import hashlib
 import json
 import re
 import json
 import re
+from xml.sax.saxutils import escape
 
 from .common import InfoExtractor
 from ..compat import (
 
 from .common import InfoExtractor
 from ..compat import (
@@ -216,6 +218,29 @@ class CBCWatchBaseIE(InfoExtractor):
         'clearleap': 'http://www.clearleap.com/namespace/clearleap/1.0/',
     }
     _GEO_COUNTRIES = ['CA']
         'clearleap': 'http://www.clearleap.com/namespace/clearleap/1.0/',
     }
     _GEO_COUNTRIES = ['CA']
+    _LOGIN_URL = 'https://api.loginradius.com/identity/v2/auth/login'
+    _TOKEN_URL = 'https://cloud-api.loginradius.com/sso/jwt/api/token'
+    _API_KEY = '3f4beddd-2061-49b0-ae80-6f1f2ed65b37'
+    _NETRC_MACHINE = 'cbcwatch'
+
+    def _signature(self, email, password):
+        data = json.dumps({
+            'email': email,
+            'password': password,
+        }).encode()
+        headers = {'content-type': 'application/json'}
+        query = {'apikey': self._API_KEY}
+        resp = self._download_json(self._LOGIN_URL, None, data=data, headers=headers, query=query)
+        access_token = resp['access_token']
+
+        # token
+        query = {
+            'access_token': access_token,
+            'apikey': self._API_KEY,
+            'jwtapp': 'jwt',
+        }
+        resp = self._download_json(self._TOKEN_URL, None, headers=headers, query=query)
+        return resp['signature']
 
     def _call_api(self, path, video_id):
         url = path if path.startswith('http') else self._API_BASE_URL + path
 
     def _call_api(self, path, video_id):
         url = path if path.startswith('http') else self._API_BASE_URL + path
@@ -239,7 +264,8 @@ def _call_api(self, path, video_id):
     def _real_initialize(self):
         if self._valid_device_token():
             return
     def _real_initialize(self):
         if self._valid_device_token():
             return
-        device = self._downloader.cache.load('cbcwatch', 'device') or {}
+        device = self._downloader.cache.load(
+            'cbcwatch', self._cache_device_key()) or {}
         self._device_id, self._device_token = device.get('id'), device.get('token')
         if self._valid_device_token():
             return
         self._device_id, self._device_token = device.get('id'), device.get('token')
         if self._valid_device_token():
             return
@@ -248,16 +274,30 @@ def _real_initialize(self):
     def _valid_device_token(self):
         return self._device_id and self._device_token
 
     def _valid_device_token(self):
         return self._device_id and self._device_token
 
+    def _cache_device_key(self):
+        email, _ = self._get_login_info()
+        return '%s_device' % hashlib.sha256(email.encode()).hexdigest() if email else 'device'
+
     def _register_device(self):
     def _register_device(self):
-        self._device_id = self._device_token = None
         result = self._download_xml(
             self._API_BASE_URL + 'device/register',
             None, 'Acquiring device token',
             data=b'<device><type>web</type></device>')
         self._device_id = xpath_text(result, 'deviceId', fatal=True)
         result = self._download_xml(
             self._API_BASE_URL + 'device/register',
             None, 'Acquiring device token',
             data=b'<device><type>web</type></device>')
         self._device_id = xpath_text(result, 'deviceId', fatal=True)
-        self._device_token = xpath_text(result, 'deviceToken', fatal=True)
+        email, password = self._get_login_info()
+        if email and password:
+            signature = self._signature(email, password)
+            data = '<login><token>{0}</token><device><deviceId>{1}</deviceId><type>web</type></device></login>'.format(
+                escape(signature), escape(self._device_id)).encode()
+            url = self._API_BASE_URL + 'device/login'
+            result = self._download_xml(
+                url, None, data=data,
+                headers={'content-type': 'application/xml'})
+            self._device_token = xpath_text(result, 'token', fatal=True)
+        else:
+            self._device_token = xpath_text(result, 'deviceToken', fatal=True)
         self._downloader.cache.store(
         self._downloader.cache.store(
-            'cbcwatch', 'device', {
+            'cbcwatch', self._cache_device_key(), {
                 'id': self._device_id,
                 'token': self._device_token,
             })
                 'id': self._device_id,
                 'token': self._device_token,
             })
index 8ff2c6531570ee3a06210832cf967c7033d6ab9f..2fdcfbb3af1fbffb9e66abff56b86e31762ad449 100644 (file)
@@ -1,20 +1,24 @@
 # coding: utf-8
 from __future__ import unicode_literals
 
 # coding: utf-8
 from __future__ import unicode_literals
 
+import base64
 import re
 
 from .common import InfoExtractor
 
 
 class CloudflareStreamIE(InfoExtractor):
 import re
 
 from .common import InfoExtractor
 
 
 class CloudflareStreamIE(InfoExtractor):
+    _DOMAIN_RE = r'(?:cloudflarestream\.com|(?:videodelivery|bytehighway)\.net)'
+    _EMBED_RE = r'embed\.%s/embed/[^/]+\.js\?.*?\bvideo=' % _DOMAIN_RE
+    _ID_RE = r'[\da-f]{32}|[\w-]+\.[\w-]+\.[\w-]+'
     _VALID_URL = r'''(?x)
                     https?://
                         (?:
     _VALID_URL = r'''(?x)
                     https?://
                         (?:
-                            (?:watch\.)?(?:cloudflarestream\.com|videodelivery\.net)/|
-                            embed\.(?:cloudflarestream\.com|videodelivery\.net)/embed/[^/]+\.js\?.*?\bvideo=
+                            (?:watch\.)?%s/|
+                            %s
                         )
                         )
-                        (?P<id>[\da-f]+)
-                    '''
+                        (?P<id>%s)
+                    ''' % (_DOMAIN_RE, _EMBED_RE, _ID_RE)
     _TESTS = [{
         'url': 'https://embed.cloudflarestream.com/embed/we4g.fla9.latest.js?video=31c9291ab41fac05471db4e73aa11717',
         'info_dict': {
     _TESTS = [{
         'url': 'https://embed.cloudflarestream.com/embed/we4g.fla9.latest.js?video=31c9291ab41fac05471db4e73aa11717',
         'info_dict': {
@@ -41,23 +45,28 @@ def _extract_urls(webpage):
         return [
             mobj.group('url')
             for mobj in re.finditer(
         return [
             mobj.group('url')
             for mobj in re.finditer(
-                r'<script[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//embed\.(?:cloudflarestream\.com|videodelivery\.net)/embed/[^/]+\.js\?.*?\bvideo=[\da-f]+?.*?)\1',
+                r'<script[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//%s(?:%s).*?)\1' % (CloudflareStreamIE._EMBED_RE, CloudflareStreamIE._ID_RE),
                 webpage)]
 
     def _real_extract(self, url):
         video_id = self._match_id(url)
                 webpage)]
 
     def _real_extract(self, url):
         video_id = self._match_id(url)
+        domain = 'bytehighway.net' if 'bytehighway.net/' in url else 'videodelivery.net'
+        base_url = 'https://%s/%s/' % (domain, video_id)
+        if '.' in video_id:
+            video_id = self._parse_json(base64.urlsafe_b64decode(
+                video_id.split('.')[1]), video_id)['sub']
+        manifest_base_url = base_url + 'manifest/video.'
 
         formats = self._extract_m3u8_formats(
 
         formats = self._extract_m3u8_formats(
-            'https://cloudflarestream.com/%s/manifest/video.m3u8' % video_id,
-            video_id, 'mp4', entry_protocol='m3u8_native', m3u8_id='hls',
-            fatal=False)
+            manifest_base_url + 'm3u8', video_id, 'mp4',
+            'm3u8_native', m3u8_id='hls', fatal=False)
         formats.extend(self._extract_mpd_formats(
         formats.extend(self._extract_mpd_formats(
-            'https://cloudflarestream.com/%s/manifest/video.mpd' % video_id,
-            video_id, mpd_id='dash', fatal=False))
+            manifest_base_url + 'mpd', video_id, mpd_id='dash', fatal=False))
         self._sort_formats(formats)
 
         return {
             'id': video_id,
             'title': video_id,
         self._sort_formats(formats)
 
         return {
             'id': video_id,
             'title': video_id,
+            'thumbnail': base_url + 'thumbnails/thumbnail.jpg',
             'formats': formats,
         }
             'formats': formats,
         }
index eaae5e484f99311ccf018301f773c87f9c8cf544..a61753b17cd35835474c347c2b438e5f32949d73 100644 (file)
@@ -15,7 +15,7 @@
 import math
 
 from ..compat import (
 import math
 
 from ..compat import (
-    compat_cookiejar,
+    compat_cookiejar_Cookie,
     compat_cookies,
     compat_etree_Element,
     compat_etree_fromstring,
     compat_cookies,
     compat_etree_Element,
     compat_etree_fromstring,
@@ -1182,16 +1182,33 @@ def _twitter_search_player(self, html):
                                       'twitter card player')
 
     def _search_json_ld(self, html, video_id, expected_type=None, **kwargs):
                                       'twitter card player')
 
     def _search_json_ld(self, html, video_id, expected_type=None, **kwargs):
-        json_ld = self._search_regex(
-            JSON_LD_RE, html, 'JSON-LD', group='json_ld', **kwargs)
+        json_ld_list = list(re.finditer(JSON_LD_RE, html))
         default = kwargs.get('default', NO_DEFAULT)
         default = kwargs.get('default', NO_DEFAULT)
-        if not json_ld:
-            return default if default is not NO_DEFAULT else {}
         # JSON-LD may be malformed and thus `fatal` should be respected.
         # At the same time `default` may be passed that assumes `fatal=False`
         # for _search_regex. Let's simulate the same behavior here as well.
         fatal = kwargs.get('fatal', True) if default == NO_DEFAULT else False
         # JSON-LD may be malformed and thus `fatal` should be respected.
         # At the same time `default` may be passed that assumes `fatal=False`
         # for _search_regex. Let's simulate the same behavior here as well.
         fatal = kwargs.get('fatal', True) if default == NO_DEFAULT else False
-        return self._json_ld(json_ld, video_id, fatal=fatal, expected_type=expected_type)
+        json_ld = []
+        for mobj in json_ld_list:
+            json_ld_item = self._parse_json(
+                mobj.group('json_ld'), video_id, fatal=fatal)
+            if not json_ld_item:
+                continue
+            if isinstance(json_ld_item, dict):
+                json_ld.append(json_ld_item)
+            elif isinstance(json_ld_item, (list, tuple)):
+                json_ld.extend(json_ld_item)
+        if json_ld:
+            json_ld = self._json_ld(json_ld, video_id, fatal=fatal, expected_type=expected_type)
+        if json_ld:
+            return json_ld
+        if default is not NO_DEFAULT:
+            return default
+        elif fatal:
+            raise RegexNotFoundError('Unable to extract JSON-LD')
+        else:
+            self._downloader.report_warning('unable to extract JSON-LD %s' % bug_reports_message())
+            return {}
 
     def _json_ld(self, json_ld, video_id, fatal=True, expected_type=None):
         if isinstance(json_ld, compat_str):
 
     def _json_ld(self, json_ld, video_id, fatal=True, expected_type=None):
         if isinstance(json_ld, compat_str):
@@ -1256,10 +1273,10 @@ def extract_video_object(e):
             extract_interaction_statistic(e)
 
         for e in json_ld:
             extract_interaction_statistic(e)
 
         for e in json_ld:
-            if isinstance(e.get('@context'), compat_str) and re.match(r'^https?://schema.org/?$', e.get('@context')):
+            if '@context' in e:
                 item_type = e.get('@type')
                 if expected_type is not None and expected_type != item_type:
                 item_type = e.get('@type')
                 if expected_type is not None and expected_type != item_type:
-                    return info
+                    continue
                 if item_type in ('TVEpisode', 'Episode'):
                     episode_name = unescapeHTML(e.get('name'))
                     info.update({
                 if item_type in ('TVEpisode', 'Episode'):
                     episode_name = unescapeHTML(e.get('name'))
                     info.update({
@@ -1293,11 +1310,17 @@ def extract_video_object(e):
                     })
                 elif item_type == 'VideoObject':
                     extract_video_object(e)
                     })
                 elif item_type == 'VideoObject':
                     extract_video_object(e)
-                    continue
+                    if expected_type is None:
+                        continue
+                    else:
+                        break
                 video = e.get('video')
                 if isinstance(video, dict) and video.get('@type') == 'VideoObject':
                     extract_video_object(video)
                 video = e.get('video')
                 if isinstance(video, dict) and video.get('@type') == 'VideoObject':
                     extract_video_object(video)
-                break
+                if expected_type is None:
+                    continue
+                else:
+                    break
         return dict((k, v) for k, v in info.items() if v is not None)
 
     @staticmethod
         return dict((k, v) for k, v in info.items() if v is not None)
 
     @staticmethod
@@ -2340,6 +2363,8 @@ def _extract_ism_formats(self, ism_url, video_id, ism_id=None, note=None, errnot
         if res is False:
             return []
         ism_doc, urlh = res
         if res is False:
             return []
         ism_doc, urlh = res
+        if ism_doc is None:
+            return []
 
         return self._parse_ism_formats(ism_doc, urlh.geturl(), ism_id)
 
 
         return self._parse_ism_formats(ism_doc, urlh.geturl(), ism_id)
 
@@ -2818,7 +2843,7 @@ def _float(self, v, name, fatal=False, **kwargs):
 
     def _set_cookie(self, domain, name, value, expire_time=None, port=None,
                     path='/', secure=False, discard=False, rest={}, **kwargs):
 
     def _set_cookie(self, domain, name, value, expire_time=None, port=None,
                     path='/', secure=False, discard=False, rest={}, **kwargs):
-        cookie = compat_cookiejar.Cookie(
+        cookie = compat_cookiejar_Cookie(
             0, name, value, port, port is not None, domain, True,
             domain.startswith('.'), path, True, secure, expire_time,
             discard, None, None, rest)
             0, name, value, port, port is not None, domain, True,
             domain.startswith('.'), path, True, secure, expire_time,
             discard, None, None, rest)
index 85a9a577f645395d6edde83e7b72a1f4001561f8..bc2d1fa8b041e3ec1bbc4d6d1b5f055ac31ee140 100644 (file)
@@ -13,6 +13,7 @@
     compat_b64decode,
     compat_etree_Element,
     compat_etree_fromstring,
     compat_b64decode,
     compat_etree_Element,
     compat_etree_fromstring,
+    compat_str,
     compat_urllib_parse_urlencode,
     compat_urllib_request,
     compat_urlparse,
     compat_urllib_parse_urlencode,
     compat_urllib_request,
     compat_urlparse,
@@ -25,9 +26,9 @@
     intlist_to_bytes,
     int_or_none,
     lowercase_escape,
     intlist_to_bytes,
     int_or_none,
     lowercase_escape,
+    merge_dicts,
     remove_end,
     sanitized_Request,
     remove_end,
     sanitized_Request,
-    unified_strdate,
     urlencode_postdata,
     xpath_text,
 )
     urlencode_postdata,
     xpath_text,
 )
@@ -136,6 +137,7 @@ class CrunchyrollIE(CrunchyrollBaseIE, VRVIE):
             # rtmp
             'skip_download': True,
         },
             # rtmp
             'skip_download': True,
         },
+        'skip': 'Video gone',
     }, {
         'url': 'http://www.crunchyroll.com/media-589804/culture-japan-1',
         'info_dict': {
     }, {
         'url': 'http://www.crunchyroll.com/media-589804/culture-japan-1',
         'info_dict': {
@@ -157,11 +159,12 @@ class CrunchyrollIE(CrunchyrollBaseIE, VRVIE):
         'info_dict': {
             'id': '702409',
             'ext': 'mp4',
         'info_dict': {
             'id': '702409',
             'ext': 'mp4',
-            'title': 'Re:ZERO -Starting Life in Another World- Episode 5 – The Morning of Our Promise Is Still Distant',
-            'description': 'md5:97664de1ab24bbf77a9c01918cb7dca9',
+            'title': compat_str,
+            'description': compat_str,
             'thumbnail': r're:^https?://.*\.jpg$',
             'thumbnail': r're:^https?://.*\.jpg$',
-            'uploader': 'TV TOKYO',
-            'upload_date': '20160508',
+            'uploader': 'Re:Zero Partners',
+            'timestamp': 1462098900,
+            'upload_date': '20160501',
         },
         'params': {
             # m3u8 download
         },
         'params': {
             # m3u8 download
@@ -172,12 +175,13 @@ class CrunchyrollIE(CrunchyrollBaseIE, VRVIE):
         'info_dict': {
             'id': '727589',
             'ext': 'mp4',
         'info_dict': {
             'id': '727589',
             'ext': 'mp4',
-            'title': "KONOSUBA -God's blessing on this wonderful world! 2 Episode 1 – Give Me Deliverance From This Judicial Injustice!",
-            'description': 'md5:cbcf05e528124b0f3a0a419fc805ea7d',
+            'title': compat_str,
+            'description': compat_str,
             'thumbnail': r're:^https?://.*\.jpg$',
             'uploader': 'Kadokawa Pictures Inc.',
             'thumbnail': r're:^https?://.*\.jpg$',
             'uploader': 'Kadokawa Pictures Inc.',
-            'upload_date': '20170118',
-            'series': "KONOSUBA -God's blessing on this wonderful world!",
+            'timestamp': 1484130900,
+            'upload_date': '20170111',
+            'series': compat_str,
             'season': "KONOSUBA -God's blessing on this wonderful world! 2",
             'season_number': 2,
             'episode': 'Give Me Deliverance From This Judicial Injustice!',
             'season': "KONOSUBA -God's blessing on this wonderful world! 2",
             'season_number': 2,
             'episode': 'Give Me Deliverance From This Judicial Injustice!',
@@ -200,10 +204,11 @@ class CrunchyrollIE(CrunchyrollBaseIE, VRVIE):
         'info_dict': {
             'id': '535080',
             'ext': 'mp4',
         'info_dict': {
             'id': '535080',
             'ext': 'mp4',
-            'title': '11eyes Episode 1 – Red Night ~ Piros éjszaka',
-            'description': 'Kakeru and Yuka are thrown into an alternate nightmarish world they call "Red Night".',
+            'title': compat_str,
+            'description': compat_str,
             'uploader': 'Marvelous AQL Inc.',
             'uploader': 'Marvelous AQL Inc.',
-            'upload_date': '20091021',
+            'timestamp': 1255512600,
+            'upload_date': '20091014',
         },
         'params': {
             # Just test metadata extraction
         },
         'params': {
             # Just test metadata extraction
@@ -224,15 +229,17 @@ class CrunchyrollIE(CrunchyrollBaseIE, VRVIE):
             # just test metadata extraction
             'skip_download': True,
         },
             # just test metadata extraction
             'skip_download': True,
         },
+        'skip': 'Video gone',
     }, {
         # A video with a vastly different season name compared to the series name
         'url': 'http://www.crunchyroll.com/nyarko-san-another-crawling-chaos/episode-1-test-590532',
         'info_dict': {
             'id': '590532',
             'ext': 'mp4',
     }, {
         # A video with a vastly different season name compared to the series name
         'url': 'http://www.crunchyroll.com/nyarko-san-another-crawling-chaos/episode-1-test-590532',
         'info_dict': {
             'id': '590532',
             'ext': 'mp4',
-            'title': 'Haiyoru! Nyaruani (ONA) Episode 1 – Test',
-            'description': 'Mahiro and Nyaruko talk about official certification.',
+            'title': compat_str,
+            'description': compat_str,
             'uploader': 'TV TOKYO',
             'uploader': 'TV TOKYO',
+            'timestamp': 1330956000,
             'upload_date': '20120305',
             'series': 'Nyarko-san: Another Crawling Chaos',
             'season': 'Haiyoru! Nyaruani (ONA)',
             'upload_date': '20120305',
             'series': 'Nyarko-san: Another Crawling Chaos',
             'season': 'Haiyoru! Nyaruani (ONA)',
@@ -442,23 +449,21 @@ def _real_extract(self, url):
             webpage, 'language', default=None, group='lang')
 
         video_title = self._html_search_regex(
             webpage, 'language', default=None, group='lang')
 
         video_title = self._html_search_regex(
-            r'(?s)<h1[^>]*>((?:(?!<h1).)*?<span[^>]+itemprop=["\']title["\'][^>]*>(?:(?!<h1).)+?)</h1>',
-            webpage, 'video_title')
+            (r'(?s)<h1[^>]*>((?:(?!<h1).)*?<(?:span[^>]+itemprop=["\']title["\']|meta[^>]+itemprop=["\']position["\'])[^>]*>(?:(?!<h1).)+?)</h1>',
+             r'<title>(.+?),\s+-\s+.+? Crunchyroll'),
+            webpage, 'video_title', default=None)
+        if not video_title:
+            video_title = re.sub(r'^Watch\s+', '', self._og_search_description(webpage))
         video_title = re.sub(r' {2,}', ' ', video_title)
         video_description = (self._parse_json(self._html_search_regex(
             r'<script[^>]*>\s*.+?\[media_id=%s\].+?({.+?"description"\s*:.+?})\);' % video_id,
             webpage, 'description', default='{}'), video_id) or media_metadata).get('description')
         if video_description:
             video_description = lowercase_escape(video_description.replace(r'\r\n', '\n'))
         video_title = re.sub(r' {2,}', ' ', video_title)
         video_description = (self._parse_json(self._html_search_regex(
             r'<script[^>]*>\s*.+?\[media_id=%s\].+?({.+?"description"\s*:.+?})\);' % video_id,
             webpage, 'description', default='{}'), video_id) or media_metadata).get('description')
         if video_description:
             video_description = lowercase_escape(video_description.replace(r'\r\n', '\n'))
-        video_upload_date = self._html_search_regex(
-            [r'<div>Availability for free users:(.+?)</div>', r'<div>[^<>]+<span>\s*(.+?\d{4})\s*</span></div>'],
-            webpage, 'video_upload_date', fatal=False, flags=re.DOTALL)
-        if video_upload_date:
-            video_upload_date = unified_strdate(video_upload_date)
         video_uploader = self._html_search_regex(
             # try looking for both an uploader that's a link and one that's not
             [r'<a[^>]+href="/publisher/[^"]+"[^>]*>([^<]+)</a>', r'<div>\s*Publisher:\s*<span>\s*(.+?)\s*</span>\s*</div>'],
         video_uploader = self._html_search_regex(
             # try looking for both an uploader that's a link and one that's not
             [r'<a[^>]+href="/publisher/[^"]+"[^>]*>([^<]+)</a>', r'<div>\s*Publisher:\s*<span>\s*(.+?)\s*</span>\s*</div>'],
-            webpage, 'video_uploader', fatal=False)
+            webpage, 'video_uploader', default=False)
 
         formats = []
         for stream in media.get('streams', []):
 
         formats = []
         for stream in media.get('streams', []):
@@ -611,14 +616,15 @@ def _real_extract(self, url):
             r'(?s)<h\d[^>]+id=["\']showmedia_about_episode_num[^>]+>.+?</h\d>\s*<h4>\s*Season (\d+)',
             webpage, 'season number', default=None))
 
             r'(?s)<h\d[^>]+id=["\']showmedia_about_episode_num[^>]+>.+?</h\d>\s*<h4>\s*Season (\d+)',
             webpage, 'season number', default=None))
 
-        return {
+        info = self._search_json_ld(webpage, video_id, default={})
+
+        return merge_dicts({
             'id': video_id,
             'title': video_title,
             'description': video_description,
             'duration': duration,
             'thumbnail': thumbnail,
             'uploader': video_uploader,
             'id': video_id,
             'title': video_title,
             'description': video_description,
             'duration': duration,
             'thumbnail': thumbnail,
             'uploader': video_uploader,
-            'upload_date': video_upload_date,
             'series': series,
             'season': season,
             'season_number': season_number,
             'series': series,
             'season': season,
             'season_number': season_number,
@@ -626,7 +632,7 @@ def _real_extract(self, url):
             'episode_number': episode_number,
             'subtitles': subtitles,
             'formats': formats,
             'episode_number': episode_number,
             'subtitles': subtitles,
             'formats': formats,
-        }
+        }, info)
 
 
 class CrunchyrollShowPlaylistIE(CrunchyrollBaseIE):
 
 
 class CrunchyrollShowPlaylistIE(CrunchyrollBaseIE):
index 327fdb04a71215b8de170314d7080093ced3bbba..b8529050c45c8ee1479e2624ba26aac3f002b9fe 100644 (file)
@@ -32,7 +32,7 @@ def _get_dailymotion_cookies(self):
 
     @staticmethod
     def _get_cookie_value(cookies, name):
 
     @staticmethod
     def _get_cookie_value(cookies, name):
-        cookie = cookies.get('name')
+        cookie = cookies.get(name)
         if cookie:
             return cookie.value
 
         if cookie:
             return cookie.value
 
index 04ff214f727826a60bbdde5ec17bb48ba004a91e..e700f8d86531415da0f1db0f2ccdaef6ea10ac53 100644 (file)
@@ -16,10 +16,11 @@ class DctpTvIE(InfoExtractor):
     _TESTS = [{
         # 4x3
         'url': 'http://www.dctp.tv/filme/videoinstallation-fuer-eine-kaufhausfassade/',
     _TESTS = [{
         # 4x3
         'url': 'http://www.dctp.tv/filme/videoinstallation-fuer-eine-kaufhausfassade/',
+        'md5': '3ffbd1556c3fe210724d7088fad723e3',
         'info_dict': {
             'id': '95eaa4f33dad413aa17b4ee613cccc6c',
             'display_id': 'videoinstallation-fuer-eine-kaufhausfassade',
         'info_dict': {
             'id': '95eaa4f33dad413aa17b4ee613cccc6c',
             'display_id': 'videoinstallation-fuer-eine-kaufhausfassade',
-            'ext': 'flv',
+            'ext': 'm4v',
             'title': 'Videoinstallation für eine Kaufhausfassade',
             'description': 'Kurzfilm',
             'thumbnail': r're:^https?://.*\.jpg$',
             'title': 'Videoinstallation für eine Kaufhausfassade',
             'description': 'Kurzfilm',
             'thumbnail': r're:^https?://.*\.jpg$',
@@ -27,10 +28,6 @@ class DctpTvIE(InfoExtractor):
             'timestamp': 1302172322,
             'upload_date': '20110407',
         },
             'timestamp': 1302172322,
             'upload_date': '20110407',
         },
-        'params': {
-            # rtmp download
-            'skip_download': True,
-        },
     }, {
         # 16x9
         'url': 'http://www.dctp.tv/filme/sind-youtuber-die-besseren-lehrer/',
     }, {
         # 16x9
         'url': 'http://www.dctp.tv/filme/sind-youtuber-die-besseren-lehrer/',
@@ -59,33 +56,26 @@ def _real_extract(self, url):
 
         uuid = media['uuid']
         title = media['title']
 
         uuid = media['uuid']
         title = media['title']
-        ratio = '16x9' if media.get('is_wide') else '4x3'
-        play_path = 'mp4:%s_dctp_0500_%s.m4v' % (uuid, ratio)
-
-        servers = self._download_json(
-            'http://www.dctp.tv/streaming_servers/', display_id,
-            note='Downloading server list JSON', fatal=False)
-
-        if servers:
-            endpoint = next(
-                server['endpoint']
-                for server in servers
-                if url_or_none(server.get('endpoint'))
-                and 'cloudfront' in server['endpoint'])
-        else:
-            endpoint = 'rtmpe://s2pqqn4u96e4j8.cloudfront.net/cfx/st/'
-
-        app = self._search_regex(
-            r'^rtmpe?://[^/]+/(?P<app>.*)$', endpoint, 'app')
-
-        formats = [{
-            'url': endpoint,
-            'app': app,
-            'play_path': play_path,
-            'page_url': url,
-            'player_url': 'http://svm-prod-dctptv-static.s3.amazonaws.com/dctptv-relaunch2012-110.swf',
-            'ext': 'flv',
-        }]
+        is_wide = media.get('is_wide')
+        formats = []
+
+        def add_formats(suffix):
+            templ = 'https://%%s/%s_dctp_%s.m4v' % (uuid, suffix)
+            formats.extend([{
+                'format_id': 'hls-' + suffix,
+                'url': templ % 'cdn-segments.dctp.tv' + '/playlist.m3u8',
+                'protocol': 'm3u8_native',
+            }, {
+                'format_id': 's3-' + suffix,
+                'url': templ % 'completed-media.s3.amazonaws.com',
+            }, {
+                'format_id': 'http-' + suffix,
+                'url': templ % 'cdn-media.dctp.tv',
+            }])
+
+        add_formats('0500_' + ('16x9' if is_wide else '4x3'))
+        if is_wide:
+            add_formats('720p')
 
         thumbnails = []
         images = media.get('images')
 
         thumbnails = []
         images = media.get('images')
index 6a2712cc50429b7297a9d4fe9e1ec2d80177986e..e0139cc862d74bc3c20a9d6567747b96b642c730 100644 (file)
@@ -13,8 +13,8 @@
 class DiscoveryIE(DiscoveryGoBaseIE):
     _VALID_URL = r'''(?x)https?://
         (?P<site>
 class DiscoveryIE(DiscoveryGoBaseIE):
     _VALID_URL = r'''(?x)https?://
         (?P<site>
-            (?:(?:www|go)\.)?discovery|
-            (?:www\.)?
+            go\.discovery|
+            www\.
                 (?:
                     investigationdiscovery|
                     discoverylife|
                 (?:
                     investigationdiscovery|
                     discoverylife|
@@ -22,8 +22,7 @@ class DiscoveryIE(DiscoveryGoBaseIE):
                     ahctv|
                     destinationamerica|
                     sciencechannel|
                     ahctv|
                     destinationamerica|
                     sciencechannel|
-                    tlc|
-                    velocity
+                    tlc
                 )|
             watch\.
                 (?:
                 )|
             watch\.
                 (?:
@@ -83,7 +82,7 @@ def _real_extract(self, url):
                     'authRel': 'authorization',
                     'client_id': '3020a40c2356a645b4b4',
                     'nonce': ''.join([random.choice(string.ascii_letters) for _ in range(32)]),
                     'authRel': 'authorization',
                     'client_id': '3020a40c2356a645b4b4',
                     'nonce': ''.join([random.choice(string.ascii_letters) for _ in range(32)]),
-                    'redirectUri': 'https://fusion.ddmcdn.com/app/mercury-sdk/180/redirectHandler.html?https://www.%s.com' % site,
+                    'redirectUri': 'https://www.discovery.com/',
                 })['access_token']
 
         headers = self.geo_verification_headers()
                 })['access_token']
 
         headers = self.geo_verification_headers()
index c050bf9df3fb7ececed5b3e03a70aea9b2c37417..fe42821c731c711e8f0974fd4ce48f5c9aee8e8f 100644 (file)
@@ -4,7 +4,6 @@
 import re
 
 from .common import InfoExtractor
 import re
 
 from .common import InfoExtractor
-from ..compat import compat_str
 from ..utils import (
     encode_base_n,
     ExtractorError,
 from ..utils import (
     encode_base_n,
     ExtractorError,
@@ -55,7 +54,7 @@ def _real_extract(self, url):
 
         webpage, urlh = self._download_webpage_handle(url, display_id)
 
 
         webpage, urlh = self._download_webpage_handle(url, display_id)
 
-        video_id = self._match_id(compat_str(urlh.geturl()))
+        video_id = self._match_id(urlh.geturl())
 
         hash = self._search_regex(
             r'hash\s*:\s*["\']([\da-f]{32})', webpage, 'hash')
 
         hash = self._search_regex(
             r'hash\s*:\s*["\']([\da-f]{32})', webpage, 'hash')
index 50f69f0b6c153d2be19250b829bf2a28af25cec0..3d3dae7a4d90d00ad2a8de19d09383166db97877 100644 (file)
     BiliBiliBangumiIE,
     BilibiliAudioIE,
     BilibiliAudioAlbumIE,
     BiliBiliBangumiIE,
     BilibiliAudioIE,
     BilibiliAudioAlbumIE,
+    BiliBiliPlayerIE,
 )
 from .biobiochiletv import BioBioChileTVIE
 from .bitchute import (
 )
 from .biobiochiletv import BioBioChileTVIE
 from .bitchute import (
 from .jove import JoveIE
 from .joj import JojIE
 from .jwplatform import JWPlatformIE
 from .jove import JoveIE
 from .joj import JojIE
 from .jwplatform import JWPlatformIE
-from .jpopsukitv import JpopsukiIE
 from .kakao import KakaoIE
 from .kaltura import KalturaIE
 from .kanalplay import KanalPlayIE
 from .kakao import KakaoIE
 from .kaltura import KalturaIE
 from .kanalplay import KanalPlayIE
 from .mlb import MLBIE
 from .mnet import MnetIE
 from .moevideo import MoeVideoIE
 from .mlb import MLBIE
 from .mnet import MnetIE
 from .moevideo import MoeVideoIE
-from .mofosex import MofosexIE
+from .mofosex import (
+    MofosexIE,
+    MofosexEmbedIE,
+)
 from .mojvideo import MojvideoIE
 from .morningstar import MorningstarIE
 from .motherless import (
 from .mojvideo import MojvideoIE
 from .morningstar import MorningstarIE
 from .motherless import (
     ORFFM4IE,
     ORFFM4StoryIE,
     ORFOE1IE,
     ORFFM4IE,
     ORFFM4StoryIE,
     ORFOE1IE,
+    ORFOE3IE,
+    ORFNOEIE,
+    ORFWIEIE,
+    ORFBGLIE,
+    ORFOOEIE,
+    ORFSTMIE,
+    ORFKTNIE,
+    ORFSBGIE,
+    ORFTIRIE,
+    ORFVBGIE,
     ORFIPTVIE,
 )
 from .outsidetv import OutsideTVIE
     ORFIPTVIE,
 )
 from .outsidetv import OutsideTVIE
     PacktPubIE,
     PacktPubCourseIE,
 )
     PacktPubIE,
     PacktPubCourseIE,
 )
-from .pandatv import PandaTVIE
 from .pandoratv import PandoraTVIE
 from .parliamentliveuk import ParliamentLiveUKIE
 from .patreon import PatreonIE
 from .pandoratv import PandoraTVIE
 from .parliamentliveuk import ParliamentLiveUKIE
 from .patreon import PatreonIE
     PolskieRadioIE,
     PolskieRadioCategoryIE,
 )
     PolskieRadioIE,
     PolskieRadioCategoryIE,
 )
+from .popcorntimes import PopcorntimesIE
 from .popcorntv import PopcornTVIE
 from .porn91 import Porn91IE
 from .porncom import PornComIE
 from .popcorntv import PopcornTVIE
 from .porn91 import Porn91IE
 from .porncom import PornComIE
 from .sbs import SBSIE
 from .screencast import ScreencastIE
 from .screencastomatic import ScreencastOMaticIE
 from .sbs import SBSIE
 from .screencast import ScreencastIE
 from .screencastomatic import ScreencastOMaticIE
-from .scrippsnetworks import ScrippsNetworksWatchIE
+from .scrippsnetworks import (
+    ScrippsNetworksWatchIE,
+    ScrippsNetworksIE,
+)
 from .scte import (
     SCTEIE,
     SCTECourseIE,
 from .scte import (
     SCTEIE,
     SCTECourseIE,
index ce64e26831fdafceb97b6d8ae919c00a78f0f90f..610d6674592384922f9df7af4da5958592ce56bd 100644 (file)
@@ -466,15 +466,18 @@ def _real_extract(self, url):
             return info_dict
 
         if '/posts/' in url:
             return info_dict
 
         if '/posts/' in url:
-            entries = [
-                self.url_result('facebook:%s' % vid, FacebookIE.ie_key())
-                for vid in self._parse_json(
-                    self._search_regex(
-                        r'(["\'])video_ids\1\s*:\s*(?P<ids>\[.+?\])',
-                        webpage, 'video ids', group='ids'),
-                    video_id)]
-
-            return self.playlist_result(entries, video_id)
+            video_id_json = self._search_regex(
+                r'(["\'])video_ids\1\s*:\s*(?P<ids>\[.+?\])', webpage, 'video ids', group='ids',
+                default='')
+            if video_id_json:
+                entries = [
+                    self.url_result('facebook:%s' % vid, FacebookIE.ie_key())
+                    for vid in self._parse_json(video_id_json, video_id)]
+                return self.playlist_result(entries, video_id)
+
+            # Single Video?
+            video_id = self._search_regex(r'video_id:\s*"([0-9]+)"', webpage, 'single video id')
+            return self.url_result('facebook:%s' % video_id, FacebookIE.ie_key())
         else:
             _, info_dict = self._extract_from_url(
                 self._VIDEO_PAGE_TEMPLATE % video_id,
         else:
             _, info_dict = self._extract_from_url(
                 self._VIDEO_PAGE_TEMPLATE % video_id,
index b8fa175880f47d6050e4fa3908994bd9777131ac..306b45fc99a4c3495a233d8fb3c649032641d87a 100644 (file)
@@ -31,7 +31,13 @@ def _real_extract(self, url):
         webpage = self._download_webpage(url, display_id)
 
         video_data = extract_attributes(self._search_regex(
         webpage = self._download_webpage(url, display_id)
 
         video_data = extract_attributes(self._search_regex(
-            r'(?s)<div[^>]+class="[^"]*?(?:title-zone-diffusion|heading-zone-(?:wrapper|player-button))[^"]*?"[^>]*>.*?(<button[^>]+data-asset-source="[^"]+"[^>]+>)',
+            r'''(?sx)
+                (?:
+                    </h1>|
+                    <div[^>]+class="[^"]*?(?:title-zone-diffusion|heading-zone-(?:wrapper|player-button))[^"]*?"[^>]*>
+                ).*?
+                (<button[^>]+data-asset-source="[^"]+"[^>]+>)
+            ''',
             webpage, 'video data'))
 
         video_url = video_data['data-asset-source']
             webpage, 'video data'))
 
         video_url = video_data['data-asset-source']
index 743ef47dbe2b6a534a65be67dee28e78aa8fe215..355067a509fc197c10a2085bad058d4459729bc9 100644 (file)
@@ -60,6 +60,9 @@
 from .drtuber import DrTuberIE
 from .redtube import RedTubeIE
 from .tube8 import Tube8IE
 from .drtuber import DrTuberIE
 from .redtube import RedTubeIE
 from .tube8 import Tube8IE
+from .mofosex import MofosexEmbedIE
+from .spankwire import SpankwireIE
+from .youporn import YouPornIE
 from .vimeo import VimeoIE
 from .dailymotion import DailymotionIE
 from .dailymail import DailyMailIE
 from .vimeo import VimeoIE
 from .dailymotion import DailymotionIE
 from .dailymail import DailyMailIE
@@ -1705,6 +1708,15 @@ class GenericIE(InfoExtractor):
             },
             'add_ie': ['Kaltura'],
         },
             },
             'add_ie': ['Kaltura'],
         },
+        {
+            # multiple kaltura embeds, nsfw
+            'url': 'https://www.quartier-rouge.be/prive/femmes/kamila-avec-video-jaime-sadomie.html',
+            'info_dict': {
+                'id': 'kamila-avec-video-jaime-sadomie',
+                'title': "Kamila avec vídeo “J'aime sadomie”",
+            },
+            'playlist_count': 8,
+        },
         {
             # Non-standard Vimeo embed
             'url': 'https://openclassrooms.com/courses/understanding-the-web',
         {
             # Non-standard Vimeo embed
             'url': 'https://openclassrooms.com/courses/understanding-the-web',
@@ -2098,6 +2110,9 @@ class GenericIE(InfoExtractor):
                 'ext': 'mp4',
                 'title': 'Smoky Barbecue Favorites',
                 'thumbnail': r're:^https?://.*\.jpe?g',
                 'ext': 'mp4',
                 'title': 'Smoky Barbecue Favorites',
                 'thumbnail': r're:^https?://.*\.jpe?g',
+                'description': 'md5:5ff01e76316bd8d46508af26dc86023b',
+                'upload_date': '20170909',
+                'timestamp': 1504915200,
             },
             'add_ie': [ZypeIE.ie_key()],
             'params': {
             },
             'add_ie': [ZypeIE.ie_key()],
             'params': {
@@ -2284,7 +2299,7 @@ def _real_extract(self, url):
 
         if head_response is not False:
             # Check for redirect
 
         if head_response is not False:
             # Check for redirect
-            new_url = compat_str(head_response.geturl())
+            new_url = head_response.geturl()
             if url != new_url:
                 self.report_following_redirect(new_url)
                 if force_videoid:
             if url != new_url:
                 self.report_following_redirect(new_url)
                 if force_videoid:
@@ -2384,12 +2399,12 @@ def _real_extract(self, url):
                 return self.playlist_result(
                     self._parse_xspf(
                         doc, video_id, xspf_url=url,
                 return self.playlist_result(
                     self._parse_xspf(
                         doc, video_id, xspf_url=url,
-                        xspf_base_url=compat_str(full_response.geturl())),
+                        xspf_base_url=full_response.geturl()),
                     video_id)
             elif re.match(r'(?i)^(?:{[^}]+})?MPD$', doc.tag):
                 info_dict['formats'] = self._parse_mpd_formats(
                     doc,
                     video_id)
             elif re.match(r'(?i)^(?:{[^}]+})?MPD$', doc.tag):
                 info_dict['formats'] = self._parse_mpd_formats(
                     doc,
-                    mpd_base_url=compat_str(full_response.geturl()).rpartition('/')[0],
+                    mpd_base_url=full_response.geturl().rpartition('/')[0],
                     mpd_url=url)
                 self._sort_formats(info_dict['formats'])
                 return info_dict
                     mpd_url=url)
                 self._sort_formats(info_dict['formats'])
                 return info_dict
@@ -2533,15 +2548,21 @@ def _real_extract(self, url):
             return self.playlist_from_matches(
                 dailymail_urls, video_id, video_title, ie=DailyMailIE.ie_key())
 
             return self.playlist_from_matches(
                 dailymail_urls, video_id, video_title, ie=DailyMailIE.ie_key())
 
+        # Look for Teachable embeds, must be before Wistia
+        teachable_url = TeachableIE._extract_url(webpage, url)
+        if teachable_url:
+            return self.url_result(teachable_url)
+
         # Look for embedded Wistia player
         # Look for embedded Wistia player
-        wistia_url = WistiaIE._extract_url(webpage)
-        if wistia_url:
-            return {
-                '_type': 'url_transparent',
-                'url': self._proto_relative_url(wistia_url),
-                'ie_key': WistiaIE.ie_key(),
-                'uploader': video_uploader,
-            }
+        wistia_urls = WistiaIE._extract_urls(webpage)
+        if wistia_urls:
+            playlist = self.playlist_from_matches(wistia_urls, video_id, video_title, ie=WistiaIE.ie_key())
+            for entry in playlist['entries']:
+                entry.update({
+                    '_type': 'url_transparent',
+                    'uploader': video_uploader,
+                })
+            return playlist
 
         # Look for SVT player
         svt_url = SVTIE._extract_url(webpage)
 
         # Look for SVT player
         svt_url = SVTIE._extract_url(webpage)
@@ -2706,6 +2727,21 @@ def _real_extract(self, url):
         if tube8_urls:
             return self.playlist_from_matches(tube8_urls, video_id, video_title, ie=Tube8IE.ie_key())
 
         if tube8_urls:
             return self.playlist_from_matches(tube8_urls, video_id, video_title, ie=Tube8IE.ie_key())
 
+        # Look for embedded Mofosex player
+        mofosex_urls = MofosexEmbedIE._extract_urls(webpage)
+        if mofosex_urls:
+            return self.playlist_from_matches(mofosex_urls, video_id, video_title, ie=MofosexEmbedIE.ie_key())
+
+        # Look for embedded Spankwire player
+        spankwire_urls = SpankwireIE._extract_urls(webpage)
+        if spankwire_urls:
+            return self.playlist_from_matches(spankwire_urls, video_id, video_title, ie=SpankwireIE.ie_key())
+
+        # Look for embedded YouPorn player
+        youporn_urls = YouPornIE._extract_urls(webpage)
+        if youporn_urls:
+            return self.playlist_from_matches(youporn_urls, video_id, video_title, ie=YouPornIE.ie_key())
+
         # Look for embedded Tvigle player
         mobj = re.search(
             r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//cloud\.tvigle\.ru/video/.+?)\1', webpage)
         # Look for embedded Tvigle player
         mobj = re.search(
             r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//cloud\.tvigle\.ru/video/.+?)\1', webpage)
@@ -2817,9 +2853,12 @@ def _real_extract(self, url):
             return self.url_result(mobj.group('url'), 'Zapiks')
 
         # Look for Kaltura embeds
             return self.url_result(mobj.group('url'), 'Zapiks')
 
         # Look for Kaltura embeds
-        kaltura_url = KalturaIE._extract_url(webpage)
-        if kaltura_url:
-            return self.url_result(smuggle_url(kaltura_url, {'source_url': url}), KalturaIE.ie_key())
+        kaltura_urls = KalturaIE._extract_urls(webpage)
+        if kaltura_urls:
+            return self.playlist_from_matches(
+                kaltura_urls, video_id, video_title,
+                getter=lambda x: smuggle_url(x, {'source_url': url}),
+                ie=KalturaIE.ie_key())
 
         # Look for EaglePlatform embeds
         eagleplatform_url = EaglePlatformIE._extract_url(webpage)
 
         # Look for EaglePlatform embeds
         eagleplatform_url = EaglePlatformIE._extract_url(webpage)
@@ -2960,7 +2999,7 @@ def _real_extract(self, url):
 
         # Look for VODPlatform embeds
         mobj = re.search(
 
         # Look for VODPlatform embeds
         mobj = re.search(
-            r'<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//(?:www\.)?vod-platform\.net/[eE]mbed/.+?)\1',
+            r'<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//(?:(?:www\.)?vod-platform\.net|embed\.kwikmotion\.com)/[eE]mbed/.+?)\1',
             webpage)
         if mobj is not None:
             return self.url_result(
             webpage)
         if mobj is not None:
             return self.url_result(
@@ -3137,10 +3176,6 @@ def _real_extract(self, url):
             return self.playlist_from_matches(
                 peertube_urls, video_id, video_title, ie=PeerTubeIE.ie_key())
 
             return self.playlist_from_matches(
                 peertube_urls, video_id, video_title, ie=PeerTubeIE.ie_key())
 
-        teachable_url = TeachableIE._extract_url(webpage, url)
-        if teachable_url:
-            return self.url_result(teachable_url)
-
         indavideo_urls = IndavideoEmbedIE._extract_urls(webpage)
         if indavideo_urls:
             return self.playlist_from_matches(
         indavideo_urls = IndavideoEmbedIE._extract_urls(webpage)
         if indavideo_urls:
             return self.playlist_from_matches(
index 6a1b1e96ebf4dc59f7f5e13dc18e3ee08ef1110c..c6477958d2766704ade1ba25bc2ed68676655889 100644 (file)
 
 
 class GiantBombIE(InfoExtractor):
 
 
 class GiantBombIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?giantbomb\.com/videos/(?P<display_id>[^/]+)/(?P<id>\d+-\d+)'
-    _TEST = {
+    _VALID_URL = r'https?://(?:www\.)?giantbomb\.com/(?:videos|shows)/(?P<display_id>[^/]+)/(?P<id>\d+-\d+)'
+    _TESTS = [{
         'url': 'http://www.giantbomb.com/videos/quick-look-destiny-the-dark-below/2300-9782/',
         'url': 'http://www.giantbomb.com/videos/quick-look-destiny-the-dark-below/2300-9782/',
-        'md5': 'c8ea694254a59246a42831155dec57ac',
+        'md5': '132f5a803e7e0ab0e274d84bda1e77ae',
         'info_dict': {
             'id': '2300-9782',
             'display_id': 'quick-look-destiny-the-dark-below',
         'info_dict': {
             'id': '2300-9782',
             'display_id': 'quick-look-destiny-the-dark-below',
@@ -26,7 +26,10 @@ class GiantBombIE(InfoExtractor):
             'duration': 2399,
             'thumbnail': r're:^https?://.*\.jpg$',
         }
             'duration': 2399,
             'thumbnail': r're:^https?://.*\.jpg$',
         }
-    }
+    }, {
+        'url': 'https://www.giantbomb.com/shows/ben-stranding/2970-20212',
+        'only_matching': True,
+    }]
 
     def _real_extract(self, url):
         mobj = re.match(self._VALID_URL, url)
 
     def _real_extract(self, url):
         mobj = re.match(self._VALID_URL, url)
index 0ee8ea712c72e618a4d7544f26c376e94fcaf70d..fae4251034da29dd252557f86528531b8fcb9235 100644 (file)
@@ -1,12 +1,11 @@
 from __future__ import unicode_literals
 
 from __future__ import unicode_literals
 
-import re
-
 from .common import InfoExtractor
 from ..utils import (
 from .common import InfoExtractor
 from ..utils import (
-    js_to_json,
+    int_or_none,
+    merge_dicts,
     remove_end,
     remove_end,
-    determine_ext,
+    unified_timestamp,
 )
 
 
 )
 
 
@@ -14,15 +13,21 @@ class HellPornoIE(InfoExtractor):
     _VALID_URL = r'https?://(?:www\.)?hellporno\.(?:com/videos|net/v)/(?P<id>[^/]+)'
     _TESTS = [{
         'url': 'http://hellporno.com/videos/dixie-is-posing-with-naked-ass-very-erotic/',
     _VALID_URL = r'https?://(?:www\.)?hellporno\.(?:com/videos|net/v)/(?P<id>[^/]+)'
     _TESTS = [{
         'url': 'http://hellporno.com/videos/dixie-is-posing-with-naked-ass-very-erotic/',
-        'md5': '1fee339c610d2049699ef2aa699439f1',
+        'md5': 'f0a46ebc0bed0c72ae8fe4629f7de5f3',
         'info_dict': {
             'id': '149116',
             'display_id': 'dixie-is-posing-with-naked-ass-very-erotic',
             'ext': 'mp4',
             'title': 'Dixie is posing with naked ass very erotic',
         'info_dict': {
             'id': '149116',
             'display_id': 'dixie-is-posing-with-naked-ass-very-erotic',
             'ext': 'mp4',
             'title': 'Dixie is posing with naked ass very erotic',
+            'description': 'md5:9a72922749354edb1c4b6e540ad3d215',
+            'categories': list,
             'thumbnail': r're:https?://.*\.jpg$',
             'thumbnail': r're:https?://.*\.jpg$',
+            'duration': 240,
+            'timestamp': 1398762720,
+            'upload_date': '20140429',
+            'view_count': int,
             'age_limit': 18,
             'age_limit': 18,
-        }
+        },
     }, {
         'url': 'http://hellporno.net/v/186271/',
         'only_matching': True,
     }, {
         'url': 'http://hellporno.net/v/186271/',
         'only_matching': True,
@@ -36,40 +41,36 @@ def _real_extract(self, url):
         title = remove_end(self._html_search_regex(
             r'<title>([^<]+)</title>', webpage, 'title'), ' - Hell Porno')
 
         title = remove_end(self._html_search_regex(
             r'<title>([^<]+)</title>', webpage, 'title'), ' - Hell Porno')
 
-        flashvars = self._parse_json(self._search_regex(
-            r'var\s+flashvars\s*=\s*({.+?});', webpage, 'flashvars'),
-            display_id, transform_source=js_to_json)
-
-        video_id = flashvars.get('video_id')
-        thumbnail = flashvars.get('preview_url')
-        ext = determine_ext(flashvars.get('postfix'), 'mp4')
-
-        formats = []
-        for video_url_key in ['video_url', 'video_alt_url']:
-            video_url = flashvars.get(video_url_key)
-            if not video_url:
-                continue
-            video_text = flashvars.get('%s_text' % video_url_key)
-            fmt = {
-                'url': video_url,
-                'ext': ext,
-                'format_id': video_text,
-            }
-            m = re.search(r'^(?P<height>\d+)[pP]', video_text)
-            if m:
-                fmt['height'] = int(m.group('height'))
-            formats.append(fmt)
-        self._sort_formats(formats)
+        info = self._parse_html5_media_entries(url, webpage, display_id)[0]
+        self._sort_formats(info['formats'])
 
 
-        categories = self._html_search_meta(
-            'keywords', webpage, 'categories', default='').split(',')
+        video_id = self._search_regex(
+            (r'chs_object\s*=\s*["\'](\d+)',
+             r'params\[["\']video_id["\']\]\s*=\s*(\d+)'), webpage, 'video id',
+            default=display_id)
+        description = self._search_regex(
+            r'class=["\']desc_video_view_v2[^>]+>([^<]+)', webpage,
+            'description', fatal=False)
+        categories = [
+            c.strip()
+            for c in self._html_search_meta(
+                'keywords', webpage, 'categories', default='').split(',')
+            if c.strip()]
+        duration = int_or_none(self._og_search_property(
+            'video:duration', webpage, fatal=False))
+        timestamp = unified_timestamp(self._og_search_property(
+            'video:release_date', webpage, fatal=False))
+        view_count = int_or_none(self._search_regex(
+            r'>Views\s+(\d+)', webpage, 'view count', fatal=False))
 
 
-        return {
+        return merge_dicts(info, {
             'id': video_id,
             'display_id': display_id,
             'title': title,
             'id': video_id,
             'display_id': display_id,
             'title': title,
-            'thumbnail': thumbnail,
+            'description': description,
             'categories': categories,
             'categories': categories,
+            'duration': duration,
+            'timestamp': timestamp,
+            'view_count': view_count,
             'age_limit': 18,
             'age_limit': 18,
-            'formats': formats,
-        }
+        })
index 436759da5480da347a578aba6cb388cb01e97448..a31301985b0c7d212886a2e6e495c7d705714041 100644 (file)
@@ -1,5 +1,7 @@
 from __future__ import unicode_literals
 
 from __future__ import unicode_literals
 
+import base64
+import json
 import re
 
 from .common import InfoExtractor
 import re
 
 from .common import InfoExtractor
@@ -8,6 +10,7 @@
     mimetype2ext,
     parse_duration,
     qualities,
     mimetype2ext,
     parse_duration,
     qualities,
+    try_get,
     url_or_none,
 )
 
     url_or_none,
 )
 
 class ImdbIE(InfoExtractor):
     IE_NAME = 'imdb'
     IE_DESC = 'Internet Movie Database trailers'
 class ImdbIE(InfoExtractor):
     IE_NAME = 'imdb'
     IE_DESC = 'Internet Movie Database trailers'
-    _VALID_URL = r'https?://(?:www|m)\.imdb\.com/(?:video|title|list).+?[/-]vi(?P<id>\d+)'
+    _VALID_URL = r'https?://(?:www|m)\.imdb\.com/(?:video|title|list).*?[/-]vi(?P<id>\d+)'
 
     _TESTS = [{
         'url': 'http://www.imdb.com/video/imdb/vi2524815897',
         'info_dict': {
             'id': '2524815897',
             'ext': 'mp4',
 
     _TESTS = [{
         'url': 'http://www.imdb.com/video/imdb/vi2524815897',
         'info_dict': {
             'id': '2524815897',
             'ext': 'mp4',
-            'title': 'No. 2 from Ice Age: Continental Drift (2012)',
+            'title': 'No. 2',
             'description': 'md5:87bd0bdc61e351f21f20d2d7441cb4e7',
             'description': 'md5:87bd0bdc61e351f21f20d2d7441cb4e7',
+            'duration': 152,
         }
     }, {
         'url': 'http://www.imdb.com/video/_/vi2524815897',
         }
     }, {
         'url': 'http://www.imdb.com/video/_/vi2524815897',
@@ -47,21 +51,23 @@ class ImdbIE(InfoExtractor):
 
     def _real_extract(self, url):
         video_id = self._match_id(url)
 
     def _real_extract(self, url):
         video_id = self._match_id(url)
-        webpage = self._download_webpage(
-            'https://www.imdb.com/videoplayer/vi' + video_id, video_id)
-        video_metadata = self._parse_json(self._search_regex(
-            r'window\.IMDbReactInitialState\.push\(({.+?})\);', webpage,
-            'video metadata'), video_id)['videos']['videoMetadata']['vi' + video_id]
-        title = self._html_search_meta(
-            ['og:title', 'twitter:title'], webpage) or self._html_search_regex(
-            r'<title>(.+?)</title>', webpage, 'title', fatal=False) or video_metadata['title']
+
+        data = self._download_json(
+            'https://www.imdb.com/ve/data/VIDEO_PLAYBACK_DATA', video_id,
+            query={
+                'key': base64.b64encode(json.dumps({
+                    'type': 'VIDEO_PLAYER',
+                    'subType': 'FORCE_LEGACY',
+                    'id': 'vi%s' % video_id,
+                }).encode()).decode(),
+            })[0]
 
         quality = qualities(('SD', '480p', '720p', '1080p'))
         formats = []
 
         quality = qualities(('SD', '480p', '720p', '1080p'))
         formats = []
-        for encoding in video_metadata.get('encodings', []):
+        for encoding in data['videoLegacyEncodings']:
             if not encoding or not isinstance(encoding, dict):
                 continue
             if not encoding or not isinstance(encoding, dict):
                 continue
-            video_url = url_or_none(encoding.get('videoUrl'))
+            video_url = url_or_none(encoding.get('url'))
             if not video_url:
                 continue
             ext = mimetype2ext(encoding.get(
             if not video_url:
                 continue
             ext = mimetype2ext(encoding.get(
@@ -69,7 +75,7 @@ def _real_extract(self, url):
             if ext == 'm3u8':
                 formats.extend(self._extract_m3u8_formats(
                     video_url, video_id, 'mp4', entry_protocol='m3u8_native',
             if ext == 'm3u8':
                 formats.extend(self._extract_m3u8_formats(
                     video_url, video_id, 'mp4', entry_protocol='m3u8_native',
-                    m3u8_id='hls', fatal=False))
+                    preference=1, m3u8_id='hls', fatal=False))
                 continue
             format_id = encoding.get('definition')
             formats.append({
                 continue
             format_id = encoding.get('definition')
             formats.append({
@@ -80,13 +86,33 @@ def _real_extract(self, url):
             })
         self._sort_formats(formats)
 
             })
         self._sort_formats(formats)
 
+        webpage = self._download_webpage(
+            'https://www.imdb.com/video/vi' + video_id, video_id)
+        video_metadata = self._parse_json(self._search_regex(
+            r'args\.push\(\s*({.+?})\s*\)\s*;', webpage,
+            'video metadata'), video_id)
+
+        video_info = video_metadata.get('VIDEO_INFO')
+        if video_info and isinstance(video_info, dict):
+            info = try_get(
+                video_info, lambda x: x[list(video_info.keys())[0]][0], dict)
+        else:
+            info = {}
+
+        title = self._html_search_meta(
+            ['og:title', 'twitter:title'], webpage) or self._html_search_regex(
+            r'<title>(.+?)</title>', webpage, 'title',
+            default=None) or info['videoTitle']
+
         return {
             'id': video_id,
             'title': title,
         return {
             'id': video_id,
             'title': title,
+            'alt_title': info.get('videoSubTitle'),
             'formats': formats,
             'formats': formats,
-            'description': video_metadata.get('description'),
-            'thumbnail': video_metadata.get('slate', {}).get('url'),
-            'duration': parse_duration(video_metadata.get('duration')),
+            'description': info.get('videoDescription'),
+            'thumbnail': url_or_none(try_get(
+                video_metadata, lambda x: x['videoSlate']['source'])),
+            'duration': parse_duration(info.get('videoRuntime')),
         }
 
 
         }
 
 
index 2b5b2b5b0b303aa4c1b6bdb4a6e1226dea11e218..4c16243ec1976676391a5e07a3b35e1140a5ec7a 100644 (file)
@@ -58,7 +58,7 @@ def _real_extract(self, url):
         video_id = self._match_id(url)
 
         video = self._download_json(
         video_id = self._match_id(url)
 
         video = self._download_json(
-            'http://amfphp.indavideo.hu/SYm0json.php/player.playerHandler.getVideoData/%s' % video_id,
+            'https://amfphp.indavideo.hu/SYm0json.php/player.playerHandler.getVideoData/%s' % video_id,
             video_id)['data']
 
         title = video['title']
             video_id)['data']
 
         title = video['title']
index 11bbeb5922a9d85e05977196c822a076c8b45ec3..53a550c11e4407813deb12f646a0c714436862b5 100644 (file)
@@ -16,12 +16,22 @@ class IPrimaIE(InfoExtractor):
     _GEO_BYPASS = False
 
     _TESTS = [{
     _GEO_BYPASS = False
 
     _TESTS = [{
-        'url': 'http://play.iprima.cz/gondici-s-r-o-33',
+        'url': 'https://prima.iprima.cz/particka/92-epizoda',
         'info_dict': {
         'info_dict': {
-            'id': 'p136534',
+            'id': 'p51388',
             'ext': 'mp4',
             'ext': 'mp4',
-            'title': 'Gondíci s. r. o. (34)',
-            'description': 'md5:16577c629d006aa91f59ca8d8e7f99bd',
+            'title': 'Partička (92)',
+            'description': 'md5:859d53beae4609e6dd7796413f1b6cac',
+        },
+        'params': {
+            'skip_download': True,  # m3u8 download
+        },
+    }, {
+        'url': 'https://cnn.iprima.cz/videa/70-epizoda',
+        'info_dict': {
+            'id': 'p681554',
+            'ext': 'mp4',
+            'title': 'HLAVNÍ ZPRÁVY 3.5.2020',
         },
         'params': {
             'skip_download': True,  # m3u8 download
         },
         'params': {
             'skip_download': True,  # m3u8 download
@@ -68,9 +78,15 @@ def _real_extract(self, url):
 
         webpage = self._download_webpage(url, video_id)
 
 
         webpage = self._download_webpage(url, video_id)
 
+        title = self._og_search_title(
+            webpage, default=None) or self._search_regex(
+            r'<h1>([^<]+)', webpage, 'title')
+
         video_id = self._search_regex(
             (r'<iframe[^>]+\bsrc=["\'](?:https?:)?//(?:api\.play-backend\.iprima\.cz/prehravac/embedded|prima\.iprima\.cz/[^/]+/[^/]+)\?.*?\bid=(p\d+)',
         video_id = self._search_regex(
             (r'<iframe[^>]+\bsrc=["\'](?:https?:)?//(?:api\.play-backend\.iprima\.cz/prehravac/embedded|prima\.iprima\.cz/[^/]+/[^/]+)\?.*?\bid=(p\d+)',
-             r'data-product="([^"]+)">'),
+             r'data-product="([^"]+)">',
+             r'id=["\']player-(p\d+)"',
+             r'playerId\s*:\s*["\']player-(p\d+)'),
             webpage, 'real id')
 
         playerpage = self._download_webpage(
             webpage, 'real id')
 
         playerpage = self._download_webpage(
@@ -125,8 +141,8 @@ def extract_formats(format_url, format_key=None, lang=None):
 
         return {
             'id': video_id,
 
         return {
             'id': video_id,
-            'title': self._og_search_title(webpage),
-            'thumbnail': self._og_search_thumbnail(webpage),
+            'title': title,
+            'thumbnail': self._og_search_thumbnail(webpage, default=None),
             'formats': formats,
             'formats': formats,
-            'description': self._og_search_description(webpage),
+            'description': self._og_search_description(webpage, default=None),
         }
         }
index a502e88066850b14de284e6e3ec7d47ea9397f3d..b5a740a01e1d360ca667869edef88f85e3559842 100644 (file)
@@ -239,7 +239,7 @@ def _extract_entries(self, html, compilation_id):
             self.url_result(
                 'http://www.ivi.ru/watch/%s/%s' % (compilation_id, serie), IviIE.ie_key())
             for serie in re.findall(
             self.url_result(
                 'http://www.ivi.ru/watch/%s/%s' % (compilation_id, serie), IviIE.ie_key())
             for serie in re.findall(
-                r'<a href="/watch/%s/(\d+)"[^>]+data-id="\1"' % compilation_id, html)]
+                r'<a\b[^>]+\bhref=["\']/watch/%s/(\d+)["\']' % compilation_id, html)]
 
     def _real_extract(self, url):
         mobj = re.match(self._VALID_URL, url)
 
     def _real_extract(self, url):
         mobj = re.match(self._VALID_URL, url)
diff --git a/youtube_dl/extractor/jpopsukitv.py b/youtube_dl/extractor/jpopsukitv.py
deleted file mode 100644 (file)
index 4b5f346..0000000
+++ /dev/null
@@ -1,68 +0,0 @@
-# coding: utf-8
-from __future__ import unicode_literals
-
-from .common import InfoExtractor
-from ..utils import (
-    int_or_none,
-    unified_strdate,
-)
-
-
-class JpopsukiIE(InfoExtractor):
-    IE_NAME = 'jpopsuki.tv'
-    _VALID_URL = r'https?://(?:www\.)?jpopsuki\.tv/(?:category/)?video/[^/]+/(?P<id>\S+)'
-
-    _TEST = {
-        'url': 'http://www.jpopsuki.tv/video/ayumi-hamasaki---evolution/00be659d23b0b40508169cdee4545771',
-        'md5': '88018c0c1a9b1387940e90ec9e7e198e',
-        'info_dict': {
-            'id': '00be659d23b0b40508169cdee4545771',
-            'ext': 'mp4',
-            'title': 'ayumi hamasaki - evolution',
-            'description': 'Release date: 2001.01.31\r\n浜崎あゆみ - evolution',
-            'thumbnail': 'http://www.jpopsuki.tv/cache/89722c74d2a2ebe58bcac65321c115b2.jpg',
-            'uploader': 'plama_chan',
-            'uploader_id': '404',
-            'upload_date': '20121101'
-        }
-    }
-
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-
-        webpage = self._download_webpage(url, video_id)
-
-        video_url = 'http://www.jpopsuki.tv' + self._html_search_regex(
-            r'<source src="(.*?)" type', webpage, 'video url')
-
-        video_title = self._og_search_title(webpage)
-        description = self._og_search_description(webpage)
-        thumbnail = self._og_search_thumbnail(webpage)
-        uploader = self._html_search_regex(
-            r'<li>from: <a href="/user/view/user/(.*?)/uid/',
-            webpage, 'video uploader', fatal=False)
-        uploader_id = self._html_search_regex(
-            r'<li>from: <a href="/user/view/user/\S*?/uid/(\d*)',
-            webpage, 'video uploader_id', fatal=False)
-        upload_date = unified_strdate(self._html_search_regex(
-            r'<li>uploaded: (.*?)</li>', webpage, 'video upload_date',
-            fatal=False))
-        view_count_str = self._html_search_regex(
-            r'<li>Hits: ([0-9]+?)</li>', webpage, 'video view_count',
-            fatal=False)
-        comment_count_str = self._html_search_regex(
-            r'<h2>([0-9]+?) comments</h2>', webpage, 'video comment_count',
-            fatal=False)
-
-        return {
-            'id': video_id,
-            'url': video_url,
-            'title': video_title,
-            'description': description,
-            'thumbnail': thumbnail,
-            'uploader': uploader,
-            'uploader_id': uploader_id,
-            'upload_date': upload_date,
-            'view_count': int_or_none(view_count_str),
-            'comment_count': int_or_none(comment_count_str),
-        }
index 2aabd98b5bbaf36efecdc76415bf52854705bd1e..c34b5f5e6bd9e7d38e762f5d82f3669ac2c438a2 100644 (file)
@@ -4,6 +4,7 @@
 import re
 
 from .common import InfoExtractor
 import re
 
 from .common import InfoExtractor
+from ..utils import unsmuggle_url
 
 
 class JWPlatformIE(InfoExtractor):
 
 
 class JWPlatformIE(InfoExtractor):
@@ -32,10 +33,14 @@ def _extract_url(webpage):
     @staticmethod
     def _extract_urls(webpage):
         return re.findall(
     @staticmethod
     def _extract_urls(webpage):
         return re.findall(
-            r'<(?:script|iframe)[^>]+?src=["\']((?:https?:)?//content\.jwplatform\.com/players/[a-zA-Z0-9]{8})',
+            r'<(?:script|iframe)[^>]+?src=["\']((?:https?:)?//(?:content\.jwplatform|cdn\.jwplayer)\.com/players/[a-zA-Z0-9]{8})',
             webpage)
 
     def _real_extract(self, url):
             webpage)
 
     def _real_extract(self, url):
+        url, smuggled_data = unsmuggle_url(url, {})
+        self._initialize_geo_bypass({
+            'countries': smuggled_data.get('geo_countries'),
+        })
         video_id = self._match_id(url)
         json_data = self._download_json('https://cdn.jwplayer.com/v2/media/' + video_id, video_id)
         return self._parse_jwplayer_data(json_data, video_id)
         video_id = self._match_id(url)
         json_data = self._download_json('https://cdn.jwplayer.com/v2/media/' + video_id, video_id)
         return self._parse_jwplayer_data(json_data, video_id)
index 2d38b758b72a852c6d9718f0537c62e7c215e903..49d13460df7f0edd4d2a08f97deaf831ba9d6a46 100644 (file)
@@ -113,9 +113,14 @@ class KalturaIE(InfoExtractor):
 
     @staticmethod
     def _extract_url(webpage):
 
     @staticmethod
     def _extract_url(webpage):
+        urls = KalturaIE._extract_urls(webpage)
+        return urls[0] if urls else None
+
+    @staticmethod
+    def _extract_urls(webpage):
         # Embed codes: https://knowledge.kaltura.com/embedding-kaltura-media-players-your-site
         # Embed codes: https://knowledge.kaltura.com/embedding-kaltura-media-players-your-site
-        mobj = (
-            re.search(
+        finditer = (
+            re.finditer(
                 r"""(?xs)
                     kWidget\.(?:thumb)?[Ee]mbed\(
                     \{.*?
                 r"""(?xs)
                     kWidget\.(?:thumb)?[Ee]mbed\(
                     \{.*?
@@ -124,7 +129,7 @@ def _extract_url(webpage):
                         (?P<q3>['"])entry_?[Ii]d(?P=q3)\s*:\s*
                         (?P<q4>['"])(?P<id>(?:(?!(?P=q4)).)+)(?P=q4)(?:,|\s*\})
                 """, webpage)
                         (?P<q3>['"])entry_?[Ii]d(?P=q3)\s*:\s*
                         (?P<q4>['"])(?P<id>(?:(?!(?P=q4)).)+)(?P=q4)(?:,|\s*\})
                 """, webpage)
-            or re.search(
+            or re.finditer(
                 r'''(?xs)
                     (?P<q1>["'])
                         (?:https?:)?//cdnapi(?:sec)?\.kaltura\.com(?::\d+)?/(?:(?!(?P=q1)).)*\b(?:p|partner_id)/(?P<partner_id>\d+)(?:(?!(?P=q1)).)*
                 r'''(?xs)
                     (?P<q1>["'])
                         (?:https?:)?//cdnapi(?:sec)?\.kaltura\.com(?::\d+)?/(?:(?!(?P=q1)).)*\b(?:p|partner_id)/(?P<partner_id>\d+)(?:(?!(?P=q1)).)*
@@ -138,7 +143,7 @@ def _extract_url(webpage):
                     )
                     (?P<q3>["'])(?P<id>(?:(?!(?P=q3)).)+)(?P=q3)
                 ''', webpage)
                     )
                     (?P<q3>["'])(?P<id>(?:(?!(?P=q3)).)+)(?P=q3)
                 ''', webpage)
-            or re.search(
+            or re.finditer(
                 r'''(?xs)
                     <(?:iframe[^>]+src|meta[^>]+\bcontent)=(?P<q1>["'])
                       (?:https?:)?//(?:(?:www|cdnapi(?:sec)?)\.)?kaltura\.com/(?:(?!(?P=q1)).)*\b(?:p|partner_id)/(?P<partner_id>\d+)
                 r'''(?xs)
                     <(?:iframe[^>]+src|meta[^>]+\bcontent)=(?P<q1>["'])
                       (?:https?:)?//(?:(?:www|cdnapi(?:sec)?)\.)?kaltura\.com/(?:(?!(?P=q1)).)*\b(?:p|partner_id)/(?P<partner_id>\d+)
@@ -148,7 +153,8 @@ def _extract_url(webpage):
                     (?P=q1)
                 ''', webpage)
         )
                     (?P=q1)
                 ''', webpage)
         )
-        if mobj:
+        urls = []
+        for mobj in finditer:
             embed_info = mobj.groupdict()
             for k, v in embed_info.items():
                 if v:
             embed_info = mobj.groupdict()
             for k, v in embed_info.items():
                 if v:
@@ -160,7 +166,8 @@ def _extract_url(webpage):
                 webpage)
             if service_mobj:
                 url = smuggle_url(url, {'service_url': service_mobj.group('id')})
                 webpage)
             if service_mobj:
                 url = smuggle_url(url, {'service_url': service_mobj.group('id')})
-            return url
+            urls.append(url)
+        return urls
 
     def _kaltura_api_call(self, video_id, actions, service_url=None, *args, **kwargs):
         params = actions[0]
 
     def _kaltura_api_call(self, video_id, actions, service_url=None, *args, **kwargs):
         params = actions[0]
index 6ed7da4abaa7a2a45f924b4bf9f919261a40bec9..1b2dcef46621237fd7c7ce376165a6bc5c674606 100644 (file)
@@ -4,7 +4,6 @@
 import re
 
 from .common import InfoExtractor
 import re
 
 from .common import InfoExtractor
-from ..compat import compat_str
 from ..utils import (
     clean_html,
     determine_ext,
 from ..utils import (
     clean_html,
     determine_ext,
@@ -36,7 +35,7 @@ def _login(self):
             self._LOGIN_URL, None, 'Downloading login popup')
 
         def is_logged(url_handle):
             self._LOGIN_URL, None, 'Downloading login popup')
 
         def is_logged(url_handle):
-            return self._LOGIN_URL not in compat_str(url_handle.geturl())
+            return self._LOGIN_URL not in url_handle.geturl()
 
         # Already logged in
         if is_logged(urlh):
 
         # Already logged in
         if is_logged(urlh):
index b312e77f1abd5d4a05f43c763b7c9f56aefbd0e5..1e3c19dfd65b442ee7b4f3abf1803022c5975d8e 100644 (file)
@@ -2,23 +2,24 @@
 from __future__ import unicode_literals
 
 import re
 from __future__ import unicode_literals
 
 import re
+import uuid
 
 from .common import InfoExtractor
 
 from .common import InfoExtractor
-from ..compat import compat_str
+from ..compat import compat_HTTPError
 from ..utils import (
 from ..utils import (
-    unescapeHTML,
-    parse_duration,
-    get_element_by_class,
+    ExtractorError,
+    int_or_none,
+    qualities,
 )
 
 
 class LEGOIE(InfoExtractor):
 )
 
 
 class LEGOIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?lego\.com/(?P<locale>[^/]+)/(?:[^/]+/)*videos/(?:[^/]+/)*[^/?#]+-(?P<id>[0-9a-f]+)'
+    _VALID_URL = r'https?://(?:www\.)?lego\.com/(?P<locale>[a-z]{2}-[a-z]{2})/(?:[^/]+/)*videos/(?:[^/]+/)*[^/?#]+-(?P<id>[0-9a-f]{32})'
     _TESTS = [{
         'url': 'http://www.lego.com/en-us/videos/themes/club/blocumentary-kawaguchi-55492d823b1b4d5e985787fa8c2973b1',
         'md5': 'f34468f176cfd76488767fc162c405fa',
         'info_dict': {
     _TESTS = [{
         'url': 'http://www.lego.com/en-us/videos/themes/club/blocumentary-kawaguchi-55492d823b1b4d5e985787fa8c2973b1',
         'md5': 'f34468f176cfd76488767fc162c405fa',
         'info_dict': {
-            'id': '55492d823b1b4d5e985787fa8c2973b1',
+            'id': '55492d82-3b1b-4d5e-9857-87fa8c2973b1_en-US',
             'ext': 'mp4',
             'title': 'Blocumentary Great Creations: Akiyuki Kawaguchi',
             'description': 'Blocumentary Great Creations: Akiyuki Kawaguchi',
             'ext': 'mp4',
             'title': 'Blocumentary Great Creations: Akiyuki Kawaguchi',
             'description': 'Blocumentary Great Creations: Akiyuki Kawaguchi',
@@ -26,103 +27,123 @@ class LEGOIE(InfoExtractor):
     }, {
         # geo-restricted but the contentUrl contain a valid url
         'url': 'http://www.lego.com/nl-nl/videos/themes/nexoknights/episode-20-kingdom-of-heroes-13bdc2299ab24d9685701a915b3d71e7##sp=399',
     }, {
         # geo-restricted but the contentUrl contain a valid url
         'url': 'http://www.lego.com/nl-nl/videos/themes/nexoknights/episode-20-kingdom-of-heroes-13bdc2299ab24d9685701a915b3d71e7##sp=399',
-        'md5': '4c3fec48a12e40c6e5995abc3d36cc2e',
+        'md5': 'c7420221f7ffd03ff056f9db7f8d807c',
         'info_dict': {
         'info_dict': {
-            'id': '13bdc2299ab24d9685701a915b3d71e7',
+            'id': '13bdc229-9ab2-4d96-8570-1a915b3d71e7_nl-NL',
             'ext': 'mp4',
             'ext': 'mp4',
-            'title': 'Aflevering 20 - Helden van het koninkrijk',
+            'title': 'Aflevering 20 Helden van het koninkrijk',
             'description': 'md5:8ee499aac26d7fa8bcb0cedb7f9c3941',
             'description': 'md5:8ee499aac26d7fa8bcb0cedb7f9c3941',
+            'age_limit': 5,
         },
     }, {
         },
     }, {
-        # special characters in title
-        'url': 'http://www.lego.com/en-us/starwars/videos/lego-star-wars-force-surprise-9685ee9d12e84ff38e84b4e3d0db533d',
+        # with subtitle
+        'url': 'https://www.lego.com/nl-nl/kids/videos/classic/creative-storytelling-the-little-puppy-aa24f27c7d5242bc86102ebdc0f24cba',
         'info_dict': {
         'info_dict': {
-            'id': '9685ee9d12e84ff38e84b4e3d0db533d',
+            'id': 'aa24f27c-7d52-42bc-8610-2ebdc0f24cba_nl-NL',
             'ext': 'mp4',
             'ext': 'mp4',
-            'title': 'Force Surprise – LEGO® Star Wars™ Microfighters',
-            'description': 'md5:9c673c96ce6f6271b88563fe9dc56de3',
+            'title': 'De kleine puppy',
+            'description': 'md5:5b725471f849348ac73f2e12cfb4be06',
+            'age_limit': 1,
+            'subtitles': {
+                'nl': [{
+                    'ext': 'srt',
+                    'url': r're:^https://.+\.srt$',
+                }],
+            },
         },
         'params': {
             'skip_download': True,
         },
     }]
         },
         'params': {
             'skip_download': True,
         },
     }]
-    _BITRATES = [256, 512, 1024, 1536, 2560]
+    _QUALITIES = {
+        'Lowest': (64, 180, 320),
+        'Low': (64, 270, 480),
+        'Medium': (96, 360, 640),
+        'High': (128, 540, 960),
+        'Highest': (128, 720, 1280),
+    }
 
     def _real_extract(self, url):
         locale, video_id = re.match(self._VALID_URL, url).groups()
 
     def _real_extract(self, url):
         locale, video_id = re.match(self._VALID_URL, url).groups()
-        webpage = self._download_webpage(url, video_id)
-        title = get_element_by_class('video-header', webpage).strip()
-        progressive_base = 'https://lc-mediaplayerns-live-s.legocdn.com/'
-        streaming_base = 'http://legoprod-f.akamaihd.net/'
-        content_url = self._html_search_meta('contentUrl', webpage)
-        path = self._search_regex(
-            r'(?:https?:)?//[^/]+/(?:[iz]/s/)?public/(.+)_[0-9,]+\.(?:mp4|webm)',
-            content_url, 'video path', default=None)
-        if not path:
-            player_url = self._proto_relative_url(self._search_regex(
-                r'<iframe[^>]+src="((?:https?)?//(?:www\.)?lego\.com/[^/]+/mediaplayer/video/[^"]+)',
-                webpage, 'player url', default=None))
-            if not player_url:
-                base_url = self._proto_relative_url(self._search_regex(
-                    r'data-baseurl="([^"]+)"', webpage, 'base url',
-                    default='http://www.lego.com/%s/mediaplayer/video/' % locale))
-                player_url = base_url + video_id
-            player_webpage = self._download_webpage(player_url, video_id)
-            video_data = self._parse_json(unescapeHTML(self._search_regex(
-                r"video='([^']+)'", player_webpage, 'video data')), video_id)
-            progressive_base = self._search_regex(
-                r'data-video-progressive-url="([^"]+)"',
-                player_webpage, 'progressive base', default='https://lc-mediaplayerns-live-s.legocdn.com/')
-            streaming_base = self._search_regex(
-                r'data-video-streaming-url="([^"]+)"',
-                player_webpage, 'streaming base', default='http://legoprod-f.akamaihd.net/')
-            item_id = video_data['ItemId']
+        countries = [locale.split('-')[1].upper()]
+        self._initialize_geo_bypass({
+            'countries': countries,
+        })
 
 
-            net_storage_path = video_data.get('NetStoragePath') or '/'.join([item_id[:2], item_id[2:4]])
-            base_path = '_'.join([item_id, video_data['VideoId'], video_data['Locale'], compat_str(video_data['VideoVersion'])])
-            path = '/'.join([net_storage_path, base_path])
-        streaming_path = ','.join(map(lambda bitrate: compat_str(bitrate), self._BITRATES))
+        try:
+            item = self._download_json(
+                # https://contentfeed.services.lego.com/api/v2/item/[VIDEO_ID]?culture=[LOCALE]&contentType=Video
+                'https://services.slingshot.lego.com/mediaplayer/v2',
+                video_id, query={
+                    'videoId': '%s_%s' % (uuid.UUID(video_id), locale),
+                }, headers=self.geo_verification_headers())
+        except ExtractorError as e:
+            if isinstance(e.cause, compat_HTTPError) and e.cause.code == 451:
+                self.raise_geo_restricted(countries=countries)
+            raise
 
 
-        formats = self._extract_akamai_formats(
-            '%si/s/public/%s_,%s,.mp4.csmil/master.m3u8' % (streaming_base, path, streaming_path), video_id)
-        m3u8_formats = list(filter(
-            lambda f: f.get('protocol') == 'm3u8_native' and f.get('vcodec') != 'none',
-            formats))
-        if len(m3u8_formats) == len(self._BITRATES):
-            self._sort_formats(m3u8_formats)
-            for bitrate, m3u8_format in zip(self._BITRATES, m3u8_formats):
-                progressive_base_url = '%spublic/%s_%d.' % (progressive_base, path, bitrate)
-                mp4_f = m3u8_format.copy()
-                mp4_f.update({
-                    'url': progressive_base_url + 'mp4',
-                    'format_id': m3u8_format['format_id'].replace('hls', 'mp4'),
-                    'protocol': 'http',
-                })
-                web_f = {
-                    'url': progressive_base_url + 'webm',
-                    'format_id': m3u8_format['format_id'].replace('hls', 'webm'),
-                    'width': m3u8_format['width'],
-                    'height': m3u8_format['height'],
-                    'tbr': m3u8_format.get('tbr'),
-                    'ext': 'webm',
+        video = item['Video']
+        video_id = video['Id']
+        title = video['Title']
+
+        q = qualities(['Lowest', 'Low', 'Medium', 'High', 'Highest'])
+        formats = []
+        for video_source in item.get('VideoFormats', []):
+            video_source_url = video_source.get('Url')
+            if not video_source_url:
+                continue
+            video_source_format = video_source.get('Format')
+            if video_source_format == 'F4M':
+                formats.extend(self._extract_f4m_formats(
+                    video_source_url, video_id,
+                    f4m_id=video_source_format, fatal=False))
+            elif video_source_format == 'M3U8':
+                formats.extend(self._extract_m3u8_formats(
+                    video_source_url, video_id, 'mp4', 'm3u8_native',
+                    m3u8_id=video_source_format, fatal=False))
+            else:
+                video_source_quality = video_source.get('Quality')
+                format_id = []
+                for v in (video_source_format, video_source_quality):
+                    if v:
+                        format_id.append(v)
+                f = {
+                    'format_id': '-'.join(format_id),
+                    'quality': q(video_source_quality),
+                    'url': video_source_url,
                 }
                 }
-                formats.extend([web_f, mp4_f])
-        else:
-            for bitrate in self._BITRATES:
-                for ext in ('web', 'mp4'):
-                    formats.append({
-                        'format_id': '%s-%s' % (ext, bitrate),
-                        'url': '%spublic/%s_%d.%s' % (progressive_base, path, bitrate, ext),
-                        'tbr': bitrate,
-                        'ext': ext,
-                    })
+                quality = self._QUALITIES.get(video_source_quality)
+                if quality:
+                    f.update({
+                        'abr': quality[0],
+                        'height': quality[1],
+                        'width': quality[2],
+                    }),
+                formats.append(f)
         self._sort_formats(formats)
 
         self._sort_formats(formats)
 
+        subtitles = {}
+        sub_file_id = video.get('SubFileId')
+        if sub_file_id and sub_file_id != '00000000-0000-0000-0000-000000000000':
+            net_storage_path = video.get('NetstoragePath')
+            invariant_id = video.get('InvariantId')
+            video_file_id = video.get('VideoFileId')
+            video_version = video.get('VideoVersion')
+            if net_storage_path and invariant_id and video_file_id and video_version:
+                subtitles.setdefault(locale[:2], []).append({
+                    'url': 'https://lc-mediaplayerns-live-s.legocdn.com/public/%s/%s_%s_%s_%s_sub.srt' % (net_storage_path, invariant_id, video_file_id, locale, video_version),
+                })
+
         return {
             'id': video_id,
             'title': title,
         return {
             'id': video_id,
             'title': title,
-            'description': self._html_search_meta('description', webpage),
-            'thumbnail': self._html_search_meta('thumbnail', webpage),
-            'duration': parse_duration(self._html_search_meta('duration', webpage)),
+            'description': video.get('Description'),
+            'thumbnail': video.get('GeneratedCoverImage') or video.get('GeneratedThumbnail'),
+            'duration': int_or_none(video.get('Length')),
             'formats': formats,
             'formats': formats,
+            'subtitles': subtitles,
+            'age_limit': int_or_none(video.get('AgeFrom')),
+            'season': video.get('SeasonTitle'),
+            'season_number': int_or_none(video.get('Season')) or None,
+            'episode_number': int_or_none(video.get('Episode')) or None,
         }
         }
index 729d8de50fab70cd69bab41fae9db0cba4d7da9b..39f74d2822bc7296df8a5c16e5edfce3298e82ab 100644 (file)
@@ -18,7 +18,6 @@
 
 class LimelightBaseIE(InfoExtractor):
     _PLAYLIST_SERVICE_URL = 'http://production-ps.lvp.llnw.net/r/PlaylistService/%s/%s/%s'
 
 class LimelightBaseIE(InfoExtractor):
     _PLAYLIST_SERVICE_URL = 'http://production-ps.lvp.llnw.net/r/PlaylistService/%s/%s/%s'
-    _API_URL = 'http://api.video.limelight.com/rest/organizations/%s/%s/%s/%s.json'
 
     @classmethod
     def _extract_urls(cls, webpage, source_url):
 
     @classmethod
     def _extract_urls(cls, webpage, source_url):
@@ -70,7 +69,8 @@ def _call_playlist_service(self, item_id, method, fatal=True, referer=None):
         try:
             return self._download_json(
                 self._PLAYLIST_SERVICE_URL % (self._PLAYLIST_SERVICE_PATH, item_id, method),
         try:
             return self._download_json(
                 self._PLAYLIST_SERVICE_URL % (self._PLAYLIST_SERVICE_PATH, item_id, method),
-                item_id, 'Downloading PlaylistService %s JSON' % method, fatal=fatal, headers=headers)
+                item_id, 'Downloading PlaylistService %s JSON' % method,
+                fatal=fatal, headers=headers)
         except ExtractorError as e:
             if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
                 error = self._parse_json(e.cause.read().decode(), item_id)['detail']['contentAccessPermission']
         except ExtractorError as e:
             if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
                 error = self._parse_json(e.cause.read().decode(), item_id)['detail']['contentAccessPermission']
@@ -79,22 +79,22 @@ def _call_playlist_service(self, item_id, method, fatal=True, referer=None):
                 raise ExtractorError(error, expected=True)
             raise
 
                 raise ExtractorError(error, expected=True)
             raise
 
-    def _call_api(self, organization_id, item_id, method):
-        return self._download_json(
-            self._API_URL % (organization_id, self._API_PATH, item_id, method),
-            item_id, 'Downloading API %s JSON' % method)
-
-    def _extract(self, item_id, pc_method, mobile_method, meta_method, referer=None):
+    def _extract(self, item_id, pc_method, mobile_method, referer=None):
         pc = self._call_playlist_service(item_id, pc_method, referer=referer)
         pc = self._call_playlist_service(item_id, pc_method, referer=referer)
-        metadata = self._call_api(pc['orgId'], item_id, meta_method)
-        mobile = self._call_playlist_service(item_id, mobile_method, fatal=False, referer=referer)
-        return pc, mobile, metadata
+        mobile = self._call_playlist_service(
+            item_id, mobile_method, fatal=False, referer=referer)
+        return pc, mobile
+
+    def _extract_info(self, pc, mobile, i, referer):
+        get_item = lambda x, y: try_get(x, lambda x: x[y][i], dict) or {}
+        pc_item = get_item(pc, 'playlistItems')
+        mobile_item = get_item(mobile, 'mediaList')
+        video_id = pc_item.get('mediaId') or mobile_item['mediaId']
+        title = pc_item.get('title') or mobile_item['title']
 
 
-    def _extract_info(self, streams, mobile_urls, properties):
-        video_id = properties['media_id']
         formats = []
         urls = []
         formats = []
         urls = []
-        for stream in streams:
+        for stream in pc_item.get('streams', []):
             stream_url = stream.get('url')
             if not stream_url or stream.get('drmProtected') or stream_url in urls:
                 continue
             stream_url = stream.get('url')
             if not stream_url or stream.get('drmProtected') or stream_url in urls:
                 continue
@@ -155,7 +155,7 @@ def _extract_info(self, streams, mobile_urls, properties):
                     })
                 formats.append(fmt)
 
                     })
                 formats.append(fmt)
 
-        for mobile_url in mobile_urls:
+        for mobile_url in mobile_item.get('mobileUrls', []):
             media_url = mobile_url.get('mobileUrl')
             format_id = mobile_url.get('targetMediaPlatform')
             if not media_url or format_id in ('Widevine', 'SmoothStreaming') or media_url in urls:
             media_url = mobile_url.get('mobileUrl')
             format_id = mobile_url.get('targetMediaPlatform')
             if not media_url or format_id in ('Widevine', 'SmoothStreaming') or media_url in urls:
@@ -179,54 +179,34 @@ def _extract_info(self, streams, mobile_urls, properties):
 
         self._sort_formats(formats)
 
 
         self._sort_formats(formats)
 
-        title = properties['title']
-        description = properties.get('description')
-        timestamp = int_or_none(properties.get('publish_date') or properties.get('create_date'))
-        duration = float_or_none(properties.get('duration_in_milliseconds'), 1000)
-        filesize = int_or_none(properties.get('total_storage_in_bytes'))
-        categories = [properties.get('category')]
-        tags = properties.get('tags', [])
-        thumbnails = [{
-            'url': thumbnail['url'],
-            'width': int_or_none(thumbnail.get('width')),
-            'height': int_or_none(thumbnail.get('height')),
-        } for thumbnail in properties.get('thumbnails', []) if thumbnail.get('url')]
-
         subtitles = {}
         subtitles = {}
-        for caption in properties.get('captions', []):
-            lang = caption.get('language_code')
-            subtitles_url = caption.get('url')
-            if lang and subtitles_url:
-                subtitles.setdefault(lang, []).append({
-                    'url': subtitles_url,
-                })
-        closed_captions_url = properties.get('closed_captions_url')
-        if closed_captions_url:
-            subtitles.setdefault('en', []).append({
-                'url': closed_captions_url,
-                'ext': 'ttml',
-            })
+        for flag in mobile_item.get('flags'):
+            if flag == 'ClosedCaptions':
+                closed_captions = self._call_playlist_service(
+                    video_id, 'getClosedCaptionsDetailsByMediaId',
+                    False, referer) or []
+                for cc in closed_captions:
+                    cc_url = cc.get('webvttFileUrl')
+                    if not cc_url:
+                        continue
+                    lang = cc.get('languageCode') or self._search_regex(r'/[a-z]{2}\.vtt', cc_url, 'lang', default='en')
+                    subtitles.setdefault(lang, []).append({
+                        'url': cc_url,
+                    })
+                break
+
+        get_meta = lambda x: pc_item.get(x) or mobile_item.get(x)
 
         return {
             'id': video_id,
             'title': title,
 
         return {
             'id': video_id,
             'title': title,
-            'description': description,
+            'description': get_meta('description'),
             'formats': formats,
             'formats': formats,
-            'timestamp': timestamp,
-            'duration': duration,
-            'filesize': filesize,
-            'categories': categories,
-            'tags': tags,
-            'thumbnails': thumbnails,
+            'duration': float_or_none(get_meta('durationInMilliseconds'), 1000),
+            'thumbnail': get_meta('previewImageUrl') or get_meta('thumbnailImageUrl'),
             'subtitles': subtitles,
         }
 
             'subtitles': subtitles,
         }
 
-    def _extract_info_helper(self, pc, mobile, i, metadata):
-        return self._extract_info(
-            try_get(pc, lambda x: x['playlistItems'][i]['streams'], list) or [],
-            try_get(mobile, lambda x: x['mediaList'][i]['mobileUrls'], list) or [],
-            metadata)
-
 
 class LimelightMediaIE(LimelightBaseIE):
     IE_NAME = 'limelight'
 
 class LimelightMediaIE(LimelightBaseIE):
     IE_NAME = 'limelight'
@@ -251,8 +231,6 @@ class LimelightMediaIE(LimelightBaseIE):
             'description': 'md5:8005b944181778e313d95c1237ddb640',
             'thumbnail': r're:^https?://.*\.jpeg$',
             'duration': 144.23,
             'description': 'md5:8005b944181778e313d95c1237ddb640',
             'thumbnail': r're:^https?://.*\.jpeg$',
             'duration': 144.23,
-            'timestamp': 1244136834,
-            'upload_date': '20090604',
         },
         'params': {
             # m3u8 download
         },
         'params': {
             # m3u8 download
@@ -268,30 +246,29 @@ class LimelightMediaIE(LimelightBaseIE):
             'title': '3Play Media Overview Video',
             'thumbnail': r're:^https?://.*\.jpeg$',
             'duration': 78.101,
             'title': '3Play Media Overview Video',
             'thumbnail': r're:^https?://.*\.jpeg$',
             'duration': 78.101,
-            'timestamp': 1338929955,
-            'upload_date': '20120605',
-            'subtitles': 'mincount:9',
+            # TODO: extract all languages that were accessible via API
+            # 'subtitles': 'mincount:9',
+            'subtitles': 'mincount:1',
         },
     }, {
         'url': 'https://assets.delvenetworks.com/player/loader.swf?mediaId=8018a574f08d416e95ceaccae4ba0452',
         'only_matching': True,
     }]
     _PLAYLIST_SERVICE_PATH = 'media'
         },
     }, {
         'url': 'https://assets.delvenetworks.com/player/loader.swf?mediaId=8018a574f08d416e95ceaccae4ba0452',
         'only_matching': True,
     }]
     _PLAYLIST_SERVICE_PATH = 'media'
-    _API_PATH = 'media'
 
     def _real_extract(self, url):
         url, smuggled_data = unsmuggle_url(url, {})
         video_id = self._match_id(url)
 
     def _real_extract(self, url):
         url, smuggled_data = unsmuggle_url(url, {})
         video_id = self._match_id(url)
+        source_url = smuggled_data.get('source_url')
         self._initialize_geo_bypass({
             'countries': smuggled_data.get('geo_countries'),
         })
 
         self._initialize_geo_bypass({
             'countries': smuggled_data.get('geo_countries'),
         })
 
-        pc, mobile, metadata = self._extract(
+        pc, mobile = self._extract(
             video_id, 'getPlaylistByMediaId',
             video_id, 'getPlaylistByMediaId',
-            'getMobilePlaylistByMediaId', 'properties',
-            smuggled_data.get('source_url'))
+            'getMobilePlaylistByMediaId', source_url)
 
 
-        return self._extract_info_helper(pc, mobile, 0, metadata)
+        return self._extract_info(pc, mobile, 0, source_url)
 
 
 class LimelightChannelIE(LimelightBaseIE):
 
 
 class LimelightChannelIE(LimelightBaseIE):
@@ -313,6 +290,7 @@ class LimelightChannelIE(LimelightBaseIE):
         'info_dict': {
             'id': 'ab6a524c379342f9b23642917020c082',
             'title': 'Javascript Sample Code',
         'info_dict': {
             'id': 'ab6a524c379342f9b23642917020c082',
             'title': 'Javascript Sample Code',
+            'description': 'Javascript Sample Code - http://www.delvenetworks.com/sample-code/playerCode-demo.html',
         },
         'playlist_mincount': 3,
     }, {
         },
         'playlist_mincount': 3,
     }, {
@@ -320,22 +298,23 @@ class LimelightChannelIE(LimelightBaseIE):
         'only_matching': True,
     }]
     _PLAYLIST_SERVICE_PATH = 'channel'
         'only_matching': True,
     }]
     _PLAYLIST_SERVICE_PATH = 'channel'
-    _API_PATH = 'channels'
 
     def _real_extract(self, url):
         url, smuggled_data = unsmuggle_url(url, {})
         channel_id = self._match_id(url)
 
     def _real_extract(self, url):
         url, smuggled_data = unsmuggle_url(url, {})
         channel_id = self._match_id(url)
+        source_url = smuggled_data.get('source_url')
 
 
-        pc, mobile, medias = self._extract(
+        pc, mobile = self._extract(
             channel_id, 'getPlaylistByChannelId',
             'getMobilePlaylistWithNItemsByChannelId?begin=0&count=-1',
             channel_id, 'getPlaylistByChannelId',
             'getMobilePlaylistWithNItemsByChannelId?begin=0&count=-1',
-            'media', smuggled_data.get('source_url'))
+            source_url)
 
         entries = [
 
         entries = [
-            self._extract_info_helper(pc, mobile, i, medias['media_list'][i])
-            for i in range(len(medias['media_list']))]
+            self._extract_info(pc, mobile, i, source_url)
+            for i in range(len(pc['playlistItems']))]
 
 
-        return self.playlist_result(entries, channel_id, pc['title'])
+        return self.playlist_result(
+            entries, channel_id, pc.get('title'), mobile.get('description'))
 
 
 class LimelightChannelListIE(LimelightBaseIE):
 
 
 class LimelightChannelListIE(LimelightBaseIE):
@@ -368,10 +347,12 @@ class LimelightChannelListIE(LimelightBaseIE):
     def _real_extract(self, url):
         channel_list_id = self._match_id(url)
 
     def _real_extract(self, url):
         channel_list_id = self._match_id(url)
 
-        channel_list = self._call_playlist_service(channel_list_id, 'getMobileChannelListById')
+        channel_list = self._call_playlist_service(
+            channel_list_id, 'getMobileChannelListById')
 
         entries = [
             self.url_result('limelight:channel:%s' % channel['id'], 'LimelightChannel')
             for channel in channel_list['channelList']]
 
 
         entries = [
             self.url_result('limelight:channel:%s' % channel['id'], 'LimelightChannel')
             for channel in channel_list['channelList']]
 
-        return self.playlist_result(entries, channel_list_id, channel_list['title'])
+        return self.playlist_result(
+            entries, channel_list_id, channel_list['title'])
index a78c6556e105220a09b66dac94d8fc27780e3dc5..23ca965d977b1ec682101f048684f20f1b70834c 100644 (file)
@@ -8,7 +8,6 @@
 from ..compat import (
     compat_b64decode,
     compat_HTTPError,
 from ..compat import (
     compat_b64decode,
     compat_HTTPError,
-    compat_str,
 )
 from ..utils import (
     ExtractorError,
 )
 from ..utils import (
     ExtractorError,
@@ -99,7 +98,7 @@ def random_string():
             'sso': 'true',
         })
 
             'sso': 'true',
         })
 
-        login_state_url = compat_str(urlh.geturl())
+        login_state_url = urlh.geturl()
 
         try:
             login_page = self._download_webpage(
 
         try:
             login_page = self._download_webpage(
@@ -129,7 +128,7 @@ def random_string():
             })
 
         access_token = self._search_regex(
             })
 
         access_token = self._search_regex(
-            r'access_token=([^=&]+)', compat_str(urlh.geturl()),
+            r'access_token=([^=&]+)', urlh.geturl(),
             'access token')
 
         self._download_webpage(
             'access token')
 
         self._download_webpage(
index 6b0e64b7f1032159262220dcf77c6ffaa358d014..65cc474db0dbc84e323a1610ab7692f2aca075f0 100644 (file)
@@ -20,10 +20,10 @@ class MailRuIE(InfoExtractor):
     IE_DESC = 'Видео@Mail.Ru'
     _VALID_URL = r'''(?x)
                     https?://
     IE_DESC = 'Видео@Mail.Ru'
     _VALID_URL = r'''(?x)
                     https?://
-                        (?:(?:www|m)\.)?my\.mail\.ru/
+                        (?:(?:www|m)\.)?my\.mail\.ru/+
                         (?:
                             video/.*\#video=/?(?P<idv1>(?:[^/]+/){3}\d+)|
                         (?:
                             video/.*\#video=/?(?P<idv1>(?:[^/]+/){3}\d+)|
-                            (?:(?P<idv2prefix>(?:[^/]+/){2})video/(?P<idv2suffix>[^/]+/\d+))\.html|
+                            (?:(?P<idv2prefix>(?:[^/]+/+){2})video/(?P<idv2suffix>[^/]+/\d+))\.html|
                             (?:video/embed|\+/video/meta)/(?P<metaid>\d+)
                         )
                     '''
                             (?:video/embed|\+/video/meta)/(?P<metaid>\d+)
                         )
                     '''
@@ -85,6 +85,14 @@ class MailRuIE(InfoExtractor):
         {
             'url': 'http://my.mail.ru/+/video/meta/7949340477499637815',
             'only_matching': True,
         {
             'url': 'http://my.mail.ru/+/video/meta/7949340477499637815',
             'only_matching': True,
+        },
+        {
+            'url': 'https://my.mail.ru//list/sinyutin10/video/_myvideo/4.html',
+            'only_matching': True,
+        },
+        {
+            'url': 'https://my.mail.ru//list//sinyutin10/video/_myvideo/4.html',
+            'only_matching': True,
         }
     ]
 
         }
     ]
 
@@ -120,6 +128,12 @@ def _real_extract(self, url):
                 'http://api.video.mail.ru/videos/%s.json?new=1' % video_id,
                 video_id, 'Downloading video JSON')
 
                 'http://api.video.mail.ru/videos/%s.json?new=1' % video_id,
                 video_id, 'Downloading video JSON')
 
+        headers = {}
+
+        video_key = self._get_cookies('https://my.mail.ru').get('video_key')
+        if video_key:
+            headers['Cookie'] = 'video_key=%s' % video_key.value
+
         formats = []
         for f in video_data['videos']:
             video_url = f.get('url')
         formats = []
         for f in video_data['videos']:
             video_url = f.get('url')
@@ -132,6 +146,7 @@ def _real_extract(self, url):
                 'url': video_url,
                 'format_id': format_id,
                 'height': height,
                 'url': video_url,
                 'format_id': format_id,
                 'height': height,
+                'http_headers': headers,
             })
         self._sort_formats(formats)
 
             })
         self._sort_formats(formats)
 
@@ -237,7 +252,7 @@ def _extract_track(t, fatal=True):
 class MailRuMusicIE(MailRuMusicSearchBaseIE):
     IE_NAME = 'mailru:music'
     IE_DESC = 'Музыка@Mail.Ru'
 class MailRuMusicIE(MailRuMusicSearchBaseIE):
     IE_NAME = 'mailru:music'
     IE_DESC = 'Музыка@Mail.Ru'
-    _VALID_URL = r'https?://my\.mail\.ru/music/songs/[^/?#&]+-(?P<id>[\da-f]+)'
+    _VALID_URL = r'https?://my\.mail\.ru/+music/+songs/+[^/?#&]+-(?P<id>[\da-f]+)'
     _TESTS = [{
         'url': 'https://my.mail.ru/music/songs/%D0%BC8%D0%BB8%D1%82%D1%85-l-a-h-luciferian-aesthetics-of-herrschaft-single-2017-4e31f7125d0dfaef505d947642366893',
         'md5': '0f8c22ef8c5d665b13ac709e63025610',
     _TESTS = [{
         'url': 'https://my.mail.ru/music/songs/%D0%BC8%D0%BB8%D1%82%D1%85-l-a-h-luciferian-aesthetics-of-herrschaft-single-2017-4e31f7125d0dfaef505d947642366893',
         'md5': '0f8c22ef8c5d665b13ac709e63025610',
@@ -273,7 +288,7 @@ def _real_extract(self, url):
 class MailRuMusicSearchIE(MailRuMusicSearchBaseIE):
     IE_NAME = 'mailru:music:search'
     IE_DESC = 'Музыка@Mail.Ru'
 class MailRuMusicSearchIE(MailRuMusicSearchBaseIE):
     IE_NAME = 'mailru:music:search'
     IE_DESC = 'Музыка@Mail.Ru'
-    _VALID_URL = r'https?://my\.mail\.ru/music/search/(?P<id>[^/?#&]+)'
+    _VALID_URL = r'https?://my\.mail\.ru/+music/+search/+(?P<id>[^/?#&]+)'
     _TESTS = [{
         'url': 'https://my.mail.ru/music/search/black%20shadow',
         'info_dict': {
     _TESTS = [{
         'url': 'https://my.mail.ru/music/search/black%20shadow',
         'info_dict': {
index e13c2e11a5baf301d32a34d8343776ab249ea821..6f4fd927fa3c5a607cb7caee632f2d7aed2471d5 100644 (file)
@@ -8,7 +8,7 @@
 
 
 class MallTVIE(InfoExtractor):
 
 
 class MallTVIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?mall\.tv/(?:[^/]+/)*(?P<id>[^/?#&]+)'
+    _VALID_URL = r'https?://(?:(?:www|sk)\.)?mall\.tv/(?:[^/]+/)*(?P<id>[^/?#&]+)'
     _TESTS = [{
         'url': 'https://www.mall.tv/18-miliard-pro-neziskovky-opravdu-jsou-sportovci-nebo-clovek-v-tisni-pijavice',
         'md5': '1c4a37f080e1f3023103a7b43458e518',
     _TESTS = [{
         'url': 'https://www.mall.tv/18-miliard-pro-neziskovky-opravdu-jsou-sportovci-nebo-clovek-v-tisni-pijavice',
         'md5': '1c4a37f080e1f3023103a7b43458e518',
@@ -26,6 +26,9 @@ class MallTVIE(InfoExtractor):
     }, {
         'url': 'https://www.mall.tv/kdo-to-plati/18-miliard-pro-neziskovky-opravdu-jsou-sportovci-nebo-clovek-v-tisni-pijavice',
         'only_matching': True,
     }, {
         'url': 'https://www.mall.tv/kdo-to-plati/18-miliard-pro-neziskovky-opravdu-jsou-sportovci-nebo-clovek-v-tisni-pijavice',
         'only_matching': True,
+    }, {
+        'url': 'https://sk.mall.tv/gejmhaus/reklamacia-nehreje-vyrobnik-tepla-alebo-spekacka',
+        'only_matching': True,
     }]
 
     def _real_extract(self, url):
     }]
 
     def _real_extract(self, url):
index f976506f416b9f5461dbfb85f988e2816c43b334..933df14952d5cc16857485e306be07f2d32384d3 100644 (file)
@@ -6,7 +6,6 @@
 from .theplatform import ThePlatformBaseIE
 from ..compat import (
     compat_parse_qs,
 from .theplatform import ThePlatformBaseIE
 from ..compat import (
     compat_parse_qs,
-    compat_str,
     compat_urllib_parse_urlparse,
 )
 from ..utils import (
     compat_urllib_parse_urlparse,
 )
 from ..utils import (
@@ -114,7 +113,7 @@ def _program_guid(qs):
                 continue
             urlh = ie._request_webpage(
                 embed_url, video_id, note='Following embed URL redirect')
                 continue
             urlh = ie._request_webpage(
                 embed_url, video_id, note='Following embed URL redirect')
-            embed_url = compat_str(urlh.geturl())
+            embed_url = urlh.geturl()
             program_guid = _program_guid(_qs(embed_url))
             if program_guid:
                 entries.append(embed_url)
             program_guid = _program_guid(_qs(embed_url))
             if program_guid:
                 entries.append(embed_url)
@@ -123,7 +122,7 @@ def _program_guid(qs):
     def _parse_smil_formats(self, smil, smil_url, video_id, namespace=None, f4m_params=None, transform_rtmp_url=None):
         for video in smil.findall(self._xpath_ns('.//video', namespace)):
             video.attrib['src'] = re.sub(r'(https?://vod05)t(-mediaset-it\.akamaized\.net/.+?.mpd)\?.+', r'\1\2', video.attrib['src'])
     def _parse_smil_formats(self, smil, smil_url, video_id, namespace=None, f4m_params=None, transform_rtmp_url=None):
         for video in smil.findall(self._xpath_ns('.//video', namespace)):
             video.attrib['src'] = re.sub(r'(https?://vod05)t(-mediaset-it\.akamaized\.net/.+?.mpd)\?.+', r'\1\2', video.attrib['src'])
-        return super()._parse_smil_formats(smil, smil_url, video_id, namespace, f4m_params, transform_rtmp_url)
+        return super(MediasetIE, self)._parse_smil_formats(smil, smil_url, video_id, namespace, f4m_params, transform_rtmp_url)
 
     def _real_extract(self, url):
         guid = self._match_id(url)
 
     def _real_extract(self, url):
         guid = self._match_id(url)
index 694a264d672288b47c2700b9265bfc0635158ff2..d6eb1574065dece67e28a4b36fa43478dd48dfa3 100644 (file)
@@ -129,7 +129,7 @@ def _real_extract(self, url):
         query = mobj.group('query')
 
         webpage, urlh = self._download_webpage_handle(url, resource_id)  # XXX: add UrlReferrer?
         query = mobj.group('query')
 
         webpage, urlh = self._download_webpage_handle(url, resource_id)  # XXX: add UrlReferrer?
-        redirect_url = compat_str(urlh.geturl())
+        redirect_url = urlh.geturl()
 
         # XXX: might have also extracted UrlReferrer and QueryString from the html
         service_path = compat_urlparse.urljoin(redirect_url, self._html_search_regex(
 
         # XXX: might have also extracted UrlReferrer and QueryString from the html
         service_path = compat_urlparse.urljoin(redirect_url, self._html_search_regex(
index 40f214a873a7dcbb5087726aa270b74a5dab3e76..ad9da96125b92d66d4e6367e23b7ed9a1a633aaf 100644 (file)
@@ -4,8 +4,8 @@
 from .common import InfoExtractor
 from ..utils import (
     int_or_none,
 from .common import InfoExtractor
 from ..utils import (
     int_or_none,
+    parse_iso8601,
     smuggle_url,
     smuggle_url,
-    parse_duration,
 )
 
 
 )
 
 
@@ -18,16 +18,18 @@ class MiTeleIE(InfoExtractor):
         'info_dict': {
             'id': 'FhYW1iNTE6J6H7NkQRIEzfne6t2quqPg',
             'ext': 'mp4',
         'info_dict': {
             'id': 'FhYW1iNTE6J6H7NkQRIEzfne6t2quqPg',
             'ext': 'mp4',
-            'title': 'Tor, la web invisible',
-            'description': 'md5:3b6fce7eaa41b2d97358726378d9369f',
+            'title': 'Diario de La redacción Programa 144',
+            'description': 'md5:07c35a7b11abb05876a6a79185b58d27',
             'series': 'Diario de',
             'series': 'Diario de',
-            'season': 'La redacción',
+            'season': 'Season 14',
             'season_number': 14,
             'season_number': 14,
-            'season_id': 'diario_de_t14_11981',
-            'episode': 'Programa 144',
+            'episode': 'Tor, la web invisible',
             'episode_number': 3,
             'thumbnail': r're:(?i)^https?://.*\.jpg$',
             'duration': 2913,
             'episode_number': 3,
             'thumbnail': r're:(?i)^https?://.*\.jpg$',
             'duration': 2913,
+            'age_limit': 16,
+            'timestamp': 1471209401,
+            'upload_date': '20160814',
         },
         'add_ie': ['Ooyala'],
     }, {
         },
         'add_ie': ['Ooyala'],
     }, {
@@ -39,13 +41,15 @@ class MiTeleIE(InfoExtractor):
             'title': 'Cuarto Milenio Temporada 6 Programa 226',
             'description': 'md5:5ff132013f0cd968ffbf1f5f3538a65f',
             'series': 'Cuarto Milenio',
             'title': 'Cuarto Milenio Temporada 6 Programa 226',
             'description': 'md5:5ff132013f0cd968ffbf1f5f3538a65f',
             'series': 'Cuarto Milenio',
-            'season': 'Temporada 6',
+            'season': 'Season 6',
             'season_number': 6,
             'season_number': 6,
-            'season_id': 'cuarto_milenio_t06_12715',
-            'episode': 'Programa 226',
+            'episode': 'Episode 24',
             'episode_number': 24,
             'thumbnail': r're:(?i)^https?://.*\.jpg$',
             'duration': 7313,
             'episode_number': 24,
             'thumbnail': r're:(?i)^https?://.*\.jpg$',
             'duration': 7313,
+            'age_limit': 12,
+            'timestamp': 1471209021,
+            'upload_date': '20160814',
         },
         'params': {
             'skip_download': True,
         },
         'params': {
             'skip_download': True,
@@ -54,67 +58,36 @@ class MiTeleIE(InfoExtractor):
     }, {
         'url': 'http://www.mitele.es/series-online/la-que-se-avecina/57aac5c1c915da951a8b45ed/player',
         'only_matching': True,
     }, {
         'url': 'http://www.mitele.es/series-online/la-que-se-avecina/57aac5c1c915da951a8b45ed/player',
         'only_matching': True,
+    }, {
+        'url': 'https://www.mitele.es/programas-tv/diario-de/la-redaccion/programa-144-40_1006364575251/player/',
+        'only_matching': True,
     }]
 
     def _real_extract(self, url):
     }]
 
     def _real_extract(self, url):
-        video_id = self._match_id(url)
-
-        paths = self._download_json(
-            'https://www.mitele.es/amd/agp/web/metadata/general_configuration',
-            video_id, 'Downloading paths JSON')
-
-        ooyala_s = paths['general_configuration']['api_configuration']['ooyala_search']
-        base_url = ooyala_s.get('base_url', 'cdn-search-mediaset.carbyne.ps.ooyala.com')
-        full_path = ooyala_s.get('full_path', '/search/v1/full/providers/')
-        source = self._download_json(
-            '%s://%s%s%s/docs/%s' % (
-                ooyala_s.get('protocol', 'https'), base_url, full_path,
-                ooyala_s.get('provider_id', '104951'), video_id),
-            video_id, 'Downloading data JSON', query={
-                'include_titles': 'Series,Season',
-                'product_name': ooyala_s.get('product_name', 'test'),
-                'format': 'full',
-            })['hits']['hits'][0]['_source']
-
-        embedCode = source['offers'][0]['embed_codes'][0]
-        titles = source['localizable_titles'][0]
-
-        title = titles.get('title_medium') or titles['title_long']
-
-        description = titles.get('summary_long') or titles.get('summary_medium')
-
-        def get(key1, key2):
-            value1 = source.get(key1)
-            if not value1 or not isinstance(value1, list):
-                return
-            if not isinstance(value1[0], dict):
-                return
-            return value1[0].get(key2)
-
-        series = get('localizable_titles_series', 'title_medium')
-
-        season = get('localizable_titles_season', 'title_medium')
-        season_number = int_or_none(source.get('season_number'))
-        season_id = source.get('season_id')
-
-        episode = titles.get('title_sort_name')
-        episode_number = int_or_none(source.get('episode_number'))
-
-        duration = parse_duration(get('videos', 'duration'))
+        display_id = self._match_id(url)
+        webpage = self._download_webpage(url, display_id)
+        pre_player = self._parse_json(self._search_regex(
+            r'window\.\$REACTBASE_STATE\.prePlayer_mtweb\s*=\s*({.+})',
+            webpage, 'Pre Player'), display_id)['prePlayer']
+        title = pre_player['title']
+        video = pre_player['video']
+        video_id = video['dataMediaId']
+        content = pre_player.get('content') or {}
+        info = content.get('info') or {}
 
         return {
             '_type': 'url_transparent',
             # for some reason only HLS is supported
 
         return {
             '_type': 'url_transparent',
             # for some reason only HLS is supported
-            'url': smuggle_url('ooyala:' + embedCode, {'supportedformats': 'm3u8,dash'}),
+            'url': smuggle_url('ooyala:' + video_id, {'supportedformats': 'm3u8,dash'}),
             'id': video_id,
             'title': title,
             'id': video_id,
             'title': title,
-            'description': description,
-            'series': series,
-            'season': season,
-            'season_number': season_number,
-            'season_id': season_id,
-            'episode': episode,
-            'episode_number': episode_number,
-            'duration': duration,
-            'thumbnail': get('images', 'url'),
+            'description': info.get('synopsis'),
+            'series': content.get('title'),
+            'season_number': int_or_none(info.get('season_number')),
+            'episode': content.get('subtitle'),
+            'episode_number': int_or_none(info.get('episode_number')),
+            'duration': int_or_none(info.get('duration')),
+            'thumbnail': video.get('dataPoster'),
+            'age_limit': int_or_none(info.get('rating')),
+            'timestamp': parse_iso8601(pre_player.get('publishedTime')),
         }
         }
index 1c652813adb96b994c3ba517db994805e8ea8eb3..5234cac02632d9cddde53ebb10e2a0d91c4ec508 100644 (file)
@@ -1,5 +1,8 @@
 from __future__ import unicode_literals
 
 from __future__ import unicode_literals
 
+import re
+
+from .common import InfoExtractor
 from ..utils import (
     int_or_none,
     str_to_int,
 from ..utils import (
     int_or_none,
     str_to_int,
@@ -54,3 +57,23 @@ def _real_extract(self, url):
         })
 
         return info
         })
 
         return info
+
+
+class MofosexEmbedIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?mofosex\.com/embed/?\?.*?\bvideoid=(?P<id>\d+)'
+    _TESTS = [{
+        'url': 'https://www.mofosex.com/embed/?videoid=318131&referrer=KM',
+        'only_matching': True,
+    }]
+
+    @staticmethod
+    def _extract_urls(webpage):
+        return re.findall(
+            r'<iframe[^>]+\bsrc=["\']((?:https?:)?//(?:www\.)?mofosex\.com/embed/?\?.*?\bvideoid=\d+)',
+            webpage)
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        return self.url_result(
+            'http://www.mofosex.com/videos/{0}/{0}.html'.format(video_id),
+            ie=MofosexIE.ie_key(), video_id=video_id)
index 43fd70f112005a893377c8e5cf489291fb8cc812..b1615b4d8e4bce8b580942f717477e6ed57ee92e 100644 (file)
@@ -26,7 +26,7 @@ class MotherlessIE(InfoExtractor):
             'categories': ['Gaming', 'anal', 'reluctant', 'rough', 'Wife'],
             'upload_date': '20100913',
             'uploader_id': 'famouslyfuckedup',
             'categories': ['Gaming', 'anal', 'reluctant', 'rough', 'Wife'],
             'upload_date': '20100913',
             'uploader_id': 'famouslyfuckedup',
-            'thumbnail': r're:http://.*\.jpg',
+            'thumbnail': r're:https?://.*\.jpg',
             'age_limit': 18,
         }
     }, {
             'age_limit': 18,
         }
     }, {
@@ -40,7 +40,7 @@ class MotherlessIE(InfoExtractor):
                            'game', 'hairy'],
             'upload_date': '20140622',
             'uploader_id': 'Sulivana7x',
                            'game', 'hairy'],
             'upload_date': '20140622',
             'uploader_id': 'Sulivana7x',
-            'thumbnail': r're:http://.*\.jpg',
+            'thumbnail': r're:https?://.*\.jpg',
             'age_limit': 18,
         },
         'skip': '404',
             'age_limit': 18,
         },
         'skip': '404',
@@ -54,7 +54,7 @@ class MotherlessIE(InfoExtractor):
             'categories': ['superheroine heroine  superher'],
             'upload_date': '20140827',
             'uploader_id': 'shade0230',
             'categories': ['superheroine heroine  superher'],
             'upload_date': '20140827',
             'uploader_id': 'shade0230',
-            'thumbnail': r're:http://.*\.jpg',
+            'thumbnail': r're:https?://.*\.jpg',
             'age_limit': 18,
         }
     }, {
             'age_limit': 18,
         }
     }, {
@@ -76,7 +76,8 @@ def _real_extract(self, url):
             raise ExtractorError('Video %s is for friends only' % video_id, expected=True)
 
         title = self._html_search_regex(
             raise ExtractorError('Video %s is for friends only' % video_id, expected=True)
 
         title = self._html_search_regex(
-            r'id="view-upload-title">\s+([^<]+)<', webpage, 'title')
+            (r'(?s)<div[^>]+\bclass=["\']media-meta-title[^>]+>(.+?)</div>',
+             r'id="view-upload-title">\s+([^<]+)<'), webpage, 'title')
         video_url = (self._html_search_regex(
             (r'setup\(\{\s*["\']file["\']\s*:\s*(["\'])(?P<url>(?:(?!\1).)+)\1',
              r'fileurl\s*=\s*(["\'])(?P<url>(?:(?!\1).)+)\1'),
         video_url = (self._html_search_regex(
             (r'setup\(\{\s*["\']file["\']\s*:\s*(["\'])(?P<url>(?:(?!\1).)+)\1',
              r'fileurl\s*=\s*(["\'])(?P<url>(?:(?!\1).)+)\1'),
@@ -84,14 +85,15 @@ def _real_extract(self, url):
             or 'http://cdn4.videos.motherlessmedia.com/videos/%s.mp4?fs=opencloud' % video_id)
         age_limit = self._rta_search(webpage)
         view_count = str_to_int(self._html_search_regex(
             or 'http://cdn4.videos.motherlessmedia.com/videos/%s.mp4?fs=opencloud' % video_id)
         age_limit = self._rta_search(webpage)
         view_count = str_to_int(self._html_search_regex(
-            r'<strong>Views</strong>\s+([^<]+)<',
+            (r'>(\d+)\s+Views<', r'<strong>Views</strong>\s+([^<]+)<'),
             webpage, 'view count', fatal=False))
         like_count = str_to_int(self._html_search_regex(
             webpage, 'view count', fatal=False))
         like_count = str_to_int(self._html_search_regex(
-            r'<strong>Favorited</strong>\s+([^<]+)<',
+            (r'>(\d+)\s+Favorites<', r'<strong>Favorited</strong>\s+([^<]+)<'),
             webpage, 'like count', fatal=False))
 
         upload_date = self._html_search_regex(
             webpage, 'like count', fatal=False))
 
         upload_date = self._html_search_regex(
-            r'<strong>Uploaded</strong>\s+([^<]+)<', webpage, 'upload date')
+            (r'class=["\']count[^>]+>(\d+\s+[a-zA-Z]{3}\s+\d{4})<',
+             r'<strong>Uploaded</strong>\s+([^<]+)<'), webpage, 'upload date')
         if 'Ago' in upload_date:
             days = int(re.search(r'([0-9]+)', upload_date).group(1))
             upload_date = (datetime.datetime.now() - datetime.timedelta(days=days)).strftime('%Y%m%d')
         if 'Ago' in upload_date:
             days = int(re.search(r'([0-9]+)', upload_date).group(1))
             upload_date = (datetime.datetime.now() - datetime.timedelta(days=days)).strftime('%Y%m%d')
index bb3d944133d6a1e2685779b86a7565ad9b0985f0..61fc59126f61ef3599f69ac7ad1d920e9bc87817 100644 (file)
@@ -1,68 +1,33 @@
 # coding: utf-8
 from __future__ import unicode_literals
 
 # coding: utf-8
 from __future__ import unicode_literals
 
+import re
+
 from .common import InfoExtractor
 from ..utils import (
 from .common import InfoExtractor
 from ..utils import (
+    clean_html,
+    dict_get,
     ExtractorError,
     int_or_none,
     ExtractorError,
     int_or_none,
+    parse_duration,
+    try_get,
     update_url_query,
 )
 
 
     update_url_query,
 )
 
 
-class NaverIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:m\.)?tv(?:cast)?\.naver\.com/v/(?P<id>\d+)'
+class NaverBaseIE(InfoExtractor):
+    _CAPTION_EXT_RE = r'\.(?:ttml|vtt)'
 
 
-    _TESTS = [{
-        'url': 'http://tv.naver.com/v/81652',
-        'info_dict': {
-            'id': '81652',
-            'ext': 'mp4',
-            'title': '[9월 모의고사 해설강의][수학_김상희] 수학 A형 16~20번',
-            'description': '합격불변의 법칙 메가스터디 | 메가스터디 수학 김상희 선생님이 9월 모의고사 수학A형 16번에서 20번까지 해설강의를 공개합니다.',
-            'upload_date': '20130903',
-        },
-    }, {
-        'url': 'http://tv.naver.com/v/395837',
-        'md5': '638ed4c12012c458fefcddfd01f173cd',
-        'info_dict': {
-            'id': '395837',
-            'ext': 'mp4',
-            'title': '9년이 지나도 아픈 기억, 전효성의 아버지',
-            'description': 'md5:5bf200dcbf4b66eb1b350d1eb9c753f7',
-            'upload_date': '20150519',
-        },
-        'skip': 'Georestricted',
-    }, {
-        'url': 'http://tvcast.naver.com/v/81652',
-        'only_matching': True,
-    }]
-
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-        webpage = self._download_webpage(url, video_id)
-
-        vid = self._search_regex(
-            r'videoId["\']\s*:\s*(["\'])(?P<value>(?:(?!\1).)+)\1', webpage,
-            'video id', fatal=None, group='value')
-        in_key = self._search_regex(
-            r'inKey["\']\s*:\s*(["\'])(?P<value>(?:(?!\1).)+)\1', webpage,
-            'key', default=None, group='value')
-
-        if not vid or not in_key:
-            error = self._html_search_regex(
-                r'(?s)<div class="(?:nation_error|nation_box|error_box)">\s*(?:<!--.*?-->)?\s*<p class="[^"]+">(?P<msg>.+?)</p>\s*</div>',
-                webpage, 'error', default=None)
-            if error:
-                raise ExtractorError(error, expected=True)
-            raise ExtractorError('couldn\'t extract vid and key')
+    def _extract_video_info(self, video_id, vid, key):
         video_data = self._download_json(
             'http://play.rmcnmv.naver.com/vod/play/v2.0/' + vid,
             video_id, query={
         video_data = self._download_json(
             'http://play.rmcnmv.naver.com/vod/play/v2.0/' + vid,
             video_id, query={
-                'key': in_key,
+                'key': key,
             })
         meta = video_data['meta']
         title = meta['subject']
         formats = []
             })
         meta = video_data['meta']
         title = meta['subject']
         formats = []
+        get_list = lambda x: try_get(video_data, lambda y: y[x + 's']['list'], list) or []
 
         def extract_formats(streams, stream_type, query={}):
             for stream in streams:
 
         def extract_formats(streams, stream_type, query={}):
             for stream in streams:
@@ -73,7 +38,7 @@ def extract_formats(streams, stream_type, query={}):
                 encoding_option = stream.get('encodingOption', {})
                 bitrate = stream.get('bitrate', {})
                 formats.append({
                 encoding_option = stream.get('encodingOption', {})
                 bitrate = stream.get('bitrate', {})
                 formats.append({
-                    'format_id': '%s_%s' % (stream.get('type') or stream_type, encoding_option.get('id') or encoding_option.get('name')),
+                    'format_id': '%s_%s' % (stream.get('type') or stream_type, dict_get(encoding_option, ('name', 'id'))),
                     'url': stream_url,
                     'width': int_or_none(encoding_option.get('width')),
                     'height': int_or_none(encoding_option.get('height')),
                     'url': stream_url,
                     'width': int_or_none(encoding_option.get('width')),
                     'height': int_or_none(encoding_option.get('height')),
@@ -83,7 +48,7 @@ def extract_formats(streams, stream_type, query={}):
                     'protocol': 'm3u8_native' if stream_type == 'HLS' else None,
                 })
 
                     'protocol': 'm3u8_native' if stream_type == 'HLS' else None,
                 })
 
-        extract_formats(video_data.get('videos', {}).get('list', []), 'H264')
+        extract_formats(get_list('video'), 'H264')
         for stream_set in video_data.get('streams', []):
             query = {}
             for param in stream_set.get('keys', []):
         for stream_set in video_data.get('streams', []):
             query = {}
             for param in stream_set.get('keys', []):
@@ -101,28 +66,101 @@ def extract_formats(streams, stream_type, query={}):
                     'mp4', 'm3u8_native', m3u8_id=stream_type, fatal=False))
         self._sort_formats(formats)
 
                     'mp4', 'm3u8_native', m3u8_id=stream_type, fatal=False))
         self._sort_formats(formats)
 
+        replace_ext = lambda x, y: re.sub(self._CAPTION_EXT_RE, '.' + y, x)
+
+        def get_subs(caption_url):
+            if re.search(self._CAPTION_EXT_RE, caption_url):
+                return [{
+                    'url': replace_ext(caption_url, 'ttml'),
+                }, {
+                    'url': replace_ext(caption_url, 'vtt'),
+                }]
+            else:
+                return [{'url': caption_url}]
+
+        automatic_captions = {}
         subtitles = {}
         subtitles = {}
-        for caption in video_data.get('captions', {}).get('list', []):
+        for caption in get_list('caption'):
             caption_url = caption.get('source')
             if not caption_url:
                 continue
             caption_url = caption.get('source')
             if not caption_url:
                 continue
-            subtitles.setdefault(caption.get('language') or caption.get('locale'), []).append({
-                'url': caption_url,
-            })
+            sub_dict = automatic_captions if caption.get('type') == 'auto' else subtitles
+            sub_dict.setdefault(dict_get(caption, ('locale', 'language')), []).extend(get_subs(caption_url))
 
 
-        upload_date = self._search_regex(
-            r'<span[^>]+class="date".*?(\d{4}\.\d{2}\.\d{2})',
-            webpage, 'upload date', fatal=False)
-        if upload_date:
-            upload_date = upload_date.replace('.', '')
+        user = meta.get('user', {})
 
         return {
             'id': video_id,
             'title': title,
             'formats': formats,
             'subtitles': subtitles,
 
         return {
             'id': video_id,
             'title': title,
             'formats': formats,
             'subtitles': subtitles,
-            'description': self._og_search_description(webpage),
-            'thumbnail': meta.get('cover', {}).get('source') or self._og_search_thumbnail(webpage),
+            'automatic_captions': automatic_captions,
+            'thumbnail': try_get(meta, lambda x: x['cover']['source']),
             'view_count': int_or_none(meta.get('count')),
             'view_count': int_or_none(meta.get('count')),
-            'upload_date': upload_date,
+            'uploader_id': user.get('id'),
+            'uploader': user.get('name'),
+            'uploader_url': user.get('url'),
         }
         }
+
+
+class NaverIE(NaverBaseIE):
+    _VALID_URL = r'https?://(?:m\.)?tv(?:cast)?\.naver\.com/(?:v|embed)/(?P<id>\d+)'
+    _GEO_BYPASS = False
+    _TESTS = [{
+        'url': 'http://tv.naver.com/v/81652',
+        'info_dict': {
+            'id': '81652',
+            'ext': 'mp4',
+            'title': '[9월 모의고사 해설강의][수학_김상희] 수학 A형 16~20번',
+            'description': '메가스터디 수학 김상희 선생님이 9월 모의고사 수학A형 16번에서 20번까지 해설강의를 공개합니다.',
+            'timestamp': 1378200754,
+            'upload_date': '20130903',
+            'uploader': '메가스터디, 합격불변의 법칙',
+            'uploader_id': 'megastudy',
+        },
+    }, {
+        'url': 'http://tv.naver.com/v/395837',
+        'md5': '8a38e35354d26a17f73f4e90094febd3',
+        'info_dict': {
+            'id': '395837',
+            'ext': 'mp4',
+            'title': '9년이 지나도 아픈 기억, 전효성의 아버지',
+            'description': 'md5:eb6aca9d457b922e43860a2a2b1984d3',
+            'timestamp': 1432030253,
+            'upload_date': '20150519',
+            'uploader': '4가지쇼 시즌2',
+            'uploader_id': 'wrappinguser29',
+        },
+        'skip': 'Georestricted',
+    }, {
+        'url': 'http://tvcast.naver.com/v/81652',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        content = self._download_json(
+            'https://tv.naver.com/api/json/v/' + video_id,
+            video_id, headers=self.geo_verification_headers())
+        player_info_json = content.get('playerInfoJson') or {}
+        current_clip = player_info_json.get('currentClip') or {}
+
+        vid = current_clip.get('videoId')
+        in_key = current_clip.get('inKey')
+
+        if not vid or not in_key:
+            player_auth = try_get(player_info_json, lambda x: x['playerOption']['auth'])
+            if player_auth == 'notCountry':
+                self.raise_geo_restricted(countries=['KR'])
+            elif player_auth == 'notLogin':
+                self.raise_login_required()
+            raise ExtractorError('couldn\'t extract vid and key')
+        info = self._extract_video_info(video_id, vid, in_key)
+        info.update({
+            'description': clean_html(current_clip.get('description')),
+            'timestamp': int_or_none(current_clip.get('firstExposureTime'), 1000),
+            'duration': parse_duration(current_clip.get('displayPlayTime')),
+            'like_count': int_or_none(current_clip.get('recommendPoint')),
+            'age_limit': 19 if current_clip.get('adult') else None,
+        })
+        return info
index 5bc39d00242c78e7dca8d479dad38b8df8728a0e..6f3cb30034da7f5fcebb99fc6dec05f1ff3cd8e4 100644 (file)
@@ -87,11 +87,25 @@ class NBCIE(AdobePassIE):
     def _real_extract(self, url):
         permalink, video_id = re.match(self._VALID_URL, url).groups()
         permalink = 'http' + compat_urllib_parse_unquote(permalink)
     def _real_extract(self, url):
         permalink, video_id = re.match(self._VALID_URL, url).groups()
         permalink = 'http' + compat_urllib_parse_unquote(permalink)
-        response = self._download_json(
+        video_data = self._download_json(
             'https://friendship.nbc.co/v2/graphql', video_id, query={
             'https://friendship.nbc.co/v2/graphql', video_id, query={
-                'query': '''{
-  page(name: "%s", platform: web, type: VIDEO, userId: "0") {
-    data {
+                'query': '''query bonanzaPage(
+  $app: NBCUBrands! = nbc
+  $name: String!
+  $oneApp: Boolean
+  $platform: SupportedPlatforms! = web
+  $type: EntityPageType! = VIDEO
+  $userId: String!
+) {
+  bonanzaPage(
+    app: $app
+    name: $name
+    oneApp: $oneApp
+    platform: $platform
+    type: $type
+    userId: $userId
+  ) {
+    metadata {
       ... on VideoPageData {
         description
         episodeNumber
       ... on VideoPageData {
         description
         episodeNumber
@@ -100,15 +114,20 @@ def _real_extract(self, url):
         mpxAccountId
         mpxGuid
         rating
         mpxAccountId
         mpxGuid
         rating
+        resourceId
         seasonNumber
         secondaryTitle
         seriesShortTitle
       }
     }
   }
         seasonNumber
         secondaryTitle
         seriesShortTitle
       }
     }
   }
-}''' % permalink,
-            })
-        video_data = response['data']['page']['data']
+}''',
+                'variables': json.dumps({
+                    'name': permalink,
+                    'oneApp': True,
+                    'userId': '0',
+                }),
+            })['data']['bonanzaPage']['metadata']
         query = {
             'mbr': 'true',
             'manifest': 'm3u',
         query = {
             'mbr': 'true',
             'manifest': 'm3u',
@@ -117,8 +136,8 @@ def _real_extract(self, url):
         title = video_data['secondaryTitle']
         if video_data.get('locked'):
             resource = self._get_mvpd_resource(
         title = video_data['secondaryTitle']
         if video_data.get('locked'):
             resource = self._get_mvpd_resource(
-                'nbcentertainment', title, video_id,
-                video_data.get('rating'))
+                video_data.get('resourceId') or 'nbcentertainment',
+                title, video_id, video_data.get('rating'))
             query['auth'] = self._extract_mvpd_auth(
                 url, video_id, 'nbcentertainment', resource)
         theplatform_url = smuggle_url(update_url_query(
             query['auth'] = self._extract_mvpd_auth(
                 url, video_id, 'nbcentertainment', resource)
         theplatform_url = smuggle_url(update_url_query(
index aec2ea1331f3c909957e50d4166e7657618fa1a6..2447c812e021e73991082aefab4bd98e6dd000a1 100644 (file)
@@ -7,8 +7,11 @@
 from ..utils import (
     determine_ext,
     int_or_none,
 from ..utils import (
     determine_ext,
     int_or_none,
+    merge_dicts,
     parse_iso8601,
     qualities,
     parse_iso8601,
     qualities,
+    try_get,
+    urljoin,
 )
 
 
 )
 
 
@@ -85,21 +88,25 @@ class NDRIE(NDRBaseIE):
 
     def _extract_embed(self, webpage, display_id):
         embed_url = self._html_search_meta(
 
     def _extract_embed(self, webpage, display_id):
         embed_url = self._html_search_meta(
-            'embedURL', webpage, 'embed URL', fatal=True)
+            'embedURL', webpage, 'embed URL',
+            default=None) or self._search_regex(
+            r'\bembedUrl["\']\s*:\s*(["\'])(?P<url>(?:(?!\1).)+)\1', webpage,
+            'embed URL', group='url')
         description = self._search_regex(
             r'<p[^>]+itemprop="description">([^<]+)</p>',
             webpage, 'description', default=None) or self._og_search_description(webpage)
         timestamp = parse_iso8601(
             self._search_regex(
                 r'<span[^>]+itemprop="(?:datePublished|uploadDate)"[^>]+content="([^"]+)"',
         description = self._search_regex(
             r'<p[^>]+itemprop="description">([^<]+)</p>',
             webpage, 'description', default=None) or self._og_search_description(webpage)
         timestamp = parse_iso8601(
             self._search_regex(
                 r'<span[^>]+itemprop="(?:datePublished|uploadDate)"[^>]+content="([^"]+)"',
-                webpage, 'upload date', fatal=False))
-        return {
+                webpage, 'upload date', default=None))
+        info = self._search_json_ld(webpage, display_id, default={})
+        return merge_dicts({
             '_type': 'url_transparent',
             'url': embed_url,
             'display_id': display_id,
             'description': description,
             'timestamp': timestamp,
             '_type': 'url_transparent',
             'url': embed_url,
             'display_id': display_id,
             'description': description,
             'timestamp': timestamp,
-        }
+        }, info)
 
 
 class NJoyIE(NDRBaseIE):
 
 
 class NJoyIE(NDRBaseIE):
@@ -220,11 +227,17 @@ def _real_extract(self, url):
         upload_date = ppjson.get('config', {}).get('publicationDate')
         duration = int_or_none(config.get('duration'))
 
         upload_date = ppjson.get('config', {}).get('publicationDate')
         duration = int_or_none(config.get('duration'))
 
-        thumbnails = [{
-            'id': thumbnail.get('quality') or thumbnail_id,
-            'url': thumbnail['src'],
-            'preference': quality_key(thumbnail.get('quality')),
-        } for thumbnail_id, thumbnail in config.get('poster', {}).items() if thumbnail.get('src')]
+        thumbnails = []
+        poster = try_get(config, lambda x: x['poster'], dict) or {}
+        for thumbnail_id, thumbnail in poster.items():
+            thumbnail_url = urljoin(url, thumbnail.get('src'))
+            if not thumbnail_url:
+                continue
+            thumbnails.append({
+                'id': thumbnail.get('quality') or thumbnail_id,
+                'url': thumbnail_url,
+                'preference': quality_key(thumbnail.get('quality')),
+            })
 
         return {
             'id': video_id,
 
         return {
             'id': video_id,
index 6a2c6cb7bb6d039c56fcf7325de422846c437ab5..de6a707c4265c4fc61a57db117a432a95468ab54 100644 (file)
@@ -6,7 +6,7 @@
 
 
 class NhkVodIE(InfoExtractor):
 
 
 class NhkVodIE(InfoExtractor):
-    _VALID_URL = r'https?://www3\.nhk\.or\.jp/nhkworld/(?P<lang>[a-z]{2})/ondemand/(?P<type>video|audio)/(?P<id>\d{7}|[a-z]+-\d{8}-\d+)'
+    _VALID_URL = r'https?://www3\.nhk\.or\.jp/nhkworld/(?P<lang>[a-z]{2})/ondemand/(?P<type>video|audio)/(?P<id>\d{7}|[^/]+?-\d{8}-\d+)'
     # Content available only for a limited period of time. Visit
     # https://www3.nhk.or.jp/nhkworld/en/ondemand/ for working samples.
     _TESTS = [{
     # Content available only for a limited period of time. Visit
     # https://www3.nhk.or.jp/nhkworld/en/ondemand/ for working samples.
     _TESTS = [{
@@ -30,8 +30,11 @@ class NhkVodIE(InfoExtractor):
     }, {
         'url': 'https://www3.nhk.or.jp/nhkworld/fr/ondemand/audio/plugin-20190404-1/',
         'only_matching': True,
     }, {
         'url': 'https://www3.nhk.or.jp/nhkworld/fr/ondemand/audio/plugin-20190404-1/',
         'only_matching': True,
+    }, {
+        'url': 'https://www3.nhk.or.jp/nhkworld/en/ondemand/audio/j_art-20150903-1/',
+        'only_matching': True,
     }]
     }]
-    _API_URL_TEMPLATE = 'https://api.nhk.or.jp/nhkworld/%sod%slist/v7/episode/%s/%s/all%s.json'
+    _API_URL_TEMPLATE = 'https://api.nhk.or.jp/nhkworld/%sod%slist/v7a/episode/%s/%s/all%s.json'
 
     def _real_extract(self, url):
         lang, m_type, episode_id = re.match(self._VALID_URL, url).groups()
 
     def _real_extract(self, url):
         lang, m_type, episode_id = re.match(self._VALID_URL, url).groups()
@@ -82,15 +85,9 @@ def get_clean_field(key):
             audio = episode['audio']
             audio_path = audio['audio']
             info['formats'] = self._extract_m3u8_formats(
             audio = episode['audio']
             audio_path = audio['audio']
             info['formats'] = self._extract_m3u8_formats(
-                'https://nhks-vh.akamaihd.net/i%s/master.m3u8' % audio_path,
-                episode_id, 'm4a', m3u8_id='hls', fatal=False)
-            for proto in ('rtmpt', 'rtmp'):
-                info['formats'].append({
-                    'ext': 'flv',
-                    'format_id': proto,
-                    'url': '%s://flv.nhk.or.jp/ondemand/mp4:flv%s' % (proto, audio_path),
-                    'vcodec': 'none',
-                })
+                'https://nhkworld-vh.akamaihd.net/i%s/master.m3u8' % audio_path,
+                episode_id, 'm4a', entry_protocol='m3u8_native',
+                m3u8_id='hls', fatal=False)
             for f in info['formats']:
                 f['language'] = lang
         return info
             for f in info['formats']:
                 f['language'] = lang
         return info
index 901f44b54f40c2c02e120c636a80b0b5bfb4ea2e..47b9748f0202d76ddef8ee5f96257bebe88e4169 100644 (file)
@@ -6,6 +6,7 @@
 from .common import InfoExtractor
 from ..utils import (
     clean_html,
 from .common import InfoExtractor
 from ..utils import (
     clean_html,
+    determine_ext,
     int_or_none,
     js_to_json,
     qualities,
     int_or_none,
     js_to_json,
     qualities,
@@ -18,7 +19,7 @@ class NovaEmbedIE(InfoExtractor):
     _VALID_URL = r'https?://media\.cms\.nova\.cz/embed/(?P<id>[^/?#&]+)'
     _TEST = {
         'url': 'https://media.cms.nova.cz/embed/8o0n0r?autoplay=1',
     _VALID_URL = r'https?://media\.cms\.nova\.cz/embed/(?P<id>[^/?#&]+)'
     _TEST = {
         'url': 'https://media.cms.nova.cz/embed/8o0n0r?autoplay=1',
-        'md5': 'b3834f6de5401baabf31ed57456463f7',
+        'md5': 'ee009bafcc794541570edd44b71cbea3',
         'info_dict': {
             'id': '8o0n0r',
             'ext': 'mp4',
         'info_dict': {
             'id': '8o0n0r',
             'ext': 'mp4',
@@ -33,36 +34,76 @@ def _real_extract(self, url):
 
         webpage = self._download_webpage(url, video_id)
 
 
         webpage = self._download_webpage(url, video_id)
 
-        bitrates = self._parse_json(
+        duration = None
+        formats = []
+
+        player = self._parse_json(
             self._search_regex(
             self._search_regex(
-                r'(?s)(?:src|bitrates)\s*=\s*({.+?})\s*;', webpage, 'formats'),
-            video_id, transform_source=js_to_json)
+                r'Player\.init\s*\([^,]+,\s*({.+?})\s*,\s*{.+?}\s*\)\s*;',
+                webpage, 'player', default='{}'), video_id, fatal=False)
+        if player:
+            for format_id, format_list in player['tracks'].items():
+                if not isinstance(format_list, list):
+                    format_list = [format_list]
+                for format_dict in format_list:
+                    if not isinstance(format_dict, dict):
+                        continue
+                    format_url = url_or_none(format_dict.get('src'))
+                    format_type = format_dict.get('type')
+                    ext = determine_ext(format_url)
+                    if (format_type == 'application/x-mpegURL'
+                            or format_id == 'HLS' or ext == 'm3u8'):
+                        formats.extend(self._extract_m3u8_formats(
+                            format_url, video_id, 'mp4',
+                            entry_protocol='m3u8_native', m3u8_id='hls',
+                            fatal=False))
+                    elif (format_type == 'application/dash+xml'
+                          or format_id == 'DASH' or ext == 'mpd'):
+                        formats.extend(self._extract_mpd_formats(
+                            format_url, video_id, mpd_id='dash', fatal=False))
+                    else:
+                        formats.append({
+                            'url': format_url,
+                        })
+            duration = int_or_none(player.get('duration'))
+        else:
+            # Old path, not actual as of 08.04.2020
+            bitrates = self._parse_json(
+                self._search_regex(
+                    r'(?s)(?:src|bitrates)\s*=\s*({.+?})\s*;', webpage, 'formats'),
+                video_id, transform_source=js_to_json)
 
 
-        QUALITIES = ('lq', 'mq', 'hq', 'hd')
-        quality_key = qualities(QUALITIES)
+            QUALITIES = ('lq', 'mq', 'hq', 'hd')
+            quality_key = qualities(QUALITIES)
+
+            for format_id, format_list in bitrates.items():
+                if not isinstance(format_list, list):
+                    format_list = [format_list]
+                for format_url in format_list:
+                    format_url = url_or_none(format_url)
+                    if not format_url:
+                        continue
+                    if format_id == 'hls':
+                        formats.extend(self._extract_m3u8_formats(
+                            format_url, video_id, ext='mp4',
+                            entry_protocol='m3u8_native', m3u8_id='hls',
+                            fatal=False))
+                        continue
+                    f = {
+                        'url': format_url,
+                    }
+                    f_id = format_id
+                    for quality in QUALITIES:
+                        if '%s.mp4' % quality in format_url:
+                            f_id += '-%s' % quality
+                            f.update({
+                                'quality': quality_key(quality),
+                                'format_note': quality.upper(),
+                            })
+                            break
+                    f['format_id'] = f_id
+                    formats.append(f)
 
 
-        formats = []
-        for format_id, format_list in bitrates.items():
-            if not isinstance(format_list, list):
-                continue
-            for format_url in format_list:
-                format_url = url_or_none(format_url)
-                if not format_url:
-                    continue
-                f = {
-                    'url': format_url,
-                }
-                f_id = format_id
-                for quality in QUALITIES:
-                    if '%s.mp4' % quality in format_url:
-                        f_id += '-%s' % quality
-                        f.update({
-                            'quality': quality_key(quality),
-                            'format_note': quality.upper(),
-                        })
-                        break
-                f['format_id'] = f_id
-                formats.append(f)
         self._sort_formats(formats)
 
         title = self._og_search_title(
         self._sort_formats(formats)
 
         title = self._og_search_title(
@@ -75,7 +116,8 @@ def _real_extract(self, url):
             r'poster\s*:\s*(["\'])(?P<value>(?:(?!\1).)+)\1', webpage,
             'thumbnail', fatal=False, group='value')
         duration = int_or_none(self._search_regex(
             r'poster\s*:\s*(["\'])(?P<value>(?:(?!\1).)+)\1', webpage,
             'thumbnail', fatal=False, group='value')
         duration = int_or_none(self._search_regex(
-            r'videoDuration\s*:\s*(\d+)', webpage, 'duration', fatal=False))
+            r'videoDuration\s*:\s*(\d+)', webpage, 'duration',
+            default=duration))
 
         return {
             'id': video_id,
 
         return {
             'id': video_id,
@@ -91,7 +133,7 @@ class NovaIE(InfoExtractor):
     _VALID_URL = r'https?://(?:[^.]+\.)?(?P<site>tv(?:noviny)?|tn|novaplus|vymena|fanda|krasna|doma|prask)\.nova\.cz/(?:[^/]+/)+(?P<id>[^/]+?)(?:\.html|/|$)'
     _TESTS = [{
         'url': 'http://tn.nova.cz/clanek/tajemstvi-ukryte-v-podzemi-specialni-nemocnice-v-prazske-krci.html#player_13260',
     _VALID_URL = r'https?://(?:[^.]+\.)?(?P<site>tv(?:noviny)?|tn|novaplus|vymena|fanda|krasna|doma|prask)\.nova\.cz/(?:[^/]+/)+(?P<id>[^/]+?)(?:\.html|/|$)'
     _TESTS = [{
         'url': 'http://tn.nova.cz/clanek/tajemstvi-ukryte-v-podzemi-specialni-nemocnice-v-prazske-krci.html#player_13260',
-        'md5': '1dd7b9d5ea27bc361f110cd855a19bd3',
+        'md5': '249baab7d0104e186e78b0899c7d5f28',
         'info_dict': {
             'id': '1757139',
             'display_id': 'tajemstvi-ukryte-v-podzemi-specialni-nemocnice-v-prazske-krci',
         'info_dict': {
             'id': '1757139',
             'display_id': 'tajemstvi-ukryte-v-podzemi-specialni-nemocnice-v-prazske-krci',
@@ -113,7 +155,8 @@ class NovaIE(InfoExtractor):
         'params': {
             # rtmp download
             'skip_download': True,
         'params': {
             # rtmp download
             'skip_download': True,
-        }
+        },
+        'skip': 'gone',
     }, {
         # media.cms.nova.cz embed
         'url': 'https://novaplus.nova.cz/porad/ulice/epizoda/18760-2180-dil',
     }, {
         # media.cms.nova.cz embed
         'url': 'https://novaplus.nova.cz/porad/ulice/epizoda/18760-2180-dil',
@@ -128,6 +171,7 @@ class NovaIE(InfoExtractor):
             'skip_download': True,
         },
         'add_ie': [NovaEmbedIE.ie_key()],
             'skip_download': True,
         },
         'add_ie': [NovaEmbedIE.ie_key()],
+        'skip': 'CHYBA 404: STRÁNKA NENALEZENA',
     }, {
         'url': 'http://sport.tn.nova.cz/clanek/sport/hokej/nhl/zivot-jde-dal-hodnotil-po-vyrazeni-z-playoff-jiri-sekac.html',
         'only_matching': True,
     }, {
         'url': 'http://sport.tn.nova.cz/clanek/sport/hokej/nhl/zivot-jde-dal-hodnotil-po-vyrazeni-z-playoff-jiri-sekac.html',
         'only_matching': True,
@@ -152,14 +196,29 @@ def _real_extract(self, url):
 
         webpage = self._download_webpage(url, display_id)
 
 
         webpage = self._download_webpage(url, display_id)
 
+        description = clean_html(self._og_search_description(webpage, default=None))
+        if site == 'novaplus':
+            upload_date = unified_strdate(self._search_regex(
+                r'(\d{1,2}-\d{1,2}-\d{4})$', display_id, 'upload date', default=None))
+        elif site == 'fanda':
+            upload_date = unified_strdate(self._search_regex(
+                r'<span class="date_time">(\d{1,2}\.\d{1,2}\.\d{4})', webpage, 'upload date', default=None))
+        else:
+            upload_date = None
+
         # novaplus
         embed_id = self._search_regex(
             r'<iframe[^>]+\bsrc=["\'](?:https?:)?//media\.cms\.nova\.cz/embed/([^/?#&]+)',
             webpage, 'embed url', default=None)
         if embed_id:
         # novaplus
         embed_id = self._search_regex(
             r'<iframe[^>]+\bsrc=["\'](?:https?:)?//media\.cms\.nova\.cz/embed/([^/?#&]+)',
             webpage, 'embed url', default=None)
         if embed_id:
-            return self.url_result(
-                'https://media.cms.nova.cz/embed/%s' % embed_id,
-                ie=NovaEmbedIE.ie_key(), video_id=embed_id)
+            return {
+                '_type': 'url_transparent',
+                'url': 'https://media.cms.nova.cz/embed/%s' % embed_id,
+                'ie_key': NovaEmbedIE.ie_key(),
+                'id': embed_id,
+                'description': description,
+                'upload_date': upload_date
+            }
 
         video_id = self._search_regex(
             [r"(?:media|video_id)\s*:\s*'(\d+)'",
 
         video_id = self._search_regex(
             [r"(?:media|video_id)\s*:\s*'(\d+)'",
@@ -233,18 +292,8 @@ def _real_extract(self, url):
         self._sort_formats(formats)
 
         title = mediafile.get('meta', {}).get('title') or self._og_search_title(webpage)
         self._sort_formats(formats)
 
         title = mediafile.get('meta', {}).get('title') or self._og_search_title(webpage)
-        description = clean_html(self._og_search_description(webpage, default=None))
         thumbnail = config.get('poster')
 
         thumbnail = config.get('poster')
 
-        if site == 'novaplus':
-            upload_date = unified_strdate(self._search_regex(
-                r'(\d{1,2}-\d{1,2}-\d{4})$', display_id, 'upload date', default=None))
-        elif site == 'fanda':
-            upload_date = unified_strdate(self._search_regex(
-                r'<span class="date_time">(\d{1,2}\.\d{1,2}\.\d{4})', webpage, 'upload date', default=None))
-        else:
-            upload_date = None
-
         return {
             'id': video_id,
             'display_id': display_id,
         return {
             'id': video_id,
             'display_id': display_id,
index a5e8baa7e2542f4e8d6a8c83dea7bddecf82413d..53acc6e574c0743a223d552d95d8806dc071ed41 100644 (file)
@@ -4,6 +4,7 @@
 from ..utils import (
     int_or_none,
     qualities,
 from ..utils import (
     int_or_none,
     qualities,
+    url_or_none,
 )
 
 
 )
 
 
@@ -48,6 +49,10 @@ class NprIE(InfoExtractor):
             },
         }],
         'expected_warnings': ['Failed to download m3u8 information'],
             },
         }],
         'expected_warnings': ['Failed to download m3u8 information'],
+    }, {
+        # multimedia, no formats, stream
+        'url': 'https://www.npr.org/2020/02/14/805476846/laura-stevenson-tiny-desk-concert',
+        'only_matching': True,
     }]
 
     def _real_extract(self, url):
     }]
 
     def _real_extract(self, url):
@@ -95,6 +100,17 @@ def _real_extract(self, url):
                             'format_id': format_id,
                             'quality': quality(format_id),
                         })
                             'format_id': format_id,
                             'quality': quality(format_id),
                         })
+            for stream_id, stream_entry in media.get('stream', {}).items():
+                if not isinstance(stream_entry, dict):
+                    continue
+                if stream_id != 'hlsUrl':
+                    continue
+                stream_url = url_or_none(stream_entry.get('$text'))
+                if not stream_url:
+                    continue
+                formats.extend(self._extract_m3u8_formats(
+                    stream_url, stream_id, 'mp4', 'm3u8_native',
+                    m3u8_id='hls', fatal=False))
             self._sort_formats(formats)
 
             entries.append({
             self._sort_formats(formats)
 
             entries.append({
index 60933f069c4ca4d38cfb19cad742fd2cb1d1b537..94115534b72ac19f3aaea3e35ef06fa3eaef3d7f 100644 (file)
@@ -12,6 +12,7 @@
     ExtractorError,
     int_or_none,
     JSON_LD_RE,
     ExtractorError,
     int_or_none,
     JSON_LD_RE,
+    js_to_json,
     NO_DEFAULT,
     parse_age_limit,
     parse_duration,
     NO_DEFAULT,
     parse_age_limit,
     parse_duration,
@@ -105,6 +106,7 @@ def video_id_and_title(idx):
             MESSAGES = {
                 'ProgramRightsAreNotReady': 'Du kan dessverre ikke se eller høre programmet',
                 'ProgramRightsHasExpired': 'Programmet har gått ut',
             MESSAGES = {
                 'ProgramRightsAreNotReady': 'Du kan dessverre ikke se eller høre programmet',
                 'ProgramRightsHasExpired': 'Programmet har gått ut',
+                'NoProgramRights': 'Ikke tilgjengelig',
                 'ProgramIsGeoBlocked': 'NRK har ikke rettigheter til å vise dette programmet utenfor Norge',
             }
             message_type = data.get('messageType', '')
                 'ProgramIsGeoBlocked': 'NRK har ikke rettigheter til å vise dette programmet utenfor Norge',
             }
             message_type = data.get('messageType', '')
@@ -255,6 +257,17 @@ class NRKTVIE(NRKBaseIE):
                     ''' % _EPISODE_RE
     _API_HOSTS = ('psapi-ne.nrk.no', 'psapi-we.nrk.no')
     _TESTS = [{
                     ''' % _EPISODE_RE
     _API_HOSTS = ('psapi-ne.nrk.no', 'psapi-we.nrk.no')
     _TESTS = [{
+        'url': 'https://tv.nrk.no/program/MDDP12000117',
+        'md5': '8270824df46ec629b66aeaa5796b36fb',
+        'info_dict': {
+            'id': 'MDDP12000117AA',
+            'ext': 'mp4',
+            'title': 'Alarm Trolltunga',
+            'description': 'md5:46923a6e6510eefcce23d5ef2a58f2ce',
+            'duration': 2223,
+            'age_limit': 6,
+        },
+    }, {
         'url': 'https://tv.nrk.no/serie/20-spoersmaal-tv/MUHH48000314/23-05-2014',
         'md5': '9a167e54d04671eb6317a37b7bc8a280',
         'info_dict': {
         'url': 'https://tv.nrk.no/serie/20-spoersmaal-tv/MUHH48000314/23-05-2014',
         'md5': '9a167e54d04671eb6317a37b7bc8a280',
         'info_dict': {
@@ -266,6 +279,7 @@ class NRKTVIE(NRKBaseIE):
             'series': '20 spørsmål',
             'episode': '23.05.2014',
         },
             'series': '20 spørsmål',
             'episode': '23.05.2014',
         },
+        'skip': 'NoProgramRights',
     }, {
         'url': 'https://tv.nrk.no/program/mdfp15000514',
         'info_dict': {
     }, {
         'url': 'https://tv.nrk.no/program/mdfp15000514',
         'info_dict': {
@@ -370,7 +384,24 @@ class NRKTVIE(NRKBaseIE):
 
 class NRKTVEpisodeIE(InfoExtractor):
     _VALID_URL = r'https?://tv\.nrk\.no/serie/(?P<id>[^/]+/sesong/\d+/episode/\d+)'
 
 class NRKTVEpisodeIE(InfoExtractor):
     _VALID_URL = r'https?://tv\.nrk\.no/serie/(?P<id>[^/]+/sesong/\d+/episode/\d+)'
-    _TEST = {
+    _TESTS = [{
+        'url': 'https://tv.nrk.no/serie/hellums-kro/sesong/1/episode/2',
+        'info_dict': {
+            'id': 'MUHH36005220BA',
+            'ext': 'mp4',
+            'title': 'Kro, krig og kjærlighet 2:6',
+            'description': 'md5:b32a7dc0b1ed27c8064f58b97bda4350',
+            'duration': 1563,
+            'series': 'Hellums kro',
+            'season_number': 1,
+            'episode_number': 2,
+            'episode': '2:6',
+            'age_limit': 6,
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }, {
         'url': 'https://tv.nrk.no/serie/backstage/sesong/1/episode/8',
         'info_dict': {
             'id': 'MSUI14000816AA',
         'url': 'https://tv.nrk.no/serie/backstage/sesong/1/episode/8',
         'info_dict': {
             'id': 'MSUI14000816AA',
@@ -386,7 +417,8 @@ class NRKTVEpisodeIE(InfoExtractor):
         'params': {
             'skip_download': True,
         },
         'params': {
             'skip_download': True,
         },
-    }
+        'skip': 'ProgramRightsHasExpired',
+    }]
 
     def _real_extract(self, url):
         display_id = self._match_id(url)
 
     def _real_extract(self, url):
         display_id = self._match_id(url)
@@ -409,7 +441,7 @@ def _extract_series(self, webpage, display_id, fatal=True):
                 (r'INITIAL_DATA(?:_V\d)?_*\s*=\s*({.+?})\s*;',
                  r'({.+?})\s*,\s*"[^"]+"\s*\)\s*</script>'),
                 webpage, 'config', default='{}' if not fatal else NO_DEFAULT),
                 (r'INITIAL_DATA(?:_V\d)?_*\s*=\s*({.+?})\s*;',
                  r'({.+?})\s*,\s*"[^"]+"\s*\)\s*</script>'),
                 webpage, 'config', default='{}' if not fatal else NO_DEFAULT),
-            display_id, fatal=False)
+            display_id, fatal=False, transform_source=js_to_json)
         if not config:
             return
         return try_get(
         if not config:
             return
         return try_get(
@@ -479,6 +511,14 @@ class NRKTVSeriesIE(NRKTVSerieBaseIE):
     _VALID_URL = r'https?://(?:tv|radio)\.nrk(?:super)?\.no/serie/(?P<id>[^/]+)'
     _ITEM_RE = r'(?:data-season=["\']|id=["\']season-)(?P<id>\d+)'
     _TESTS = [{
     _VALID_URL = r'https?://(?:tv|radio)\.nrk(?:super)?\.no/serie/(?P<id>[^/]+)'
     _ITEM_RE = r'(?:data-season=["\']|id=["\']season-)(?P<id>\d+)'
     _TESTS = [{
+        'url': 'https://tv.nrk.no/serie/blank',
+        'info_dict': {
+            'id': 'blank',
+            'title': 'Blank',
+            'description': 'md5:7664b4e7e77dc6810cd3bca367c25b6e',
+        },
+        'playlist_mincount': 30,
+    }, {
         # new layout, seasons
         'url': 'https://tv.nrk.no/serie/backstage',
         'info_dict': {
         # new layout, seasons
         'url': 'https://tv.nrk.no/serie/backstage',
         'info_dict': {
@@ -648,7 +688,7 @@ class NRKSkoleIE(InfoExtractor):
 
     _TESTS = [{
         'url': 'https://www.nrk.no/skole/?page=search&q=&mediaId=14099',
 
     _TESTS = [{
         'url': 'https://www.nrk.no/skole/?page=search&q=&mediaId=14099',
-        'md5': '6bc936b01f9dd8ed45bc58b252b2d9b6',
+        'md5': '18c12c3d071953c3bf8d54ef6b2587b7',
         'info_dict': {
             'id': '6021',
             'ext': 'mp4',
         'info_dict': {
             'id': '6021',
             'ext': 'mp4',
index 2bb77ab249239163d8318a57e8fd0fdb57d2e32a..fc78ca56c90d37b00c1f396aee7c896d54fb91c9 100644 (file)
@@ -69,10 +69,10 @@ def get_file_size(file_size):
                     'width': int_or_none(video.get('width')),
                     'height': int_or_none(video.get('height')),
                     'filesize': get_file_size(video.get('file_size') or video.get('fileSize')),
                     'width': int_or_none(video.get('width')),
                     'height': int_or_none(video.get('height')),
                     'filesize': get_file_size(video.get('file_size') or video.get('fileSize')),
-                    'tbr': int_or_none(video.get('bitrate'), 1000),
+                    'tbr': int_or_none(video.get('bitrate'), 1000) or None,
                     'ext': ext,
                 })
                     'ext': ext,
                 })
-        self._sort_formats(formats)
+        self._sort_formats(formats, ('height', 'width', 'filesize', 'tbr', 'fps', 'format_id'))
 
         thumbnails = []
         for image in video_data.get('images', []):
 
         thumbnails = []
         for image in video_data.get('images', []):
index 3425f76024c04cdb937502361480a81e0b958c16..700ce448c4b8faa925aa1dae179f030acaa0b4f6 100644 (file)
@@ -6,12 +6,14 @@
 from .common import InfoExtractor
 from ..compat import compat_str
 from ..utils import (
 from .common import InfoExtractor
 from ..compat import compat_str
 from ..utils import (
+    clean_html,
     determine_ext,
     float_or_none,
     HEADRequest,
     int_or_none,
     orderedSet,
     remove_end,
     determine_ext,
     float_or_none,
     HEADRequest,
     int_or_none,
     orderedSet,
     remove_end,
+    str_or_none,
     strip_jsonp,
     unescapeHTML,
     unified_strdate,
     strip_jsonp,
     unescapeHTML,
     unified_strdate,
@@ -88,8 +90,11 @@ def _real_extract(self, url):
                 format_id = '-'.join(format_id_list)
                 ext = determine_ext(src)
                 if ext == 'm3u8':
                 format_id = '-'.join(format_id_list)
                 ext = determine_ext(src)
                 if ext == 'm3u8':
-                    formats.extend(self._extract_m3u8_formats(
-                        src, video_id, 'mp4', m3u8_id=format_id, fatal=False))
+                    m3u8_formats = self._extract_m3u8_formats(
+                        src, video_id, 'mp4', m3u8_id=format_id, fatal=False)
+                    if any('/geoprotection' in f['url'] for f in m3u8_formats):
+                        self.raise_geo_restricted()
+                    formats.extend(m3u8_formats)
                 elif ext == 'f4m':
                     formats.extend(self._extract_f4m_formats(
                         src, video_id, f4m_id=format_id, fatal=False))
                 elif ext == 'f4m':
                     formats.extend(self._extract_f4m_formats(
                         src, video_id, f4m_id=format_id, fatal=False))
@@ -157,48 +162,53 @@ def _real_extract(self, url):
 class ORFRadioIE(InfoExtractor):
     def _real_extract(self, url):
         mobj = re.match(self._VALID_URL, url)
 class ORFRadioIE(InfoExtractor):
     def _real_extract(self, url):
         mobj = re.match(self._VALID_URL, url)
-        station = mobj.group('station')
         show_date = mobj.group('date')
         show_id = mobj.group('show')
 
         show_date = mobj.group('date')
         show_id = mobj.group('show')
 
-        if station == 'fm4':
-            show_id = '4%s' % show_id
-
         data = self._download_json(
         data = self._download_json(
-            'http://audioapi.orf.at/%s/api/json/current/broadcast/%s/%s' % (station, show_id, show_date),
-            show_id
-        )
-
-        def extract_entry_dict(info, title, subtitle):
-            return {
-                'id': info['loopStreamId'].replace('.mp3', ''),
-                'url': 'http://loopstream01.apa.at/?channel=%s&id=%s' % (station, info['loopStreamId']),
+            'http://audioapi.orf.at/%s/api/json/current/broadcast/%s/%s'
+            % (self._API_STATION, show_id, show_date), show_id)
+
+        entries = []
+        for info in data['streams']:
+            loop_stream_id = str_or_none(info.get('loopStreamId'))
+            if not loop_stream_id:
+                continue
+            title = str_or_none(data.get('title'))
+            if not title:
+                continue
+            start = int_or_none(info.get('start'), scale=1000)
+            end = int_or_none(info.get('end'), scale=1000)
+            duration = end - start if end and start else None
+            entries.append({
+                'id': loop_stream_id.replace('.mp3', ''),
+                'url': 'http://loopstream01.apa.at/?channel=%s&id=%s' % (self._LOOP_STATION, loop_stream_id),
                 'title': title,
                 'title': title,
-                'description': subtitle,
-                'duration': (info['end'] - info['start']) / 1000,
-                'timestamp': info['start'] / 1000,
+                'description': clean_html(data.get('subtitle')),
+                'duration': duration,
+                'timestamp': start,
                 'ext': 'mp3',
                 'ext': 'mp3',
-                'series': data.get('programTitle')
-            }
-
-        entries = [extract_entry_dict(t, data['title'], data['subtitle']) for t in data['streams']]
+                'series': data.get('programTitle'),
+            })
 
         return {
             '_type': 'playlist',
             'id': show_id,
 
         return {
             '_type': 'playlist',
             'id': show_id,
-            'title': data['title'],
-            'description': data['subtitle'],
-            'entries': entries
+            'title': data.get('title'),
+            'description': clean_html(data.get('subtitle')),
+            'entries': entries,
         }
 
 
 class ORFFM4IE(ORFRadioIE):
     IE_NAME = 'orf:fm4'
     IE_DESC = 'radio FM4'
         }
 
 
 class ORFFM4IE(ORFRadioIE):
     IE_NAME = 'orf:fm4'
     IE_DESC = 'radio FM4'
-    _VALID_URL = r'https?://(?P<station>fm4)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
+    _VALID_URL = r'https?://(?P<station>fm4)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>4\w+)'
+    _API_STATION = 'fm4'
+    _LOOP_STATION = 'fm4'
 
     _TEST = {
 
     _TEST = {
-        'url': 'http://fm4.orf.at/player/20170107/CC',
+        'url': 'http://fm4.orf.at/player/20170107/4CC',
         'md5': '2b0be47375432a7ef104453432a19212',
         'info_dict': {
             'id': '2017-01-07_2100_tl_54_7DaysSat18_31295',
         'md5': '2b0be47375432a7ef104453432a19212',
         'info_dict': {
             'id': '2017-01-07_2100_tl_54_7DaysSat18_31295',
@@ -209,7 +219,138 @@ class ORFFM4IE(ORFRadioIE):
             'timestamp': 1483819257,
             'upload_date': '20170107',
         },
             'timestamp': 1483819257,
             'upload_date': '20170107',
         },
-        'skip': 'Shows from ORF radios are only available for 7 days.'
+        'skip': 'Shows from ORF radios are only available for 7 days.',
+        'only_matching': True,
+    }
+
+
+class ORFNOEIE(ORFRadioIE):
+    IE_NAME = 'orf:noe'
+    IE_DESC = 'Radio Niederösterreich'
+    _VALID_URL = r'https?://(?P<station>noe)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
+    _API_STATION = 'noe'
+    _LOOP_STATION = 'oe2n'
+
+    _TEST = {
+        'url': 'https://noe.orf.at/player/20200423/NGM',
+        'only_matching': True,
+    }
+
+
+class ORFWIEIE(ORFRadioIE):
+    IE_NAME = 'orf:wien'
+    IE_DESC = 'Radio Wien'
+    _VALID_URL = r'https?://(?P<station>wien)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
+    _API_STATION = 'wie'
+    _LOOP_STATION = 'oe2w'
+
+    _TEST = {
+        'url': 'https://wien.orf.at/player/20200423/WGUM',
+        'only_matching': True,
+    }
+
+
+class ORFBGLIE(ORFRadioIE):
+    IE_NAME = 'orf:burgenland'
+    IE_DESC = 'Radio Burgenland'
+    _VALID_URL = r'https?://(?P<station>burgenland)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
+    _API_STATION = 'bgl'
+    _LOOP_STATION = 'oe2b'
+
+    _TEST = {
+        'url': 'https://burgenland.orf.at/player/20200423/BGM',
+        'only_matching': True,
+    }
+
+
+class ORFOOEIE(ORFRadioIE):
+    IE_NAME = 'orf:oberoesterreich'
+    IE_DESC = 'Radio Oberösterreich'
+    _VALID_URL = r'https?://(?P<station>ooe)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
+    _API_STATION = 'ooe'
+    _LOOP_STATION = 'oe2o'
+
+    _TEST = {
+        'url': 'https://ooe.orf.at/player/20200423/OGMO',
+        'only_matching': True,
+    }
+
+
+class ORFSTMIE(ORFRadioIE):
+    IE_NAME = 'orf:steiermark'
+    IE_DESC = 'Radio Steiermark'
+    _VALID_URL = r'https?://(?P<station>steiermark)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
+    _API_STATION = 'stm'
+    _LOOP_STATION = 'oe2st'
+
+    _TEST = {
+        'url': 'https://steiermark.orf.at/player/20200423/STGMS',
+        'only_matching': True,
+    }
+
+
+class ORFKTNIE(ORFRadioIE):
+    IE_NAME = 'orf:kaernten'
+    IE_DESC = 'Radio Kärnten'
+    _VALID_URL = r'https?://(?P<station>kaernten)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
+    _API_STATION = 'ktn'
+    _LOOP_STATION = 'oe2k'
+
+    _TEST = {
+        'url': 'https://kaernten.orf.at/player/20200423/KGUMO',
+        'only_matching': True,
+    }
+
+
+class ORFSBGIE(ORFRadioIE):
+    IE_NAME = 'orf:salzburg'
+    IE_DESC = 'Radio Salzburg'
+    _VALID_URL = r'https?://(?P<station>salzburg)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
+    _API_STATION = 'sbg'
+    _LOOP_STATION = 'oe2s'
+
+    _TEST = {
+        'url': 'https://salzburg.orf.at/player/20200423/SGUM',
+        'only_matching': True,
+    }
+
+
+class ORFTIRIE(ORFRadioIE):
+    IE_NAME = 'orf:tirol'
+    IE_DESC = 'Radio Tirol'
+    _VALID_URL = r'https?://(?P<station>tirol)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
+    _API_STATION = 'tir'
+    _LOOP_STATION = 'oe2t'
+
+    _TEST = {
+        'url': 'https://tirol.orf.at/player/20200423/TGUMO',
+        'only_matching': True,
+    }
+
+
+class ORFVBGIE(ORFRadioIE):
+    IE_NAME = 'orf:vorarlberg'
+    IE_DESC = 'Radio Vorarlberg'
+    _VALID_URL = r'https?://(?P<station>vorarlberg)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
+    _API_STATION = 'vbg'
+    _LOOP_STATION = 'oe2v'
+
+    _TEST = {
+        'url': 'https://vorarlberg.orf.at/player/20200423/VGUM',
+        'only_matching': True,
+    }
+
+
+class ORFOE3IE(ORFRadioIE):
+    IE_NAME = 'orf:oe3'
+    IE_DESC = 'Radio Österreich 3'
+    _VALID_URL = r'https?://(?P<station>oe3)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
+    _API_STATION = 'oe3'
+    _LOOP_STATION = 'oe3'
+
+    _TEST = {
+        'url': 'https://oe3.orf.at/player/20200424/3WEK',
+        'only_matching': True,
     }
 
 
     }
 
 
@@ -217,6 +358,8 @@ class ORFOE1IE(ORFRadioIE):
     IE_NAME = 'orf:oe1'
     IE_DESC = 'Radio Österreich 1'
     _VALID_URL = r'https?://(?P<station>oe1)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
     IE_NAME = 'orf:oe1'
     IE_DESC = 'Radio Österreich 1'
     _VALID_URL = r'https?://(?P<station>oe1)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
+    _API_STATION = 'oe1'
+    _LOOP_STATION = 'oe1'
 
     _TEST = {
         'url': 'http://oe1.orf.at/player/20170108/456544',
 
     _TEST = {
         'url': 'http://oe1.orf.at/player/20170108/456544',
diff --git a/youtube_dl/extractor/pandatv.py b/youtube_dl/extractor/pandatv.py
deleted file mode 100644 (file)
index 4219802..0000000
+++ /dev/null
@@ -1,99 +0,0 @@
-# coding: utf-8
-from __future__ import unicode_literals
-
-from .common import InfoExtractor
-from ..utils import (
-    ExtractorError,
-    qualities,
-)
-
-
-class PandaTVIE(InfoExtractor):
-    IE_DESC = '熊猫TV'
-    _VALID_URL = r'https?://(?:www\.)?panda\.tv/(?P<id>[0-9]+)'
-    _TESTS = [{
-        'url': 'http://www.panda.tv/66666',
-        'info_dict': {
-            'id': '66666',
-            'title': 're:.+',
-            'uploader': '刘杀鸡',
-            'ext': 'flv',
-            'is_live': True,
-        },
-        'params': {
-            'skip_download': True,
-        },
-        'skip': 'Live stream is offline',
-    }, {
-        'url': 'https://www.panda.tv/66666',
-        'only_matching': True,
-    }]
-
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-
-        config = self._download_json(
-            'https://www.panda.tv/api_room_v2?roomid=%s' % video_id, video_id)
-
-        error_code = config.get('errno', 0)
-        if error_code != 0:
-            raise ExtractorError(
-                '%s returned error %s: %s'
-                % (self.IE_NAME, error_code, config['errmsg']),
-                expected=True)
-
-        data = config['data']
-        video_info = data['videoinfo']
-
-        # 2 = live, 3 = offline
-        if video_info.get('status') != '2':
-            raise ExtractorError(
-                'Live stream is offline', expected=True)
-
-        title = data['roominfo']['name']
-        uploader = data.get('hostinfo', {}).get('name')
-        room_key = video_info['room_key']
-        stream_addr = video_info.get(
-            'stream_addr', {'OD': '1', 'HD': '1', 'SD': '1'})
-
-        # Reverse engineered from web player swf
-        # (http://s6.pdim.gs/static/07153e425f581151.swf at the moment of
-        # writing).
-        plflag0, plflag1 = video_info['plflag'].split('_')
-        plflag0 = int(plflag0) - 1
-        if plflag1 == '21':
-            plflag0 = 10
-            plflag1 = '4'
-        live_panda = 'live_panda' if plflag0 < 1 else ''
-
-        plflag_auth = self._parse_json(video_info['plflag_list'], video_id)
-        sign = plflag_auth['auth']['sign']
-        ts = plflag_auth['auth']['time']
-        rid = plflag_auth['auth']['rid']
-
-        quality_key = qualities(['OD', 'HD', 'SD'])
-        suffix = ['_small', '_mid', '']
-        formats = []
-        for k, v in stream_addr.items():
-            if v != '1':
-                continue
-            quality = quality_key(k)
-            if quality <= 0:
-                continue
-            for pref, (ext, pl) in enumerate((('m3u8', '-hls'), ('flv', ''))):
-                formats.append({
-                    'url': 'https://pl%s%s.live.panda.tv/live_panda/%s%s%s.%s?sign=%s&ts=%s&rid=%s'
-                    % (pl, plflag1, room_key, live_panda, suffix[quality], ext, sign, ts, rid),
-                    'format_id': '%s-%s' % (k, ext),
-                    'quality': quality,
-                    'source_preference': pref,
-                })
-        self._sort_formats(formats)
-
-        return {
-            'id': video_id,
-            'title': self._live_title(title),
-            'uploader': uploader,
-            'formats': formats,
-            'is_live': True,
-        }
index d3a83ea2bb5215e34e72ad85bf99867697e2e1b2..48fb9541693c35878317f22ed9dd6e2da4412ced 100644 (file)
@@ -8,6 +8,7 @@
 from ..utils import (
     int_or_none,
     parse_resolution,
 from ..utils import (
     int_or_none,
     parse_resolution,
+    str_or_none,
     try_get,
     unified_timestamp,
     url_or_none,
     try_get,
     unified_timestamp,
     url_or_none,
@@ -415,6 +416,7 @@ class PeerTubeIE(InfoExtractor):
                             peertube\.cpy\.re
                         )'''
     _UUID_RE = r'[\da-fA-F]{8}-[\da-fA-F]{4}-[\da-fA-F]{4}-[\da-fA-F]{4}-[\da-fA-F]{12}'
                             peertube\.cpy\.re
                         )'''
     _UUID_RE = r'[\da-fA-F]{8}-[\da-fA-F]{4}-[\da-fA-F]{4}-[\da-fA-F]{4}-[\da-fA-F]{12}'
+    _API_BASE = 'https://%s/api/v1/videos/%s/%s'
     _VALID_URL = r'''(?x)
                     (?:
                         peertube:(?P<host>[^:]+):|
     _VALID_URL = r'''(?x)
                     (?:
                         peertube:(?P<host>[^:]+):|
@@ -423,26 +425,30 @@ class PeerTubeIE(InfoExtractor):
                     (?P<id>%s)
                     ''' % (_INSTANCES_RE, _UUID_RE)
     _TESTS = [{
                     (?P<id>%s)
                     ''' % (_INSTANCES_RE, _UUID_RE)
     _TESTS = [{
-        'url': 'https://peertube.cpy.re/videos/watch/2790feb0-8120-4e63-9af3-c943c69f5e6c',
-        'md5': '80f24ff364cc9d333529506a263e7feb',
+        'url': 'https://framatube.org/videos/watch/9c9de5e8-0a1e-484a-b099-e80766180a6d',
+        'md5': '9bed8c0137913e17b86334e5885aacff',
         'info_dict': {
         'info_dict': {
-            'id': '2790feb0-8120-4e63-9af3-c943c69f5e6c',
+            'id': '9c9de5e8-0a1e-484a-b099-e80766180a6d',
             'ext': 'mp4',
             'ext': 'mp4',
-            'title': 'wow',
-            'description': 'wow such video, so gif',
+            'title': 'What is PeerTube?',
+            'description': 'md5:3fefb8dde2b189186ce0719fda6f7b10',
             'thumbnail': r're:https?://.*\.(?:jpg|png)',
             'thumbnail': r're:https?://.*\.(?:jpg|png)',
-            'timestamp': 1519297480,
-            'upload_date': '20180222',
-            'uploader': 'Luclu7',
-            'uploader_id': '7fc42640-efdb-4505-a45d-a15b1a5496f1',
-            'uploder_url': 'https://peertube.nsa.ovh/accounts/luclu7',
-            'license': 'Unknown',
-            'duration': 3,
+            'timestamp': 1538391166,
+            'upload_date': '20181001',
+            'uploader': 'Framasoft',
+            'uploader_id': '3',
+            'uploader_url': 'https://framatube.org/accounts/framasoft',
+            'channel': 'Les vidéos de Framasoft',
+            'channel_id': '2',
+            'channel_url': 'https://framatube.org/video-channels/bf54d359-cfad-4935-9d45-9d6be93f63e8',
+            'language': 'en',
+            'license': 'Attribution - Share Alike',
+            'duration': 113,
             'view_count': int,
             'like_count': int,
             'dislike_count': int,
             'view_count': int,
             'like_count': int,
             'dislike_count': int,
-            'tags': list,
-            'categories': list,
+            'tags': ['framasoft', 'peertube'],
+            'categories': ['Science & Technology'],
         }
     }, {
         'url': 'https://peertube.tamanoir.foucry.net/videos/watch/0b04f13d-1e18-4f1d-814e-4979aa7c9c44',
         }
     }, {
         'url': 'https://peertube.tamanoir.foucry.net/videos/watch/0b04f13d-1e18-4f1d-814e-4979aa7c9c44',
@@ -484,13 +490,38 @@ def _extract_urls(webpage, source_url):
                 entries = [peertube_url]
         return entries
 
                 entries = [peertube_url]
         return entries
 
+    def _call_api(self, host, video_id, path, note=None, errnote=None, fatal=True):
+        return self._download_json(
+            self._API_BASE % (host, video_id, path), video_id,
+            note=note, errnote=errnote, fatal=fatal)
+
+    def _get_subtitles(self, host, video_id):
+        captions = self._call_api(
+            host, video_id, 'captions', note='Downloading captions JSON',
+            fatal=False)
+        if not isinstance(captions, dict):
+            return
+        data = captions.get('data')
+        if not isinstance(data, list):
+            return
+        subtitles = {}
+        for e in data:
+            language_id = try_get(e, lambda x: x['language']['id'], compat_str)
+            caption_url = urljoin('https://%s' % host, e.get('captionPath'))
+            if not caption_url:
+                continue
+            subtitles.setdefault(language_id or 'en', []).append({
+                'url': caption_url,
+            })
+        return subtitles
+
     def _real_extract(self, url):
         mobj = re.match(self._VALID_URL, url)
         host = mobj.group('host') or mobj.group('host_2')
         video_id = mobj.group('id')
 
     def _real_extract(self, url):
         mobj = re.match(self._VALID_URL, url)
         host = mobj.group('host') or mobj.group('host_2')
         video_id = mobj.group('id')
 
-        video = self._download_json(
-            'https://%s/api/v1/videos/%s' % (host, video_id), video_id)
+        video = self._call_api(
+            host, video_id, '', note='Downloading video JSON')
 
         title = video['name']
 
 
         title = video['name']
 
@@ -513,10 +544,28 @@ def _real_extract(self, url):
             formats.append(f)
         self._sort_formats(formats)
 
             formats.append(f)
         self._sort_formats(formats)
 
-        def account_data(field):
-            return try_get(video, lambda x: x['account'][field], compat_str)
+        full_description = self._call_api(
+            host, video_id, 'description', note='Downloading description JSON',
+            fatal=False)
+
+        description = None
+        if isinstance(full_description, dict):
+            description = str_or_none(full_description.get('description'))
+        if not description:
+            description = video.get('description')
+
+        subtitles = self.extract_subtitles(host, video_id)
+
+        def data(section, field, type_):
+            return try_get(video, lambda x: x[section][field], type_)
+
+        def account_data(field, type_):
+            return data('account', field, type_)
+
+        def channel_data(field, type_):
+            return data('channel', field, type_)
 
 
-        category = try_get(video, lambda x: x['category']['label'], compat_str)
+        category = data('category', 'label', compat_str)
         categories = [category] if category else None
 
         nsfw = video.get('nsfw')
         categories = [category] if category else None
 
         nsfw = video.get('nsfw')
@@ -528,14 +577,17 @@ def account_data(field):
         return {
             'id': video_id,
             'title': title,
         return {
             'id': video_id,
             'title': title,
-            'description': video.get('description'),
+            'description': description,
             'thumbnail': urljoin(url, video.get('thumbnailPath')),
             'timestamp': unified_timestamp(video.get('publishedAt')),
             'thumbnail': urljoin(url, video.get('thumbnailPath')),
             'timestamp': unified_timestamp(video.get('publishedAt')),
-            'uploader': account_data('displayName'),
-            'uploader_id': account_data('uuid'),
-            'uploder_url': account_data('url'),
-            'license': try_get(
-                video, lambda x: x['licence']['label'], compat_str),
+            'uploader': account_data('displayName', compat_str),
+            'uploader_id': str_or_none(account_data('id', int)),
+            'uploader_url': url_or_none(account_data('url', compat_str)),
+            'channel': channel_data('displayName', compat_str),
+            'channel_id': str_or_none(channel_data('id', int)),
+            'channel_url': url_or_none(channel_data('url', compat_str)),
+            'language': data('language', 'id', compat_str),
+            'license': data('licence', 'label', compat_str),
             'duration': int_or_none(video.get('duration')),
             'view_count': int_or_none(video.get('views')),
             'like_count': int_or_none(video.get('likes')),
             'duration': int_or_none(video.get('duration')),
             'view_count': int_or_none(video.get('views')),
             'like_count': int_or_none(video.get('likes')),
@@ -544,4 +596,5 @@ def account_data(field):
             'tags': try_get(video, lambda x: x['tags'], list),
             'categories': categories,
             'formats': formats,
             'tags': try_get(video, lambda x: x['tags'], list),
             'categories': categories,
             'formats': formats,
+            'subtitles': subtitles
         }
         }
index c02e34abac8720361f94b7085e5f7fb3814df312..b15906390d07715494b5653dce5499ca0ad72141 100644 (file)
@@ -18,7 +18,7 @@ def _call_api(self, method, query, item_id):
             item_id, query=query)
 
     def _parse_broadcast_data(self, broadcast, video_id):
             item_id, query=query)
 
     def _parse_broadcast_data(self, broadcast, video_id):
-        title = broadcast['status']
+        title = broadcast.get('status') or 'Periscope Broadcast'
         uploader = broadcast.get('user_display_name') or broadcast.get('username')
         title = '%s - %s' % (uploader, title) if uploader else title
         is_live = broadcast.get('state').lower() == 'running'
         uploader = broadcast.get('user_display_name') or broadcast.get('username')
         title = '%s - %s' % (uploader, title) if uploader else title
         is_live = broadcast.get('state').lower() == 'running'
index 602207bebdd6a01d7f33dbf08302ab5a75ccf207..23c8256b59dab4a92ae79ef48dc8e3b0adf0ff68 100644 (file)
@@ -46,7 +46,7 @@ def _login(self):
             headers={'Referer': self._LOGIN_URL})
 
         # login succeeded
             headers={'Referer': self._LOGIN_URL})
 
         # login succeeded
-        if 'platzi.com/login' not in compat_str(urlh.geturl()):
+        if 'platzi.com/login' not in urlh.geturl():
             return
 
         login_error = self._webpage_read_content(
             return
 
         login_error = self._webpage_read_content(
index dd5f17f1192c3543636f6ff24624b0c9cc9a0bd6..80222d42831b8e58116449f96a615ea250b2985f 100644 (file)
@@ -20,20 +20,16 @@ class PokemonIE(InfoExtractor):
             'ext': 'mp4',
             'title': 'The Ol’ Raise and Switch!',
             'description': 'md5:7db77f7107f98ba88401d3adc80ff7af',
             'ext': 'mp4',
             'title': 'The Ol’ Raise and Switch!',
             'description': 'md5:7db77f7107f98ba88401d3adc80ff7af',
-            'timestamp': 1511824728,
-            'upload_date': '20171127',
         },
         'add_id': ['LimelightMedia'],
     }, {
         # no data-video-title
         },
         'add_id': ['LimelightMedia'],
     }, {
         # no data-video-title
-        'url': 'https://www.pokemon.com/us/pokemon-episodes/pokemon-movies/pokemon-the-rise-of-darkrai-2008',
+        'url': 'https://www.pokemon.com/fr/episodes-pokemon/films-pokemon/pokemon-lascension-de-darkrai-2008',
         'info_dict': {
         'info_dict': {
-            'id': '99f3bae270bf4e5097274817239ce9c8',
+            'id': 'dfbaf830d7e54e179837c50c0c6cc0e1',
             'ext': 'mp4',
             'ext': 'mp4',
-            'title': 'Pokémon: The Rise of Darkrai',
-            'description': 'md5:ea8fbbf942e1e497d54b19025dd57d9d',
-            'timestamp': 1417778347,
-            'upload_date': '20141205',
+            'title': "Pokémon : L'ascension de Darkrai",
+            'description': 'md5:d1dbc9e206070c3e14a06ff557659fb5',
         },
         'add_id': ['LimelightMedia'],
         'params': {
         },
         'add_id': ['LimelightMedia'],
         'params': {
diff --git a/youtube_dl/extractor/popcorntimes.py b/youtube_dl/extractor/popcorntimes.py
new file mode 100644 (file)
index 0000000..7bf7f98
--- /dev/null
@@ -0,0 +1,99 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..compat import (
+    compat_b64decode,
+    compat_chr,
+)
+from ..utils import int_or_none
+
+
+class PopcorntimesIE(InfoExtractor):
+    _VALID_URL = r'https?://popcorntimes\.tv/[^/]+/m/(?P<id>[^/]+)/(?P<display_id>[^/?#&]+)'
+    _TEST = {
+        'url': 'https://popcorntimes.tv/de/m/A1XCFvz/haensel-und-gretel-opera-fantasy',
+        'md5': '93f210991ad94ba8c3485950a2453257',
+        'info_dict': {
+            'id': 'A1XCFvz',
+            'display_id': 'haensel-und-gretel-opera-fantasy',
+            'ext': 'mp4',
+            'title': 'Hänsel und Gretel',
+            'description': 'md5:1b8146791726342e7b22ce8125cf6945',
+            'thumbnail': r're:^https?://.*\.jpg$',
+            'creator': 'John Paul',
+            'release_date': '19541009',
+            'duration': 4260,
+            'tbr': 5380,
+            'width': 720,
+            'height': 540,
+        },
+    }
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        video_id, display_id = mobj.group('id', 'display_id')
+
+        webpage = self._download_webpage(url, display_id)
+
+        title = self._search_regex(
+            r'<h1>([^<]+)', webpage, 'title',
+            default=None) or self._html_search_meta(
+            'ya:ovs:original_name', webpage, 'title', fatal=True)
+
+        loc = self._search_regex(
+            r'PCTMLOC\s*=\s*(["\'])(?P<value>(?:(?!\1).)+)\1', webpage, 'loc',
+            group='value')
+
+        loc_b64 = ''
+        for c in loc:
+            c_ord = ord(c)
+            if ord('a') <= c_ord <= ord('z') or ord('A') <= c_ord <= ord('Z'):
+                upper = ord('Z') if c_ord <= ord('Z') else ord('z')
+                c_ord += 13
+                if upper < c_ord:
+                    c_ord -= 26
+            loc_b64 += compat_chr(c_ord)
+
+        video_url = compat_b64decode(loc_b64).decode('utf-8')
+
+        description = self._html_search_regex(
+            r'(?s)<div[^>]+class=["\']pt-movie-desc[^>]+>(.+?)</div>', webpage,
+            'description', fatal=False)
+
+        thumbnail = self._search_regex(
+            r'<img[^>]+class=["\']video-preview[^>]+\bsrc=(["\'])(?P<value>(?:(?!\1).)+)\1',
+            webpage, 'thumbnail', default=None,
+            group='value') or self._og_search_thumbnail(webpage)
+
+        creator = self._html_search_meta(
+            'video:director', webpage, 'creator', default=None)
+
+        release_date = self._html_search_meta(
+            'video:release_date', webpage, default=None)
+        if release_date:
+            release_date = release_date.replace('-', '')
+
+        def int_meta(name):
+            return int_or_none(self._html_search_meta(
+                name, webpage, default=None))
+
+        return {
+            'id': video_id,
+            'display_id': display_id,
+            'url': video_url,
+            'title': title,
+            'description': description,
+            'thumbnail': thumbnail,
+            'creator': creator,
+            'release_date': release_date,
+            'duration': int_meta('video:duration'),
+            'tbr': int_meta('ya:ovs:bitrate'),
+            'width': int_meta('og:video:width'),
+            'height': int_meta('og:video:height'),
+            'http_headers': {
+                'Referer': url,
+            },
+        }
index 27d65d4b9cdcf1e068b0d6971502eaa0caccf895..c6052ac9f966f332d0cfb1f7acfe68b0a143d2b7 100644 (file)
@@ -8,6 +8,7 @@
     ExtractorError,
     int_or_none,
     js_to_json,
     ExtractorError,
     int_or_none,
     js_to_json,
+    merge_dicts,
     urljoin,
 )
 
     urljoin,
 )
 
@@ -27,23 +28,22 @@ class PornHdIE(InfoExtractor):
             'view_count': int,
             'like_count': int,
             'age_limit': 18,
             'view_count': int,
             'like_count': int,
             'age_limit': 18,
-        }
+        },
+        'skip': 'HTTP Error 404: Not Found',
     }, {
     }, {
-        # removed video
         'url': 'http://www.pornhd.com/videos/1962/sierra-day-gets-his-cum-all-over-herself-hd-porn-video',
         'url': 'http://www.pornhd.com/videos/1962/sierra-day-gets-his-cum-all-over-herself-hd-porn-video',
-        'md5': '956b8ca569f7f4d8ec563e2c41598441',
+        'md5': '1b7b3a40b9d65a8e5b25f7ab9ee6d6de',
         'info_dict': {
             'id': '1962',
             'display_id': 'sierra-day-gets-his-cum-all-over-herself-hd-porn-video',
             'ext': 'mp4',
         'info_dict': {
             'id': '1962',
             'display_id': 'sierra-day-gets-his-cum-all-over-herself-hd-porn-video',
             'ext': 'mp4',
-            'title': 'Sierra loves doing laundry',
+            'title': 'md5:98c6f8b2d9c229d0f0fde47f61a1a759',
             'description': 'md5:8ff0523848ac2b8f9b065ba781ccf294',
             'thumbnail': r're:^https?://.*\.jpg',
             'view_count': int,
             'like_count': int,
             'age_limit': 18,
         },
             'description': 'md5:8ff0523848ac2b8f9b065ba781ccf294',
             'thumbnail': r're:^https?://.*\.jpg',
             'view_count': int,
             'like_count': int,
             'age_limit': 18,
         },
-        'skip': 'Not available anymore',
     }]
 
     def _real_extract(self, url):
     }]
 
     def _real_extract(self, url):
@@ -61,7 +61,13 @@ def _real_extract(self, url):
             r"(?s)sources'?\s*[:=]\s*(\{.+?\})",
             webpage, 'sources', default='{}')), video_id)
 
             r"(?s)sources'?\s*[:=]\s*(\{.+?\})",
             webpage, 'sources', default='{}')), video_id)
 
+        info = {}
         if not sources:
         if not sources:
+            entries = self._parse_html5_media_entries(url, webpage, video_id)
+            if entries:
+                info = entries[0]
+
+        if not sources and not info:
             message = self._html_search_regex(
                 r'(?s)<(div|p)[^>]+class="no-video"[^>]*>(?P<value>.+?)</\1',
                 webpage, 'error message', group='value')
             message = self._html_search_regex(
                 r'(?s)<(div|p)[^>]+class="no-video"[^>]*>(?P<value>.+?)</\1',
                 webpage, 'error message', group='value')
@@ -80,23 +86,29 @@ def _real_extract(self, url):
                 'format_id': format_id,
                 'height': height,
             })
                 'format_id': format_id,
                 'height': height,
             })
-        self._sort_formats(formats)
+        if formats:
+            info['formats'] = formats
+        self._sort_formats(info['formats'])
 
         description = self._html_search_regex(
 
         description = self._html_search_regex(
-            r'<(div|p)[^>]+class="description"[^>]*>(?P<value>[^<]+)</\1',
-            webpage, 'description', fatal=False, group='value')
+            (r'(?s)<section[^>]+class=["\']video-description[^>]+>(?P<value>.+?)</section>',
+             r'<(div|p)[^>]+class="description"[^>]*>(?P<value>[^<]+)</\1'),
+            webpage, 'description', fatal=False,
+            group='value') or self._html_search_meta(
+            'description', webpage, default=None) or self._og_search_description(webpage)
         view_count = int_or_none(self._html_search_regex(
             r'(\d+) views\s*<', webpage, 'view count', fatal=False))
         thumbnail = self._search_regex(
             r"poster'?\s*:\s*([\"'])(?P<url>(?:(?!\1).)+)\1", webpage,
         view_count = int_or_none(self._html_search_regex(
             r'(\d+) views\s*<', webpage, 'view count', fatal=False))
         thumbnail = self._search_regex(
             r"poster'?\s*:\s*([\"'])(?P<url>(?:(?!\1).)+)\1", webpage,
-            'thumbnail', fatal=False, group='url')
+            'thumbnail', default=None, group='url')
 
         like_count = int_or_none(self._search_regex(
 
         like_count = int_or_none(self._search_regex(
-            (r'(\d+)\s*</11[^>]+>(?:&nbsp;|\s)*\blikes',
+            (r'(\d+)</span>\s*likes',
+             r'(\d+)\s*</11[^>]+>(?:&nbsp;|\s)*\blikes',
              r'class=["\']save-count["\'][^>]*>\s*(\d+)'),
             webpage, 'like count', fatal=False))
 
              r'class=["\']save-count["\'][^>]*>\s*(\d+)'),
             webpage, 'like count', fatal=False))
 
-        return {
+        return merge_dicts(info, {
             'id': video_id,
             'display_id': display_id,
             'title': title,
             'id': video_id,
             'display_id': display_id,
             'title': title,
@@ -106,4 +118,4 @@ def _real_extract(self, url):
             'like_count': like_count,
             'formats': formats,
             'age_limit': 18,
             'like_count': like_count,
             'formats': formats,
             'age_limit': 18,
-        }
+        })
index ba0ad7da29d188f5e920376805bf7532a1613bee..3567a32839eef2f75123a3f1b939038cf3eaf678 100644 (file)
@@ -17,6 +17,7 @@
     determine_ext,
     ExtractorError,
     int_or_none,
     determine_ext,
     ExtractorError,
     int_or_none,
+    NO_DEFAULT,
     orderedSet,
     remove_quotes,
     str_to_int,
     orderedSet,
     remove_quotes,
     str_to_int,
@@ -51,7 +52,7 @@ class PornHubIE(PornHubBaseIE):
     _VALID_URL = r'''(?x)
                     https?://
                         (?:
     _VALID_URL = r'''(?x)
                     https?://
                         (?:
-                            (?:[^/]+\.)?(?P<host>pornhub\.(?:com|net))/(?:(?:view_video\.php|video/show)\?viewkey=|embed/)|
+                            (?:[^/]+\.)?(?P<host>pornhub(?:premium)?\.(?:com|net))/(?:(?:view_video\.php|video/show)\?viewkey=|embed/)|
                             (?:www\.)?thumbzilla\.com/video/
                         )
                         (?P<id>[\da-z]+)
                             (?:www\.)?thumbzilla\.com/video/
                         )
                         (?P<id>[\da-z]+)
@@ -148,6 +149,9 @@ class PornHubIE(PornHubBaseIE):
     }, {
         'url': 'https://www.pornhub.net/view_video.php?viewkey=203640933',
         'only_matching': True,
     }, {
         'url': 'https://www.pornhub.net/view_video.php?viewkey=203640933',
         'only_matching': True,
+    }, {
+        'url': 'https://www.pornhubpremium.com/view_video.php?viewkey=ph5e4acdae54a82',
+        'only_matching': True,
     }]
 
     @staticmethod
     }]
 
     @staticmethod
@@ -165,6 +169,13 @@ def _real_extract(self, url):
         host = mobj.group('host') or 'pornhub.com'
         video_id = mobj.group('id')
 
         host = mobj.group('host') or 'pornhub.com'
         video_id = mobj.group('id')
 
+        if 'premium' in host:
+            if not self._downloader.params.get('cookiefile'):
+                raise ExtractorError(
+                    'PornHub Premium requires authentication.'
+                    ' You may want to use --cookies.',
+                    expected=True)
+
         self._set_cookie(host, 'age_verified', '1')
 
         def dl_webpage(platform):
         self._set_cookie(host, 'age_verified', '1')
 
         def dl_webpage(platform):
@@ -188,10 +199,10 @@ def dl_webpage(platform):
         # http://www.pornhub.com/view_video.php?viewkey=1331683002), not relying
         # on that anymore.
         title = self._html_search_meta(
         # http://www.pornhub.com/view_video.php?viewkey=1331683002), not relying
         # on that anymore.
         title = self._html_search_meta(
-            'twitter:title', webpage, default=None) or self._search_regex(
-            (r'<h1[^>]+class=["\']title["\'][^>]*>(?P<title>[^<]+)',
-             r'<div[^>]+data-video-title=(["\'])(?P<title>.+?)\1',
-             r'shareTitle\s*=\s*(["\'])(?P<title>.+?)\1'),
+            'twitter:title', webpage, default=None) or self._html_search_regex(
+            (r'(?s)<h1[^>]+class=["\']title["\'][^>]*>(?P<title>.+?)</h1>',
+             r'<div[^>]+data-video-title=(["\'])(?P<title>(?:(?!\1).)+)\1',
+             r'shareTitle["\']\s*[=:]\s*(["\'])(?P<title>(?:(?!\1).)+)\1'),
             webpage, 'title', group='title')
 
         video_urls = []
             webpage, 'title', group='title')
 
         video_urls = []
@@ -227,12 +238,13 @@ def dl_webpage(platform):
         else:
             thumbnail, duration = [None] * 2
 
         else:
             thumbnail, duration = [None] * 2
 
-        if not video_urls:
-            tv_webpage = dl_webpage('tv')
-
+        def extract_js_vars(webpage, pattern, default=NO_DEFAULT):
             assignments = self._search_regex(
             assignments = self._search_regex(
-                r'(var.+?mediastring.+?)</script>', tv_webpage,
-                'encoded url').split(';')
+                pattern, webpage, 'encoded url', default=default)
+            if not assignments:
+                return {}
+
+            assignments = assignments.split(';')
 
             js_vars = {}
 
 
             js_vars = {}
 
@@ -254,11 +266,35 @@ def parse_js_value(inp):
                 assn = re.sub(r'var\s+', '', assn)
                 vname, value = assn.split('=', 1)
                 js_vars[vname] = parse_js_value(value)
                 assn = re.sub(r'var\s+', '', assn)
                 vname, value = assn.split('=', 1)
                 js_vars[vname] = parse_js_value(value)
+            return js_vars
 
 
-            video_url = js_vars['mediastring']
-            if video_url not in video_urls_set:
-                video_urls.append((video_url, None))
-                video_urls_set.add(video_url)
+        def add_video_url(video_url):
+            v_url = url_or_none(video_url)
+            if not v_url:
+                return
+            if v_url in video_urls_set:
+                return
+            video_urls.append((v_url, None))
+            video_urls_set.add(v_url)
+
+        if not video_urls:
+            FORMAT_PREFIXES = ('media', 'quality')
+            js_vars = extract_js_vars(
+                webpage, r'(var\s+(?:%s)_.+)' % '|'.join(FORMAT_PREFIXES),
+                default=None)
+            if js_vars:
+                for key, format_url in js_vars.items():
+                    if any(key.startswith(p) for p in FORMAT_PREFIXES):
+                        add_video_url(format_url)
+            if not video_urls and re.search(
+                    r'<[^>]+\bid=["\']lockedPlayer', webpage):
+                raise ExtractorError(
+                    'Video %s is locked' % video_id, expected=True)
+
+        if not video_urls:
+            js_vars = extract_js_vars(
+                dl_webpage('tv'), r'(var.+?mediastring.+?)</script>')
+            add_video_url(js_vars['mediastring'])
 
         for mobj in re.finditer(
                 r'<a[^>]+\bclass=["\']downloadBtn\b[^>]+\bhref=(["\'])(?P<url>(?:(?!\1).)+)\1',
 
         for mobj in re.finditer(
                 r'<a[^>]+\bclass=["\']downloadBtn\b[^>]+\bhref=(["\'])(?P<url>(?:(?!\1).)+)\1',
@@ -276,10 +312,16 @@ def parse_js_value(inp):
                     r'/(\d{6}/\d{2})/', video_url, 'upload data', default=None)
                 if upload_date:
                     upload_date = upload_date.replace('/', '')
                     r'/(\d{6}/\d{2})/', video_url, 'upload data', default=None)
                 if upload_date:
                     upload_date = upload_date.replace('/', '')
-            if determine_ext(video_url) == 'mpd':
+            ext = determine_ext(video_url)
+            if ext == 'mpd':
                 formats.extend(self._extract_mpd_formats(
                     video_url, video_id, mpd_id='dash', fatal=False))
                 continue
                 formats.extend(self._extract_mpd_formats(
                     video_url, video_id, mpd_id='dash', fatal=False))
                 continue
+            elif ext == 'm3u8':
+                formats.extend(self._extract_m3u8_formats(
+                    video_url, video_id, 'mp4', entry_protocol='m3u8_native',
+                    m3u8_id='hls', fatal=False))
+                continue
             tbr = None
             mobj = re.search(r'(?P<height>\d+)[pP]?_(?P<tbr>\d+)[kK]', video_url)
             if mobj:
             tbr = None
             mobj = re.search(r'(?P<height>\d+)[pP]?_(?P<tbr>\d+)[kK]', video_url)
             if mobj:
@@ -373,7 +415,7 @@ def _real_extract(self, url):
 
 
 class PornHubUserIE(PornHubPlaylistBaseIE):
 
 
 class PornHubUserIE(PornHubPlaylistBaseIE):
-    _VALID_URL = r'(?P<url>https?://(?:[^/]+\.)?pornhub\.(?:com|net)/(?:(?:user|channel)s|model|pornstar)/(?P<id>[^/?#&]+))(?:[?#&]|/(?!videos)|$)'
+    _VALID_URL = r'(?P<url>https?://(?:[^/]+\.)?(?P<host>pornhub(?:premium)?\.(?:com|net))/(?:(?:user|channel)s|model|pornstar)/(?P<id>[^/?#&]+))(?:[?#&]|/(?!videos)|$)'
     _TESTS = [{
         'url': 'https://www.pornhub.com/model/zoe_ph',
         'playlist_mincount': 118,
     _TESTS = [{
         'url': 'https://www.pornhub.com/model/zoe_ph',
         'playlist_mincount': 118,
@@ -441,7 +483,7 @@ def _real_extract(self, url):
 
 
 class PornHubPagedVideoListIE(PornHubPagedPlaylistBaseIE):
 
 
 class PornHubPagedVideoListIE(PornHubPagedPlaylistBaseIE):
-    _VALID_URL = r'https?://(?:[^/]+\.)?(?P<host>pornhub\.(?:com|net))/(?P<id>(?:[^/]+/)*[^/?#&]+)'
+    _VALID_URL = r'https?://(?:[^/]+\.)?(?P<host>pornhub(?:premium)?\.(?:com|net))/(?P<id>(?:[^/]+/)*[^/?#&]+)'
     _TESTS = [{
         'url': 'https://www.pornhub.com/model/zoe_ph/videos',
         'only_matching': True,
     _TESTS = [{
         'url': 'https://www.pornhub.com/model/zoe_ph/videos',
         'only_matching': True,
@@ -556,7 +598,7 @@ def suitable(cls, url):
 
 
 class PornHubUserVideosUploadIE(PornHubPagedPlaylistBaseIE):
 
 
 class PornHubUserVideosUploadIE(PornHubPagedPlaylistBaseIE):
-    _VALID_URL = r'(?P<url>https?://(?:[^/]+\.)?(?P<host>pornhub\.(?:com|net))/(?:(?:user|channel)s|model|pornstar)/(?P<id>[^/]+)/videos/upload)'
+    _VALID_URL = r'(?P<url>https?://(?:[^/]+\.)?(?P<host>pornhub(?:premium)?\.(?:com|net))/(?:(?:user|channel)s|model|pornstar)/(?P<id>[^/]+)/videos/upload)'
     _TESTS = [{
         'url': 'https://www.pornhub.com/pornstar/jenny-blighe/videos/upload',
         'info_dict': {
     _TESTS = [{
         'url': 'https://www.pornhub.com/pornstar/jenny-blighe/videos/upload',
         'info_dict': {
index e19a470a5eee5efda48fe88a694c7e3a62010963..e470882922ffe8f22fad735f73899e0104a4a561 100644 (file)
     determine_ext,
     float_or_none,
     int_or_none,
     determine_ext,
     float_or_none,
     int_or_none,
+    merge_dicts,
     unified_strdate,
 )
 
 
 class ProSiebenSat1BaseIE(InfoExtractor):
     unified_strdate,
 )
 
 
 class ProSiebenSat1BaseIE(InfoExtractor):
-    _GEO_COUNTRIES = ['DE']
+    _GEO_BYPASS = False
     _ACCESS_ID = None
     _SUPPORTED_PROTOCOLS = 'dash:clear,hls:clear,progressive:clear'
     _V4_BASE_URL = 'https://vas-v4.p7s1video.net/4.0/get'
     _ACCESS_ID = None
     _SUPPORTED_PROTOCOLS = 'dash:clear,hls:clear,progressive:clear'
     _V4_BASE_URL = 'https://vas-v4.p7s1video.net/4.0/get'
@@ -39,14 +40,18 @@ def _extract_video_info(self, url, clip_id):
         formats = []
         if self._ACCESS_ID:
             raw_ct = self._ENCRYPTION_KEY + clip_id + self._IV + self._ACCESS_ID
         formats = []
         if self._ACCESS_ID:
             raw_ct = self._ENCRYPTION_KEY + clip_id + self._IV + self._ACCESS_ID
-            server_token = (self._download_json(
+            protocols = self._download_json(
                 self._V4_BASE_URL + 'protocols', clip_id,
                 'Downloading protocols JSON',
                 headers=self.geo_verification_headers(), query={
                     'access_id': self._ACCESS_ID,
                     'client_token': sha1((raw_ct).encode()).hexdigest(),
                     'video_id': clip_id,
                 self._V4_BASE_URL + 'protocols', clip_id,
                 'Downloading protocols JSON',
                 headers=self.geo_verification_headers(), query={
                     'access_id': self._ACCESS_ID,
                     'client_token': sha1((raw_ct).encode()).hexdigest(),
                     'video_id': clip_id,
-                }, fatal=False) or {}).get('server_token')
+                }, fatal=False, expected_status=(403,)) or {}
+            error = protocols.get('error') or {}
+            if error.get('title') == 'Geo check failed':
+                self.raise_geo_restricted(countries=['AT', 'CH', 'DE'])
+            server_token = protocols.get('server_token')
             if server_token:
                 urls = (self._download_json(
                     self._V4_BASE_URL + 'urls', clip_id, 'Downloading urls JSON', query={
             if server_token:
                 urls = (self._download_json(
                     self._V4_BASE_URL + 'urls', clip_id, 'Downloading urls JSON', query={
@@ -171,7 +176,7 @@ class ProSiebenSat1IE(ProSiebenSat1BaseIE):
                         (?:
                             (?:beta\.)?
                             (?:
                         (?:
                             (?:beta\.)?
                             (?:
-                                prosieben(?:maxx)?|sixx|sat1(?:gold)?|kabeleins(?:doku)?|the-voice-of-germany|7tv|advopedia
+                                prosieben(?:maxx)?|sixx|sat1(?:gold)?|kabeleins(?:doku)?|the-voice-of-germany|advopedia
                             )\.(?:de|at|ch)|
                             ran\.de|fem\.com|advopedia\.de|galileo\.tv/video
                         )
                             )\.(?:de|at|ch)|
                             ran\.de|fem\.com|advopedia\.de|galileo\.tv/video
                         )
@@ -189,10 +194,14 @@ class ProSiebenSat1IE(ProSiebenSat1BaseIE):
             'info_dict': {
                 'id': '2104602',
                 'ext': 'mp4',
             'info_dict': {
                 'id': '2104602',
                 'ext': 'mp4',
-                'title': 'Episode 18 - Staffel 2',
+                'title': 'CIRCUS HALLIGALLI - Episode 18 - Staffel 2',
                 'description': 'md5:8733c81b702ea472e069bc48bb658fc1',
                 'upload_date': '20131231',
                 'duration': 5845.04,
                 'description': 'md5:8733c81b702ea472e069bc48bb658fc1',
                 'upload_date': '20131231',
                 'duration': 5845.04,
+                'series': 'CIRCUS HALLIGALLI',
+                'season_number': 2,
+                'episode': 'Episode 18 - Staffel 2',
+                'episode_number': 18,
             },
         },
         {
             },
         },
         {
@@ -296,8 +305,9 @@ class ProSiebenSat1IE(ProSiebenSat1BaseIE):
             'info_dict': {
                 'id': '2572814',
                 'ext': 'mp4',
             'info_dict': {
                 'id': '2572814',
                 'ext': 'mp4',
-                'title': 'Andreas Kümmert: Rocket Man',
+                'title': 'The Voice of Germany - Andreas Kümmert: Rocket Man',
                 'description': 'md5:6ddb02b0781c6adf778afea606652e38',
                 'description': 'md5:6ddb02b0781c6adf778afea606652e38',
+                'timestamp': 1382041620,
                 'upload_date': '20131017',
                 'duration': 469.88,
             },
                 'upload_date': '20131017',
                 'duration': 469.88,
             },
@@ -306,7 +316,7 @@ class ProSiebenSat1IE(ProSiebenSat1BaseIE):
             },
         },
         {
             },
         },
         {
-            'url': 'http://www.fem.com/wellness/videos/wellness-video-clip-kurztripps-zum-valentinstag.html',
+            'url': 'http://www.fem.com/videos/beauty-lifestyle/kurztrips-zum-valentinstag',
             'info_dict': {
                 'id': '2156342',
                 'ext': 'mp4',
             'info_dict': {
                 'id': '2156342',
                 'ext': 'mp4',
@@ -328,19 +338,6 @@ class ProSiebenSat1IE(ProSiebenSat1BaseIE):
             'playlist_count': 2,
             'skip': 'This video is unavailable',
         },
             'playlist_count': 2,
             'skip': 'This video is unavailable',
         },
-        {
-            'url': 'http://www.7tv.de/circus-halligalli/615-best-of-circus-halligalli-ganze-folge',
-            'info_dict': {
-                'id': '4187506',
-                'ext': 'mp4',
-                'title': 'Best of Circus HalliGalli',
-                'description': 'md5:8849752efd90b9772c9db6fdf87fb9e9',
-                'upload_date': '20151229',
-            },
-            'params': {
-                'skip_download': True,
-            },
-        },
         {
             # title in <h2 class="subtitle">
             'url': 'http://www.prosieben.de/stars/oscar-award/videos/jetzt-erst-enthuellt-das-geheimnis-von-emma-stones-oscar-robe-clip',
         {
             # title in <h2 class="subtitle">
             'url': 'http://www.prosieben.de/stars/oscar-award/videos/jetzt-erst-enthuellt-das-geheimnis-von-emma-stones-oscar-robe-clip',
@@ -417,7 +414,6 @@ class ProSiebenSat1IE(ProSiebenSat1BaseIE):
         r'<div[^>]+id="veeseoDescription"[^>]*>(.+?)</div>',
     ]
     _UPLOAD_DATE_REGEXES = [
         r'<div[^>]+id="veeseoDescription"[^>]*>(.+?)</div>',
     ]
     _UPLOAD_DATE_REGEXES = [
-        r'<meta property="og:published_time" content="(.+?)">',
         r'<span>\s*(\d{2}\.\d{2}\.\d{4} \d{2}:\d{2}) \|\s*<span itemprop="duration"',
         r'<footer>\s*(\d{2}\.\d{2}\.\d{4}) \d{2}:\d{2} Uhr',
         r'<span style="padding-left: 4px;line-height:20px; color:#404040">(\d{2}\.\d{2}\.\d{4})</span>',
         r'<span>\s*(\d{2}\.\d{2}\.\d{4} \d{2}:\d{2}) \|\s*<span itemprop="duration"',
         r'<footer>\s*(\d{2}\.\d{2}\.\d{4}) \d{2}:\d{2} Uhr',
         r'<span style="padding-left: 4px;line-height:20px; color:#404040">(\d{2}\.\d{2}\.\d{4})</span>',
@@ -447,17 +443,21 @@ def _extract_clip(self, url, webpage):
         if description is None:
             description = self._og_search_description(webpage)
         thumbnail = self._og_search_thumbnail(webpage)
         if description is None:
             description = self._og_search_description(webpage)
         thumbnail = self._og_search_thumbnail(webpage)
-        upload_date = unified_strdate(self._html_search_regex(
-            self._UPLOAD_DATE_REGEXES, webpage, 'upload date', default=None))
+        upload_date = unified_strdate(
+            self._html_search_meta('og:published_time', webpage,
+                                   'upload date', default=None)
+            or self._html_search_regex(self._UPLOAD_DATE_REGEXES,
+                                       webpage, 'upload date', default=None))
+
+        json_ld = self._search_json_ld(webpage, clip_id, default={})
 
 
-        info.update({
+        return merge_dicts(info, {
             'id': clip_id,
             'title': title,
             'description': description,
             'thumbnail': thumbnail,
             'upload_date': upload_date,
             'id': clip_id,
             'title': title,
             'description': description,
             'thumbnail': thumbnail,
             'upload_date': upload_date,
-        })
-        return info
+        }, json_ld)
 
     def _extract_playlist(self, url, webpage):
         playlist_id = self._html_search_regex(
 
     def _extract_playlist(self, url, webpage):
         playlist_id = self._html_search_regex(
index fb704a3c4390b9da5b6fa3a5dad027c6e812b7eb..ca71665e0fabf958738192b497130ee12d6ad1f6 100644 (file)
@@ -82,17 +82,6 @@ def _real_extract(self, url):
         urls = []
         formats = []
 
         urls = []
         formats = []
 
-        def add_http_from_hls(m3u8_f):
-            http_url = m3u8_f['url'].replace('/hls/', '/mp4/').replace('/chunklist.m3u8', '.mp4')
-            if http_url != m3u8_f['url']:
-                f = m3u8_f.copy()
-                f.update({
-                    'format_id': f['format_id'].replace('hls', 'http'),
-                    'protocol': 'http',
-                    'url': http_url,
-                })
-                formats.append(f)
-
         for video in videos['data']['videos']:
             media_url = url_or_none(video.get('url'))
             if not media_url or media_url in urls:
         for video in videos['data']['videos']:
             media_url = url_or_none(video.get('url'))
             if not media_url or media_url in urls:
@@ -101,12 +90,9 @@ def add_http_from_hls(m3u8_f):
 
             playlist = video.get('is_playlist')
             if (video.get('stream_type') == 'hls' and playlist is True) or 'playlist.m3u8' in media_url:
 
             playlist = video.get('is_playlist')
             if (video.get('stream_type') == 'hls' and playlist is True) or 'playlist.m3u8' in media_url:
-                m3u8_formats = self._extract_m3u8_formats(
+                formats.extend(self._extract_m3u8_formats(
                     media_url, video_id, 'mp4', entry_protocol='m3u8_native',
                     media_url, video_id, 'mp4', entry_protocol='m3u8_native',
-                    m3u8_id='hls', fatal=False)
-                for m3u8_f in m3u8_formats:
-                    formats.append(m3u8_f)
-                    add_http_from_hls(m3u8_f)
+                    m3u8_id='hls', fatal=False))
                 continue
 
             quality = int_or_none(video.get('quality'))
                 continue
 
             quality = int_or_none(video.get('quality'))
@@ -128,8 +114,6 @@ def add_http_from_hls(m3u8_f):
                 format_id += '-%sp' % quality
             f['format_id'] = format_id
             formats.append(f)
                 format_id += '-%sp' % quality
             f['format_id'] = format_id
             formats.append(f)
-            if is_hls:
-                add_http_from_hls(f)
         self._sort_formats(formats)
 
         creator = try_get(
         self._sort_formats(formats)
 
         creator = try_get(
index 5c84028ef97e8220d494633f5ada42804fa2ae7f..2d2f6a98c97dba8605cb9f640c7c73d860caa1d0 100644 (file)
@@ -4,6 +4,7 @@
 
 from .common import InfoExtractor
 from ..utils import (
 
 from .common import InfoExtractor
 from ..utils import (
+    determine_ext,
     ExtractorError,
     int_or_none,
     merge_dicts,
     ExtractorError,
     int_or_none,
     merge_dicts,
@@ -43,14 +44,21 @@ def _real_extract(self, url):
         webpage = self._download_webpage(
             'http://www.redtube.com/%s' % video_id, video_id)
 
         webpage = self._download_webpage(
             'http://www.redtube.com/%s' % video_id, video_id)
 
-        if any(s in webpage for s in ['video-deleted-info', '>This video has been removed']):
-            raise ExtractorError('Video %s has been removed' % video_id, expected=True)
+        ERRORS = (
+            (('video-deleted-info', '>This video has been removed'), 'has been removed'),
+            (('private_video_text', '>This video is private', '>Send a friend request to its owner to be able to view it'), 'is private'),
+        )
+
+        for patterns, message in ERRORS:
+            if any(p in webpage for p in patterns):
+                raise ExtractorError(
+                    'Video %s %s' % (video_id, message), expected=True)
 
         info = self._search_json_ld(webpage, video_id, default={})
 
         if not info.get('title'):
             info['title'] = self._html_search_regex(
 
         info = self._search_json_ld(webpage, video_id, default={})
 
         if not info.get('title'):
             info['title'] = self._html_search_regex(
-                (r'<h(\d)[^>]+class="(?:video_title_text|videoTitle)[^"]*">(?P<title>(?:(?!\1).)+)</h\1>',
+                (r'<h(\d)[^>]+class="(?:video_title_text|videoTitle|video_title)[^"]*">(?P<title>(?:(?!\1).)+)</h\1>',
                  r'(?:videoTitle|title)\s*:\s*(["\'])(?P<title>(?:(?!\1).)+)\1',),
                 webpage, 'title', group='title',
                 default=None) or self._og_search_title(webpage)
                  r'(?:videoTitle|title)\s*:\s*(["\'])(?P<title>(?:(?!\1).)+)\1',),
                 webpage, 'title', group='title',
                 default=None) or self._og_search_title(webpage)
@@ -70,7 +78,7 @@ def _real_extract(self, url):
                     })
         medias = self._parse_json(
             self._search_regex(
                     })
         medias = self._parse_json(
             self._search_regex(
-                r'mediaDefinition\s*:\s*(\[.+?\])', webpage,
+                r'mediaDefinition["\']?\s*:\s*(\[.+?}\s*\])', webpage,
                 'media definitions', default='{}'),
             video_id, fatal=False)
         if medias and isinstance(medias, list):
                 'media definitions', default='{}'),
             video_id, fatal=False)
         if medias and isinstance(medias, list):
@@ -78,6 +86,12 @@ def _real_extract(self, url):
                 format_url = url_or_none(media.get('videoUrl'))
                 if not format_url:
                     continue
                 format_url = url_or_none(media.get('videoUrl'))
                 if not format_url:
                     continue
+                if media.get('format') == 'hls' or determine_ext(format_url) == 'm3u8':
+                    formats.extend(self._extract_m3u8_formats(
+                        format_url, video_id, 'mp4',
+                        entry_protocol='m3u8_native', m3u8_id='hls',
+                        fatal=False))
+                    continue
                 format_id = media.get('quality')
                 formats.append({
                     'url': format_url,
                 format_id = media.get('quality')
                 formats.append({
                     'url': format_url,
index bd9ee1647d47d47bfc8d8341139c2bf953ecf158..2cc66512241dbc6f65589e52d842cf70b7250ccb 100644 (file)
@@ -8,7 +8,6 @@
 
 from ..compat import (
     compat_parse_qs,
 
 from ..compat import (
     compat_parse_qs,
-    compat_str,
     compat_urlparse,
 )
 from ..utils import (
     compat_urlparse,
 )
 from ..utils import (
@@ -39,13 +38,13 @@ def _login(self):
             'Downloading login page')
 
         def is_logged(urlh):
             'Downloading login page')
 
         def is_logged(urlh):
-            return 'learning.oreilly.com/home/' in compat_str(urlh.geturl())
+            return 'learning.oreilly.com/home/' in urlh.geturl()
 
         if is_logged(urlh):
             self.LOGGED_IN = True
             return
 
 
         if is_logged(urlh):
             self.LOGGED_IN = True
             return
 
-        redirect_url = compat_str(urlh.geturl())
+        redirect_url = urlh.geturl()
         parsed_url = compat_urlparse.urlparse(redirect_url)
         qs = compat_parse_qs(parsed_url.query)
         next_uri = compat_urlparse.urljoin(
         parsed_url = compat_urlparse.urlparse(redirect_url)
         qs = compat_parse_qs(parsed_url.query)
         next_uri = compat_urlparse.urljoin(
@@ -165,7 +164,8 @@ def _real_extract(self, url):
             kaltura_session = self._download_json(
                 '%s/player/kaltura_session/?reference_id=%s' % (self._API_BASE, reference_id),
                 video_id, 'Downloading kaltura session JSON',
             kaltura_session = self._download_json(
                 '%s/player/kaltura_session/?reference_id=%s' % (self._API_BASE, reference_id),
                 video_id, 'Downloading kaltura session JSON',
-                'Unable to download kaltura session JSON', fatal=False)
+                'Unable to download kaltura session JSON', fatal=False,
+                headers={'Accept': 'application/json'})
             if kaltura_session:
                 session = kaltura_session.get('session')
                 if session:
             if kaltura_session:
                 session = kaltura_session.get('session')
                 if session:
index 8b3275735b1638b98c9c64f8360a6498c525a8f7..b40b4c4afded1b6f9541d60b3b2d3fb5fe0c5973 100644 (file)
@@ -7,6 +7,7 @@
 
 from .aws import AWSIE
 from .anvato import AnvatoIE
 
 from .aws import AWSIE
 from .anvato import AnvatoIE
+from .common import InfoExtractor
 from ..utils import (
     smuggle_url,
     urlencode_postdata,
 from ..utils import (
     smuggle_url,
     urlencode_postdata,
@@ -102,3 +103,50 @@ def get(key):
                 'anvato:anvato_scripps_app_web_prod_0837996dbe373629133857ae9eb72e740424d80a:%s' % mcp_id,
                 {'geo_countries': ['US']}),
             AnvatoIE.ie_key(), video_id=mcp_id)
                 'anvato:anvato_scripps_app_web_prod_0837996dbe373629133857ae9eb72e740424d80a:%s' % mcp_id,
                 {'geo_countries': ['US']}),
             AnvatoIE.ie_key(), video_id=mcp_id)
+
+
+class ScrippsNetworksIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?(?P<site>cookingchanneltv|discovery|(?:diy|food)network|hgtv|travelchannel)\.com/videos/[0-9a-z-]+-(?P<id>\d+)'
+    _TESTS = [{
+        'url': 'https://www.cookingchanneltv.com/videos/the-best-of-the-best-0260338',
+        'info_dict': {
+            'id': '0260338',
+            'ext': 'mp4',
+            'title': 'The Best of the Best',
+            'description': 'Catch a new episode of MasterChef Canada Tuedsay at 9/8c.',
+            'timestamp': 1475678834,
+            'upload_date': '20161005',
+            'uploader': 'SCNI-SCND',
+        },
+        'add_ie': ['ThePlatform'],
+    }, {
+        'url': 'https://www.diynetwork.com/videos/diy-barnwood-tablet-stand-0265790',
+        'only_matching': True,
+    }, {
+        'url': 'https://www.foodnetwork.com/videos/chocolate-strawberry-cake-roll-7524591',
+        'only_matching': True,
+    }, {
+        'url': 'https://www.hgtv.com/videos/cookie-decorating-101-0301929',
+        'only_matching': True,
+    }, {
+        'url': 'https://www.travelchannel.com/videos/two-climates-one-bag-5302184',
+        'only_matching': True,
+    }, {
+        'url': 'https://www.discovery.com/videos/guardians-of-the-glades-cooking-with-tom-cobb-5578368',
+        'only_matching': True,
+    }]
+    _ACCOUNT_MAP = {
+        'cookingchanneltv': 2433005105,
+        'discovery': 2706091867,
+        'diynetwork': 2433004575,
+        'foodnetwork': 2433005105,
+        'hgtv': 2433004575,
+        'travelchannel': 2433005739,
+    }
+    _TP_TEMPL = 'https://link.theplatform.com/s/ip77QC/media/guid/%d/%s?mbr=true'
+
+    def _real_extract(self, url):
+        site, guid = re.match(self._VALID_URL, url).groups()
+        return self.url_result(smuggle_url(
+            self._TP_TEMPL % (self._ACCOUNT_MAP[site], guid),
+            {'force_smil_url': True}), 'ThePlatform', guid)
index e579d42cf525b56500b98c84272b67b279e36baa..9401bf2cf7fcdad2eb218f5a3d072399932fea9a 100644 (file)
@@ -7,9 +7,18 @@
 
 
 class ServusIE(InfoExtractor):
 
 
 class ServusIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?servus\.com/(?:(?:at|de)/p/[^/]+|tv/videos)/(?P<id>[aA]{2}-\w+|\d+-\d+)'
+    _VALID_URL = r'''(?x)
+                    https?://
+                        (?:www\.)?
+                        (?:
+                            servus\.com/(?:(?:at|de)/p/[^/]+|tv/videos)|
+                            servustv\.com/videos
+                        )
+                        /(?P<id>[aA]{2}-\w+|\d+-\d+)
+                    '''
     _TESTS = [{
     _TESTS = [{
-        'url': 'https://www.servus.com/de/p/Die-Gr%C3%BCnen-aus-Sicht-des-Volkes/AA-1T6VBU5PW1W12/',
+        # new URL schema
+        'url': 'https://www.servustv.com/videos/aa-1t6vbu5pw1w12/',
         'md5': '3e1dd16775aa8d5cbef23628cfffc1f4',
         'info_dict': {
             'id': 'AA-1T6VBU5PW1W12',
         'md5': '3e1dd16775aa8d5cbef23628cfffc1f4',
         'info_dict': {
             'id': 'AA-1T6VBU5PW1W12',
@@ -18,6 +27,10 @@ class ServusIE(InfoExtractor):
             'description': 'md5:1247204d85783afe3682644398ff2ec4',
             'thumbnail': r're:^https?://.*\.jpg',
         }
             'description': 'md5:1247204d85783afe3682644398ff2ec4',
             'thumbnail': r're:^https?://.*\.jpg',
         }
+    }, {
+        # old URL schema
+        'url': 'https://www.servus.com/de/p/Die-Gr%C3%BCnen-aus-Sicht-des-Volkes/AA-1T6VBU5PW1W12/',
+        'only_matching': True,
     }, {
         'url': 'https://www.servus.com/at/p/Wie-das-Leben-beginnt/1309984137314-381415152/',
         'only_matching': True,
     }, {
         'url': 'https://www.servus.com/at/p/Wie-das-Leben-beginnt/1309984137314-381415152/',
         'only_matching': True,
index c2ee54457e90909fea418f38320d9e5480ccc379..d37c52543f1469652d69bea603e7d4f296d2bb50 100644 (file)
@@ -9,10 +9,13 @@
     SearchInfoExtractor
 )
 from ..compat import (
     SearchInfoExtractor
 )
 from ..compat import (
+    compat_HTTPError,
+    compat_kwargs,
     compat_str,
     compat_urlparse,
 )
 from ..utils import (
     compat_str,
     compat_urlparse,
 )
 from ..utils import (
+    error_to_compat_str,
     ExtractorError,
     float_or_none,
     HEADRequest,
     ExtractorError,
     float_or_none,
     HEADRequest,
@@ -24,6 +27,7 @@
     unified_timestamp,
     update_url_query,
     url_or_none,
     unified_timestamp,
     update_url_query,
     url_or_none,
+    urlhandle_detect_ext,
 )
 
 
 )
 
 
@@ -93,7 +97,7 @@ class SoundcloudIE(InfoExtractor):
                 'repost_count': int,
             }
         },
                 'repost_count': int,
             }
         },
-        # not streamable song
+        # geo-restricted
         {
             'url': 'https://soundcloud.com/the-concept-band/goldrushed-mastered?in=the-concept-band/sets/the-royal-concept-ep',
             'info_dict': {
         {
             'url': 'https://soundcloud.com/the-concept-band/goldrushed-mastered?in=the-concept-band/sets/the-royal-concept-ep',
             'info_dict': {
@@ -105,18 +109,13 @@ class SoundcloudIE(InfoExtractor):
                 'uploader_id': '9615865',
                 'timestamp': 1337635207,
                 'upload_date': '20120521',
                 'uploader_id': '9615865',
                 'timestamp': 1337635207,
                 'upload_date': '20120521',
-                'duration': 30,
+                'duration': 227.155,
                 'license': 'all-rights-reserved',
                 'view_count': int,
                 'like_count': int,
                 'comment_count': int,
                 'repost_count': int,
             },
                 'license': 'all-rights-reserved',
                 'view_count': int,
                 'like_count': int,
                 'comment_count': int,
                 'repost_count': int,
             },
-            'params': {
-                # rtmp
-                'skip_download': True,
-            },
-            'skip': 'Preview',
         },
         # private link
         {
         },
         # private link
         {
@@ -227,7 +226,6 @@ class SoundcloudIE(InfoExtractor):
                 'skip_download': True,
             },
         },
                 'skip_download': True,
             },
         },
-        # not available via api.soundcloud.com/i1/tracks/id/streams
         {
             'url': 'https://soundcloud.com/giovannisarani/mezzo-valzer',
             'md5': 'e22aecd2bc88e0e4e432d7dcc0a1abf7',
         {
             'url': 'https://soundcloud.com/giovannisarani/mezzo-valzer',
             'md5': 'e22aecd2bc88e0e4e432d7dcc0a1abf7',
@@ -236,7 +234,7 @@ class SoundcloudIE(InfoExtractor):
                 'ext': 'mp3',
                 'title': 'Mezzo Valzer',
                 'description': 'md5:4138d582f81866a530317bae316e8b61',
                 'ext': 'mp3',
                 'title': 'Mezzo Valzer',
                 'description': 'md5:4138d582f81866a530317bae316e8b61',
-                'uploader': 'Giovanni Sarani',
+                'uploader': 'Micronie',
                 'uploader_id': '3352531',
                 'timestamp': 1551394171,
                 'upload_date': '20190228',
                 'uploader_id': '3352531',
                 'timestamp': 1551394171,
                 'upload_date': '20190228',
@@ -248,14 +246,16 @@ class SoundcloudIE(InfoExtractor):
                 'comment_count': int,
                 'repost_count': int,
             },
                 'comment_count': int,
                 'repost_count': int,
             },
-            'expected_warnings': ['Unable to download JSON metadata'],
-        }
+        },
+        {
+            # with AAC HQ format available via OAuth token
+            'url': 'https://soundcloud.com/wandw/the-chainsmokers-ft-daya-dont-let-me-down-ww-remix-1',
+            'only_matching': True,
+        },
     ]
 
     ]
 
-    _API_BASE = 'https://api.soundcloud.com/'
     _API_V2_BASE = 'https://api-v2.soundcloud.com/'
     _BASE_URL = 'https://soundcloud.com/'
     _API_V2_BASE = 'https://api-v2.soundcloud.com/'
     _BASE_URL = 'https://soundcloud.com/'
-    _CLIENT_ID = 'UW9ajvMgVdMMW3cdeBi8lPfN6dvOVGji'
     _IMAGE_REPL_RE = r'-([0-9a-z]+)\.jpg'
 
     _ARTWORK_MAP = {
     _IMAGE_REPL_RE = r'-([0-9a-z]+)\.jpg'
 
     _ARTWORK_MAP = {
@@ -271,14 +271,53 @@ class SoundcloudIE(InfoExtractor):
         'original': 0,
     }
 
         'original': 0,
     }
 
+    def _store_client_id(self, client_id):
+        self._downloader.cache.store('soundcloud', 'client_id', client_id)
+
+    def _update_client_id(self):
+        webpage = self._download_webpage('https://soundcloud.com/', None)
+        for src in reversed(re.findall(r'<script[^>]+src="([^"]+)"', webpage)):
+            script = self._download_webpage(src, None, fatal=False)
+            if script:
+                client_id = self._search_regex(
+                    r'client_id\s*:\s*"([0-9a-zA-Z]{32})"',
+                    script, 'client id', default=None)
+                if client_id:
+                    self._CLIENT_ID = client_id
+                    self._store_client_id(client_id)
+                    return
+        raise ExtractorError('Unable to extract client id')
+
+    def _download_json(self, *args, **kwargs):
+        non_fatal = kwargs.get('fatal') is False
+        if non_fatal:
+            del kwargs['fatal']
+        query = kwargs.get('query', {}).copy()
+        for _ in range(2):
+            query['client_id'] = self._CLIENT_ID
+            kwargs['query'] = query
+            try:
+                return super(SoundcloudIE, self)._download_json(*args, **compat_kwargs(kwargs))
+            except ExtractorError as e:
+                if isinstance(e.cause, compat_HTTPError) and e.cause.code == 401:
+                    self._store_client_id(None)
+                    self._update_client_id()
+                    continue
+                elif non_fatal:
+                    self._downloader.report_warning(error_to_compat_str(e))
+                    return False
+                raise
+
+    def _real_initialize(self):
+        self._CLIENT_ID = self._downloader.cache.load('soundcloud', 'client_id') or 'YUKXoArFcqrlQn9tfNHvvyfnDISj04zk'
+
     @classmethod
     def _resolv_url(cls, url):
     @classmethod
     def _resolv_url(cls, url):
-        return SoundcloudIE._API_V2_BASE + 'resolve?url=' + url + '&client_id=' + cls._CLIENT_ID
+        return SoundcloudIE._API_V2_BASE + 'resolve?url=' + url
 
 
-    def _extract_info_dict(self, info, full_title=None, secret_token=None, version=2):
+    def _extract_info_dict(self, info, full_title=None, secret_token=None):
         track_id = compat_str(info['id'])
         title = info['title']
         track_id = compat_str(info['id'])
         title = info['title']
-        track_base_url = self._API_BASE + 'tracks/%s' % track_id
 
         format_urls = set()
         formats = []
 
         format_urls = set()
         formats = []
@@ -287,26 +326,27 @@ def _extract_info_dict(self, info, full_title=None, secret_token=None, version=2
             query['secret_token'] = secret_token
 
         if info.get('downloadable') and info.get('has_downloads_left'):
             query['secret_token'] = secret_token
 
         if info.get('downloadable') and info.get('has_downloads_left'):
-            format_url = update_url_query(
-                info.get('download_url') or track_base_url + '/download', query)
-            format_urls.add(format_url)
-            if version == 2:
-                v1_info = self._download_json(
-                    track_base_url, track_id, query=query, fatal=False) or {}
-            else:
-                v1_info = info
-            formats.append({
-                'format_id': 'download',
-                'ext': v1_info.get('original_format') or 'mp3',
-                'filesize': int_or_none(v1_info.get('original_content_size')),
-                'url': format_url,
-                'preference': 10,
-            })
+            download_url = update_url_query(
+                self._API_V2_BASE + 'tracks/' + track_id + '/download', query)
+            redirect_url = (self._download_json(download_url, track_id, fatal=False) or {}).get('redirectUri')
+            if redirect_url:
+                urlh = self._request_webpage(
+                    HEADRequest(redirect_url), track_id, fatal=False)
+                if urlh:
+                    format_url = urlh.geturl()
+                    format_urls.add(format_url)
+                    formats.append({
+                        'format_id': 'download',
+                        'ext': urlhandle_detect_ext(urlh) or 'mp3',
+                        'filesize': int_or_none(urlh.headers.get('Content-Length')),
+                        'url': format_url,
+                        'preference': 10,
+                    })
 
         def invalid_url(url):
 
         def invalid_url(url):
-            return not url or url in format_urls or re.search(r'/(?:preview|playlist)/0/30/', url)
+            return not url or url in format_urls
 
 
-        def add_format(f, protocol):
+        def add_format(f, protocol, is_preview=False):
             mobj = re.search(r'\.(?P<abr>\d+)\.(?P<ext>[0-9a-z]{3,4})(?=[/?])', stream_url)
             if mobj:
                 for k, v in mobj.groupdict().items():
             mobj = re.search(r'\.(?P<abr>\d+)\.(?P<ext>[0-9a-z]{3,4})(?=[/?])', stream_url)
             if mobj:
                 for k, v in mobj.groupdict().items():
@@ -315,16 +355,27 @@ def add_format(f, protocol):
             format_id_list = []
             if protocol:
                 format_id_list.append(protocol)
             format_id_list = []
             if protocol:
                 format_id_list.append(protocol)
+            ext = f.get('ext')
+            if ext == 'aac':
+                f['abr'] = '256'
             for k in ('ext', 'abr'):
                 v = f.get(k)
                 if v:
                     format_id_list.append(v)
             for k in ('ext', 'abr'):
                 v = f.get(k)
                 if v:
                     format_id_list.append(v)
+            preview = is_preview or re.search(r'/(?:preview|playlist)/0/30/', f['url'])
+            if preview:
+                format_id_list.append('preview')
             abr = f.get('abr')
             if abr:
                 f['abr'] = int(abr)
             abr = f.get('abr')
             if abr:
                 f['abr'] = int(abr)
+            if protocol == 'hls':
+                protocol = 'm3u8' if ext == 'aac' else 'm3u8_native'
+            else:
+                protocol = 'http'
             f.update({
                 'format_id': '_'.join(format_id_list),
             f.update({
                 'format_id': '_'.join(format_id_list),
-                'protocol': 'm3u8_native' if protocol == 'hls' else 'http',
+                'protocol': protocol,
+                'preference': -10 if preview else None,
             })
             formats.append(f)
 
             })
             formats.append(f)
 
@@ -335,7 +386,7 @@ def add_format(f, protocol):
             if not isinstance(t, dict):
                 continue
             format_url = url_or_none(t.get('url'))
             if not isinstance(t, dict):
                 continue
             format_url = url_or_none(t.get('url'))
-            if not format_url or t.get('snipped') or '/preview/' in format_url:
+            if not format_url:
                 continue
             stream = self._download_json(
                 format_url, track_id, query=query, fatal=False)
                 continue
             stream = self._download_json(
                 format_url, track_id, query=query, fatal=False)
@@ -358,44 +409,14 @@ def add_format(f, protocol):
             add_format({
                 'url': stream_url,
                 'ext': ext,
             add_format({
                 'url': stream_url,
                 'ext': ext,
-            }, 'http' if protocol == 'progressive' else protocol)
-
-        if not formats:
-            # Old API, does not work for some tracks (e.g.
-            # https://soundcloud.com/giovannisarani/mezzo-valzer)
-            # and might serve preview URLs (e.g.
-            # http://www.soundcloud.com/snbrn/ele)
-            format_dict = self._download_json(
-                track_base_url + '/streams', track_id,
-                'Downloading track url', query=query, fatal=False) or {}
-
-            for key, stream_url in format_dict.items():
-                if invalid_url(stream_url):
-                    continue
-                format_urls.add(stream_url)
-                mobj = re.search(r'(http|hls)_([^_]+)_(\d+)_url', key)
-                if mobj:
-                    protocol, ext, abr = mobj.groups()
-                    add_format({
-                        'abr': abr,
-                        'ext': ext,
-                        'url': stream_url,
-                    }, protocol)
-
-        if not formats:
-            # We fallback to the stream_url in the original info, this
-            # cannot be always used, sometimes it can give an HTTP 404 error
-            urlh = self._request_webpage(
-                HEADRequest(info.get('stream_url') or track_base_url + '/stream'),
-                track_id, query=query, fatal=False)
-            if urlh:
-                stream_url = urlh.geturl()
-                if not invalid_url(stream_url):
-                    add_format({'url': stream_url}, 'http')
+            }, 'http' if protocol == 'progressive' else protocol,
+                t.get('snipped') or '/preview/' in format_url)
 
         for f in formats:
             f['vcodec'] = 'none'
 
 
         for f in formats:
             f['vcodec'] = 'none'
 
+        if not formats and info.get('policy') == 'BLOCK':
+            self.raise_geo_restricted()
         self._sort_formats(formats)
 
         user = info.get('user') or {}
         self._sort_formats(formats)
 
         user = info.get('user') or {}
@@ -451,9 +472,7 @@ def _real_extract(self, url):
 
         track_id = mobj.group('track_id')
 
 
         track_id = mobj.group('track_id')
 
-        query = {
-            'client_id': self._CLIENT_ID,
-        }
+        query = {}
         if track_id:
             info_json_url = self._API_V2_BASE + 'tracks/' + track_id
             full_title = track_id
         if track_id:
             info_json_url = self._API_V2_BASE + 'tracks/' + track_id
             full_title = track_id
@@ -467,20 +486,24 @@ def _real_extract(self, url):
                 resolve_title += '/%s' % token
             info_json_url = self._resolv_url(self._BASE_URL + resolve_title)
 
                 resolve_title += '/%s' % token
             info_json_url = self._resolv_url(self._BASE_URL + resolve_title)
 
-        version = 2
         info = self._download_json(
         info = self._download_json(
-            info_json_url, full_title, 'Downloading info JSON', query=query, fatal=False)
-        if not info:
-            info = self._download_json(
-                info_json_url.replace(self._API_V2_BASE, self._API_BASE),
-                full_title, 'Downloading info JSON', query=query)
-            version = 1
+            info_json_url, full_title, 'Downloading info JSON', query=query)
 
 
-        return self._extract_info_dict(info, full_title, token, version)
+        return self._extract_info_dict(info, full_title, token)
 
 
 class SoundcloudPlaylistBaseIE(SoundcloudIE):
 
 
 class SoundcloudPlaylistBaseIE(SoundcloudIE):
-    def _extract_track_entries(self, tracks, token=None):
+    def _extract_set(self, playlist, token=None):
+        playlist_id = compat_str(playlist['id'])
+        tracks = playlist.get('tracks') or []
+        if not all([t.get('permalink_url') for t in tracks]) and token:
+            tracks = self._download_json(
+                self._API_V2_BASE + 'tracks', playlist_id,
+                'Downloading tracks', query={
+                    'ids': ','.join([compat_str(t['id']) for t in tracks]),
+                    'playlistId': playlist_id,
+                    'playlistSecretToken': token,
+                })
         entries = []
         for track in tracks:
             track_id = str_or_none(track.get('id'))
         entries = []
         for track in tracks:
             track_id = str_or_none(track.get('id'))
@@ -493,7 +516,10 @@ def _extract_track_entries(self, tracks, token=None):
                     url += '?secret_token=' + token
             entries.append(self.url_result(
                 url, SoundcloudIE.ie_key(), track_id))
                     url += '?secret_token=' + token
             entries.append(self.url_result(
                 url, SoundcloudIE.ie_key(), track_id))
-        return entries
+        return self.playlist_result(
+            entries, playlist_id,
+            playlist.get('title'),
+            playlist.get('description'))
 
 
 class SoundcloudSetIE(SoundcloudPlaylistBaseIE):
 
 
 class SoundcloudSetIE(SoundcloudPlaylistBaseIE):
@@ -504,6 +530,7 @@ class SoundcloudSetIE(SoundcloudPlaylistBaseIE):
         'info_dict': {
             'id': '2284613',
             'title': 'The Royal Concept EP',
         'info_dict': {
             'id': '2284613',
             'title': 'The Royal Concept EP',
+            'description': 'md5:71d07087c7a449e8941a70a29e34671e',
         },
         'playlist_mincount': 5,
     }, {
         },
         'playlist_mincount': 5,
     }, {
@@ -526,17 +553,13 @@ def _real_extract(self, url):
             msgs = (compat_str(err['error_message']) for err in info['errors'])
             raise ExtractorError('unable to download video webpage: %s' % ','.join(msgs))
 
             msgs = (compat_str(err['error_message']) for err in info['errors'])
             raise ExtractorError('unable to download video webpage: %s' % ','.join(msgs))
 
-        entries = self._extract_track_entries(info['tracks'], token)
-
-        return self.playlist_result(
-            entries, str_or_none(info.get('id')), info.get('title'))
+        return self._extract_set(info, token)
 
 
 
 
-class SoundcloudPagedPlaylistBaseIE(SoundcloudPlaylistBaseIE):
+class SoundcloudPagedPlaylistBaseIE(SoundcloudIE):
     def _extract_playlist(self, base_url, playlist_id, playlist_title):
         COMMON_QUERY = {
     def _extract_playlist(self, base_url, playlist_id, playlist_title):
         COMMON_QUERY = {
-            'limit': 2000000000,
-            'client_id': self._CLIENT_ID,
+            'limit': 80000,
             'linked_partitioning': '1',
         }
 
             'linked_partitioning': '1',
         }
 
@@ -722,9 +745,7 @@ def _real_extract(self, url):
         mobj = re.match(self._VALID_URL, url)
         playlist_id = mobj.group('id')
 
         mobj = re.match(self._VALID_URL, url)
         playlist_id = mobj.group('id')
 
-        query = {
-            'client_id': self._CLIENT_ID,
-        }
+        query = {}
         token = mobj.group('token')
         if token:
             query['secret_token'] = token
         token = mobj.group('token')
         if token:
             query['secret_token'] = token
@@ -733,10 +754,7 @@ def _real_extract(self, url):
             self._API_V2_BASE + 'playlists/' + playlist_id,
             playlist_id, 'Downloading playlist', query=query)
 
             self._API_V2_BASE + 'playlists/' + playlist_id,
             playlist_id, 'Downloading playlist', query=query)
 
-        entries = self._extract_track_entries(data['tracks'], token)
-
-        return self.playlist_result(
-            entries, playlist_id, data.get('title'), data.get('description'))
+        return self._extract_set(data, token)
 
 
 class SoundcloudSearchIE(SearchInfoExtractor, SoundcloudIE):
 
 
 class SoundcloudSearchIE(SearchInfoExtractor, SoundcloudIE):
@@ -761,7 +779,6 @@ def _get_collection(self, endpoint, collection_id, **query):
             self._MAX_RESULTS_PER_PAGE)
         query.update({
             'limit': limit,
             self._MAX_RESULTS_PER_PAGE)
         query.update({
             'limit': limit,
-            'client_id': self._CLIENT_ID,
             'linked_partitioning': 1,
             'offset': 0,
         })
             'linked_partitioning': 1,
             'offset': 0,
         })
index e040ada29b24542582f72f08f31b843d928af251..61ca902ce286e6274c7d5776bd10c265a023643f 100644 (file)
@@ -4,6 +4,7 @@
 
 from .common import InfoExtractor
 from ..utils import (
 
 from .common import InfoExtractor
 from ..utils import (
+    determine_ext,
     ExtractorError,
     merge_dicts,
     orderedSet,
     ExtractorError,
     merge_dicts,
     orderedSet,
@@ -64,7 +65,7 @@ def _real_extract(self, url):
             url.replace('/%s/embed' % video_id, '/%s/video' % video_id),
             video_id, headers={'Cookie': 'country=US'})
 
             url.replace('/%s/embed' % video_id, '/%s/video' % video_id),
             video_id, headers={'Cookie': 'country=US'})
 
-        if re.search(r'<[^>]+\bid=["\']video_removed', webpage):
+        if re.search(r'<[^>]+\b(?:id|class)=["\']video_removed', webpage):
             raise ExtractorError(
                 'Video %s is not available' % video_id, expected=True)
 
             raise ExtractorError(
                 'Video %s is not available' % video_id, expected=True)
 
@@ -75,11 +76,20 @@ def extract_format(format_id, format_url):
             if not f_url:
                 return
             f = parse_resolution(format_id)
             if not f_url:
                 return
             f = parse_resolution(format_id)
-            f.update({
-                'url': f_url,
-                'format_id': format_id,
-            })
-            formats.append(f)
+            ext = determine_ext(f_url)
+            if format_id.startswith('m3u8') or ext == 'm3u8':
+                formats.extend(self._extract_m3u8_formats(
+                    f_url, video_id, 'mp4', entry_protocol='m3u8_native',
+                    m3u8_id='hls', fatal=False))
+            elif format_id.startswith('mpd') or ext == 'mpd':
+                formats.extend(self._extract_mpd_formats(
+                    f_url, video_id, mpd_id='dash', fatal=False))
+            elif ext == 'mp4' or f.get('width') or f.get('height'):
+                f.update({
+                    'url': f_url,
+                    'format_id': format_id,
+                })
+                formats.append(f)
 
         STREAM_URL_PREFIX = 'stream_url_'
 
 
         STREAM_URL_PREFIX = 'stream_url_'
 
@@ -93,28 +103,22 @@ def extract_format(format_id, format_url):
                 r'data-streamkey\s*=\s*(["\'])(?P<value>(?:(?!\1).)+)\1',
                 webpage, 'stream key', group='value')
 
                 r'data-streamkey\s*=\s*(["\'])(?P<value>(?:(?!\1).)+)\1',
                 webpage, 'stream key', group='value')
 
-            sb_csrf_session = self._get_cookies(
-                'https://spankbang.com')['sb_csrf_session'].value
-
             stream = self._download_json(
                 'https://spankbang.com/api/videos/stream', video_id,
                 'Downloading stream JSON', data=urlencode_postdata({
                     'id': stream_key,
                     'data': 0,
             stream = self._download_json(
                 'https://spankbang.com/api/videos/stream', video_id,
                 'Downloading stream JSON', data=urlencode_postdata({
                     'id': stream_key,
                     'data': 0,
-                    'sb_csrf_session': sb_csrf_session,
                 }), headers={
                     'Referer': url,
                 }), headers={
                     'Referer': url,
-                    'X-CSRFToken': sb_csrf_session,
+                    'X-Requested-With': 'XMLHttpRequest',
                 })
 
             for format_id, format_url in stream.items():
                 })
 
             for format_id, format_url in stream.items():
-                if format_id.startswith(STREAM_URL_PREFIX):
-                    if format_url and isinstance(format_url, list):
-                        format_url = format_url[0]
-                    extract_format(
-                        format_id[len(STREAM_URL_PREFIX):], format_url)
+                if format_url and isinstance(format_url, list):
+                    format_url = format_url[0]
+                extract_format(format_id, format_url)
 
 
-        self._sort_formats(formats)
+        self._sort_formats(formats, field_preference=('preference', 'height', 'width', 'fps', 'tbr', 'format_id'))
 
         info = self._search_json_ld(webpage, video_id, default={})
 
 
         info = self._search_json_ld(webpage, video_id, default={})
 
index 44d8fa52f3071ca00971624db81ce4ad6b2141e3..35ab9ec375989d412d94f7d40d2babadf93c44c4 100644 (file)
@@ -3,34 +3,47 @@
 import re
 
 from .common import InfoExtractor
 import re
 
 from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_parse_unquote,
-    compat_urllib_parse_urlparse,
-)
 from ..utils import (
 from ..utils import (
-    sanitized_Request,
+    float_or_none,
+    int_or_none,
+    merge_dicts,
+    str_or_none,
     str_to_int,
     str_to_int,
-    unified_strdate,
+    url_or_none,
 )
 )
-from ..aes import aes_decrypt_text
 
 
 class SpankwireIE(InfoExtractor):
 
 
 class SpankwireIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?(?P<url>spankwire\.com/[^/]*/video(?P<id>[0-9]+)/?)'
+    _VALID_URL = r'''(?x)
+                    https?://
+                        (?:www\.)?spankwire\.com/
+                        (?:
+                            [^/]+/video|
+                            EmbedPlayer\.aspx/?\?.*?\bArticleId=
+                        )
+                        (?P<id>\d+)
+                    '''
     _TESTS = [{
         # download URL pattern: */<height>P_<tbr>K_<video_id>.mp4
         'url': 'http://www.spankwire.com/Buckcherry-s-X-Rated-Music-Video-Crazy-Bitch/video103545/',
     _TESTS = [{
         # download URL pattern: */<height>P_<tbr>K_<video_id>.mp4
         'url': 'http://www.spankwire.com/Buckcherry-s-X-Rated-Music-Video-Crazy-Bitch/video103545/',
-        'md5': '8bbfde12b101204b39e4b9fe7eb67095',
+        'md5': '5aa0e4feef20aad82cbcae3aed7ab7cd',
         'info_dict': {
             'id': '103545',
             'ext': 'mp4',
             'title': 'Buckcherry`s X Rated Music Video Crazy Bitch',
             'description': 'Crazy Bitch X rated music video.',
         'info_dict': {
             'id': '103545',
             'ext': 'mp4',
             'title': 'Buckcherry`s X Rated Music Video Crazy Bitch',
             'description': 'Crazy Bitch X rated music video.',
+            'duration': 222,
             'uploader': 'oreusz',
             'uploader_id': '124697',
             'uploader': 'oreusz',
             'uploader_id': '124697',
-            'upload_date': '20070507',
+            'timestamp': 1178587885,
+            'upload_date': '20070508',
+            'average_rating': float,
+            'view_count': int,
+            'comment_count': int,
             'age_limit': 18,
             'age_limit': 18,
-        }
+            'categories': list,
+            'tags': list,
+        },
     }, {
         # download URL pattern: */mp4_<format_id>_<video_id>.mp4
         'url': 'http://www.spankwire.com/Titcums-Compiloation-I/video1921551/',
     }, {
         # download URL pattern: */mp4_<format_id>_<video_id>.mp4
         'url': 'http://www.spankwire.com/Titcums-Compiloation-I/video1921551/',
@@ -45,83 +58,125 @@ class SpankwireIE(InfoExtractor):
             'upload_date': '20150822',
             'age_limit': 18,
         },
             'upload_date': '20150822',
             'age_limit': 18,
         },
+        'params': {
+            'proxy': '127.0.0.1:8118'
+        },
+        'skip': 'removed',
+    }, {
+        'url': 'https://www.spankwire.com/EmbedPlayer.aspx/?ArticleId=156156&autostart=true',
+        'only_matching': True,
     }]
 
     }]
 
+    @staticmethod
+    def _extract_urls(webpage):
+        return re.findall(
+            r'<iframe[^>]+\bsrc=["\']((?:https?:)?//(?:www\.)?spankwire\.com/EmbedPlayer\.aspx/?\?.*?\bArticleId=\d+)',
+            webpage)
+
     def _real_extract(self, url):
     def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
-
-        req = sanitized_Request('http://www.' + mobj.group('url'))
-        req.add_header('Cookie', 'age_verified=1')
-        webpage = self._download_webpage(req, video_id)
-
-        title = self._html_search_regex(
-            r'<h1>([^<]+)', webpage, 'title')
-        description = self._html_search_regex(
-            r'(?s)<div\s+id="descriptionContent">(.+?)</div>',
-            webpage, 'description', fatal=False)
-        thumbnail = self._html_search_regex(
-            r'playerData\.screenShot\s*=\s*["\']([^"\']+)["\']',
-            webpage, 'thumbnail', fatal=False)
-
-        uploader = self._html_search_regex(
-            r'by:\s*<a [^>]*>(.+?)</a>',
-            webpage, 'uploader', fatal=False)
-        uploader_id = self._html_search_regex(
-            r'by:\s*<a href="/(?:user/viewProfile|Profile\.aspx)\?.*?UserId=(\d+).*?"',
-            webpage, 'uploader id', fatal=False)
-        upload_date = unified_strdate(self._html_search_regex(
-            r'</a> on (.+?) at \d+:\d+',
-            webpage, 'upload date', fatal=False))
-
-        view_count = str_to_int(self._html_search_regex(
-            r'<div id="viewsCounter"><span>([\d,\.]+)</span> views</div>',
-            webpage, 'view count', fatal=False))
-        comment_count = str_to_int(self._html_search_regex(
-            r'<span\s+id="spCommentCount"[^>]*>([\d,\.]+)</span>',
-            webpage, 'comment count', fatal=False))
-
-        videos = re.findall(
-            r'playerData\.cdnPath([0-9]{3,})\s*=\s*(?:encodeURIComponent\()?["\']([^"\']+)["\']', webpage)
-        heights = [int(video[0]) for video in videos]
-        video_urls = list(map(compat_urllib_parse_unquote, [video[1] for video in videos]))
-        if webpage.find(r'flashvars\.encrypted = "true"') != -1:
-            password = self._search_regex(
-                r'flashvars\.video_title = "([^"]+)',
-                webpage, 'password').replace('+', ' ')
-            video_urls = list(map(
-                lambda s: aes_decrypt_text(s, password, 32).decode('utf-8'),
-                video_urls))
+        video_id = self._match_id(url)
+
+        video = self._download_json(
+            'https://www.spankwire.com/api/video/%s.json' % video_id, video_id)
+
+        title = video['title']
 
         formats = []
 
         formats = []
-        for height, video_url in zip(heights, video_urls):
-            path = compat_urllib_parse_urlparse(video_url).path
-            m = re.search(r'/(?P<height>\d+)[pP]_(?P<tbr>\d+)[kK]', path)
-            if m:
-                tbr = int(m.group('tbr'))
-                height = int(m.group('height'))
-            else:
-                tbr = None
-            formats.append({
-                'url': video_url,
-                'format_id': '%dp' % height,
-                'height': height,
-                'tbr': tbr,
+        videos = video.get('videos')
+        if isinstance(videos, dict):
+            for format_id, format_url in videos.items():
+                video_url = url_or_none(format_url)
+                if not format_url:
+                    continue
+                height = int_or_none(self._search_regex(
+                    r'(\d+)[pP]', format_id, 'height', default=None))
+                m = re.search(
+                    r'/(?P<height>\d+)[pP]_(?P<tbr>\d+)[kK]', video_url)
+                if m:
+                    tbr = int(m.group('tbr'))
+                    height = height or int(m.group('height'))
+                else:
+                    tbr = None
+                formats.append({
+                    'url': video_url,
+                    'format_id': '%dp' % height if height else format_id,
+                    'height': height,
+                    'tbr': tbr,
+                })
+        m3u8_url = url_or_none(video.get('HLS'))
+        if m3u8_url:
+            formats.extend(self._extract_m3u8_formats(
+                m3u8_url, video_id, 'mp4', entry_protocol='m3u8_native',
+                m3u8_id='hls', fatal=False))
+        self._sort_formats(formats, ('height', 'tbr', 'width', 'format_id'))
+
+        view_count = str_to_int(video.get('viewed'))
+
+        thumbnails = []
+        for preference, t in enumerate(('', '2x'), start=0):
+            thumbnail_url = url_or_none(video.get('poster%s' % t))
+            if not thumbnail_url:
+                continue
+            thumbnails.append({
+                'url': thumbnail_url,
+                'preference': preference,
             })
             })
-        self._sort_formats(formats)
 
 
-        age_limit = self._rta_search(webpage)
+        def extract_names(key):
+            entries_list = video.get(key)
+            if not isinstance(entries_list, list):
+                return
+            entries = []
+            for entry in entries_list:
+                name = str_or_none(entry.get('name'))
+                if name:
+                    entries.append(name)
+            return entries
+
+        categories = extract_names('categories')
+        tags = extract_names('tags')
 
 
-        return {
+        uploader = None
+        info = {}
+
+        webpage = self._download_webpage(
+            'https://www.spankwire.com/_/video%s/' % video_id, video_id,
+            fatal=False)
+        if webpage:
+            info = self._search_json_ld(webpage, video_id, default={})
+            thumbnail_url = None
+            if 'thumbnail' in info:
+                thumbnail_url = url_or_none(info['thumbnail'])
+                del info['thumbnail']
+            if not thumbnail_url:
+                thumbnail_url = self._og_search_thumbnail(webpage)
+            if thumbnail_url:
+                thumbnails.append({
+                    'url': thumbnail_url,
+                    'preference': 10,
+                })
+            uploader = self._html_search_regex(
+                r'(?s)by\s*<a[^>]+\bclass=["\']uploaded__by[^>]*>(.+?)</a>',
+                webpage, 'uploader', fatal=False)
+            if not view_count:
+                view_count = str_to_int(self._search_regex(
+                    r'data-views=["\']([\d,.]+)', webpage, 'view count',
+                    fatal=False))
+
+        return merge_dicts({
             'id': video_id,
             'title': title,
             'id': video_id,
             'title': title,
-            'description': description,
-            'thumbnail': thumbnail,
+            'description': video.get('description'),
+            'duration': int_or_none(video.get('duration')),
+            'thumbnails': thumbnails,
             'uploader': uploader,
             'uploader': uploader,
-            'uploader_id': uploader_id,
-            'upload_date': upload_date,
+            'uploader_id': str_or_none(video.get('userId')),
+            'timestamp': int_or_none(video.get('time_approved_on')),
+            'average_rating': float_or_none(video.get('rating')),
             'view_count': view_count,
             'view_count': view_count,
-            'comment_count': comment_count,
+            'comment_count': int_or_none(video.get('comments')),
+            'age_limit': 18,
+            'categories': categories,
+            'tags': tags,
             'formats': formats,
             'formats': formats,
-            'age_limit': age_limit,
-        }
+        }, info)
index 7c11ea7aaf9306181fee00adab6080d5c40ac1d7..aabff7a3ce78d76b9aec3772a4e88a5bf802d192 100644 (file)
@@ -8,15 +8,10 @@ class BellatorIE(MTVServicesInfoExtractor):
     _TESTS = [{
         'url': 'http://www.bellator.com/fight/atwr7k/bellator-158-michael-page-vs-evangelista-cyborg',
         'info_dict': {
     _TESTS = [{
         'url': 'http://www.bellator.com/fight/atwr7k/bellator-158-michael-page-vs-evangelista-cyborg',
         'info_dict': {
-            'id': 'b55e434e-fde1-4a98-b7cc-92003a034de4',
-            'ext': 'mp4',
-            'title': 'Douglas Lima vs. Paul Daley - Round 1',
-            'description': 'md5:805a8dd29310fd611d32baba2f767885',
-        },
-        'params': {
-            # m3u8 download
-            'skip_download': True,
+            'title': 'Michael Page vs. Evangelista Cyborg',
+            'description': 'md5:0d917fc00ffd72dd92814963fc6cbb05',
         },
         },
+        'playlist_count': 3,
     }, {
         'url': 'http://www.bellator.com/video-clips/bw6k7n/bellator-158-foundations-michael-venom-page',
         'only_matching': True,
     }, {
         'url': 'http://www.bellator.com/video-clips/bw6k7n/bellator-158-foundations-michael-venom-page',
         'only_matching': True,
@@ -25,6 +20,9 @@ class BellatorIE(MTVServicesInfoExtractor):
     _FEED_URL = 'http://www.bellator.com/feeds/mrss/'
     _GEO_COUNTRIES = ['US']
 
     _FEED_URL = 'http://www.bellator.com/feeds/mrss/'
     _GEO_COUNTRIES = ['US']
 
+    def _extract_mgid(self, webpage):
+        return self._extract_triforce_mgid(webpage)
+
 
 class ParamountNetworkIE(MTVServicesInfoExtractor):
     _VALID_URL = r'https?://(?:www\.)?paramountnetwork\.com/[^/]+/[\da-z]{6}(?:[/?#&]|$)'
 
 class ParamountNetworkIE(MTVServicesInfoExtractor):
     _VALID_URL = r'https?://(?:www\.)?paramountnetwork\.com/[^/]+/[\da-z]{6}(?:[/?#&]|$)'
index a3c35a899a2186f1e937771cd0e34df408b2d361..378fc75686313f92a846aaa30579049e9a29eccc 100644 (file)
 class SportDeutschlandIE(InfoExtractor):
     _VALID_URL = r'https?://sportdeutschland\.tv/(?P<sport>[^/?#]+)/(?P<id>[^?#/]+)(?:$|[?#])'
     _TESTS = [{
 class SportDeutschlandIE(InfoExtractor):
     _VALID_URL = r'https?://sportdeutschland\.tv/(?P<sport>[^/?#]+)/(?P<id>[^?#/]+)(?:$|[?#])'
     _TESTS = [{
-        'url': 'http://sportdeutschland.tv/badminton/live-li-ning-badminton-weltmeisterschaft-2014-kopenhagen',
+        'url': 'https://sportdeutschland.tv/badminton/re-live-deutsche-meisterschaften-2020-halbfinals?playlistId=0',
         'info_dict': {
         'info_dict': {
-            'id': 'live-li-ning-badminton-weltmeisterschaft-2014-kopenhagen',
+            'id': 're-live-deutsche-meisterschaften-2020-halbfinals',
             'ext': 'mp4',
             'ext': 'mp4',
-            'title': 're:Li-Ning Badminton Weltmeisterschaft 2014 Kopenhagen',
-            'categories': ['Badminton'],
+            'title': 're:Re-live: Deutsche Meisterschaften 2020.*Halbfinals',
+            'categories': ['Badminton-Deutschland'],
             'view_count': int,
             'view_count': int,
-            'thumbnail': r're:^https?://.*\.jpg$',
-            'description': r're:Die Badminton-WM 2014 aus Kopenhagen bei Sportdeutschland\.TV',
+            'thumbnail': r're:^https?://.*\.(?:jpg|png)$',
             'timestamp': int,
             'timestamp': int,
-            'upload_date': 're:^201408[23][0-9]$',
+            'upload_date': '20200201',
+            'description': 're:.*',  # meaningless description for THIS video
         },
         },
-        'params': {
-            'skip_download': 'Live stream',
-        },
-    }, {
-        'url': 'http://sportdeutschland.tv/li-ning-badminton-wm-2014/lee-li-ning-badminton-weltmeisterschaft-2014-kopenhagen-herren-einzel-wei-vs',
-        'info_dict': {
-            'id': 'lee-li-ning-badminton-weltmeisterschaft-2014-kopenhagen-herren-einzel-wei-vs',
-            'ext': 'mp4',
-            'upload_date': '20140825',
-            'description': 'md5:60a20536b57cee7d9a4ec005e8687504',
-            'timestamp': 1408976060,
-            'duration': 2732,
-            'title': 'Li-Ning Badminton Weltmeisterschaft 2014 Kopenhagen: Herren Einzel, Wei Lee vs. Keun Lee',
-            'thumbnail': r're:^https?://.*\.jpg$',
-            'view_count': int,
-            'categories': ['Li-Ning Badminton WM 2014'],
-
-        }
     }]
 
     def _real_extract(self, url):
     }]
 
     def _real_extract(self, url):
@@ -50,7 +32,7 @@ def _real_extract(self, url):
         video_id = mobj.group('id')
         sport_id = mobj.group('sport')
 
         video_id = mobj.group('id')
         sport_id = mobj.group('sport')
 
-        api_url = 'http://proxy.vidibusdynamic.net/sportdeutschland.tv/api/permalinks/%s/%s?access_token=true' % (
+        api_url = 'https://proxy.vidibusdynamic.net/ssl/backend.sportdeutschland.tv/api/permalinks/%s/%s?access_token=true' % (
             sport_id, video_id)
         req = sanitized_Request(api_url, headers={
             'Accept': 'application/vnd.vidibus.v2.html+json',
             sport_id, video_id)
         req = sanitized_Request(api_url, headers={
             'Accept': 'application/vnd.vidibus.v2.html+json',
index 28baf901c9f021c15544f099f78dd5d5a6b9165c..359dadaa3cce4540f5abb2fb58a43596379e0c56 100644 (file)
@@ -1,14 +1,14 @@
 # coding: utf-8
 from __future__ import unicode_literals
 
 # coding: utf-8
 from __future__ import unicode_literals
 
-from .ard import ARDMediathekIE
+from .ard import ARDMediathekBaseIE
 from ..utils import (
     ExtractorError,
     get_element_by_attribute,
 )
 
 
 from ..utils import (
     ExtractorError,
     get_element_by_attribute,
 )
 
 
-class SRMediathekIE(ARDMediathekIE):
+class SRMediathekIE(ARDMediathekBaseIE):
     IE_NAME = 'sr:mediathek'
     IE_DESC = 'Saarländischer Rundfunk'
     _VALID_URL = r'https?://sr-mediathek(?:\.sr-online)?\.de/index\.php\?.*?&id=(?P<id>[0-9]+)'
     IE_NAME = 'sr:mediathek'
     IE_DESC = 'Saarländischer Rundfunk'
     _VALID_URL = r'https?://sr-mediathek(?:\.sr-online)?\.de/index\.php\?.*?&id=(?P<id>[0-9]+)'
index ae2ac1b42fe0021c8b904721221d441368bf50ca..4dbead2ba428abdbfc678aea31403b998a1eb06c 100644 (file)
@@ -5,44 +5,28 @@
 
 
 class StretchInternetIE(InfoExtractor):
 
 
 class StretchInternetIE(InfoExtractor):
-    _VALID_URL = r'https?://portal\.stretchinternet\.com/[^/]+/portal\.htm\?.*?\beventId=(?P<id>\d+)'
+    _VALID_URL = r'https?://portal\.stretchinternet\.com/[^/]+/(?:portal|full)\.htm\?.*?\beventId=(?P<id>\d+)'
     _TEST = {
     _TEST = {
-        'url': 'https://portal.stretchinternet.com/umary/portal.htm?eventId=313900&streamType=video',
+        'url': 'https://portal.stretchinternet.com/umary/portal.htm?eventId=573272&streamType=video',
         'info_dict': {
         'info_dict': {
-            'id': '313900',
+            'id': '573272',
             'ext': 'mp4',
             'ext': 'mp4',
-            'title': 'Augustana (S.D.) Baseball vs University of Mary',
-            'description': 'md5:7578478614aae3bdd4a90f578f787438',
-            'timestamp': 1490468400,
-            'upload_date': '20170325',
+            'title': 'University of Mary Wrestling vs. Upper Iowa',
+            'timestamp': 1575668361,
+            'upload_date': '20191206',
         }
     }
 
     def _real_extract(self, url):
         video_id = self._match_id(url)
 
         }
     }
 
     def _real_extract(self, url):
         video_id = self._match_id(url)
 
-        stream = self._download_json(
-            'https://neo-client.stretchinternet.com/streamservice/v1/media/stream/v%s'
-            % video_id, video_id)
-
-        video_url = 'https://%s' % stream['source']
-
         event = self._download_json(
         event = self._download_json(
-            'https://neo-client.stretchinternet.com/portal-ws/getEvent.json',
-            video_id, query={
-                'clientID': 99997,
-                'eventID': video_id,
-                'token': 'asdf',
-            })['event']
-
-        title = event.get('title') or event['mobileTitle']
-        description = event.get('customText')
-        timestamp = int_or_none(event.get('longtime'))
+            'https://api.stretchinternet.com/trinity/event/tcg/' + video_id,
+            video_id)[0]
 
         return {
             'id': video_id,
 
         return {
             'id': video_id,
-            'title': title,
-            'description': description,
-            'timestamp': timestamp,
-            'url': video_url,
+            'title': event['title'],
+            'timestamp': int_or_none(event.get('dateCreated'), 1000),
+            'url': 'https://' + event['media'][0]['url'],
         }
         }
index 0901c3163e6cab4723d451b30df8574e359ba899..e12389cad80a83612e10d052b7f36bceba0f1fbf 100644 (file)
@@ -4,19 +4,14 @@
 import re
 
 from .common import InfoExtractor
 import re
 
 from .common import InfoExtractor
-from ..compat import (
-    compat_parse_qs,
-    compat_urllib_parse_urlparse,
-)
+from ..compat import compat_str
 from ..utils import (
     determine_ext,
     dict_get,
     int_or_none,
 from ..utils import (
     determine_ext,
     dict_get,
     int_or_none,
-    orderedSet,
+    str_or_none,
     strip_or_none,
     try_get,
     strip_or_none,
     try_get,
-    urljoin,
-    compat_str,
 )
 
 
 )
 
 
@@ -237,23 +232,23 @@ def _real_extract(self, url):
 
 
 class SVTSeriesIE(SVTPlayBaseIE):
 
 
 class SVTSeriesIE(SVTPlayBaseIE):
-    _VALID_URL = r'https?://(?:www\.)?svtplay\.se/(?P<id>[^/?&#]+)'
+    _VALID_URL = r'https?://(?:www\.)?svtplay\.se/(?P<id>[^/?&#]+)(?:.+?\btab=(?P<season_slug>[^&#]+))?'
     _TESTS = [{
         'url': 'https://www.svtplay.se/rederiet',
         'info_dict': {
     _TESTS = [{
         'url': 'https://www.svtplay.se/rederiet',
         'info_dict': {
-            'id': 'rederiet',
+            'id': '14445680',
             'title': 'Rederiet',
             'title': 'Rederiet',
-            'description': 'md5:505d491a58f4fcf6eb418ecab947e69e',
+            'description': 'md5:d9fdfff17f5d8f73468176ecd2836039',
         },
         'playlist_mincount': 318,
     }, {
         },
         'playlist_mincount': 318,
     }, {
-        'url': 'https://www.svtplay.se/rederiet?tab=sasong2',
+        'url': 'https://www.svtplay.se/rederiet?tab=season-2-14445680',
         'info_dict': {
         'info_dict': {
-            'id': 'rederiet-sasong2',
+            'id': 'season-2-14445680',
             'title': 'Rederiet - Säsong 2',
             'title': 'Rederiet - Säsong 2',
-            'description': 'md5:505d491a58f4fcf6eb418ecab947e69e',
+            'description': 'md5:d9fdfff17f5d8f73468176ecd2836039',
         },
         },
-        'playlist_count': 12,
+        'playlist_mincount': 12,
     }]
 
     @classmethod
     }]
 
     @classmethod
@@ -261,83 +256,87 @@ def suitable(cls, url):
         return False if SVTIE.suitable(url) or SVTPlayIE.suitable(url) else super(SVTSeriesIE, cls).suitable(url)
 
     def _real_extract(self, url):
         return False if SVTIE.suitable(url) or SVTPlayIE.suitable(url) else super(SVTSeriesIE, cls).suitable(url)
 
     def _real_extract(self, url):
-        series_id = self._match_id(url)
-
-        qs = compat_parse_qs(compat_urllib_parse_urlparse(url).query)
-        season_slug = qs.get('tab', [None])[0]
-
-        if season_slug:
-            series_id += '-%s' % season_slug
-
-        webpage = self._download_webpage(
-            url, series_id, 'Downloading series page')
-
-        root = self._parse_json(
-            self._search_regex(
-                self._SVTPLAY_RE, webpage, 'content', group='json'),
-            series_id)
+        series_slug, season_id = re.match(self._VALID_URL, url).groups()
+
+        series = self._download_json(
+            'https://api.svt.se/contento/graphql', series_slug,
+            'Downloading series page', query={
+                'query': '''{
+  listablesBySlug(slugs: ["%s"]) {
+    associatedContent(include: [productionPeriod, season]) {
+      items {
+        item {
+          ... on Episode {
+            videoSvtId
+          }
+        }
+      }
+      id
+      name
+    }
+    id
+    longDescription
+    name
+    shortDescription
+  }
+}''' % series_slug,
+            })['data']['listablesBySlug'][0]
 
         season_name = None
 
         entries = []
 
         season_name = None
 
         entries = []
-        for season in root['relatedVideoContent']['relatedVideosAccordion']:
+        for season in series['associatedContent']:
             if not isinstance(season, dict):
                 continue
             if not isinstance(season, dict):
                 continue
-            if season_slug:
-                if season.get('slug') != season_slug:
+            if season_id:
+                if season.get('id') != season_id:
                     continue
                 season_name = season.get('name')
                     continue
                 season_name = season.get('name')
-            videos = season.get('videos')
-            if not isinstance(videos, list):
+            items = season.get('items')
+            if not isinstance(items, list):
                 continue
                 continue
-            for video in videos:
-                content_url = video.get('contentUrl')
-                if not content_url or not isinstance(content_url, compat_str):
+            for item in items:
+                video = item.get('item') or {}
+                content_id = video.get('videoSvtId')
+                if not content_id or not isinstance(content_id, compat_str):
                     continue
                     continue
-                entries.append(
-                    self.url_result(
-                        urljoin(url, content_url),
-                        ie=SVTPlayIE.ie_key(),
-                        video_title=video.get('title')
-                    ))
-
-        metadata = root.get('metaData')
-        if not isinstance(metadata, dict):
-            metadata = {}
+                entries.append(self.url_result(
+                    'svt:' + content_id, SVTPlayIE.ie_key(), content_id))
 
 
-        title = metadata.get('title')
-        season_name = season_name or season_slug
+        title = series.get('name')
+        season_name = season_name or season_id
 
         if title and season_name:
             title = '%s - %s' % (title, season_name)
 
         if title and season_name:
             title = '%s - %s' % (title, season_name)
-        elif season_slug:
-            title = season_slug
+        elif season_id:
+            title = season_id
 
         return self.playlist_result(
 
         return self.playlist_result(
-            entries, series_id, title, metadata.get('description'))
+            entries, season_id or series.get('id'), title,
+            dict_get(series, ('longDescription', 'shortDescription')))
 
 
 class SVTPageIE(InfoExtractor):
 
 
 class SVTPageIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?svt\.se/(?:[^/]+/)*(?P<id>[^/?&#]+)'
+    _VALID_URL = r'https?://(?:www\.)?svt\.se/(?P<path>(?:[^/]+/)*(?P<id>[^/?&#]+))'
     _TESTS = [{
     _TESTS = [{
-        'url': 'https://www.svt.se/sport/oseedat/guide-sommartraningen-du-kan-gora-var-och-nar-du-vill',
+        'url': 'https://www.svt.se/sport/ishockey/bakom-masken-lehners-kamp-mot-mental-ohalsa',
         'info_dict': {
         'info_dict': {
-            'id': 'guide-sommartraningen-du-kan-gora-var-och-nar-du-vill',
-            'title': 'GUIDE: Sommarträning du kan göra var och när du vill',
+            'id': '25298267',
+            'title': 'Bakom masken – Lehners kamp mot mental ohälsa',
         },
         },
-        'playlist_count': 7,
+        'playlist_count': 4,
     }, {
     }, {
-        'url': 'https://www.svt.se/nyheter/inrikes/ebba-busch-thor-kd-har-delvis-ratt-om-no-go-zoner',
+        'url': 'https://www.svt.se/nyheter/utrikes/svenska-andrea-ar-en-mil-fran-branderna-i-kalifornien',
         'info_dict': {
         'info_dict': {
-            'id': 'ebba-busch-thor-kd-har-delvis-ratt-om-no-go-zoner',
-            'title': 'Ebba Busch Thor har bara delvis rätt om ”no-go-zoner”',
+            'id': '24243746',
+            'title': 'Svenska Andrea redo att fly sitt hem i Kalifornien',
         },
         },
-        'playlist_count': 1,
+        'playlist_count': 2,
     }, {
         # only programTitle
         'url': 'http://www.svt.se/sport/ishockey/jagr-tacklar-giroux-under-intervjun',
         'info_dict': {
     }, {
         # only programTitle
         'url': 'http://www.svt.se/sport/ishockey/jagr-tacklar-giroux-under-intervjun',
         'info_dict': {
-            'id': '2900353',
+            'id': '8439V2K',
             'ext': 'mp4',
             'title': 'Stjärnorna skojar till det - under SVT-intervjun',
             'duration': 27,
             'ext': 'mp4',
             'title': 'Stjärnorna skojar till det - under SVT-intervjun',
             'duration': 27,
@@ -356,16 +355,26 @@ def suitable(cls, url):
         return False if SVTIE.suitable(url) else super(SVTPageIE, cls).suitable(url)
 
     def _real_extract(self, url):
         return False if SVTIE.suitable(url) else super(SVTPageIE, cls).suitable(url)
 
     def _real_extract(self, url):
-        playlist_id = self._match_id(url)
+        path, display_id = re.match(self._VALID_URL, url).groups()
 
 
-        webpage = self._download_webpage(url, playlist_id)
+        article = self._download_json(
+            'https://api.svt.se/nss-api/page/' + path, display_id,
+            query={'q': 'articles'})['articles']['content'][0]
 
 
-        entries = [
-            self.url_result(
-                'svt:%s' % video_id, ie=SVTPlayIE.ie_key(), video_id=video_id)
-            for video_id in orderedSet(re.findall(
-                r'data-video-id=["\'](\d+)', webpage))]
+        entries = []
 
 
-        title = strip_or_none(self._og_search_title(webpage, default=None))
+        def _process_content(content):
+            if content.get('_type') in ('VIDEOCLIP', 'VIDEOEPISODE'):
+                video_id = compat_str(content['image']['svtId'])
+                entries.append(self.url_result(
+                    'svt:' + video_id, SVTPlayIE.ie_key(), video_id))
 
 
-        return self.playlist_result(entries, playlist_id, title)
+        for media in article.get('media', []):
+            _process_content(media)
+
+        for obj in article.get('structuredBody', []):
+            _process_content(obj.get('content') or {})
+
+        return self.playlist_result(
+            entries, str_or_none(article.get('id')),
+            strip_or_none(article.get('title')))
index 7d2e34b3bc4204d3ec999968c1ac76db7687a0c4..a75369dbe8a3582595ae339d58887eaefd220536 100644 (file)
@@ -4,11 +4,12 @@
 
 from .common import InfoExtractor
 from .wistia import WistiaIE
 
 from .common import InfoExtractor
 from .wistia import WistiaIE
-from ..compat import compat_str
 from ..utils import (
     clean_html,
     ExtractorError,
 from ..utils import (
     clean_html,
     ExtractorError,
+    int_or_none,
     get_element_by_class,
     get_element_by_class,
+    strip_or_none,
     urlencode_postdata,
     urljoin,
 )
     urlencode_postdata,
     urljoin,
 )
@@ -20,8 +21,8 @@ class TeachableBaseIE(InfoExtractor):
 
     _SITES = {
         # Only notable ones here
 
     _SITES = {
         # Only notable ones here
-        'upskillcourses.com': 'upskill',
-        'academy.gns3.com': 'gns3',
+        'v1.upskillcourses.com': 'upskill',
+        'gns3.teachable.com': 'gns3',
         'academyhacker.com': 'academyhacker',
         'stackskills.com': 'stackskills',
         'market.saleshacker.com': 'saleshacker',
         'academyhacker.com': 'academyhacker',
         'stackskills.com': 'stackskills',
         'market.saleshacker.com': 'saleshacker',
@@ -58,7 +59,7 @@ def is_logged(webpage):
             self._logged_in = True
             return
 
             self._logged_in = True
             return
 
-        login_url = compat_str(urlh.geturl())
+        login_url = urlh.geturl()
 
         login_form = self._hidden_inputs(login_page)
 
 
         login_form = self._hidden_inputs(login_page)
 
@@ -110,27 +111,29 @@ class TeachableIE(TeachableBaseIE):
                     ''' % TeachableBaseIE._VALID_URL_SUB_TUPLE
 
     _TESTS = [{
                     ''' % TeachableBaseIE._VALID_URL_SUB_TUPLE
 
     _TESTS = [{
-        'url': 'http://upskillcourses.com/courses/essential-web-developer-course/lectures/1747100',
+        'url': 'https://gns3.teachable.com/courses/gns3-certified-associate/lectures/6842364',
         'info_dict': {
         'info_dict': {
-            'id': 'uzw6zw58or',
-            'ext': 'mp4',
-            'title': 'Welcome to the Course!',
-            'description': 'md5:65edb0affa582974de4625b9cdea1107',
-            'duration': 138.763,
-            'timestamp': 1479846621,
-            'upload_date': '20161122',
+            'id': 'untlgzk1v7',
+            'ext': 'bin',
+            'title': 'Overview',
+            'description': 'md5:071463ff08b86c208811130ea1c2464c',
+            'duration': 736.4,
+            'timestamp': 1542315762,
+            'upload_date': '20181115',
+            'chapter': 'Welcome',
+            'chapter_number': 1,
         },
         'params': {
             'skip_download': True,
         },
     }, {
         },
         'params': {
             'skip_download': True,
         },
     }, {
-        'url': 'http://upskillcourses.com/courses/119763/lectures/1747100',
+        'url': 'http://v1.upskillcourses.com/courses/119763/lectures/1747100',
         'only_matching': True,
     }, {
         'only_matching': True,
     }, {
-        'url': 'https://academy.gns3.com/courses/423415/lectures/6885939',
+        'url': 'https://gns3.teachable.com/courses/423415/lectures/6885939',
         'only_matching': True,
     }, {
         'only_matching': True,
     }, {
-        'url': 'teachable:https://upskillcourses.com/courses/essential-web-developer-course/lectures/1747100',
+        'url': 'teachable:https://v1.upskillcourses.com/courses/essential-web-developer-course/lectures/1747100',
         'only_matching': True,
     }]
 
         'only_matching': True,
     }]
 
@@ -160,22 +163,51 @@ def _real_extract(self, url):
 
         webpage = self._download_webpage(url, video_id)
 
 
         webpage = self._download_webpage(url, video_id)
 
-        wistia_url = WistiaIE._extract_url(webpage)
-        if not wistia_url:
+        wistia_urls = WistiaIE._extract_urls(webpage)
+        if not wistia_urls:
             if any(re.search(p, webpage) for p in (
                     r'class=["\']lecture-contents-locked',
                     r'>\s*Lecture contents locked',
             if any(re.search(p, webpage) for p in (
                     r'class=["\']lecture-contents-locked',
                     r'>\s*Lecture contents locked',
-                    r'id=["\']lecture-locked')):
+                    r'id=["\']lecture-locked',
+                    # https://academy.tailoredtutors.co.uk/courses/108779/lectures/1955313
+                    r'class=["\'](?:inner-)?lesson-locked',
+                    r'>LESSON LOCKED<')):
                 self.raise_login_required('Lecture contents locked')
                 self.raise_login_required('Lecture contents locked')
+            raise ExtractorError('Unable to find video URL')
 
         title = self._og_search_title(webpage, default=None)
 
 
         title = self._og_search_title(webpage, default=None)
 
-        return {
+        chapter = None
+        chapter_number = None
+        section_item = self._search_regex(
+            r'(?s)(?P<li><li[^>]+\bdata-lecture-id=["\']%s[^>]+>.+?</li>)' % video_id,
+            webpage, 'section item', default=None, group='li')
+        if section_item:
+            chapter_number = int_or_none(self._search_regex(
+                r'data-ss-position=["\'](\d+)', section_item, 'section id',
+                default=None))
+            if chapter_number is not None:
+                sections = []
+                for s in re.findall(
+                        r'(?s)<div[^>]+\bclass=["\']section-title[^>]+>(.+?)</div>', webpage):
+                    section = strip_or_none(clean_html(s))
+                    if not section:
+                        sections = []
+                        break
+                    sections.append(section)
+                if chapter_number <= len(sections):
+                    chapter = sections[chapter_number - 1]
+
+        entries = [{
             '_type': 'url_transparent',
             'url': wistia_url,
             'ie_key': WistiaIE.ie_key(),
             'title': title,
             '_type': 'url_transparent',
             'url': wistia_url,
             'ie_key': WistiaIE.ie_key(),
             'title': title,
-        }
+            'chapter': chapter,
+            'chapter_number': chapter_number,
+        } for wistia_url in wistia_urls]
+
+        return self.playlist_result(entries, video_id, title)
 
 
 class TeachableCourseIE(TeachableBaseIE):
 
 
 class TeachableCourseIE(TeachableBaseIE):
@@ -187,20 +219,20 @@ class TeachableCourseIE(TeachableBaseIE):
                         /(?:courses|p)/(?:enrolled/)?(?P<id>[^/?#&]+)
                     ''' % TeachableBaseIE._VALID_URL_SUB_TUPLE
     _TESTS = [{
                         /(?:courses|p)/(?:enrolled/)?(?P<id>[^/?#&]+)
                     ''' % TeachableBaseIE._VALID_URL_SUB_TUPLE
     _TESTS = [{
-        'url': 'http://upskillcourses.com/courses/essential-web-developer-course/',
+        'url': 'http://v1.upskillcourses.com/courses/essential-web-developer-course/',
         'info_dict': {
             'id': 'essential-web-developer-course',
             'title': 'The Essential Web Developer Course (Free)',
         },
         'playlist_count': 192,
     }, {
         'info_dict': {
             'id': 'essential-web-developer-course',
             'title': 'The Essential Web Developer Course (Free)',
         },
         'playlist_count': 192,
     }, {
-        'url': 'http://upskillcourses.com/courses/119763/',
+        'url': 'http://v1.upskillcourses.com/courses/119763/',
         'only_matching': True,
     }, {
         'only_matching': True,
     }, {
-        'url': 'http://upskillcourses.com/courses/enrolled/119763',
+        'url': 'http://v1.upskillcourses.com/courses/enrolled/119763',
         'only_matching': True,
     }, {
         'only_matching': True,
     }, {
-        'url': 'https://academy.gns3.com/courses/enrolled/423415',
+        'url': 'https://gns3.teachable.com/courses/enrolled/423415',
         'only_matching': True,
     }, {
         'url': 'teachable:https://learn.vrdev.school/p/gear-vr-developer-mini',
         'only_matching': True,
     }, {
         'url': 'teachable:https://learn.vrdev.school/p/gear-vr-developer-mini',
index 33a72083bff96a74e04020257ba3305d1cfee2ca..3e1a7a9e609a9eb80732348b86e5976e12093a0b 100644 (file)
@@ -1,13 +1,21 @@
 # coding: utf-8
 from __future__ import unicode_literals
 
 # coding: utf-8
 from __future__ import unicode_literals
 
+import re
+
 from .common import InfoExtractor
 from .common import InfoExtractor
+from .jwplatform import JWPlatformIE
 from .nexx import NexxIE
 from ..compat import compat_urlparse
 from .nexx import NexxIE
 from ..compat import compat_urlparse
+from ..utils import (
+    NO_DEFAULT,
+    smuggle_url,
+)
 
 
 class Tele5IE(InfoExtractor):
     _VALID_URL = r'https?://(?:www\.)?tele5\.de/(?:[^/]+/)*(?P<id>[^/?#&]+)'
 
 
 class Tele5IE(InfoExtractor):
     _VALID_URL = r'https?://(?:www\.)?tele5\.de/(?:[^/]+/)*(?P<id>[^/?#&]+)'
+    _GEO_COUNTRIES = ['DE']
     _TESTS = [{
         'url': 'https://www.tele5.de/mediathek/filme-online/videos?vid=1549416',
         'info_dict': {
     _TESTS = [{
         'url': 'https://www.tele5.de/mediathek/filme-online/videos?vid=1549416',
         'info_dict': {
@@ -20,6 +28,21 @@ class Tele5IE(InfoExtractor):
         'params': {
             'skip_download': True,
         },
         'params': {
             'skip_download': True,
         },
+    }, {
+        # jwplatform, nexx unavailable
+        'url': 'https://www.tele5.de/filme/ghoul-das-geheimnis-des-friedhofmonsters/',
+        'info_dict': {
+            'id': 'WJuiOlUp',
+            'ext': 'mp4',
+            'upload_date': '20200603',
+            'timestamp': 1591214400,
+            'title': 'Ghoul - Das Geheimnis des Friedhofmonsters',
+            'description': 'md5:42002af1d887ff3d5b2b3ca1f8137d97',
+        },
+        'params': {
+            'skip_download': True,
+        },
+        'add_ie': [JWPlatformIE.ie_key()],
     }, {
         'url': 'https://www.tele5.de/kalkofes-mattscheibe/video-clips/politik-und-gesellschaft?ve_id=1551191',
         'only_matching': True,
     }, {
         'url': 'https://www.tele5.de/kalkofes-mattscheibe/video-clips/politik-und-gesellschaft?ve_id=1551191',
         'only_matching': True,
@@ -44,14 +67,42 @@ def _real_extract(self, url):
         qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
         video_id = (qs.get('vid') or qs.get('ve_id') or [None])[0]
 
         qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
         video_id = (qs.get('vid') or qs.get('ve_id') or [None])[0]
 
-        if not video_id:
+        NEXX_ID_RE = r'\d{6,}'
+        JWPLATFORM_ID_RE = r'[a-zA-Z0-9]{8}'
+
+        def nexx_result(nexx_id):
+            return self.url_result(
+                'https://api.nexx.cloud/v3/759/videos/byid/%s' % nexx_id,
+                ie=NexxIE.ie_key(), video_id=nexx_id)
+
+        nexx_id = jwplatform_id = None
+
+        if video_id:
+            if re.match(NEXX_ID_RE, video_id):
+                return nexx_result(video_id)
+            elif re.match(JWPLATFORM_ID_RE, video_id):
+                jwplatform_id = video_id
+
+        if not nexx_id:
             display_id = self._match_id(url)
             webpage = self._download_webpage(url, display_id)
             display_id = self._match_id(url)
             webpage = self._download_webpage(url, display_id)
-            video_id = self._html_search_regex(
-                (r'id\s*=\s*["\']video-player["\'][^>]+data-id\s*=\s*["\'](\d+)',
-                 r'\s+id\s*=\s*["\']player_(\d{6,})',
-                 r'\bdata-id\s*=\s*["\'](\d{6,})'), webpage, 'video id')
+
+            def extract_id(pattern, name, default=NO_DEFAULT):
+                return self._html_search_regex(
+                    (r'id\s*=\s*["\']video-player["\'][^>]+data-id\s*=\s*["\'](%s)' % pattern,
+                     r'\s+id\s*=\s*["\']player_(%s)' % pattern,
+                     r'\bdata-id\s*=\s*["\'](%s)' % pattern), webpage, name,
+                    default=default)
+
+            nexx_id = extract_id(NEXX_ID_RE, 'nexx id', default=None)
+            if nexx_id:
+                return nexx_result(nexx_id)
+
+            if not jwplatform_id:
+                jwplatform_id = extract_id(JWPLATFORM_ID_RE, 'jwplatform id')
 
         return self.url_result(
 
         return self.url_result(
-            'https://api.nexx.cloud/v3/759/videos/byid/%s' % video_id,
-            ie=NexxIE.ie_key(), video_id=video_id)
+            smuggle_url(
+                'jwplatform:%s' % jwplatform_id,
+                {'geo_countries': self._GEO_COUNTRIES}),
+            ie=JWPlatformIE.ie_key(), video_id=jwplatform_id)
index d37e1b0557cf3ba241a25e7e56d28c8dc679b1d0..9ba3da341dac65d18a599a790bff9c95b0e52eb8 100644 (file)
@@ -11,6 +11,7 @@
     determine_ext,
     int_or_none,
     str_or_none,
     determine_ext,
     int_or_none,
     str_or_none,
+    try_get,
     urljoin,
 )
 
     urljoin,
 )
 
@@ -24,7 +25,7 @@ class TelecincoIE(InfoExtractor):
         'info_dict': {
             'id': '1876350223',
             'title': 'Bacalao con kokotxas al pil-pil',
         'info_dict': {
             'id': '1876350223',
             'title': 'Bacalao con kokotxas al pil-pil',
-            'description': 'md5:1382dacd32dd4592d478cbdca458e5bb',
+            'description': 'md5:716caf5601e25c3c5ab6605b1ae71529',
         },
         'playlist': [{
             'md5': 'adb28c37238b675dad0f042292f209a7',
         },
         'playlist': [{
             'md5': 'adb28c37238b675dad0f042292f209a7',
@@ -55,6 +56,26 @@ class TelecincoIE(InfoExtractor):
             'description': 'md5:2771356ff7bfad9179c5f5cd954f1477',
             'duration': 50,
         },
             'description': 'md5:2771356ff7bfad9179c5f5cd954f1477',
             'duration': 50,
         },
+    }, {
+        # video in opening's content
+        'url': 'https://www.telecinco.es/vivalavida/fiorella-sobrina-edmundo-arrocet-entrevista_18_2907195140.html',
+        'info_dict': {
+            'id': '2907195140',
+            'title': 'La surrealista entrevista a la sobrina de Edmundo Arrocet: "No puedes venir aquí y tomarnos por tontos"',
+            'description': 'md5:73f340a7320143d37ab895375b2bf13a',
+        },
+        'playlist': [{
+            'md5': 'adb28c37238b675dad0f042292f209a7',
+            'info_dict': {
+                'id': 'TpI2EttSDAReWpJ1o0NVh2',
+                'ext': 'mp4',
+                'title': 'La surrealista entrevista a la sobrina de Edmundo Arrocet: "No puedes venir aquí y tomarnos por tontos"',
+                'duration': 1015,
+            },
+        }],
+        'params': {
+            'skip_download': True,
+        },
     }, {
         'url': 'http://www.telecinco.es/informativos/nacional/Pablo_Iglesias-Informativos_Telecinco-entrevista-Pedro_Piqueras_2_1945155182.html',
         'only_matching': True,
     }, {
         'url': 'http://www.telecinco.es/informativos/nacional/Pablo_Iglesias-Informativos_Telecinco-entrevista-Pedro_Piqueras_2_1945155182.html',
         'only_matching': True,
@@ -135,17 +156,28 @@ def _real_extract(self, url):
         display_id = self._match_id(url)
         webpage = self._download_webpage(url, display_id)
         article = self._parse_json(self._search_regex(
         display_id = self._match_id(url)
         webpage = self._download_webpage(url, display_id)
         article = self._parse_json(self._search_regex(
-            r'window\.\$REACTBASE_STATE\.article\s*=\s*({.+})',
+            r'window\.\$REACTBASE_STATE\.article(?:_multisite)?\s*=\s*({.+})',
             webpage, 'article'), display_id)['article']
         title = article.get('title')
             webpage, 'article'), display_id)['article']
         title = article.get('title')
-        description = clean_html(article.get('leadParagraph'))
+        description = clean_html(article.get('leadParagraph')) or ''
         if article.get('editorialType') != 'VID':
             entries = []
         if article.get('editorialType') != 'VID':
             entries = []
-            for p in article.get('body', []):
+            body = [article.get('opening')]
+            body.extend(try_get(article, lambda x: x['body'], list) or [])
+            for p in body:
+                if not isinstance(p, dict):
+                    continue
                 content = p.get('content')
                 content = p.get('content')
-                if p.get('type') != 'video' or not content:
+                if not content:
+                    continue
+                type_ = p.get('type')
+                if type_ == 'paragraph':
+                    content_str = str_or_none(content)
+                    if content_str:
+                        description += content_str
                     continue
                     continue
-                entries.append(self._parse_content(content, url))
+                if type_ == 'video' and isinstance(content, dict):
+                    entries.append(self._parse_content(content, url))
             return self.playlist_result(
                 entries, str_or_none(article.get('id')), title, description)
         content = article['opening']['content']
             return self.playlist_result(
                 entries, str_or_none(article.get('id')), title, description)
         content = article['opening']['content']
index ae9f66787439462967baa63dc58f39870fb89382..c82c94b3a0009da2cf0938c92910feec84de018b 100644 (file)
@@ -38,8 +38,6 @@ class TeleQuebecIE(TeleQuebecBaseIE):
             'ext': 'mp4',
             'title': 'Un petit choc et puis repart!',
             'description': 'md5:b04a7e6b3f74e32d7b294cffe8658374',
             'ext': 'mp4',
             'title': 'Un petit choc et puis repart!',
             'description': 'md5:b04a7e6b3f74e32d7b294cffe8658374',
-            'upload_date': '20180222',
-            'timestamp': 1519326631,
         },
         'params': {
             'skip_download': True,
         },
         'params': {
             'skip_download': True,
index dff44a4e2f4f5e4105def539c186dc630b411b34..af325fea8fcd68ce5cf9b8bb8ec33975950b5c32 100644 (file)
@@ -10,8 +10,8 @@
 
 
 class TenPlayIE(InfoExtractor):
 
 
 class TenPlayIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?10play\.com\.au/[^/]+/episodes/[^/]+/[^/]+/(?P<id>tpv\d{6}[a-z]{5})'
-    _TEST = {
+    _VALID_URL = r'https?://(?:www\.)?10play\.com\.au/(?:[^/]+/)+(?P<id>tpv\d{6}[a-z]{5})'
+    _TESTS = [{
         'url': 'https://10play.com.au/masterchef/episodes/season-1/masterchef-s1-ep-1/tpv190718kwzga',
         'info_dict': {
             'id': '6060533435001',
         'url': 'https://10play.com.au/masterchef/episodes/season-1/masterchef-s1-ep-1/tpv190718kwzga',
         'info_dict': {
             'id': '6060533435001',
@@ -27,7 +27,10 @@ class TenPlayIE(InfoExtractor):
             'format': 'bestvideo',
             'skip_download': True,
         }
             'format': 'bestvideo',
             'skip_download': True,
         }
-    }
+    }, {
+        'url': 'https://10play.com.au/how-to-stay-married/web-extras/season-1/terrys-talks-ep-1-embracing-change/tpv190915ylupc',
+        'only_matching': True,
+    }]
     BRIGHTCOVE_URL_TEMPLATE = 'https://players.brightcove.net/2199827728001/cN6vRtRQt_default/index.html?videoId=%s'
 
     def _real_extract(self, url):
     BRIGHTCOVE_URL_TEMPLATE = 'https://players.brightcove.net/2199827728001/cN6vRtRQt_default/index.html?videoId=%s'
 
     def _real_extract(self, url):
index 0e2370cd828f78a2e1a708852a392a05d96e3039..0631cb7aba8a7068a291fb8e67de0d5e04acf482 100644 (file)
@@ -17,14 +17,12 @@ class TFOIE(InfoExtractor):
     _VALID_URL = r'https?://(?:www\.)?tfo\.org/(?:en|fr)/(?:[^/]+/){2}(?P<id>\d+)'
     _TEST = {
         'url': 'http://www.tfo.org/en/universe/tfo-247/100463871/video-game-hackathon',
     _VALID_URL = r'https?://(?:www\.)?tfo\.org/(?:en|fr)/(?:[^/]+/){2}(?P<id>\d+)'
     _TEST = {
         'url': 'http://www.tfo.org/en/universe/tfo-247/100463871/video-game-hackathon',
-        'md5': '47c987d0515561114cf03d1226a9d4c7',
+        'md5': 'cafbe4f47a8dae0ca0159937878100d6',
         'info_dict': {
         'info_dict': {
-            'id': '100463871',
+            'id': '7da3d50e495c406b8fc0b997659cc075',
             'ext': 'mp4',
             'title': 'Video Game Hackathon',
             'description': 'md5:558afeba217c6c8d96c60e5421795c07',
             'ext': 'mp4',
             'title': 'Video Game Hackathon',
             'description': 'md5:558afeba217c6c8d96c60e5421795c07',
-            'upload_date': '20160212',
-            'timestamp': 1455310233,
         }
     }
 
         }
     }
 
index 6ab147ad726306ba9250599d34491a50e64e82d0..a3d9b4017b93b1d9381b896967fa9e68da59eaec 100644 (file)
@@ -2,43 +2,46 @@
 from __future__ import unicode_literals
 
 from .common import InfoExtractor
 from __future__ import unicode_literals
 
 from .common import InfoExtractor
-from ..compat import compat_str
-from ..utils import try_get
 
 
 class ThisOldHouseIE(InfoExtractor):
 
 
 class ThisOldHouseIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?thisoldhouse\.com/(?:watch|how-to|tv-episode)/(?P<id>[^/?#]+)'
+    _VALID_URL = r'https?://(?:www\.)?thisoldhouse\.com/(?:watch|how-to|tv-episode|(?:[^/]+/)?\d+)/(?P<id>[^/?#]+)'
     _TESTS = [{
         'url': 'https://www.thisoldhouse.com/how-to/how-to-build-storage-bench',
     _TESTS = [{
         'url': 'https://www.thisoldhouse.com/how-to/how-to-build-storage-bench',
-        'md5': '568acf9ca25a639f0c4ff905826b662f',
         'info_dict': {
         'info_dict': {
-            'id': '2REGtUDQ',
+            'id': '5dcdddf673c3f956ef5db202',
             'ext': 'mp4',
             'title': 'How to Build a Storage Bench',
             'description': 'In the workshop, Tom Silva and Kevin O\'Connor build a storage bench for an entryway.',
             'timestamp': 1442548800,
             'upload_date': '20150918',
             'ext': 'mp4',
             'title': 'How to Build a Storage Bench',
             'description': 'In the workshop, Tom Silva and Kevin O\'Connor build a storage bench for an entryway.',
             'timestamp': 1442548800,
             'upload_date': '20150918',
-        }
+        },
+        'params': {
+            'skip_download': True,
+        },
     }, {
         'url': 'https://www.thisoldhouse.com/watch/arlington-arts-crafts-arts-and-crafts-class-begins',
         'only_matching': True,
     }, {
         'url': 'https://www.thisoldhouse.com/tv-episode/ask-toh-shelf-rough-electric',
         'only_matching': True,
     }, {
         'url': 'https://www.thisoldhouse.com/watch/arlington-arts-crafts-arts-and-crafts-class-begins',
         'only_matching': True,
     }, {
         'url': 'https://www.thisoldhouse.com/tv-episode/ask-toh-shelf-rough-electric',
         'only_matching': True,
+    }, {
+        'url': 'https://www.thisoldhouse.com/furniture/21017078/how-to-build-a-storage-bench',
+        'only_matching': True,
+    }, {
+        'url': 'https://www.thisoldhouse.com/21113884/s41-e13-paradise-lost',
+        'only_matching': True,
+    }, {
+        # iframe www.thisoldhouse.com
+        'url': 'https://www.thisoldhouse.com/21083431/seaside-transformation-the-westerly-project',
+        'only_matching': True,
     }]
     }]
+    _ZYPE_TMPL = 'https://player.zype.com/embed/%s.html?api_key=hsOk_yMSPYNrT22e9pu8hihLXjaZf0JW5jsOWv4ZqyHJFvkJn6rtToHl09tbbsbe'
 
     def _real_extract(self, url):
         display_id = self._match_id(url)
         webpage = self._download_webpage(url, display_id)
         video_id = self._search_regex(
 
     def _real_extract(self, url):
         display_id = self._match_id(url)
         webpage = self._download_webpage(url, display_id)
         video_id = self._search_regex(
-            (r'data-mid=(["\'])(?P<id>(?:(?!\1).)+)\1',
-             r'id=(["\'])inline-video-player-(?P<id>(?:(?!\1).)+)\1'),
-            webpage, 'video id', default=None, group='id')
-        if not video_id:
-            drupal_settings = self._parse_json(self._search_regex(
-                r'jQuery\.extend\(Drupal\.settings\s*,\s*({.+?})\);',
-                webpage, 'drupal settings'), display_id)
-            video_id = try_get(
-                drupal_settings, lambda x: x['jwplatform']['video_id'],
-                compat_str) or list(drupal_settings['comScore'])[0]
-        return self.url_result('jwplatform:' + video_id, 'JWPlatform', video_id)
+            r'<iframe[^>]+src=[\'"](?:https?:)?//(?:www\.)?thisoldhouse\.(?:chorus\.build|com)/videos/zype/([0-9a-f]{24})',
+            webpage, 'video id')
+        return self.url_result(self._ZYPE_TMPL % video_id, 'Zype', video_id)
index 5e5efda0f0780fb98b7c37b788ad2734a837e90d..ca2e36efe4216ad66d46252c662ea4cc5395c3ca 100644 (file)
@@ -17,9 +17,9 @@
 
 class ToggleIE(InfoExtractor):
     IE_NAME = 'toggle'
 
 class ToggleIE(InfoExtractor):
     IE_NAME = 'toggle'
-    _VALID_URL = r'https?://video\.toggle\.sg/(?:en|zh)/(?:[^/]+/){2,}(?P<id>[0-9]+)'
+    _VALID_URL = r'https?://(?:(?:www\.)?mewatch|video\.toggle)\.sg/(?:en|zh)/(?:[^/]+/){2,}(?P<id>[0-9]+)'
     _TESTS = [{
     _TESTS = [{
-        'url': 'http://video.toggle.sg/en/series/lion-moms-tif/trailers/lion-moms-premier/343115',
+        'url': 'http://www.mewatch.sg/en/series/lion-moms-tif/trailers/lion-moms-premier/343115',
         'info_dict': {
             'id': '343115',
             'ext': 'mp4',
         'info_dict': {
             'id': '343115',
             'ext': 'mp4',
@@ -33,7 +33,7 @@ class ToggleIE(InfoExtractor):
         }
     }, {
         'note': 'DRM-protected video',
         }
     }, {
         'note': 'DRM-protected video',
-        'url': 'http://video.toggle.sg/en/movies/dug-s-special-mission/341413',
+        'url': 'http://www.mewatch.sg/en/movies/dug-s-special-mission/341413',
         'info_dict': {
             'id': '341413',
             'ext': 'wvm',
         'info_dict': {
             'id': '341413',
             'ext': 'wvm',
@@ -48,7 +48,7 @@ class ToggleIE(InfoExtractor):
     }, {
         # this also tests correct video id extraction
         'note': 'm3u8 links are geo-restricted, but Android/mp4 is okay',
     }, {
         # this also tests correct video id extraction
         'note': 'm3u8 links are geo-restricted, but Android/mp4 is okay',
-        'url': 'http://video.toggle.sg/en/series/28th-sea-games-5-show/28th-sea-games-5-show-ep11/332861',
+        'url': 'http://www.mewatch.sg/en/series/28th-sea-games-5-show/28th-sea-games-5-show-ep11/332861',
         'info_dict': {
             'id': '332861',
             'ext': 'mp4',
         'info_dict': {
             'id': '332861',
             'ext': 'mp4',
@@ -65,19 +65,22 @@ class ToggleIE(InfoExtractor):
         'url': 'http://video.toggle.sg/en/clips/seraph-sun-aloysius-will-suddenly-sing-some-old-songs-in-high-pitch-on-set/343331',
         'only_matching': True,
     }, {
         'url': 'http://video.toggle.sg/en/clips/seraph-sun-aloysius-will-suddenly-sing-some-old-songs-in-high-pitch-on-set/343331',
         'only_matching': True,
     }, {
-        'url': 'http://video.toggle.sg/zh/series/zero-calling-s2-hd/ep13/336367',
+        'url': 'http://www.mewatch.sg/en/clips/seraph-sun-aloysius-will-suddenly-sing-some-old-songs-in-high-pitch-on-set/343331',
         'only_matching': True,
     }, {
         'only_matching': True,
     }, {
-        'url': 'http://video.toggle.sg/en/series/vetri-s2/webisodes/jeeva-is-an-orphan-vetri-s2-webisode-7/342302',
+        'url': 'http://www.mewatch.sg/zh/series/zero-calling-s2-hd/ep13/336367',
         'only_matching': True,
     }, {
         'only_matching': True,
     }, {
-        'url': 'http://video.toggle.sg/en/movies/seven-days/321936',
+        'url': 'http://www.mewatch.sg/en/series/vetri-s2/webisodes/jeeva-is-an-orphan-vetri-s2-webisode-7/342302',
         'only_matching': True,
     }, {
         'only_matching': True,
     }, {
-        'url': 'https://video.toggle.sg/en/tv-show/news/may-2017-cna-singapore-tonight/fri-19-may-2017/512456',
+        'url': 'http://www.mewatch.sg/en/movies/seven-days/321936',
         'only_matching': True,
     }, {
         'only_matching': True,
     }, {
-        'url': 'http://video.toggle.sg/en/channels/eleven-plus/401585',
+        'url': 'https://www.mewatch.sg/en/tv-show/news/may-2017-cna-singapore-tonight/fri-19-may-2017/512456',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.mewatch.sg/en/channels/eleven-plus/401585',
         'only_matching': True,
     }]
 
         'only_matching': True,
     }]
 
index b0c7caabf336499dea40adbb62a321e43a6a03f4..cca5b5cebd7c7dedd081156fcb670be3b8bd6027 100644 (file)
@@ -1,21 +1,12 @@
 from __future__ import unicode_literals
 
 from .common import InfoExtractor
 from __future__ import unicode_literals
 
 from .common import InfoExtractor
-from ..utils import (
-    dict_get,
-    float_or_none,
-    int_or_none,
-    unified_timestamp,
-    update_url_query,
-    url_or_none,
-)
 
 
 class TruNewsIE(InfoExtractor):
     _VALID_URL = r'https?://(?:www\.)?trunews\.com/stream/(?P<id>[^/?#&]+)'
     _TEST = {
         'url': 'https://www.trunews.com/stream/will-democrats-stage-a-circus-during-president-trump-s-state-of-the-union-speech',
 
 
 class TruNewsIE(InfoExtractor):
     _VALID_URL = r'https?://(?:www\.)?trunews\.com/stream/(?P<id>[^/?#&]+)'
     _TEST = {
         'url': 'https://www.trunews.com/stream/will-democrats-stage-a-circus-during-president-trump-s-state-of-the-union-speech',
-        'md5': 'a19c024c3906ff954fac9b96ce66bb08',
         'info_dict': {
             'id': '5c5a21e65d3c196e1c0020cc',
             'display_id': 'will-democrats-stage-a-circus-during-president-trump-s-state-of-the-union-speech',
         'info_dict': {
             'id': '5c5a21e65d3c196e1c0020cc',
             'display_id': 'will-democrats-stage-a-circus-during-president-trump-s-state-of-the-union-speech',
@@ -28,48 +19,16 @@ class TruNewsIE(InfoExtractor):
         },
         'add_ie': ['Zype'],
     }
         },
         'add_ie': ['Zype'],
     }
+    _ZYPE_TEMPL = 'https://player.zype.com/embed/%s.js?api_key=X5XnahkjCwJrT_l5zUqypnaLEObotyvtUKJWWlONxDoHVjP8vqxlArLV8llxMbyt'
 
     def _real_extract(self, url):
         display_id = self._match_id(url)
 
 
     def _real_extract(self, url):
         display_id = self._match_id(url)
 
-        video = self._download_json(
+        zype_id = self._download_json(
             'https://api.zype.com/videos', display_id, query={
                 'app_key': 'PUVKp9WgGUb3-JUw6EqafLx8tFVP6VKZTWbUOR-HOm__g4fNDt1bCsm_LgYf_k9H',
                 'per_page': 1,
                 'active': 'true',
                 'friendly_title': display_id,
             'https://api.zype.com/videos', display_id, query={
                 'app_key': 'PUVKp9WgGUb3-JUw6EqafLx8tFVP6VKZTWbUOR-HOm__g4fNDt1bCsm_LgYf_k9H',
                 'per_page': 1,
                 'active': 'true',
                 'friendly_title': display_id,
-            })['response'][0]
-
-        zype_id = video['_id']
-
-        thumbnails = []
-        thumbnails_list = video.get('thumbnails')
-        if isinstance(thumbnails_list, list):
-            for thumbnail in thumbnails_list:
-                if not isinstance(thumbnail, dict):
-                    continue
-                thumbnail_url = url_or_none(thumbnail.get('url'))
-                if not thumbnail_url:
-                    continue
-                thumbnails.append({
-                    'url': thumbnail_url,
-                    'width': int_or_none(thumbnail.get('width')),
-                    'height': int_or_none(thumbnail.get('height')),
-                })
-
-        return {
-            '_type': 'url_transparent',
-            'url': update_url_query(
-                'https://player.zype.com/embed/%s.js' % zype_id,
-                {'api_key': 'X5XnahkjCwJrT_l5zUqypnaLEObotyvtUKJWWlONxDoHVjP8vqxlArLV8llxMbyt'}),
-            'ie_key': 'Zype',
-            'id': zype_id,
-            'display_id': display_id,
-            'title': video.get('title'),
-            'description': dict_get(video, ('description', 'ott_description', 'short_description')),
-            'duration': int_or_none(video.get('duration')),
-            'timestamp': unified_timestamp(video.get('published_at')),
-            'average_rating': float_or_none(video.get('rating')),
-            'view_count': int_or_none(video.get('request_count')),
-            'thumbnails': thumbnails,
-        }
+            })['response'][0]['_id']
+        return self.url_result(self._ZYPE_TEMPL % zype_id, 'Zype', zype_id)
index edbb0aa6944ba82b36415875f2d99e570b3373fc..ae584ad697bdf3f460eff033b8f43e75776942ee 100644 (file)
@@ -4,7 +4,6 @@
 import re
 
 from .common import InfoExtractor
 import re
 
 from .common import InfoExtractor
-from ..compat import compat_str
 from ..utils import (
     ExtractorError,
     int_or_none,
 from ..utils import (
     ExtractorError,
     int_or_none,
@@ -151,7 +150,7 @@ def _real_extract(self, url):
         url = 'http://%s.tumblr.com/post/%s/' % (blog, video_id)
         webpage, urlh = self._download_webpage_handle(url, video_id)
 
         url = 'http://%s.tumblr.com/post/%s/' % (blog, video_id)
         webpage, urlh = self._download_webpage_handle(url, video_id)
 
-        redirect_url = compat_str(urlh.geturl())
+        redirect_url = urlh.geturl()
         if 'tumblr.com/safe-mode' in redirect_url or redirect_url.startswith('/safe-mode'):
             raise ExtractorError(
                 'This Tumblr may contain sensitive media. '
         if 'tumblr.com/safe-mode' in redirect_url or redirect_url.startswith('/safe-mode'):
             raise ExtractorError(
                 'This Tumblr may contain sensitive media. '
index 611fdc0c6c7002c1669200c7ace75bf498a85c6c..8bda9348d723073b894d2d77b6556b51d89dad80 100644 (file)
@@ -106,7 +106,7 @@ def _real_extract(self, url):
         video_id = self._match_id(url)
 
         video = self._download_json(
         video_id = self._match_id(url)
 
         video = self._download_json(
-            'http://play.tv2bornholm.dk/controls/AJAX.aspx/specifikVideo', video_id,
+            'https://play.tv2bornholm.dk/controls/AJAX.aspx/specifikVideo', video_id,
             data=json.dumps({
                 'playlist_id': video_id,
                 'serienavn': '',
             data=json.dumps({
                 'playlist_id': video_id,
                 'serienavn': '',
index a819d048c613929b79f090facc4a82a097e1cb73..c498b0191623220071d38764f04d3ba1fc114558 100644 (file)
@@ -99,7 +99,7 @@ def _real_extract(self, url):
             manifest_url.replace('.m3u8', '.f4m'),
             video_id, f4m_id='hds', fatal=False))
         formats.extend(self._extract_ism_formats(
             manifest_url.replace('.m3u8', '.f4m'),
             video_id, f4m_id='hds', fatal=False))
         formats.extend(self._extract_ism_formats(
-            re.sub(r'\.ism/.+?\.m3u8', r'.ism/Manifest', manifest_url),
+            re.sub(r'\.ism/.*?\.m3u8', r'.ism/Manifest', manifest_url),
             video_id, ism_id='mss', fatal=False))
 
         if not formats and info.get('is_geo_restricted'):
             video_id, ism_id='mss', fatal=False))
 
         if not formats and info.get('is_geo_restricted'):
index 88b6baa316b54eb58e3deb5d69f2fd04c1795bba..b7fe082b9c00e6600b6bf9f6e29849a3fd4eb3b6 100644 (file)
@@ -3,31 +3,51 @@
 
 from .common import InfoExtractor
 from ..utils import (
 
 from .common import InfoExtractor
 from ..utils import (
-    clean_html,
     determine_ext,
     extract_attributes,
     determine_ext,
     extract_attributes,
-    get_element_by_class,
     int_or_none,
     parse_duration,
     int_or_none,
     parse_duration,
-    parse_iso8601,
 )
 
 
 class TV5MondePlusIE(InfoExtractor):
     IE_DESC = 'TV5MONDE+'
 )
 
 
 class TV5MondePlusIE(InfoExtractor):
     IE_DESC = 'TV5MONDE+'
-    _VALID_URL = r'https?://(?:www\.)?tv5mondeplus\.com/toutes-les-videos/[^/]+/(?P<id>[^/?#]+)'
-    _TEST = {
-        'url': 'http://www.tv5mondeplus.com/toutes-les-videos/documentaire/tdah-mon-amour-tele-quebec-tdah-mon-amour-ep001-enfants',
-        'md5': '12130fc199f020673138a83466542ec6',
+    _VALID_URL = r'https?://(?:www\.)?(?:tv5mondeplus|revoir\.tv5monde)\.com/toutes-les-videos/[^/]+/(?P<id>[^/?#]+)'
+    _TESTS = [{
+        # movie
+        'url': 'https://revoir.tv5monde.com/toutes-les-videos/cinema/rendez-vous-a-atlit',
+        'md5': '8cbde5ea7b296cf635073e27895e227f',
         'info_dict': {
         'info_dict': {
-            'id': 'tdah-mon-amour-tele-quebec-tdah-mon-amour-ep001-enfants',
+            'id': '822a4756-0712-7329-1859-a13ac7fd1407',
+            'display_id': 'rendez-vous-a-atlit',
             'ext': 'mp4',
             'ext': 'mp4',
-            'title': 'Tdah, mon amour - Enfants',
-            'description': 'md5:230e3aca23115afcf8006d1bece6df74',
-            'upload_date': '20170401',
-            'timestamp': 1491022860,
-        }
-    }
+            'title': 'Rendez-vous à Atlit',
+            'description': 'md5:2893a4c5e1dbac3eedff2d87956e4efb',
+            'upload_date': '20200130',
+        },
+    }, {
+        # series episode
+        'url': 'https://revoir.tv5monde.com/toutes-les-videos/series-fictions/c-est-la-vie-ennemie-juree',
+        'info_dict': {
+            'id': '0df7007c-4900-3936-c601-87a13a93a068',
+            'display_id': 'c-est-la-vie-ennemie-juree',
+            'ext': 'mp4',
+            'title': "C'est la vie - Ennemie jurée",
+            'description': 'md5:dfb5c63087b6f35fe0cc0af4fe44287e',
+            'upload_date': '20200130',
+            'series': "C'est la vie",
+            'episode': 'Ennemie jurée',
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }, {
+        'url': 'https://revoir.tv5monde.com/toutes-les-videos/series-fictions/neuf-jours-en-hiver-neuf-jours-en-hiver',
+        'only_matching': True,
+    }, {
+        'url': 'https://revoir.tv5monde.com/toutes-les-videos/info-societe/le-journal-de-la-rts-edition-du-30-01-20-19h30',
+        'only_matching': True,
+    }]
     _GEO_BYPASS = False
 
     def _real_extract(self, url):
     _GEO_BYPASS = False
 
     def _real_extract(self, url):
@@ -37,11 +57,7 @@ def _real_extract(self, url):
         if ">Ce programme n'est malheureusement pas disponible pour votre zone géographique.<" in webpage:
             self.raise_geo_restricted(countries=['FR'])
 
         if ">Ce programme n'est malheureusement pas disponible pour votre zone géographique.<" in webpage:
             self.raise_geo_restricted(countries=['FR'])
 
-        series = get_element_by_class('video-detail__title', webpage)
-        title = episode = get_element_by_class(
-            'video-detail__subtitle', webpage) or series
-        if series and series != title:
-            title = '%s - %s' % (series, title)
+        title = episode = self._html_search_regex(r'<h1>([^<]+)', webpage, 'title')
         vpl_data = extract_attributes(self._search_regex(
             r'(<[^>]+class="video_player_loader"[^>]+>)',
             webpage, 'video player loader'))
         vpl_data = extract_attributes(self._search_regex(
             r'(<[^>]+class="video_player_loader"[^>]+>)',
             webpage, 'video player loader'))
@@ -65,15 +81,37 @@ def _real_extract(self, url):
                 })
         self._sort_formats(formats)
 
                 })
         self._sort_formats(formats)
 
+        description = self._html_search_regex(
+            r'(?s)<div[^>]+class=["\']episode-texte[^>]+>(.+?)</div>', webpage,
+            'description', fatal=False)
+
+        series = self._html_search_regex(
+            r'<p[^>]+class=["\']episode-emission[^>]+>([^<]+)', webpage,
+            'series', default=None)
+
+        if series and series != title:
+            title = '%s - %s' % (series, title)
+
+        upload_date = self._search_regex(
+            r'(?:date_publication|publish_date)["\']\s*:\s*["\'](\d{4}_\d{2}_\d{2})',
+            webpage, 'upload date', default=None)
+        if upload_date:
+            upload_date = upload_date.replace('_', '')
+
+        video_id = self._search_regex(
+            (r'data-guid=["\']([\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})',
+             r'id_contenu["\']\s:\s*(\d+)'), webpage, 'video id',
+            default=display_id)
+
         return {
         return {
-            'id': display_id,
+            'id': video_id,
             'display_id': display_id,
             'title': title,
             'display_id': display_id,
             'title': title,
-            'description': clean_html(get_element_by_class('video-detail__description', webpage)),
+            'description': description,
             'thumbnail': vpl_data.get('data-image'),
             'duration': int_or_none(vpl_data.get('data-duration')) or parse_duration(self._html_search_meta('duration', webpage)),
             'thumbnail': vpl_data.get('data-image'),
             'duration': int_or_none(vpl_data.get('data-duration')) or parse_duration(self._html_search_meta('duration', webpage)),
-            'timestamp': parse_iso8601(self._html_search_meta('uploadDate', webpage)),
+            'upload_date': upload_date,
             'formats': formats,
             'formats': formats,
-            'episode': episode,
             'series': series,
             'series': series,
+            'episode': episode,
         }
         }
index 0b863df2ff4ad214162c6187ac7aaa65fe3fc6c9..443f46e8a3537165d620c2db8863634e9f922ab6 100644 (file)
@@ -9,8 +9,8 @@
 
 
 class TVAIE(InfoExtractor):
 
 
 class TVAIE(InfoExtractor):
-    _VALID_URL = r'https?://videos\.tva\.ca/details/_(?P<id>\d+)'
-    _TEST = {
+    _VALID_URL = r'https?://videos?\.tva\.ca/details/_(?P<id>\d+)'
+    _TESTS = [{
         'url': 'https://videos.tva.ca/details/_5596811470001',
         'info_dict': {
             'id': '5596811470001',
         'url': 'https://videos.tva.ca/details/_5596811470001',
         'info_dict': {
             'id': '5596811470001',
@@ -24,7 +24,10 @@ class TVAIE(InfoExtractor):
             # m3u8 download
             'skip_download': True,
         }
             # m3u8 download
             'skip_download': True,
         }
-    }
+    }, {
+        'url': 'https://video.tva.ca/details/_5596811470001',
+        'only_matching': True,
+    }]
     BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/5481942443001/default_default/index.html?videoId=%s'
 
     def _real_extract(self, url):
     BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/5481942443001/default_default/index.html?videoId=%s'
 
     def _real_extract(self, url):
index d82d48f94ecc026629479d84aa8ca513eaf0db26..3c2450dd0c8733d3a96a1a842c54505294b70ca6 100644 (file)
@@ -6,7 +6,6 @@
 from .common import InfoExtractor
 from ..compat import (
     compat_HTTPError,
 from .common import InfoExtractor
 from ..compat import (
     compat_HTTPError,
-    compat_str,
     compat_urlparse,
 )
 from ..utils import (
     compat_urlparse,
 )
 from ..utils import (
@@ -15,9 +14,7 @@
     int_or_none,
     parse_iso8601,
     qualities,
     int_or_none,
     parse_iso8601,
     qualities,
-    smuggle_url,
     try_get,
     try_get,
-    unsmuggle_url,
     update_url_query,
     url_or_none,
 )
     update_url_query,
     url_or_none,
 )
@@ -235,11 +232,6 @@ class TVPlayIE(InfoExtractor):
     ]
 
     def _real_extract(self, url):
     ]
 
     def _real_extract(self, url):
-        url, smuggled_data = unsmuggle_url(url, {})
-        self._initialize_geo_bypass({
-            'countries': smuggled_data.get('geo_countries'),
-        })
-
         video_id = self._match_id(url)
         geo_country = self._search_regex(
             r'https?://[^/]+\.([a-z]{2})', url,
         video_id = self._match_id(url)
         geo_country = self._search_regex(
             r'https?://[^/]+\.([a-z]{2})', url,
@@ -285,8 +277,6 @@ def _real_extract(self, url):
                     'ext': ext,
                 }
                 if video_url.startswith('rtmp'):
                     'ext': ext,
                 }
                 if video_url.startswith('rtmp'):
-                    if smuggled_data.get('skip_rtmp'):
-                        continue
                     m = re.search(
                         r'^(?P<url>rtmp://[^/]+/(?P<app>[^/]+))/(?P<playpath>.+)$', video_url)
                     if not m:
                     m = re.search(
                         r'^(?P<url>rtmp://[^/]+/(?P<app>[^/]+))/(?P<playpath>.+)$', video_url)
                     if not m:
@@ -347,115 +337,80 @@ class ViafreeIE(InfoExtractor):
     _VALID_URL = r'''(?x)
                     https?://
                         (?:www\.)?
     _VALID_URL = r'''(?x)
                     https?://
                         (?:www\.)?
-                        viafree\.
-                        (?:
-                            (?:dk|no)/programmer|
-                            se/program
-                        )
-                        /(?:[^/]+/)+(?P<id>[^/?#&]+)
+                        viafree\.(?P<country>dk|no|se)
+                        /(?P<id>program(?:mer)?/(?:[^/]+/)+[^/?#&]+)
                     '''
     _TESTS = [{
                     '''
     _TESTS = [{
-        'url': 'http://www.viafree.se/program/livsstil/husraddarna/sasong-2/avsnitt-2',
+        'url': 'http://www.viafree.no/programmer/underholdning/det-beste-vorspielet/sesong-2/episode-1',
         'info_dict': {
         'info_dict': {
-            'id': '395375',
+            'id': '757786',
             'ext': 'mp4',
             'ext': 'mp4',
-            'title': 'Husräddarna S02E02',
-            'description': 'md5:4db5c933e37db629b5a2f75dfb34829e',
-            'series': 'Husräddarna',
-            'season': 'Säsong 2',
+            'title': 'Det beste vorspielet - Sesong 2 - Episode 1',
+            'description': 'md5:b632cb848331404ccacd8cd03e83b4c3',
+            'series': 'Det beste vorspielet',
             'season_number': 2,
             'season_number': 2,
-            'duration': 2576,
-            'timestamp': 1400596321,
-            'upload_date': '20140520',
+            'duration': 1116,
+            'timestamp': 1471200600,
+            'upload_date': '20160814',
         },
         'params': {
             'skip_download': True,
         },
         },
         'params': {
             'skip_download': True,
         },
-        'add_ie': [TVPlayIE.ie_key()],
     }, {
         # with relatedClips
         'url': 'http://www.viafree.se/program/reality/sommaren-med-youtube-stjarnorna/sasong-1/avsnitt-1',
     }, {
         # with relatedClips
         'url': 'http://www.viafree.se/program/reality/sommaren-med-youtube-stjarnorna/sasong-1/avsnitt-1',
-        'info_dict': {
-            'id': '758770',
-            'ext': 'mp4',
-            'title': 'Sommaren med YouTube-stjärnorna S01E01',
-            'description': 'md5:2bc69dce2c4bb48391e858539bbb0e3f',
-            'series': 'Sommaren med YouTube-stjärnorna',
-            'season': 'Säsong 1',
-            'season_number': 1,
-            'duration': 1326,
-            'timestamp': 1470905572,
-            'upload_date': '20160811',
-        },
-        'params': {
-            'skip_download': True,
-        },
-        'add_ie': [TVPlayIE.ie_key()],
+        'only_matching': True,
     }, {
         # Different og:image URL schema
         'url': 'http://www.viafree.se/program/reality/sommaren-med-youtube-stjarnorna/sasong-1/avsnitt-2',
         'only_matching': True,
     }, {
     }, {
         # Different og:image URL schema
         'url': 'http://www.viafree.se/program/reality/sommaren-med-youtube-stjarnorna/sasong-1/avsnitt-2',
         'only_matching': True,
     }, {
-        'url': 'http://www.viafree.no/programmer/underholdning/det-beste-vorspielet/sesong-2/episode-1',
+        'url': 'http://www.viafree.se/program/livsstil/husraddarna/sasong-2/avsnitt-2',
         'only_matching': True,
     }, {
         'url': 'http://www.viafree.dk/programmer/reality/paradise-hotel/saeson-7/episode-5',
         'only_matching': True,
     }]
         'only_matching': True,
     }, {
         'url': 'http://www.viafree.dk/programmer/reality/paradise-hotel/saeson-7/episode-5',
         'only_matching': True,
     }]
+    _GEO_BYPASS = False
 
     @classmethod
     def suitable(cls, url):
         return False if TVPlayIE.suitable(url) else super(ViafreeIE, cls).suitable(url)
 
     def _real_extract(self, url):
 
     @classmethod
     def suitable(cls, url):
         return False if TVPlayIE.suitable(url) else super(ViafreeIE, cls).suitable(url)
 
     def _real_extract(self, url):
-        video_id = self._match_id(url)
+        country, path = re.match(self._VALID_URL, url).groups()
+        content = self._download_json(
+            'https://viafree-content.mtg-api.com/viafree-content/v1/%s/path/%s' % (country, path), path)
+        program = content['_embedded']['viafreeBlocks'][0]['_embedded']['program']
+        guid = program['guid']
+        meta = content['meta']
+        title = meta['title']
 
 
-        webpage = self._download_webpage(url, video_id)
+        try:
+            stream_href = self._download_json(
+                program['_links']['streamLink']['href'], guid,
+                headers=self.geo_verification_headers())['embedded']['prioritizedStreams'][0]['links']['stream']['href']
+        except ExtractorError as e:
+            if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
+                self.raise_geo_restricted(countries=[country])
+            raise
+
+        formats = self._extract_m3u8_formats(stream_href, guid, 'mp4')
+        self._sort_formats(formats)
+        episode = program.get('episode') or {}
 
 
-        data = self._parse_json(
-            self._search_regex(
-                r'(?s)window\.App\s*=\s*({.+?})\s*;\s*</script',
-                webpage, 'data', default='{}'),
-            video_id, transform_source=lambda x: re.sub(
-                r'(?s)function\s+[a-zA-Z_][\da-zA-Z_]*\s*\([^)]*\)\s*{[^}]*}\s*',
-                'null', x), fatal=False)
-
-        video_id = None
-
-        if data:
-            video_id = try_get(
-                data, lambda x: x['context']['dispatcher']['stores'][
-                    'ContentPageProgramStore']['currentVideo']['id'],
-                compat_str)
-
-        # Fallback #1 (extract from og:image URL schema)
-        if not video_id:
-            thumbnail = self._og_search_thumbnail(webpage, default=None)
-            if thumbnail:
-                video_id = self._search_regex(
-                    # Patterns seen:
-                    #  http://cdn.playapi.mtgx.tv/imagecache/600x315/cloud/content-images/inbox/765166/a2e95e5f1d735bab9f309fa345cc3f25.jpg
-                    #  http://cdn.playapi.mtgx.tv/imagecache/600x315/cloud/content-images/seasons/15204/758770/4a5ba509ca8bc043e1ebd1a76131cdf2.jpg
-                    r'https?://[^/]+/imagecache/(?:[^/]+/)+(\d{6,})/',
-                    thumbnail, 'video id', default=None)
-
-        # Fallback #2. Extract from raw JSON string.
-        # May extract wrong video id if relatedClips is present.
-        if not video_id:
-            video_id = self._search_regex(
-                r'currentVideo["\']\s*:\s*.+?["\']id["\']\s*:\s*["\'](\d{6,})',
-                webpage, 'video id')
-
-        return self.url_result(
-            smuggle_url(
-                'mtg:%s' % video_id,
-                {
-                    'geo_countries': [
-                        compat_urlparse.urlparse(url).netloc.rsplit('.', 1)[-1]],
-                    # rtmp host mtgfs.fplive.net for viafree is unresolvable
-                    'skip_rtmp': True,
-                }),
-            ie=TVPlayIE.ie_key(), video_id=video_id)
+        return {
+            'id': guid,
+            'title': title,
+            'thumbnail': meta.get('image'),
+            'description': meta.get('description'),
+            'series': episode.get('seriesTitle'),
+            'episode_number': int_or_none(episode.get('episodeNumber')),
+            'season_number': int_or_none(episode.get('seasonNumber')),
+            'duration': int_or_none(try_get(program, lambda x: x['video']['duration']['milliseconds']), 1000),
+            'timestamp': parse_iso8601(try_get(program, lambda x: x['availability']['start'])),
+            'formats': formats,
+        }
 
 
 class TVPlayHomeIE(InfoExtractor):
 
 
 class TVPlayHomeIE(InfoExtractor):
index 1d66eeaff6e80cd1c629f79ad98a1a68f1d52564..74d14049b482a702bf464a40f2e5f361dc7cd72a 100644 (file)
@@ -17,8 +17,8 @@ class TwentyFourVideoIE(InfoExtractor):
     _VALID_URL = r'''(?x)
                     https?://
                         (?P<host>
     _VALID_URL = r'''(?x)
                     https?://
                         (?P<host>
-                            (?:(?:www|porno)\.)?24video\.
-                            (?:net|me|xxx|sexy?|tube|adult|site)
+                            (?:(?:www|porno?)\.)?24video\.
+                            (?:net|me|xxx|sexy?|tube|adult|site|vip)
                         )/
                         (?:
                             video/(?:(?:view|xml)/)?|
                         )/
                         (?:
                             video/(?:(?:view|xml)/)?|
@@ -59,6 +59,12 @@ class TwentyFourVideoIE(InfoExtractor):
     }, {
         'url': 'https://porno.24video.net/video/2640421-vsya-takaya-gibkaya-i-v-masle',
         'only_matching': True,
     }, {
         'url': 'https://porno.24video.net/video/2640421-vsya-takaya-gibkaya-i-v-masle',
         'only_matching': True,
+    }, {
+        'url': 'https://www.24video.vip/video/view/1044982',
+        'only_matching': True,
+    }, {
+        'url': 'https://porn.24video.net/video/2640421-vsya-takay',
+        'only_matching': True,
     }]
 
     def _real_extract(self, url):
     }]
 
     def _real_extract(self, url):
index a8c2502af8132834a34b8ef9c8ade935dd432604..e211cd4c84cb4c713e937cf0e1eef83de7c40a93 100644 (file)
@@ -21,6 +21,8 @@
     orderedSet,
     parse_duration,
     parse_iso8601,
     orderedSet,
     parse_duration,
     parse_iso8601,
+    qualities,
+    str_or_none,
     try_get,
     unified_timestamp,
     update_url_query,
     try_get,
     unified_timestamp,
     update_url_query,
@@ -50,8 +52,14 @@ def _handle_error(self, response):
 
     def _call_api(self, path, item_id, *args, **kwargs):
         headers = kwargs.get('headers', {}).copy()
 
     def _call_api(self, path, item_id, *args, **kwargs):
         headers = kwargs.get('headers', {}).copy()
-        headers['Client-ID'] = self._CLIENT_ID
-        kwargs['headers'] = headers
+        headers.update({
+            'Accept': 'application/vnd.twitchtv.v5+json; charset=UTF-8',
+            'Client-ID': self._CLIENT_ID,
+        })
+        kwargs.update({
+            'headers': headers,
+            'expected_status': (400, 410),
+        })
         response = self._download_json(
             '%s/%s' % (self._API_BASE, path), item_id,
             *args, **compat_kwargs(kwargs))
         response = self._download_json(
             '%s/%s' % (self._API_BASE, path), item_id,
             *args, **compat_kwargs(kwargs))
@@ -186,12 +194,27 @@ def _extract_info(self, info):
             is_live = False
         else:
             is_live = None
             is_live = False
         else:
             is_live = None
+        _QUALITIES = ('small', 'medium', 'large')
+        quality_key = qualities(_QUALITIES)
+        thumbnails = []
+        preview = info.get('preview')
+        if isinstance(preview, dict):
+            for thumbnail_id, thumbnail_url in preview.items():
+                thumbnail_url = url_or_none(thumbnail_url)
+                if not thumbnail_url:
+                    continue
+                if thumbnail_id not in _QUALITIES:
+                    continue
+                thumbnails.append({
+                    'url': thumbnail_url,
+                    'preference': quality_key(thumbnail_id),
+                })
         return {
             'id': info['_id'],
             'title': info.get('title') or 'Untitled Broadcast',
             'description': info.get('description'),
             'duration': int_or_none(info.get('length')),
         return {
             'id': info['_id'],
             'title': info.get('title') or 'Untitled Broadcast',
             'description': info.get('description'),
             'duration': int_or_none(info.get('length')),
-            'thumbnail': info.get('preview'),
+            'thumbnails': thumbnails,
             'uploader': info.get('channel', {}).get('display_name'),
             'uploader_id': info.get('channel', {}).get('name'),
             'timestamp': parse_iso8601(info.get('recorded_at')),
             'uploader': info.get('channel', {}).get('display_name'),
             'uploader_id': info.get('channel', {}).get('name'),
             'timestamp': parse_iso8601(info.get('recorded_at')),
@@ -572,11 +595,19 @@ def suitable(cls, url):
                 else super(TwitchStreamIE, cls).suitable(url))
 
     def _real_extract(self, url):
                 else super(TwitchStreamIE, cls).suitable(url))
 
     def _real_extract(self, url):
-        channel_id = self._match_id(url)
+        channel_name = self._match_id(url)
+
+        access_token = self._call_api(
+            'api/channels/%s/access_token' % channel_name, channel_name,
+            'Downloading access token JSON')
+
+        token = access_token['token']
+        channel_id = compat_str(self._parse_json(
+            token, channel_name)['channel_id'])
 
         stream = self._call_api(
 
         stream = self._call_api(
-            'kraken/streams/%s?stream_type=all' % channel_id, channel_id,
-            'Downloading stream JSON').get('stream')
+            'kraken/streams/%s?stream_type=all' % channel_id,
+            channel_id, 'Downloading stream JSON').get('stream')
 
         if not stream:
             raise ExtractorError('%s is offline' % channel_id, expected=True)
 
         if not stream:
             raise ExtractorError('%s is offline' % channel_id, expected=True)
@@ -585,11 +616,9 @@ def _real_extract(self, url):
         # (e.g. http://www.twitch.tv/TWITCHPLAYSPOKEMON) that will lead to constructing
         # an invalid m3u8 URL. Working around by use of original channel name from stream
         # JSON and fallback to lowercase if it's not available.
         # (e.g. http://www.twitch.tv/TWITCHPLAYSPOKEMON) that will lead to constructing
         # an invalid m3u8 URL. Working around by use of original channel name from stream
         # JSON and fallback to lowercase if it's not available.
-        channel_id = stream.get('channel', {}).get('name') or channel_id.lower()
-
-        access_token = self._call_api(
-            'api/channels/%s/access_token' % channel_id, channel_id,
-            'Downloading channel access token')
+        channel_name = try_get(
+            stream, lambda x: x['channel']['name'],
+            compat_str) or channel_name.lower()
 
         query = {
             'allow_source': 'true',
 
         query = {
             'allow_source': 'true',
@@ -600,11 +629,11 @@ def _real_extract(self, url):
             'playlist_include_framerate': 'true',
             'segment_preference': '4',
             'sig': access_token['sig'].encode('utf-8'),
             'playlist_include_framerate': 'true',
             'segment_preference': '4',
             'sig': access_token['sig'].encode('utf-8'),
-            'token': access_token['token'].encode('utf-8'),
+            'token': token.encode('utf-8'),
         }
         formats = self._extract_m3u8_formats(
             '%s/api/channel/hls/%s.m3u8?%s'
         }
         formats = self._extract_m3u8_formats(
             '%s/api/channel/hls/%s.m3u8?%s'
-            % (self._USHER_BASE, channel_id, compat_urllib_parse_urlencode(query)),
+            % (self._USHER_BASE, channel_name, compat_urllib_parse_urlencode(query)),
             channel_id, 'mp4')
         self._prefer_source(formats)
 
             channel_id, 'mp4')
         self._prefer_source(formats)
 
@@ -627,8 +656,8 @@ def _real_extract(self, url):
             })
 
         return {
             })
 
         return {
-            'id': compat_str(stream['_id']),
-            'display_id': channel_id,
+            'id': str_or_none(stream.get('_id')) or channel_id,
+            'display_id': channel_name,
             'title': title,
             'description': description,
             'thumbnails': thumbnails,
             'title': title,
             'description': description,
             'thumbnails': thumbnails,
@@ -643,7 +672,14 @@ def _real_extract(self, url):
 
 class TwitchClipsIE(TwitchBaseIE):
     IE_NAME = 'twitch:clips'
 
 class TwitchClipsIE(TwitchBaseIE):
     IE_NAME = 'twitch:clips'
-    _VALID_URL = r'https?://(?:clips\.twitch\.tv/(?:embed\?.*?\bclip=|(?:[^/]+/)*)|(?:www\.)?twitch\.tv/[^/]+/clip/)(?P<id>[^/?#&]+)'
+    _VALID_URL = r'''(?x)
+                    https?://
+                        (?:
+                            clips\.twitch\.tv/(?:embed\?.*?\bclip=|(?:[^/]+/)*)|
+                            (?:(?:www|go|m)\.)?twitch\.tv/[^/]+/clip/
+                        )
+                        (?P<id>[^/?#&]+)
+                    '''
 
     _TESTS = [{
         'url': 'https://clips.twitch.tv/FaintLightGullWholeWheat',
 
     _TESTS = [{
         'url': 'https://clips.twitch.tv/FaintLightGullWholeWheat',
@@ -669,6 +705,12 @@ class TwitchClipsIE(TwitchBaseIE):
     }, {
         'url': 'https://clips.twitch.tv/embed?clip=InquisitiveBreakableYogurtJebaited',
         'only_matching': True,
     }, {
         'url': 'https://clips.twitch.tv/embed?clip=InquisitiveBreakableYogurtJebaited',
         'only_matching': True,
+    }, {
+        'url': 'https://m.twitch.tv/rossbroadcast/clip/ConfidentBraveHumanChefFrank',
+        'only_matching': True,
+    }, {
+        'url': 'https://go.twitch.tv/rossbroadcast/clip/ConfidentBraveHumanChefFrank',
+        'only_matching': True,
     }]
 
     def _real_extract(self, url):
     }]
 
     def _real_extract(self, url):
index 5f8d90fb4e5c13d19cd0601f86a44eec5568fadc..4284487db4994b25990c4151afca71c48a271751 100644 (file)
@@ -251,10 +251,10 @@ class TwitterIE(TwitterBaseIE):
         'info_dict': {
             'id': '700207533655363584',
             'ext': 'mp4',
         'info_dict': {
             'id': '700207533655363584',
             'ext': 'mp4',
-            'title': 'Simon Vertugo - BEAT PROD: @suhmeduh #Damndaniel',
+            'title': 'simon vetugo - BEAT PROD: @suhmeduh #Damndaniel',
             'description': 'BEAT PROD: @suhmeduh  https://t.co/HBrQ4AfpvZ #Damndaniel https://t.co/byBooq2ejZ',
             'thumbnail': r're:^https?://.*\.jpg',
             'description': 'BEAT PROD: @suhmeduh  https://t.co/HBrQ4AfpvZ #Damndaniel https://t.co/byBooq2ejZ',
             'thumbnail': r're:^https?://.*\.jpg',
-            'uploader': 'Simon Vertugo',
+            'uploader': 'simon vetugo',
             'uploader_id': 'simonvertugo',
             'duration': 30.0,
             'timestamp': 1455777459,
             'uploader_id': 'simonvertugo',
             'duration': 30.0,
             'timestamp': 1455777459,
@@ -376,6 +376,10 @@ class TwitterIE(TwitterBaseIE):
         # Twitch Clip Embed
         'url': 'https://twitter.com/GunB1g/status/1163218564784017422',
         'only_matching': True,
         # Twitch Clip Embed
         'url': 'https://twitter.com/GunB1g/status/1163218564784017422',
         'only_matching': True,
+    }, {
+        # promo_video_website card
+        'url': 'https://twitter.com/GunB1g/status/1163218564784017422',
+        'only_matching': True,
     }]
 
     def _real_extract(self, url):
     }]
 
     def _real_extract(self, url):
@@ -458,10 +462,11 @@ def get_binding_value(k):
                     return try_get(o, lambda x: x[x['type'].lower() + '_value'])
 
                 card_name = card['name'].split(':')[-1]
                     return try_get(o, lambda x: x[x['type'].lower() + '_value'])
 
                 card_name = card['name'].split(':')[-1]
-                if card_name == 'amplify':
-                    formats = self._extract_formats_from_vmap_url(
-                        get_binding_value('amplify_url_vmap'),
-                        get_binding_value('amplify_content_id') or twid)
+                if card_name in ('amplify', 'promo_video_website'):
+                    is_amplify = card_name == 'amplify'
+                    vmap_url = get_binding_value('amplify_url_vmap') if is_amplify else get_binding_value('player_stream_url')
+                    content_id = get_binding_value('%s_content_id' % (card_name if is_amplify else 'player'))
+                    formats = self._extract_formats_from_vmap_url(vmap_url, content_id or twid)
                     self._sort_formats(formats)
 
                     thumbnails = []
                     self._sort_formats(formats)
 
                     thumbnails = []
@@ -573,6 +578,18 @@ class TwitterBroadcastIE(TwitterBaseIE, PeriscopeBaseIE):
     IE_NAME = 'twitter:broadcast'
     _VALID_URL = TwitterBaseIE._BASE_REGEX + r'i/broadcasts/(?P<id>[0-9a-zA-Z]{13})'
 
     IE_NAME = 'twitter:broadcast'
     _VALID_URL = TwitterBaseIE._BASE_REGEX + r'i/broadcasts/(?P<id>[0-9a-zA-Z]{13})'
 
+    _TEST = {
+        # untitled Periscope video
+        'url': 'https://twitter.com/i/broadcasts/1yNGaQLWpejGj',
+        'info_dict': {
+            'id': '1yNGaQLWpejGj',
+            'ext': 'mp4',
+            'title': 'Andrea May Sahouri - Periscope Broadcast',
+            'uploader': 'Andrea May Sahouri',
+            'uploader_id': '1PXEdBZWpGwKe',
+        },
+    }
+
     def _real_extract(self, url):
         broadcast_id = self._match_id(url)
         broadcast = self._call_api(
     def _real_extract(self, url):
         broadcast_id = self._match_id(url)
         broadcast = self._call_api(
index 08f0c072e28b09dfbbde2f662b52f5de4cf46f3d..628adf2199ca7a1d7ebbc9ffece05e2447f59133 100644 (file)
@@ -2,12 +2,17 @@
 from __future__ import unicode_literals
 
 from .common import InfoExtractor
 from __future__ import unicode_literals
 
 from .common import InfoExtractor
+from ..compat import (
+    compat_str,
+    compat_urllib_parse_urlencode,
+)
 from ..utils import (
     clean_html,
     int_or_none,
     parse_duration,
 from ..utils import (
     clean_html,
     int_or_none,
     parse_duration,
+    parse_iso8601,
+    qualities,
     update_url_query,
     update_url_query,
-    str_or_none,
 )
 
 
 )
 
 
@@ -16,21 +21,25 @@ class UOLIE(InfoExtractor):
     _VALID_URL = r'https?://(?:.+?\.)?uol\.com\.br/.*?(?:(?:mediaId|v)=|view/(?:[a-z0-9]+/)?|video(?:=|/(?:\d{4}/\d{2}/\d{2}/)?))(?P<id>\d+|[\w-]+-[A-Z0-9]+)'
     _TESTS = [{
         'url': 'http://player.mais.uol.com.br/player_video_v3.swf?mediaId=15951931',
     _VALID_URL = r'https?://(?:.+?\.)?uol\.com\.br/.*?(?:(?:mediaId|v)=|view/(?:[a-z0-9]+/)?|video(?:=|/(?:\d{4}/\d{2}/\d{2}/)?))(?P<id>\d+|[\w-]+-[A-Z0-9]+)'
     _TESTS = [{
         'url': 'http://player.mais.uol.com.br/player_video_v3.swf?mediaId=15951931',
-        'md5': '25291da27dc45e0afb5718a8603d3816',
+        'md5': '4f1e26683979715ff64e4e29099cf020',
         'info_dict': {
             'id': '15951931',
             'ext': 'mp4',
             'title': 'Miss simpatia é encontrada morta',
             'description': 'md5:3f8c11a0c0556d66daf7e5b45ef823b2',
         'info_dict': {
             'id': '15951931',
             'ext': 'mp4',
             'title': 'Miss simpatia é encontrada morta',
             'description': 'md5:3f8c11a0c0556d66daf7e5b45ef823b2',
+            'timestamp': 1470421860,
+            'upload_date': '20160805',
         }
     }, {
         'url': 'http://tvuol.uol.com.br/video/incendio-destroi-uma-das-maiores-casas-noturnas-de-londres-04024E9A3268D4C95326',
         }
     }, {
         'url': 'http://tvuol.uol.com.br/video/incendio-destroi-uma-das-maiores-casas-noturnas-de-londres-04024E9A3268D4C95326',
-        'md5': 'e41a2fb7b7398a3a46b6af37b15c00c9',
+        'md5': '2850a0e8dfa0a7307e04a96c5bdc5bc2',
         'info_dict': {
             'id': '15954259',
             'ext': 'mp4',
             'title': 'Incêndio destrói uma das maiores casas noturnas de Londres',
             'description': 'Em Londres, um incêndio destruiu uma das maiores boates da cidade. Não há informações sobre vítimas.',
         'info_dict': {
             'id': '15954259',
             'ext': 'mp4',
             'title': 'Incêndio destrói uma das maiores casas noturnas de Londres',
             'description': 'Em Londres, um incêndio destruiu uma das maiores boates da cidade. Não há informações sobre vítimas.',
+            'timestamp': 1470674520,
+            'upload_date': '20160808',
         }
     }, {
         'url': 'http://mais.uol.com.br/static/uolplayer/index.html?mediaId=15951931',
         }
     }, {
         'url': 'http://mais.uol.com.br/static/uolplayer/index.html?mediaId=15951931',
@@ -55,91 +64,55 @@ class UOLIE(InfoExtractor):
         'only_matching': True,
     }]
 
         'only_matching': True,
     }]
 
-    _FORMATS = {
-        '2': {
-            'width': 640,
-            'height': 360,
-        },
-        '5': {
-            'width': 1280,
-            'height': 720,
-        },
-        '6': {
-            'width': 426,
-            'height': 240,
-        },
-        '7': {
-            'width': 1920,
-            'height': 1080,
-        },
-        '8': {
-            'width': 192,
-            'height': 144,
-        },
-        '9': {
-            'width': 568,
-            'height': 320,
-        },
-        '11': {
-            'width': 640,
-            'height': 360,
-        }
-    }
-
     def _real_extract(self, url):
         video_id = self._match_id(url)
     def _real_extract(self, url):
         video_id = self._match_id(url)
-        media_id = None
-
-        if video_id.isdigit():
-            media_id = video_id
-
-        if not media_id:
-            embed_page = self._download_webpage(
-                'https://jsuol.com.br/c/tv/uol/embed/?params=[embed,%s]' % video_id,
-                video_id, 'Downloading embed page', fatal=False)
-            if embed_page:
-                media_id = self._search_regex(
-                    (r'uol\.com\.br/(\d+)', r'mediaId=(\d+)'),
-                    embed_page, 'media id', default=None)
-
-        if not media_id:
-            webpage = self._download_webpage(url, video_id)
-            media_id = self._search_regex(r'mediaId=(\d+)', webpage, 'media id')
 
         video_data = self._download_json(
 
         video_data = self._download_json(
-            'http://mais.uol.com.br/apiuol/v3/player/getMedia/%s.json' % media_id,
-            media_id)['item']
+            # https://api.mais.uol.com.br/apiuol/v4/player/data/[MEDIA_ID]
+            'https://api.mais.uol.com.br/apiuol/v3/media/detail/' + video_id,
+            video_id)['item']
+        media_id = compat_str(video_data['mediaId'])
         title = video_data['title']
         title = video_data['title']
+        ver = video_data.get('revision', 2)
 
 
-        query = {
-            'ver': video_data.get('numRevision', 2),
-            'r': 'http://mais.uol.com.br',
-        }
-        for k in ('token', 'sign'):
-            v = video_data.get(k)
-            if v:
-                query[k] = v
-
+        uol_formats = self._download_json(
+            'https://croupier.mais.uol.com.br/v3/formats/%s/jsonp' % media_id,
+            media_id)
+        quality = qualities(['mobile', 'WEBM', '360p', '720p', '1080p'])
         formats = []
         formats = []
-        for f in video_data.get('formats', []):
+        for format_id, f in uol_formats.items():
+            if not isinstance(f, dict):
+                continue
             f_url = f.get('url') or f.get('secureUrl')
             if not f_url:
                 continue
             f_url = f.get('url') or f.get('secureUrl')
             if not f_url:
                 continue
+            query = {
+                'ver': ver,
+                'r': 'http://mais.uol.com.br',
+            }
+            for k in ('token', 'sign'):
+                v = f.get(k)
+                if v:
+                    query[k] = v
             f_url = update_url_query(f_url, query)
             f_url = update_url_query(f_url, query)
-            format_id = str_or_none(f.get('id'))
-            if format_id == '10':
-                formats.extend(self._extract_m3u8_formats(
-                    f_url, video_id, 'mp4', 'm3u8_native',
-                    m3u8_id='hls', fatal=False))
+            format_id = format_id
+            if format_id == 'HLS':
+                m3u8_formats = self._extract_m3u8_formats(
+                    f_url, media_id, 'mp4', 'm3u8_native',
+                    m3u8_id='hls', fatal=False)
+                encoded_query = compat_urllib_parse_urlencode(query)
+                for m3u8_f in m3u8_formats:
+                    m3u8_f['extra_param_to_segment_url'] = encoded_query
+                    m3u8_f['url'] = update_url_query(m3u8_f['url'], query)
+                formats.extend(m3u8_formats)
                 continue
                 continue
-            fmt = {
+            formats.append({
                 'format_id': format_id,
                 'url': f_url,
                 'format_id': format_id,
                 'url': f_url,
-                'source_preference': 1,
-            }
-            fmt.update(self._FORMATS.get(format_id, {}))
-            formats.append(fmt)
-        self._sort_formats(formats, ('height', 'width', 'source_preference', 'tbr', 'ext'))
+                'quality': quality(format_id),
+                'preference': -1,
+            })
+        self._sort_formats(formats)
 
         tags = []
         for tag in video_data.get('tags', []):
 
         tags = []
         for tag in video_data.get('tags', []):
@@ -148,12 +121,24 @@ def _real_extract(self, url):
                 continue
             tags.append(tag_description)
 
                 continue
             tags.append(tag_description)
 
+        thumbnails = []
+        for q in ('Small', 'Medium', 'Wmedium', 'Large', 'Wlarge', 'Xlarge'):
+            q_url = video_data.get('thumb' + q)
+            if not q_url:
+                continue
+            thumbnails.append({
+                'id': q,
+                'url': q_url,
+            })
+
         return {
             'id': media_id,
             'title': title,
         return {
             'id': media_id,
             'title': title,
-            'description': clean_html(video_data.get('desMedia')),
-            'thumbnail': video_data.get('thumbnail'),
-            'duration': int_or_none(video_data.get('durationSeconds')) or parse_duration(video_data.get('duration')),
+            'description': clean_html(video_data.get('description')),
+            'thumbnails': thumbnails,
+            'duration': parse_duration(video_data.get('duration')),
             'tags': tags,
             'formats': formats,
             'tags': tags,
             'formats': formats,
+            'timestamp': parse_iso8601(video_data.get('publishDate'), ' '),
+            'view_count': int_or_none(video_data.get('viewsQtty')),
         }
         }
index 8fdfd743d04a6ee4319a64337ea67398d2243896..e37499512856234d9b989aec53dfd59c42647300 100644 (file)
@@ -1,35 +1,50 @@
 # coding: utf-8
 from __future__ import unicode_literals
 
 # coding: utf-8
 from __future__ import unicode_literals
 
-import re
-import time
+import functools
 import hashlib
 import json
 import random
 import hashlib
 import json
 import random
+import re
+import time
 
 from .adobepass import AdobePassIE
 
 from .adobepass import AdobePassIE
-from .youtube import YoutubeIE
 from .common import InfoExtractor
 from .common import InfoExtractor
+from .youtube import YoutubeIE
 from ..compat import (
     compat_HTTPError,
     compat_str,
 )
 from ..utils import (
 from ..compat import (
     compat_HTTPError,
     compat_str,
 )
 from ..utils import (
+    clean_html,
     ExtractorError,
     int_or_none,
     ExtractorError,
     int_or_none,
+    OnDemandPagedList,
     parse_age_limit,
     str_or_none,
     try_get,
 )
 
 
     parse_age_limit,
     str_or_none,
     try_get,
 )
 
 
-class ViceIE(AdobePassIE):
+class ViceBaseIE(InfoExtractor):
+    def _call_api(self, resource, resource_key, resource_id, locale, fields, args=''):
+        return self._download_json(
+            'https://video.vice.com/api/v1/graphql', resource_id, query={
+                'query': '''{
+  %s(locale: "%s", %s: "%s"%s) {
+    %s
+  }
+}''' % (resource, locale, resource_key, resource_id, args, fields),
+            })['data'][resource]
+
+
+class ViceIE(ViceBaseIE, AdobePassIE):
     IE_NAME = 'vice'
     IE_NAME = 'vice'
-    _VALID_URL = r'https?://(?:(?:video|vms)\.vice|(?:www\.)?viceland)\.com/(?P<locale>[^/]+)/(?:video/[^/]+|embed)/(?P<id>[\da-f]+)'
+    _VALID_URL = r'https?://(?:(?:video|vms)\.vice|(?:www\.)?vice(?:land|tv))\.com/(?P<locale>[^/]+)/(?:video/[^/]+|embed)/(?P<id>[\da-f]{24})'
     _TESTS = [{
         'url': 'https://video.vice.com/en_us/video/pet-cremator/58c69e38a55424f1227dc3f7',
         'info_dict': {
     _TESTS = [{
         'url': 'https://video.vice.com/en_us/video/pet-cremator/58c69e38a55424f1227dc3f7',
         'info_dict': {
-            'id': '5e647f0125e145c9aef2069412c0cbde',
+            'id': '58c69e38a55424f1227dc3f7',
             'ext': 'mp4',
             'title': '10 Questions You Always Wanted To Ask: Pet Cremator',
             'description': 'md5:fe856caacf61fe0e74fab15ce2b07ca5',
             'ext': 'mp4',
             'title': '10 Questions You Always Wanted To Ask: Pet Cremator',
             'description': 'md5:fe856caacf61fe0e74fab15ce2b07ca5',
@@ -43,17 +58,16 @@ class ViceIE(AdobePassIE):
             # m3u8 download
             'skip_download': True,
         },
             # m3u8 download
             'skip_download': True,
         },
-        'add_ie': ['UplynkPreplay'],
     }, {
         # geo restricted to US
         'url': 'https://video.vice.com/en_us/video/the-signal-from-tolva/5816510690b70e6c5fd39a56',
         'info_dict': {
     }, {
         # geo restricted to US
         'url': 'https://video.vice.com/en_us/video/the-signal-from-tolva/5816510690b70e6c5fd39a56',
         'info_dict': {
-            'id': '930c0ad1f47141cc955087eecaddb0e2',
+            'id': '5816510690b70e6c5fd39a56',
             'ext': 'mp4',
             'ext': 'mp4',
-            'uploader': 'waypoint',
+            'uploader': 'vice',
             'title': 'The Signal From Tölva',
             'description': 'md5:3927e3c79f9e8094606a2b3c5b5e55d5',
             'title': 'The Signal From Tölva',
             'description': 'md5:3927e3c79f9e8094606a2b3c5b5e55d5',
-            'uploader_id': '57f7d621e05ca860fa9ccaf9',
+            'uploader_id': '57a204088cb727dec794c67b',
             'timestamp': 1477941983,
             'upload_date': '20161031',
         },
             'timestamp': 1477941983,
             'upload_date': '20161031',
         },
@@ -61,15 +75,14 @@ class ViceIE(AdobePassIE):
             # m3u8 download
             'skip_download': True,
         },
             # m3u8 download
             'skip_download': True,
         },
-        'add_ie': ['UplynkPreplay'],
     }, {
         'url': 'https://video.vice.com/alps/video/ulfs-wien-beruchtigste-grafitti-crew-part-1/581b12b60a0e1f4c0fb6ea2f',
         'info_dict': {
             'id': '581b12b60a0e1f4c0fb6ea2f',
             'ext': 'mp4',
             'title': 'ULFs - Wien berüchtigste Grafitti Crew - Part 1',
     }, {
         'url': 'https://video.vice.com/alps/video/ulfs-wien-beruchtigste-grafitti-crew-part-1/581b12b60a0e1f4c0fb6ea2f',
         'info_dict': {
             'id': '581b12b60a0e1f4c0fb6ea2f',
             'ext': 'mp4',
             'title': 'ULFs - Wien berüchtigste Grafitti Crew - Part 1',
-            'description': '<p>Zwischen Hinterzimmer-Tattoos und U-Bahnschächten erzählen uns die Ulfs, wie es ist, "süchtig nach Sachbeschädigung" zu sein.</p>',
-            'uploader': 'VICE',
+            'description': 'Zwischen Hinterzimmer-Tattoos und U-Bahnschächten erzählen uns die Ulfs, wie es ist, "süchtig nach Sachbeschädigung" zu sein.',
+            'uploader': 'vice',
             'uploader_id': '57a204088cb727dec794c67b',
             'timestamp': 1485368119,
             'upload_date': '20170125',
             'uploader_id': '57a204088cb727dec794c67b',
             'timestamp': 1485368119,
             'upload_date': '20170125',
@@ -78,9 +91,7 @@ class ViceIE(AdobePassIE):
         'params': {
             # AES-encrypted m3u8
             'skip_download': True,
         'params': {
             # AES-encrypted m3u8
             'skip_download': True,
-            'proxy': '127.0.0.1:8118',
         },
         },
-        'add_ie': ['UplynkPreplay'],
     }, {
         'url': 'https://video.vice.com/en_us/video/pizza-show-trailer/56d8c9a54d286ed92f7f30e4',
         'only_matching': True,
     }, {
         'url': 'https://video.vice.com/en_us/video/pizza-show-trailer/56d8c9a54d286ed92f7f30e4',
         'only_matching': True,
@@ -98,7 +109,7 @@ class ViceIE(AdobePassIE):
     @staticmethod
     def _extract_urls(webpage):
         return re.findall(
     @staticmethod
     def _extract_urls(webpage):
         return re.findall(
-            r'<iframe\b[^>]+\bsrc=["\']((?:https?:)?//video\.vice\.com/[^/]+/embed/[\da-f]+)',
+            r'<iframe\b[^>]+\bsrc=["\']((?:https?:)?//video\.vice\.com/[^/]+/embed/[\da-f]{24})',
             webpage)
 
     @staticmethod
             webpage)
 
     @staticmethod
@@ -109,31 +120,16 @@ def _extract_url(webpage):
     def _real_extract(self, url):
         locale, video_id = re.match(self._VALID_URL, url).groups()
 
     def _real_extract(self, url):
         locale, video_id = re.match(self._VALID_URL, url).groups()
 
-        webpage = self._download_webpage(
-            'https://video.vice.com/%s/embed/%s' % (locale, video_id),
-            video_id)
-
-        video = self._parse_json(
-            self._search_regex(
-                r'PREFETCH_DATA\s*=\s*({.+?})\s*;\s*\n', webpage,
-                'app state'), video_id)['video']
-        video_id = video.get('vms_id') or video.get('id') or video_id
-        title = video['title']
-        is_locked = video.get('locked')
+        video = self._call_api('videos', 'id', video_id, locale, '''body
+    locked
+    rating
+    thumbnail_url
+    title''')[0]
+        title = video['title'].strip()
         rating = video.get('rating')
         rating = video.get('rating')
-        thumbnail = video.get('thumbnail_url')
-        duration = int_or_none(video.get('duration'))
-        series = try_get(
-            video, lambda x: x['episode']['season']['show']['title'],
-            compat_str)
-        episode_number = try_get(
-            video, lambda x: x['episode']['episode_number'])
-        season_number = try_get(
-            video, lambda x: x['episode']['season']['season_number'])
-        uploader = None
 
         query = {}
 
         query = {}
-        if is_locked:
+        if video.get('locked'):
             resource = self._get_mvpd_resource(
                 'VICELAND', title, video_id, rating)
             query['tvetoken'] = self._extract_mvpd_auth(
             resource = self._get_mvpd_resource(
                 'VICELAND', title, video_id, rating)
             query['tvetoken'] = self._extract_mvpd_auth(
@@ -148,12 +144,9 @@ def _real_extract(self, url):
         query.update({
             'exp': exp,
             'sign': hashlib.sha512(('%s:GET:%d' % (video_id, exp)).encode()).hexdigest(),
         query.update({
             'exp': exp,
             'sign': hashlib.sha512(('%s:GET:%d' % (video_id, exp)).encode()).hexdigest(),
-            '_ad_blocked': None,
-            '_ad_unit': '',
-            '_debug': '',
+            'skipadstitching': 1,
             'platform': 'desktop',
             'rn': random.randint(10000, 100000),
             'platform': 'desktop',
             'rn': random.randint(10000, 100000),
-            'fbprebidtoken': '',
         })
 
         try:
         })
 
         try:
@@ -169,85 +162,94 @@ def _real_extract(self, url):
             raise
 
         video_data = preplay['video']
             raise
 
         video_data = preplay['video']
-        base = video_data['base']
-        uplynk_preplay_url = preplay['preplayURL']
-        episode = video_data.get('episode', {})
-        channel = video_data.get('channel', {})
+        formats = self._extract_m3u8_formats(
+            preplay['playURL'], video_id, 'mp4', 'm3u8_native')
+        self._sort_formats(formats)
+        episode = video_data.get('episode') or {}
+        channel = video_data.get('channel') or {}
+        season = video_data.get('season') or {}
 
         subtitles = {}
 
         subtitles = {}
-        cc_url = preplay.get('ccURL')
-        if cc_url:
-            subtitles['en'] = [{
+        for subtitle in preplay.get('subtitleURLs', []):
+            cc_url = subtitle.get('url')
+            if not cc_url:
+                continue
+            language_code = try_get(subtitle, lambda x: x['languages'][0]['language_code'], compat_str) or 'en'
+            subtitles.setdefault(language_code, []).append({
                 'url': cc_url,
                 'url': cc_url,
-            }]
+            })
 
         return {
 
         return {
-            '_type': 'url_transparent',
-            'url': uplynk_preplay_url,
+            'formats': formats,
             'id': video_id,
             'title': title,
             'id': video_id,
             'title': title,
-            'description': base.get('body') or base.get('display_body'),
-            'thumbnail': thumbnail,
-            'duration': int_or_none(video_data.get('video_duration')) or duration,
+            'description': clean_html(video.get('body')),
+            'thumbnail': video.get('thumbnail_url'),
+            'duration': int_or_none(video_data.get('video_duration')),
             'timestamp': int_or_none(video_data.get('created_at'), 1000),
             'timestamp': int_or_none(video_data.get('created_at'), 1000),
-            'age_limit': parse_age_limit(video_data.get('video_rating')),
-            'series': video_data.get('show_title') or series,
-            'episode_number': int_or_none(episode.get('episode_number') or episode_number),
+            'age_limit': parse_age_limit(video_data.get('video_rating') or rating),
+            'series': try_get(video_data, lambda x: x['show']['base']['display_title'], compat_str),
+            'episode_number': int_or_none(episode.get('episode_number')),
             'episode_id': str_or_none(episode.get('id') or video_data.get('episode_id')),
             'episode_id': str_or_none(episode.get('id') or video_data.get('episode_id')),
-            'season_number': int_or_none(season_number),
-            'season_id': str_or_none(episode.get('season_id')),
-            'uploader': channel.get('base', {}).get('title') or channel.get('name') or uploader,
+            'season_number': int_or_none(season.get('season_number')),
+            'season_id': str_or_none(season.get('id') or video_data.get('season_id')),
+            'uploader': channel.get('name'),
             'uploader_id': str_or_none(channel.get('id')),
             'subtitles': subtitles,
             'uploader_id': str_or_none(channel.get('id')),
             'subtitles': subtitles,
-            'ie_key': 'UplynkPreplay',
         }
 
 
         }
 
 
-class ViceShowIE(InfoExtractor):
+class ViceShowIE(ViceBaseIE):
     IE_NAME = 'vice:show'
     IE_NAME = 'vice:show'
-    _VALID_URL = r'https?://(?:.+?\.)?vice\.com/(?:[^/]+/)?show/(?P<id>[^/?#&]+)'
-
-    _TEST = {
-        'url': 'https://munchies.vice.com/en/show/fuck-thats-delicious-2',
+    _VALID_URL = r'https?://(?:video\.vice|(?:www\.)?vice(?:land|tv))\.com/(?P<locale>[^/]+)/show/(?P<id>[^/?#&]+)'
+    _PAGE_SIZE = 25
+    _TESTS = [{
+        'url': 'https://video.vice.com/en_us/show/fck-thats-delicious',
         'info_dict': {
         'info_dict': {
-            'id': 'fuck-thats-delicious-2',
-            'title': "Fuck, That's Delicious",
-            'description': 'Follow the culinary adventures of rapper Action Bronson during his ongoing world tour.',
+            'id': '57a2040c8cb727dec794c901',
+            'title': 'F*ck, That’s Delicious',
+            'description': 'The life and eating habits of rap’s greatest bon vivant, Action Bronson.',
         },
         },
-        'playlist_count': 17,
-    }
+        'playlist_mincount': 64,
+    }, {
+        'url': 'https://www.vicetv.com/en_us/show/fck-thats-delicious',
+        'only_matching': True,
+    }]
 
 
-    def _real_extract(self, url):
-        show_id = self._match_id(url)
-        webpage = self._download_webpage(url, show_id)
+    def _fetch_page(self, locale, show_id, page):
+        videos = self._call_api('videos', 'show_id', show_id, locale, '''body
+    id
+    url''', ', page: %d, per_page: %d' % (page + 1, self._PAGE_SIZE))
+        for video in videos:
+            yield self.url_result(
+                video['url'], ViceIE.ie_key(), video.get('id'))
 
 
-        entries = [
-            self.url_result(video_url, ViceIE.ie_key())
-            for video_url, _ in re.findall(
-                r'<h2[^>]+class="article-title"[^>]+data-id="\d+"[^>]*>\s*<a[^>]+href="(%s.*?)"'
-                % ViceIE._VALID_URL, webpage)]
+    def _real_extract(self, url):
+        locale, display_id = re.match(self._VALID_URL, url).groups()
+        show = self._call_api('shows', 'slug', display_id, locale, '''dek
+    id
+    title''')[0]
+        show_id = show['id']
 
 
-        title = self._search_regex(
-            r'<title>(.+?)</title>', webpage, 'title', default=None)
-        if title:
-            title = re.sub(r'(.+)\s*\|\s*.+$', r'\1', title).strip()
-        description = self._html_search_meta(
-            'description', webpage, 'description')
+        entries = OnDemandPagedList(
+            functools.partial(self._fetch_page, locale, show_id),
+            self._PAGE_SIZE)
 
 
-        return self.playlist_result(entries, show_id, title, description)
+        return self.playlist_result(
+            entries, show_id, show.get('title'), show.get('dek'))
 
 
 
 
-class ViceArticleIE(InfoExtractor):
+class ViceArticleIE(ViceBaseIE):
     IE_NAME = 'vice:article'
     IE_NAME = 'vice:article'
-    _VALID_URL = r'https://www\.vice\.com/[^/]+/article/(?P<id>[^?#]+)'
+    _VALID_URL = r'https://(?:www\.)?vice\.com/(?P<locale>[^/]+)/article/(?:[0-9a-z]{6}/)?(?P<id>[^?#]+)'
 
     _TESTS = [{
         'url': 'https://www.vice.com/en_us/article/on-set-with-the-woman-making-mormon-porn-in-utah',
         'info_dict': {
 
     _TESTS = [{
         'url': 'https://www.vice.com/en_us/article/on-set-with-the-woman-making-mormon-porn-in-utah',
         'info_dict': {
-            'id': '41eae2a47b174a1398357cec55f1f6fc',
+            'id': '58dc0a3dee202d2a0ccfcbd8',
             'ext': 'mp4',
             'ext': 'mp4',
-            'title': 'Mormon War on Porn ',
-            'description': 'md5:6394a8398506581d0346b9ab89093fef',
+            'title': 'Mormon War on Porn',
+            'description': 'md5:1c5d91fe25fa8aa304f9def118b92dbf',
             'uploader': 'vice',
             'uploader_id': '57a204088cb727dec794c67b',
             'timestamp': 1491883129,
             'uploader': 'vice',
             'uploader_id': '57a204088cb727dec794c67b',
             'timestamp': 1491883129,
@@ -258,10 +260,10 @@ class ViceArticleIE(InfoExtractor):
             # AES-encrypted m3u8
             'skip_download': True,
         },
             # AES-encrypted m3u8
             'skip_download': True,
         },
-        'add_ie': ['UplynkPreplay'],
+        'add_ie': [ViceIE.ie_key()],
     }, {
         'url': 'https://www.vice.com/en_us/article/how-to-hack-a-car',
     }, {
         'url': 'https://www.vice.com/en_us/article/how-to-hack-a-car',
-        'md5': '7fe8ebc4fa3323efafc127b82bd821d9',
+        'md5': '13010ee0bc694ea87ec40724397c2349',
         'info_dict': {
             'id': '3jstaBeXgAs',
             'ext': 'mp4',
         'info_dict': {
             'id': '3jstaBeXgAs',
             'ext': 'mp4',
@@ -271,15 +273,15 @@ class ViceArticleIE(InfoExtractor):
             'uploader_id': 'MotherboardTV',
             'upload_date': '20140529',
         },
             'uploader_id': 'MotherboardTV',
             'upload_date': '20140529',
         },
-        'add_ie': ['Youtube'],
+        'add_ie': [YoutubeIE.ie_key()],
     }, {
         'url': 'https://www.vice.com/en_us/article/znm9dx/karley-sciortino-slutever-reloaded',
         'md5': 'a7ecf64ee4fa19b916c16f4b56184ae2',
         'info_dict': {
     }, {
         'url': 'https://www.vice.com/en_us/article/znm9dx/karley-sciortino-slutever-reloaded',
         'md5': 'a7ecf64ee4fa19b916c16f4b56184ae2',
         'info_dict': {
-            'id': 'e2ed435eb67e43efb66e6ef9a6930a88',
+            'id': '57f41d3556a0a80f54726060',
             'ext': 'mp4',
             'title': "Making The World's First Male Sex Doll",
             'ext': 'mp4',
             'title': "Making The World's First Male Sex Doll",
-            'description': 'md5:916078ef0e032d76343116208b6cc2c4',
+            'description': 'md5:19b00b215b99961cf869c40fbe9df755',
             'uploader': 'vice',
             'uploader_id': '57a204088cb727dec794c67b',
             'timestamp': 1476919911,
             'uploader': 'vice',
             'uploader_id': '57a204088cb727dec794c67b',
             'timestamp': 1476919911,
@@ -288,6 +290,7 @@ class ViceArticleIE(InfoExtractor):
         },
         'params': {
             'skip_download': True,
         },
         'params': {
             'skip_download': True,
+            'format': 'bestvideo',
         },
         'add_ie': [ViceIE.ie_key()],
     }, {
         },
         'add_ie': [ViceIE.ie_key()],
     }, {
@@ -299,14 +302,11 @@ class ViceArticleIE(InfoExtractor):
     }]
 
     def _real_extract(self, url):
     }]
 
     def _real_extract(self, url):
-        display_id = self._match_id(url)
-
-        webpage = self._download_webpage(url, display_id)
+        locale, display_id = re.match(self._VALID_URL, url).groups()
 
 
-        prefetch_data = self._parse_json(self._search_regex(
-            r'__APP_STATE\s*=\s*({.+?})(?:\s*\|\|\s*{}\s*)?;\s*\n',
-            webpage, 'app state'), display_id)['pageData']
-        body = prefetch_data['body']
+        article = self._call_api('articles', 'slug', display_id, locale, '''body
+    embed_code''')[0]
+        body = article['body']
 
         def _url_res(video_url, ie_key):
             return {
 
         def _url_res(video_url, ie_key):
             return {
@@ -316,7 +316,7 @@ def _url_res(video_url, ie_key):
                 'ie_key': ie_key,
             }
 
                 'ie_key': ie_key,
             }
 
-        vice_url = ViceIE._extract_url(webpage)
+        vice_url = ViceIE._extract_url(body)
         if vice_url:
             return _url_res(vice_url, ViceIE.ie_key())
 
         if vice_url:
             return _url_res(vice_url, ViceIE.ie_key())
 
@@ -332,6 +332,6 @@ def _url_res(video_url, ie_key):
 
         video_url = self._html_search_regex(
             r'data-video-url="([^"]+)"',
 
         video_url = self._html_search_regex(
             r'data-video-url="([^"]+)"',
-            prefetch_data['embed_code'], 'video URL')
+            article['embed_code'], 'video URL')
 
         return _url_res(video_url, ViceIE.ie_key())
 
         return _url_res(video_url, ViceIE.ie_key())
index 851ad936cfc012b02c3125c9a0a3e898f9c6f005..d6b92b1c833072bab2bbacb8503693b6da156427 100644 (file)
@@ -1,28 +1,62 @@
 from __future__ import unicode_literals
 
 from __future__ import unicode_literals
 
-import base64
+import json
 import re
 
 from .common import InfoExtractor
 import re
 
 from .common import InfoExtractor
-from ..compat import compat_urllib_parse_unquote
+from ..compat import compat_HTTPError
 from ..utils import (
     ExtractorError,
 from ..utils import (
     ExtractorError,
-    clean_html,
-    determine_ext,
     int_or_none,
     int_or_none,
-    js_to_json,
     parse_age_limit,
     parse_age_limit,
-    parse_duration,
-    try_get,
 )
 
 
 class ViewLiftBaseIE(InfoExtractor):
 )
 
 
 class ViewLiftBaseIE(InfoExtractor):
-    _DOMAINS_REGEX = r'(?:(?:main\.)?snagfilms|snagxtreme|funnyforfree|kiddovid|winnersview|(?:monumental|lax)sportsnetwork|vayafilm)\.com|hoichoi\.tv'
+    _API_BASE = 'https://prod-api.viewlift.com/'
+    _DOMAINS_REGEX = r'(?:(?:main\.)?snagfilms|snagxtreme|funnyforfree|kiddovid|winnersview|(?:monumental|lax)sportsnetwork|vayafilm|failarmy|ftfnext|lnppass\.legapallacanestro|moviespree|app\.myoutdoortv|neoufitness|pflmma|theidentitytb)\.com|(?:hoichoi|app\.horseandcountry|kronon|marquee|supercrosslive)\.tv'
+    _SITE_MAP = {
+        'ftfnext': 'lax',
+        'funnyforfree': 'snagfilms',
+        'hoichoi': 'hoichoitv',
+        'kiddovid': 'snagfilms',
+        'laxsportsnetwork': 'lax',
+        'legapallacanestro': 'lnp',
+        'marquee': 'marquee-tv',
+        'monumentalsportsnetwork': 'monumental-network',
+        'moviespree': 'bingeflix',
+        'pflmma': 'pfl',
+        'snagxtreme': 'snagfilms',
+        'theidentitytb': 'tampabay',
+        'vayafilm': 'snagfilms',
+    }
+    _TOKENS = {}
+
+    def _call_api(self, site, path, video_id, query):
+        token = self._TOKENS.get(site)
+        if not token:
+            token_query = {'site': site}
+            email, password = self._get_login_info(netrc_machine=site)
+            if email:
+                resp = self._download_json(
+                    self._API_BASE + 'identity/signin', video_id,
+                    'Logging in', query=token_query, data=json.dumps({
+                        'email': email,
+                        'password': password,
+                    }).encode())
+            else:
+                resp = self._download_json(
+                    self._API_BASE + 'identity/anonymous-token', video_id,
+                    'Downloading authorization token', query=token_query)
+            self._TOKENS[site] = token = resp['authorizationToken']
+        return self._download_json(
+            self._API_BASE + path, video_id,
+            headers={'Authorization': token}, query=query)
 
 
 class ViewLiftEmbedIE(ViewLiftBaseIE):
 
 
 class ViewLiftEmbedIE(ViewLiftBaseIE):
-    _VALID_URL = r'https?://(?:(?:www|embed)\.)?(?:%s)/embed/player\?.*\bfilmId=(?P<id>[\da-f]{8}-(?:[\da-f]{4}-){3}[\da-f]{12})' % ViewLiftBaseIE._DOMAINS_REGEX
+    IE_NAME = 'viewlift:embed'
+    _VALID_URL = r'https?://(?:(?:www|embed)\.)?(?P<domain>%s)/embed/player\?.*\bfilmId=(?P<id>[\da-f]{8}-(?:[\da-f]{4}-){3}[\da-f]{12})' % ViewLiftBaseIE._DOMAINS_REGEX
     _TESTS = [{
         'url': 'http://embed.snagfilms.com/embed/player?filmId=74849a00-85a9-11e1-9660-123139220831&w=500',
         'md5': '2924e9215c6eff7a55ed35b72276bd93',
     _TESTS = [{
         'url': 'http://embed.snagfilms.com/embed/player?filmId=74849a00-85a9-11e1-9660-123139220831&w=500',
         'md5': '2924e9215c6eff7a55ed35b72276bd93',
@@ -30,6 +64,9 @@ class ViewLiftEmbedIE(ViewLiftBaseIE):
             'id': '74849a00-85a9-11e1-9660-123139220831',
             'ext': 'mp4',
             'title': '#whilewewatch',
             'id': '74849a00-85a9-11e1-9660-123139220831',
             'ext': 'mp4',
             'title': '#whilewewatch',
+            'description': 'md5:b542bef32a6f657dadd0df06e26fb0c8',
+            'timestamp': 1334350096,
+            'upload_date': '20120413',
         }
     }, {
         # invalid labels, 360p is better that 480p
         }
     }, {
         # invalid labels, 360p is better that 480p
@@ -39,7 +76,8 @@ class ViewLiftEmbedIE(ViewLiftBaseIE):
             'id': '17ca0950-a74a-11e0-a92a-0026bb61d036',
             'ext': 'mp4',
             'title': 'Life in Limbo',
             'id': '17ca0950-a74a-11e0-a92a-0026bb61d036',
             'ext': 'mp4',
             'title': 'Life in Limbo',
-        }
+        },
+        'skip': 'The video does not exist',
     }, {
         'url': 'http://www.snagfilms.com/embed/player?filmId=0000014c-de2f-d5d6-abcf-ffef58af0017',
         'only_matching': True,
     }, {
         'url': 'http://www.snagfilms.com/embed/player?filmId=0000014c-de2f-d5d6-abcf-ffef58af0017',
         'only_matching': True,
@@ -54,67 +92,68 @@ def _extract_url(webpage):
             return mobj.group('url')
 
     def _real_extract(self, url):
             return mobj.group('url')
 
     def _real_extract(self, url):
-        video_id = self._match_id(url)
-
-        webpage = self._download_webpage(url, video_id)
-
-        if '>This film is not playable in your area.<' in webpage:
-            raise ExtractorError(
-                'Film %s is not playable in your area.' % video_id, expected=True)
+        domain, film_id = re.match(self._VALID_URL, url).groups()
+        site = domain.split('.')[-2]
+        if site in self._SITE_MAP:
+            site = self._SITE_MAP[site]
+        try:
+            content_data = self._call_api(
+                site, 'entitlement/video/status', film_id, {
+                    'id': film_id
+                })['video']
+        except ExtractorError as e:
+            if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
+                error_message = self._parse_json(e.cause.read().decode(), film_id).get('errorMessage')
+                if error_message == 'User does not have a valid subscription or has not purchased this content.':
+                    self.raise_login_required()
+                raise ExtractorError(error_message, expected=True)
+            raise
+        gist = content_data['gist']
+        title = gist['title']
+        video_assets = content_data['streamingInfo']['videoAssets']
 
         formats = []
 
         formats = []
-        has_bitrate = False
-        sources = self._parse_json(self._search_regex(
-            r'(?s)sources:\s*(\[.+?\]),', webpage,
-            'sources', default='[]'), video_id, js_to_json)
-        for source in sources:
-            file_ = source.get('file')
-            if not file_:
+        mpeg_video_assets = video_assets.get('mpeg') or []
+        for video_asset in mpeg_video_assets:
+            video_asset_url = video_asset.get('url')
+            if not video_asset:
                 continue
                 continue
-            type_ = source.get('type')
-            ext = determine_ext(file_)
-            format_id = source.get('label') or ext
-            if all(v in ('m3u8', 'hls') for v in (type_, ext)):
-                formats.extend(self._extract_m3u8_formats(
-                    file_, video_id, 'mp4', 'm3u8_native',
-                    m3u8_id='hls', fatal=False))
-            else:
-                bitrate = int_or_none(self._search_regex(
-                    [r'(\d+)kbps', r'_\d{1,2}x\d{1,2}_(\d{3,})\.%s' % ext],
-                    file_, 'bitrate', default=None))
-                if not has_bitrate and bitrate:
-                    has_bitrate = True
-                height = int_or_none(self._search_regex(
-                    r'^(\d+)[pP]$', format_id, 'height', default=None))
-                formats.append({
-                    'url': file_,
-                    'format_id': 'http-%s%s' % (format_id, ('-%dk' % bitrate if bitrate else '')),
-                    'tbr': bitrate,
-                    'height': height,
-                })
-        if not formats:
-            hls_url = self._parse_json(self._search_regex(
-                r'filmInfo\.src\s*=\s*({.+?});',
-                webpage, 'src'), video_id, js_to_json)['src']
-            formats = self._extract_m3u8_formats(
-                hls_url, video_id, 'mp4', 'm3u8_native',
-                m3u8_id='hls', fatal=False)
-        field_preference = None if has_bitrate else ('height', 'tbr', 'format_id')
-        self._sort_formats(formats, field_preference)
-
-        title = self._search_regex(
-            [r"title\s*:\s*'([^']+)'", r'<title>([^<]+)</title>'],
-            webpage, 'title')
-
-        return {
-            'id': video_id,
+            bitrate = int_or_none(video_asset.get('bitrate'))
+            height = int_or_none(self._search_regex(
+                r'^_?(\d+)[pP]$', video_asset.get('renditionValue'),
+                'height', default=None))
+            formats.append({
+                'url': video_asset_url,
+                'format_id': 'http%s' % ('-%d' % bitrate if bitrate else ''),
+                'tbr': bitrate,
+                'height': height,
+                'vcodec': video_asset.get('codec'),
+            })
+
+        hls_url = video_assets.get('hls')
+        if hls_url:
+            formats.extend(self._extract_m3u8_formats(
+                hls_url, film_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
+        self._sort_formats(formats, ('height', 'tbr', 'format_id'))
+
+        info = {
+            'id': film_id,
             'title': title,
             'title': title,
+            'description': gist.get('description'),
+            'thumbnail': gist.get('videoImageUrl'),
+            'duration': int_or_none(gist.get('runtime')),
+            'age_limit': parse_age_limit(content_data.get('parentalRating')),
+            'timestamp': int_or_none(gist.get('publishDate'), 1000),
             'formats': formats,
         }
             'formats': formats,
         }
+        for k in ('categories', 'tags'):
+            info[k] = [v['title'] for v in content_data.get(k, []) if v.get('title')]
+        return info
 
 
 class ViewLiftIE(ViewLiftBaseIE):
 
 
 class ViewLiftIE(ViewLiftBaseIE):
-    _VALID_URL = r'https?://(?:www\.)?(?P<domain>%s)(?:/(?:films/title|show|(?:news/)?videos?))?/(?P<id>[^?#]+)' % ViewLiftBaseIE._DOMAINS_REGEX
+    IE_NAME = 'viewlift'
+    _VALID_URL = r'https?://(?:www\.)?(?P<domain>%s)(?P<path>(?:/(?:films/title|show|(?:news/)?videos?|watch))?/(?P<id>[^?#]+))' % ViewLiftBaseIE._DOMAINS_REGEX
     _TESTS = [{
         'url': 'http://www.snagfilms.com/films/title/lost_for_life',
         'md5': '19844f897b35af219773fd63bdec2942',
     _TESTS = [{
         'url': 'http://www.snagfilms.com/films/title/lost_for_life',
         'md5': '19844f897b35af219773fd63bdec2942',
@@ -151,10 +190,13 @@ class ViewLiftIE(ViewLiftBaseIE):
             'id': '00000148-7b53-de26-a9fb-fbf306f70020',
             'display_id': 'augie_alone/s_2_ep_12_love',
             'ext': 'mp4',
             'id': '00000148-7b53-de26-a9fb-fbf306f70020',
             'display_id': 'augie_alone/s_2_ep_12_love',
             'ext': 'mp4',
-            'title': 'Augie, Alone:S. 2 Ep. 12 - Love',
-            'description': 'md5:db2a5c72d994f16a780c1eb353a8f403',
+            'title': 'S. 2 Ep. 12 - Love',
+            'description': 'Augie finds love.',
             'thumbnail': r're:^https?://.*\.jpg',
             'duration': 107,
             'thumbnail': r're:^https?://.*\.jpg',
             'duration': 107,
+            'upload_date': '20141012',
+            'timestamp': 1413129540,
+            'age_limit': 17,
         },
         'params': {
             'skip_download': True,
         },
         'params': {
             'skip_download': True,
@@ -177,6 +219,9 @@ class ViewLiftIE(ViewLiftBaseIE):
         # Was once Kaltura embed
         'url': 'https://www.monumentalsportsnetwork.com/videos/john-carlson-postgame-2-25-15',
         'only_matching': True,
         # Was once Kaltura embed
         'url': 'https://www.monumentalsportsnetwork.com/videos/john-carlson-postgame-2-25-15',
         'only_matching': True,
+    }, {
+        'url': 'https://www.marquee.tv/watch/sadlerswells-sacredmonsters',
+        'only_matching': True,
     }]
 
     @classmethod
     }]
 
     @classmethod
@@ -184,119 +229,22 @@ def suitable(cls, url):
         return False if ViewLiftEmbedIE.suitable(url) else super(ViewLiftIE, cls).suitable(url)
 
     def _real_extract(self, url):
         return False if ViewLiftEmbedIE.suitable(url) else super(ViewLiftIE, cls).suitable(url)
 
     def _real_extract(self, url):
-        domain, display_id = re.match(self._VALID_URL, url).groups()
-
-        webpage = self._download_webpage(url, display_id)
-
-        if ">Sorry, the Film you're looking for is not available.<" in webpage:
-            raise ExtractorError(
-                'Film %s is not available.' % display_id, expected=True)
-
-        initial_store_state = self._search_regex(
-            r"window\.initialStoreState\s*=.*?JSON\.parse\(unescape\(atob\('([^']+)'\)\)\)",
-            webpage, 'Initial Store State', default=None)
-        if initial_store_state:
-            modules = self._parse_json(compat_urllib_parse_unquote(base64.b64decode(
-                initial_store_state).decode()), display_id)['page']['data']['modules']
-            content_data = next(m['contentData'][0] for m in modules if m.get('moduleType') == 'VideoDetailModule')
-            gist = content_data['gist']
-            film_id = gist['id']
-            title = gist['title']
-            video_assets = try_get(
-                content_data, lambda x: x['streamingInfo']['videoAssets'], dict)
-            if not video_assets:
-                token = self._download_json(
-                    'https://prod-api.viewlift.com/identity/anonymous-token',
-                    film_id, 'Downloading authorization token',
-                    query={'site': 'snagfilms'})['authorizationToken']
-                video_assets = self._download_json(
-                    'https://prod-api.viewlift.com/entitlement/video/status',
-                    film_id, headers={
-                        'Authorization': token,
-                        'Referer': url,
-                    }, query={
-                        'id': film_id
-                    })['video']['streamingInfo']['videoAssets']
-
-            formats = []
-            mpeg_video_assets = video_assets.get('mpeg') or []
-            for video_asset in mpeg_video_assets:
-                video_asset_url = video_asset.get('url')
-                if not video_asset:
-                    continue
-                bitrate = int_or_none(video_asset.get('bitrate'))
-                height = int_or_none(self._search_regex(
-                    r'^_?(\d+)[pP]$', video_asset.get('renditionValue'),
-                    'height', default=None))
-                formats.append({
-                    'url': video_asset_url,
-                    'format_id': 'http%s' % ('-%d' % bitrate if bitrate else ''),
-                    'tbr': bitrate,
-                    'height': height,
-                    'vcodec': video_asset.get('codec'),
-                })
-
-            hls_url = video_assets.get('hls')
-            if hls_url:
-                formats.extend(self._extract_m3u8_formats(
-                    hls_url, film_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
-            self._sort_formats(formats, ('height', 'tbr', 'format_id'))
-
-            info = {
-                'id': film_id,
-                'display_id': display_id,
-                'title': title,
-                'description': gist.get('description'),
-                'thumbnail': gist.get('videoImageUrl'),
-                'duration': int_or_none(gist.get('runtime')),
-                'age_limit': parse_age_limit(content_data.get('parentalRating')),
-                'timestamp': int_or_none(gist.get('publishDate'), 1000),
-                'formats': formats,
-            }
-            for k in ('categories', 'tags'):
-                info[k] = [v['title'] for v in content_data.get(k, []) if v.get('title')]
-            return info
-        else:
-            film_id = self._search_regex(r'filmId=([\da-f-]{36})"', webpage, 'film id')
-
-            snag = self._parse_json(
-                self._search_regex(
-                    r'Snag\.page\.data\s*=\s*(\[.+?\]);', webpage, 'snag', default='[]'),
-                display_id)
-
-            for item in snag:
-                if item.get('data', {}).get('film', {}).get('id') == film_id:
-                    data = item['data']['film']
-                    title = data['title']
-                    description = clean_html(data.get('synopsis'))
-                    thumbnail = data.get('image')
-                    duration = int_or_none(data.get('duration') or data.get('runtime'))
-                    categories = [
-                        category['title'] for category in data.get('categories', [])
-                        if category.get('title')]
-                    break
-            else:
-                title = self._html_search_regex(
-                    (r'itemprop="title">([^<]+)<',
-                     r'(?s)itemprop="title">(.+?)<div'), webpage, 'title')
-                description = self._html_search_regex(
-                    r'(?s)<div itemprop="description" class="film-synopsis-inner ">(.+?)</div>',
-                    webpage, 'description', default=None) or self._og_search_description(webpage)
-                thumbnail = self._og_search_thumbnail(webpage)
-                duration = parse_duration(self._search_regex(
-                    r'<span itemprop="duration" class="film-duration strong">([^<]+)<',
-                    webpage, 'duration', fatal=False))
-                categories = re.findall(r'<a href="/movies/[^"]+">([^<]+)</a>', webpage)
-
-            return {
-                '_type': 'url_transparent',
-                'url': 'http://%s/embed/player?filmId=%s' % (domain, film_id),
-                'id': film_id,
-                'display_id': display_id,
-                'title': title,
-                'description': description,
-                'thumbnail': thumbnail,
-                'duration': duration,
-                'categories': categories,
-                'ie_key': 'ViewLiftEmbed',
-            }
+        domain, path, display_id = re.match(self._VALID_URL, url).groups()
+        site = domain.split('.')[-2]
+        if site in self._SITE_MAP:
+            site = self._SITE_MAP[site]
+        modules = self._call_api(
+            site, 'content/pages', display_id, {
+                'includeContent': 'true',
+                'moduleOffset': 1,
+                'path': path,
+                'site': site,
+            })['modules']
+        film_id = next(m['contentData'][0]['gist']['id'] for m in modules if m.get('moduleType') == 'VideoDetailModule')
+        return {
+            '_type': 'url_transparent',
+            'url': 'http://%s/embed/player?filmId=%s' % (domain, film_id),
+            'id': film_id,
+            'display_id': display_id,
+            'ie_key': 'ViewLiftEmbed',
+        }
index baa46d5f3513cbde337f144c7143a9c501455ff4..421795b94d9f6cf10fe3d5725503baa86383062c 100644 (file)
@@ -33,6 +33,7 @@
     unified_timestamp,
     unsmuggle_url,
     urlencode_postdata,
     unified_timestamp,
     unsmuggle_url,
     urlencode_postdata,
+    urljoin,
     unescapeHTML,
 )
 
     unescapeHTML,
 )
 
@@ -139,28 +140,28 @@ def _parse_config(self, config, video_id):
             })
 
         # TODO: fix handling of 308 status code returned for live archive manifest requests
             })
 
         # TODO: fix handling of 308 status code returned for live archive manifest requests
+        sep_pattern = r'/sep/video/'
         for files_type in ('hls', 'dash'):
             for cdn_name, cdn_data in config_files.get(files_type, {}).get('cdns', {}).items():
                 manifest_url = cdn_data.get('url')
                 if not manifest_url:
                     continue
                 format_id = '%s-%s' % (files_type, cdn_name)
         for files_type in ('hls', 'dash'):
             for cdn_name, cdn_data in config_files.get(files_type, {}).get('cdns', {}).items():
                 manifest_url = cdn_data.get('url')
                 if not manifest_url:
                     continue
                 format_id = '%s-%s' % (files_type, cdn_name)
-                if files_type == 'hls':
-                    formats.extend(self._extract_m3u8_formats(
-                        manifest_url, video_id, 'mp4',
-                        'm3u8' if is_live else 'm3u8_native', m3u8_id=format_id,
-                        note='Downloading %s m3u8 information' % cdn_name,
-                        fatal=False))
-                elif files_type == 'dash':
-                    mpd_pattern = r'/%s/(?:sep/)?video/' % video_id
-                    mpd_manifest_urls = []
-                    if re.search(mpd_pattern, manifest_url):
-                        for suffix, repl in (('', 'video'), ('_sep', 'sep/video')):
-                            mpd_manifest_urls.append((format_id + suffix, re.sub(
-                                mpd_pattern, '/%s/%s/' % (video_id, repl), manifest_url)))
-                    else:
-                        mpd_manifest_urls = [(format_id, manifest_url)]
-                    for f_id, m_url in mpd_manifest_urls:
+                sep_manifest_urls = []
+                if re.search(sep_pattern, manifest_url):
+                    for suffix, repl in (('', 'video'), ('_sep', 'sep/video')):
+                        sep_manifest_urls.append((format_id + suffix, re.sub(
+                            sep_pattern, '/%s/' % repl, manifest_url)))
+                else:
+                    sep_manifest_urls = [(format_id, manifest_url)]
+                for f_id, m_url in sep_manifest_urls:
+                    if files_type == 'hls':
+                        formats.extend(self._extract_m3u8_formats(
+                            m_url, video_id, 'mp4',
+                            'm3u8' if is_live else 'm3u8_native', m3u8_id=f_id,
+                            note='Downloading %s m3u8 information' % cdn_name,
+                            fatal=False))
+                    elif files_type == 'dash':
                         if 'json=1' in m_url:
                             real_m_url = (self._download_json(m_url, video_id, fatal=False) or {}).get('url')
                             if real_m_url:
                         if 'json=1' in m_url:
                             real_m_url = (self._download_json(m_url, video_id, fatal=False) or {}).get('url')
                             if real_m_url:
@@ -169,11 +170,6 @@ def _parse_config(self, config, video_id):
                             m_url.replace('/master.json', '/master.mpd'), video_id, f_id,
                             'Downloading %s MPD information' % cdn_name,
                             fatal=False)
                             m_url.replace('/master.json', '/master.mpd'), video_id, f_id,
                             'Downloading %s MPD information' % cdn_name,
                             fatal=False)
-                        for f in mpd_formats:
-                            if f.get('vcodec') == 'none':
-                                f['preference'] = -50
-                            elif f.get('acodec') == 'none':
-                                f['preference'] = -40
                         formats.extend(mpd_formats)
 
         live_archive = live_event.get('archive') or {}
                         formats.extend(mpd_formats)
 
         live_archive = live_event.get('archive') or {}
@@ -185,13 +181,19 @@ def _parse_config(self, config, video_id):
                 'preference': 1,
             })
 
                 'preference': 1,
             })
 
+        for f in formats:
+            if f.get('vcodec') == 'none':
+                f['preference'] = -50
+            elif f.get('acodec') == 'none':
+                f['preference'] = -40
+
         subtitles = {}
         text_tracks = config['request'].get('text_tracks')
         if text_tracks:
             for tt in text_tracks:
                 subtitles[tt['lang']] = [{
                     'ext': 'vtt',
         subtitles = {}
         text_tracks = config['request'].get('text_tracks')
         if text_tracks:
             for tt in text_tracks:
                 subtitles[tt['lang']] = [{
                     'ext': 'vtt',
-                    'url': 'https://vimeo.com' + tt['url'],
+                    'url': urljoin('https://vimeo.com', tt['url']),
                 }]
 
         thumbnails = []
                 }]
 
         thumbnails = []
@@ -591,7 +593,7 @@ def _real_extract(self, url):
             # Retrieve video webpage to extract further information
             webpage, urlh = self._download_webpage_handle(
                 url, video_id, headers=headers)
             # Retrieve video webpage to extract further information
             webpage, urlh = self._download_webpage_handle(
                 url, video_id, headers=headers)
-            redirect_url = compat_str(urlh.geturl())
+            redirect_url = urlh.geturl()
         except ExtractorError as ee:
             if isinstance(ee.cause, compat_HTTPError) and ee.cause.code == 403:
                 errmsg = ee.cause.read()
         except ExtractorError as ee:
             if isinstance(ee.cause, compat_HTTPError) and ee.cause.code == 403:
                 errmsg = ee.cause.read()
@@ -841,33 +843,6 @@ def _extract_list_title(self, webpage):
         return self._TITLE or self._html_search_regex(
             self._TITLE_RE, webpage, 'list title', fatal=False)
 
         return self._TITLE or self._html_search_regex(
             self._TITLE_RE, webpage, 'list title', fatal=False)
 
-    def _login_list_password(self, page_url, list_id, webpage):
-        login_form = self._search_regex(
-            r'(?s)<form[^>]+?id="pw_form"(.*?)</form>',
-            webpage, 'login form', default=None)
-        if not login_form:
-            return webpage
-
-        password = self._downloader.params.get('videopassword')
-        if password is None:
-            raise ExtractorError('This album is protected by a password, use the --video-password option', expected=True)
-        fields = self._hidden_inputs(login_form)
-        token, vuid = self._extract_xsrft_and_vuid(webpage)
-        fields['token'] = token
-        fields['password'] = password
-        post = urlencode_postdata(fields)
-        password_path = self._search_regex(
-            r'action="([^"]+)"', login_form, 'password URL')
-        password_url = compat_urlparse.urljoin(page_url, password_path)
-        password_request = sanitized_Request(password_url, post)
-        password_request.add_header('Content-type', 'application/x-www-form-urlencoded')
-        self._set_vimeo_cookie('vuid', vuid)
-        self._set_vimeo_cookie('xsrft', token)
-
-        return self._download_webpage(
-            password_request, list_id,
-            'Verifying the password', 'Wrong password')
-
     def _title_and_entries(self, list_id, base_url):
         for pagenum in itertools.count(1):
             page_url = self._page_url(base_url, pagenum)
     def _title_and_entries(self, list_id, base_url):
         for pagenum in itertools.count(1):
             page_url = self._page_url(base_url, pagenum)
@@ -876,7 +851,6 @@ def _title_and_entries(self, list_id, base_url):
                 'Downloading page %s' % pagenum)
 
             if pagenum == 1:
                 'Downloading page %s' % pagenum)
 
             if pagenum == 1:
-                webpage = self._login_list_password(page_url, list_id, webpage)
                 yield self._extract_list_title(webpage)
 
             # Try extracting href first since not all videos are available via
                 yield self._extract_list_title(webpage)
 
             # Try extracting href first since not all videos are available via
@@ -923,7 +897,7 @@ class VimeoUserIE(VimeoChannelIE):
     _BASE_URL_TEMPL = 'https://vimeo.com/%s'
 
 
     _BASE_URL_TEMPL = 'https://vimeo.com/%s'
 
 
-class VimeoAlbumIE(VimeoChannelIE):
+class VimeoAlbumIE(VimeoBaseInfoExtractor):
     IE_NAME = 'vimeo:album'
     _VALID_URL = r'https://vimeo\.com/(?:album|showcase)/(?P<id>\d+)(?:$|[?#]|/(?!video))'
     _TITLE_RE = r'<header id="page_header">\n\s*<h1>(.*?)</h1>'
     IE_NAME = 'vimeo:album'
     _VALID_URL = r'https://vimeo\.com/(?:album|showcase)/(?P<id>\d+)(?:$|[?#]|/(?!video))'
     _TITLE_RE = r'<header id="page_header">\n\s*<h1>(.*?)</h1>'
@@ -973,13 +947,39 @@ def _fetch_page(self, album_id, authorizaion, hashed_pass, page):
     def _real_extract(self, url):
         album_id = self._match_id(url)
         webpage = self._download_webpage(url, album_id)
     def _real_extract(self, url):
         album_id = self._match_id(url)
         webpage = self._download_webpage(url, album_id)
-        webpage = self._login_list_password(url, album_id, webpage)
-        api_config = self._extract_vimeo_config(webpage, album_id)['api']
+        viewer = self._parse_json(self._search_regex(
+            r'bootstrap_data\s*=\s*({.+?})</script>',
+            webpage, 'bootstrap data'), album_id)['viewer']
+        jwt = viewer['jwt']
+        album = self._download_json(
+            'https://api.vimeo.com/albums/' + album_id,
+            album_id, headers={'Authorization': 'jwt ' + jwt},
+            query={'fields': 'description,name,privacy'})
+        hashed_pass = None
+        if try_get(album, lambda x: x['privacy']['view']) == 'password':
+            password = self._downloader.params.get('videopassword')
+            if not password:
+                raise ExtractorError(
+                    'This album is protected by a password, use the --video-password option',
+                    expected=True)
+            self._set_vimeo_cookie('vuid', viewer['vuid'])
+            try:
+                hashed_pass = self._download_json(
+                    'https://vimeo.com/showcase/%s/auth' % album_id,
+                    album_id, 'Verifying the password', data=urlencode_postdata({
+                        'password': password,
+                        'token': viewer['xsrft'],
+                    }), headers={
+                        'X-Requested-With': 'XMLHttpRequest',
+                    })['hashed_pass']
+            except ExtractorError as e:
+                if isinstance(e.cause, compat_HTTPError) and e.cause.code == 401:
+                    raise ExtractorError('Wrong password', expected=True)
+                raise
         entries = OnDemandPagedList(functools.partial(
         entries = OnDemandPagedList(functools.partial(
-            self._fetch_page, album_id, api_config['jwt'],
-            api_config.get('hashed_pass')), self._PAGE_SIZE)
-        return self.playlist_result(entries, album_id, self._html_search_regex(
-            r'<title>\s*(.+?)(?:\s+on Vimeo)?</title>', webpage, 'title', fatal=False))
+            self._fetch_page, album_id, jwt, hashed_pass), self._PAGE_SIZE)
+        return self.playlist_result(
+            entries, album_id, album.get('name'), album.get('description'))
 
 
 class VimeoGroupsIE(VimeoChannelIE):
 
 
 class VimeoGroupsIE(VimeoChannelIE):
index c3429f723ddec36cb89dfc9f329f766f45b953e1..f79531e6f3a2e922b0369706cc0d76a22feb2499 100644 (file)
@@ -6,22 +6,18 @@
 import itertools
 
 from .common import InfoExtractor
 import itertools
 
 from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_parse_urlencode,
-    compat_str,
-)
+from .naver import NaverBaseIE
+from ..compat import compat_str
 from ..utils import (
 from ..utils import (
-    dict_get,
     ExtractorError,
     ExtractorError,
-    float_or_none,
-    int_or_none,
+    merge_dicts,
     remove_start,
     try_get,
     urlencode_postdata,
 )
 
 
     remove_start,
     try_get,
     urlencode_postdata,
 )
 
 
-class VLiveIE(InfoExtractor):
+class VLiveIE(NaverBaseIE):
     IE_NAME = 'vlive'
     _VALID_URL = r'https?://(?:(?:www|m)\.)?vlive\.tv/video/(?P<id>[0-9]+)'
     _NETRC_MACHINE = 'vlive'
     IE_NAME = 'vlive'
     _VALID_URL = r'https?://(?:(?:www|m)\.)?vlive\.tv/video/(?P<id>[0-9]+)'
     _NETRC_MACHINE = 'vlive'
@@ -34,6 +30,7 @@ class VLiveIE(InfoExtractor):
             'title': "[V LIVE] Girl's Day's Broadcast",
             'creator': "Girl's Day",
             'view_count': int,
             'title': "[V LIVE] Girl's Day's Broadcast",
             'creator': "Girl's Day",
             'view_count': int,
+            'uploader_id': 'muploader_a',
         },
     }, {
         'url': 'http://www.vlive.tv/video/16937',
         },
     }, {
         'url': 'http://www.vlive.tv/video/16937',
@@ -44,6 +41,7 @@ class VLiveIE(InfoExtractor):
             'creator': 'EXO',
             'view_count': int,
             'subtitles': 'mincount:12',
             'creator': 'EXO',
             'view_count': int,
             'subtitles': 'mincount:12',
+            'uploader_id': 'muploader_j',
         },
         'params': {
             'skip_download': True,
         },
         'params': {
             'skip_download': True,
@@ -187,45 +185,9 @@ def _replay(self, video_id, webpage, long_video_id, key):
                     'This video is only available for CH+ subscribers')
             long_video_id, key = video_info['vid'], video_info['inkey']
 
                     'This video is only available for CH+ subscribers')
             long_video_id, key = video_info['vid'], video_info['inkey']
 
-        playinfo = self._download_json(
-            'http://global.apis.naver.com/rmcnmv/rmcnmv/vod_play_videoInfo.json?%s'
-            % compat_urllib_parse_urlencode({
-                'videoId': long_video_id,
-                'key': key,
-                'ptc': 'http',
-                'doct': 'json',  # document type (xml or json)
-                'cpt': 'vtt',  # captions type (vtt or ttml)
-            }), video_id)
-
-        formats = [{
-            'url': vid['source'],
-            'format_id': vid.get('encodingOption', {}).get('name'),
-            'abr': float_or_none(vid.get('bitrate', {}).get('audio')),
-            'vbr': float_or_none(vid.get('bitrate', {}).get('video')),
-            'width': int_or_none(vid.get('encodingOption', {}).get('width')),
-            'height': int_or_none(vid.get('encodingOption', {}).get('height')),
-            'filesize': int_or_none(vid.get('size')),
-        } for vid in playinfo.get('videos', {}).get('list', []) if vid.get('source')]
-        self._sort_formats(formats)
-
-        view_count = int_or_none(playinfo.get('meta', {}).get('count'))
-
-        subtitles = {}
-        for caption in playinfo.get('captions', {}).get('list', []):
-            lang = dict_get(caption, ('locale', 'language', 'country', 'label'))
-            if lang and caption.get('source'):
-                subtitles[lang] = [{
-                    'ext': 'vtt',
-                    'url': caption['source']}]
-
-        info = self._get_common_fields(webpage)
-        info.update({
-            'id': video_id,
-            'formats': formats,
-            'view_count': view_count,
-            'subtitles': subtitles,
-        })
-        return info
+        return merge_dicts(
+            self._get_common_fields(webpage),
+            self._extract_video_info(video_id, long_video_id, key))
 
     def _download_init_page(self, video_id):
         return self._download_webpage(
 
     def _download_init_page(self, video_id):
         return self._download_webpage(
index 239644340384b60c8e1a80d40b50cabbd0fd2c9e..74d2257e7c059cdb410db69d77c25e1c7e8c9902 100644 (file)
@@ -6,8 +6,8 @@
 
 
 class VODPlatformIE(InfoExtractor):
 
 
 class VODPlatformIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?vod-platform\.net/[eE]mbed/(?P<id>[^/?#]+)'
-    _TEST = {
+    _VALID_URL = r'https?://(?:(?:www\.)?vod-platform\.net|embed\.kwikmotion\.com)/[eE]mbed/(?P<id>[^/?#]+)'
+    _TESTS = [{
         # from http://www.lbcgroup.tv/watch/chapter/29143/52844/%D8%A7%D9%84%D9%86%D8%B5%D8%B1%D8%A9-%D9%81%D9%8A-%D8%B6%D9%8A%D8%A7%D9%81%D8%A9-%D8%A7%D9%84%D9%80-cnn/ar
         'url': 'http://vod-platform.net/embed/RufMcytHDolTH1MuKHY9Fw',
         'md5': '1db2b7249ce383d6be96499006e951fc',
         # from http://www.lbcgroup.tv/watch/chapter/29143/52844/%D8%A7%D9%84%D9%86%D8%B5%D8%B1%D8%A9-%D9%81%D9%8A-%D8%B6%D9%8A%D8%A7%D9%81%D8%A9-%D8%A7%D9%84%D9%80-cnn/ar
         'url': 'http://vod-platform.net/embed/RufMcytHDolTH1MuKHY9Fw',
         'md5': '1db2b7249ce383d6be96499006e951fc',
@@ -16,7 +16,10 @@ class VODPlatformIE(InfoExtractor):
             'ext': 'mp4',
             'title': 'LBCi News_ النصرة في ضيافة الـ "سي.أن.أن"',
         }
             'ext': 'mp4',
             'title': 'LBCi News_ النصرة في ضيافة الـ "سي.أن.أن"',
         }
-    }
+    }, {
+        'url': 'http://embed.kwikmotion.com/embed/RufMcytHDolTH1MuKHY9Fw',
+        'only_matching': True,
+    }]
 
     def _real_extract(self, url):
         video_id = self._match_id(url)
 
     def _real_extract(self, url):
         video_id = self._match_id(url)
index 59e1359c48628af9b4c53bedc337fa6b9b3d1396..a52e40afa2892a10538251ba40e4d2a44a10a67d 100644 (file)
@@ -1,17 +1,12 @@
 from __future__ import unicode_literals
 
 from __future__ import unicode_literals
 
-import re
-
 from .common import InfoExtractor
 from .common import InfoExtractor
-from ..compat import (
-    compat_str,
-    compat_urlparse,
-)
+from ..compat import compat_str
 from ..utils import (
     ExtractorError,
     determine_ext,
     int_or_none,
 from ..utils import (
     ExtractorError,
     determine_ext,
     int_or_none,
-    sanitized_Request,
+    urljoin,
 )
 
 
 )
 
 
@@ -26,8 +21,7 @@ class VoiceRepublicIE(InfoExtractor):
             'ext': 'm4a',
             'title': 'Watching the Watchers: Building a Sousveillance State',
             'description': 'Secret surveillance programs have metadata too. The people and companies that operate secret surveillance programs can be surveilled.',
             'ext': 'm4a',
             'title': 'Watching the Watchers: Building a Sousveillance State',
             'description': 'Secret surveillance programs have metadata too. The people and companies that operate secret surveillance programs can be surveilled.',
-            'thumbnail': r're:^https?://.*\.(?:png|jpg)$',
-            'duration': 1800,
+            'duration': 1556,
             'view_count': int,
         }
     }, {
             'view_count': int,
         }
     }, {
@@ -38,63 +32,31 @@ class VoiceRepublicIE(InfoExtractor):
     def _real_extract(self, url):
         display_id = self._match_id(url)
 
     def _real_extract(self, url):
         display_id = self._match_id(url)
 
-        req = sanitized_Request(
-            compat_urlparse.urljoin(url, '/talks/%s' % display_id))
-        # Older versions of Firefox get redirected to an "upgrade browser" page
-        req.add_header('User-Agent', 'youtube-dl')
-        webpage = self._download_webpage(req, display_id)
+        webpage = self._download_webpage(url, display_id)
 
         if '>Queued for processing, please stand by...<' in webpage:
             raise ExtractorError(
                 'Audio is still queued for processing', expected=True)
 
 
         if '>Queued for processing, please stand by...<' in webpage:
             raise ExtractorError(
                 'Audio is still queued for processing', expected=True)
 
-        config = self._search_regex(
-            r'(?s)return ({.+?});\s*\n', webpage,
-            'data', default=None)
-        data = self._parse_json(config, display_id, fatal=False) if config else None
-        if data:
-            title = data['title']
-            description = data.get('teaser')
-            talk_id = compat_str(data.get('talk_id') or display_id)
-            talk = data['talk']
-            duration = int_or_none(talk.get('duration'))
-            formats = [{
-                'url': compat_urlparse.urljoin(url, talk_url),
-                'format_id': format_id,
-                'ext': determine_ext(talk_url) or format_id,
-                'vcodec': 'none',
-            } for format_id, talk_url in talk['links'].items()]
-        else:
-            title = self._og_search_title(webpage)
-            description = self._html_search_regex(
-                r"(?s)<div class='talk-teaser'[^>]*>(.+?)</div>",
-                webpage, 'description', fatal=False)
-            talk_id = self._search_regex(
-                [r"id='jc-(\d+)'", r"data-shareable-id='(\d+)'"],
-                webpage, 'talk id', default=None) or display_id
-            duration = None
-            player = self._search_regex(
-                r"class='vr-player jp-jplayer'([^>]+)>", webpage, 'player')
-            formats = [{
-                'url': compat_urlparse.urljoin(url, talk_url),
-                'format_id': format_id,
-                'ext': determine_ext(talk_url) or format_id,
-                'vcodec': 'none',
-            } for format_id, talk_url in re.findall(r"data-([^=]+)='([^']+)'", player)]
+        talk = self._parse_json(self._search_regex(
+            r'initialSnapshot\s*=\s*({.+?});',
+            webpage, 'talk'), display_id)['talk']
+        title = talk['title']
+        formats = [{
+            'url': urljoin(url, talk_url),
+            'format_id': format_id,
+            'ext': determine_ext(talk_url) or format_id,
+            'vcodec': 'none',
+        } for format_id, talk_url in talk['media_links'].items()]
         self._sort_formats(formats)
 
         self._sort_formats(formats)
 
-        thumbnail = self._og_search_thumbnail(webpage)
-        view_count = int_or_none(self._search_regex(
-            r"class='play-count[^']*'>\s*(\d+) plays",
-            webpage, 'play count', fatal=False))
-
         return {
         return {
-            'id': talk_id,
+            'id': compat_str(talk.get('id') or display_id),
             'display_id': display_id,
             'title': title,
             'display_id': display_id,
             'title': title,
-            'description': description,
-            'thumbnail': thumbnail,
-            'duration': duration,
-            'view_count': view_count,
+            'description': talk.get('teaser'),
+            'thumbnail': talk.get('image_url'),
+            'duration': int_or_none(talk.get('archived_duration')),
+            'view_count': int_or_none(talk.get('play_count')),
             'formats': formats,
         }
             'formats': formats,
         }
index 0fbc888ec03ce6d09f853a82ea96328c927e4ae5..77febd2eb1b1cada3942c212739725135a36682b 100644 (file)
@@ -13,8 +13,7 @@
 
 class WistiaIE(InfoExtractor):
     _VALID_URL = r'(?:wistia:|https?://(?:fast\.)?wistia\.(?:net|com)/embed/(?:iframe|medias)/)(?P<id>[a-z0-9]{10})'
 
 class WistiaIE(InfoExtractor):
     _VALID_URL = r'(?:wistia:|https?://(?:fast\.)?wistia\.(?:net|com)/embed/(?:iframe|medias)/)(?P<id>[a-z0-9]{10})'
-    _API_URL = 'http://fast.wistia.com/embed/medias/%s.json'
-    _IFRAME_URL = 'http://fast.wistia.net/embed/iframe/%s'
+    _EMBED_BASE_URL = 'http://fast.wistia.com/embed/'
 
     _TESTS = [{
         'url': 'http://fast.wistia.net/embed/iframe/sh7fpupwlt',
 
     _TESTS = [{
         'url': 'http://fast.wistia.net/embed/iframe/sh7fpupwlt',
@@ -46,31 +45,32 @@ class WistiaIE(InfoExtractor):
     # https://wistia.com/support/embed-and-share/video-on-your-website
     @staticmethod
     def _extract_url(webpage):
     # https://wistia.com/support/embed-and-share/video-on-your-website
     @staticmethod
     def _extract_url(webpage):
-        match = re.search(
-            r'<(?:meta[^>]+?content|(?:iframe|script)[^>]+?src)=["\'](?P<url>(?:https?:)?//(?:fast\.)?wistia\.(?:net|com)/embed/(?:iframe|medias)/[a-z0-9]{10})', webpage)
-        if match:
-            return unescapeHTML(match.group('url'))
+        urls = WistiaIE._extract_urls(webpage)
+        return urls[0] if urls else None
 
 
-        match = re.search(
-            r'''(?sx)
-                <script[^>]+src=(["'])(?:https?:)?//fast\.wistia\.com/assets/external/E-v1\.js\1[^>]*>.*?
-                <div[^>]+class=(["']).*?\bwistia_async_(?P<id>[a-z0-9]{10})\b.*?\2
-            ''', webpage)
-        if match:
-            return 'wistia:%s' % match.group('id')
-
-        match = re.search(r'(?:data-wistia-?id=["\']|Wistia\.embed\(["\']|id=["\']wistia_)(?P<id>[a-z0-9]{10})', webpage)
-        if match:
-            return 'wistia:%s' % match.group('id')
+    @staticmethod
+    def _extract_urls(webpage):
+        urls = []
+        for match in re.finditer(
+                r'<(?:meta[^>]+?content|(?:iframe|script)[^>]+?src)=["\'](?P<url>(?:https?:)?//(?:fast\.)?wistia\.(?:net|com)/embed/(?:iframe|medias)/[a-z0-9]{10})', webpage):
+            urls.append(unescapeHTML(match.group('url')))
+        for match in re.finditer(
+                r'''(?sx)
+                    <div[^>]+class=(["'])(?:(?!\1).)*?\bwistia_async_(?P<id>[a-z0-9]{10})\b(?:(?!\1).)*?\1
+                ''', webpage):
+            urls.append('wistia:%s' % match.group('id'))
+        for match in re.finditer(r'(?:data-wistia-?id=["\']|Wistia\.embed\(["\']|id=["\']wistia_)(?P<id>[a-z0-9]{10})', webpage):
+            urls.append('wistia:%s' % match.group('id'))
+        return urls
 
     def _real_extract(self, url):
         video_id = self._match_id(url)
 
         data_json = self._download_json(
 
     def _real_extract(self, url):
         video_id = self._match_id(url)
 
         data_json = self._download_json(
-            self._API_URL % video_id, video_id,
+            self._EMBED_BASE_URL + 'medias/%s.json' % video_id, video_id,
             # Some videos require this.
             headers={
             # Some videos require this.
             headers={
-                'Referer': url if url.startswith('http') else self._IFRAME_URL % video_id,
+                'Referer': url if url.startswith('http') else self._EMBED_BASE_URL + 'iframe/' + video_id,
             })
 
         if data_json.get('error'):
             })
 
         if data_json.get('error'):
@@ -95,27 +95,61 @@ def _real_extract(self, url):
                     'url': aurl,
                     'width': int_or_none(a.get('width')),
                     'height': int_or_none(a.get('height')),
                     'url': aurl,
                     'width': int_or_none(a.get('width')),
                     'height': int_or_none(a.get('height')),
+                    'filesize': int_or_none(a.get('size')),
                 })
             else:
                 aext = a.get('ext')
                 })
             else:
                 aext = a.get('ext')
-                is_m3u8 = a.get('container') == 'm3u8' or aext == 'm3u8'
-                formats.append({
-                    'format_id': atype,
+                display_name = a.get('display_name')
+                format_id = atype
+                if atype and atype.endswith('_video') and display_name:
+                    format_id = '%s-%s' % (atype[:-6], display_name)
+                f = {
+                    'format_id': format_id,
                     'url': aurl,
                     'url': aurl,
-                    'tbr': int_or_none(a.get('bitrate')),
-                    'vbr': int_or_none(a.get('opt_vbitrate')),
-                    'width': int_or_none(a.get('width')),
-                    'height': int_or_none(a.get('height')),
-                    'filesize': int_or_none(a.get('size')),
-                    'vcodec': a.get('codec'),
-                    'container': a.get('container'),
-                    'ext': 'mp4' if is_m3u8 else aext,
-                    'protocol': 'm3u8' if is_m3u8 else None,
+                    'tbr': int_or_none(a.get('bitrate')) or None,
                     'preference': 1 if atype == 'original' else None,
                     'preference': 1 if atype == 'original' else None,
-                })
+                }
+                if display_name == 'Audio':
+                    f.update({
+                        'vcodec': 'none',
+                    })
+                else:
+                    f.update({
+                        'width': int_or_none(a.get('width')),
+                        'height': int_or_none(a.get('height')),
+                        'vcodec': a.get('codec'),
+                    })
+                if a.get('container') == 'm3u8' or aext == 'm3u8':
+                    ts_f = f.copy()
+                    ts_f.update({
+                        'ext': 'ts',
+                        'format_id': f['format_id'].replace('hls-', 'ts-'),
+                        'url': f['url'].replace('.bin', '.ts'),
+                    })
+                    formats.append(ts_f)
+                    f.update({
+                        'ext': 'mp4',
+                        'protocol': 'm3u8_native',
+                    })
+                else:
+                    f.update({
+                        'container': a.get('container'),
+                        'ext': aext,
+                        'filesize': int_or_none(a.get('size')),
+                    })
+                formats.append(f)
 
         self._sort_formats(formats)
 
 
         self._sort_formats(formats)
 
+        subtitles = {}
+        for caption in data.get('captions', []):
+            language = caption.get('language')
+            if not language:
+                continue
+            subtitles[language] = [{
+                'url': self._EMBED_BASE_URL + 'captions/' + video_id + '.vtt?language=' + language,
+            }]
+
         return {
             'id': video_id,
             'title': title,
         return {
             'id': video_id,
             'title': title,
@@ -124,4 +158,5 @@ def _real_extract(self, url):
             'thumbnails': thumbnails,
             'duration': float_or_none(data.get('duration')),
             'timestamp': int_or_none(data.get('createdAt')),
             'thumbnails': thumbnails,
             'duration': float_or_none(data.get('duration')),
             'timestamp': int_or_none(data.get('createdAt')),
+            'subtitles': subtitles,
         }
         }
index a5b94d2794166d452464b728ec40b2b258459c64..902a3ed338e914c9e20edc3c643d0e5274d71fe7 100644 (file)
 
 
 class XHamsterIE(InfoExtractor):
 
 
 class XHamsterIE(InfoExtractor):
-    _DOMAINS = r'(?:xhamster\.(?:com|one|desi)|xhms\.pro|xhamster[27]\.com)'
+    _DOMAINS = r'(?:xhamster\.(?:com|one|desi)|xhms\.pro|xhamster\d+\.com)'
     _VALID_URL = r'''(?x)
                     https?://
                         (?:.+?\.)?%s/
                         (?:
     _VALID_URL = r'''(?x)
                     https?://
                         (?:.+?\.)?%s/
                         (?:
-                            movies/(?P<id>\d+)/(?P<display_id>[^/]*)\.html|
-                            videos/(?P<display_id_2>[^/]*)-(?P<id_2>\d+)
+                            movies/(?P<id>[\dA-Za-z]+)/(?P<display_id>[^/]*)\.html|
+                            videos/(?P<display_id_2>[^/]*)-(?P<id_2>[\dA-Za-z]+)
                         )
                     ''' % _DOMAINS
     _TESTS = [{
                         )
                     ''' % _DOMAINS
     _TESTS = [{
@@ -99,12 +99,21 @@ class XHamsterIE(InfoExtractor):
     }, {
         'url': 'https://xhamster2.com/videos/femaleagent-shy-beauty-takes-the-bait-1509445',
         'only_matching': True,
     }, {
         'url': 'https://xhamster2.com/videos/femaleagent-shy-beauty-takes-the-bait-1509445',
         'only_matching': True,
+    }, {
+        'url': 'https://xhamster11.com/videos/femaleagent-shy-beauty-takes-the-bait-1509445',
+        'only_matching': True,
+    }, {
+        'url': 'https://xhamster26.com/videos/femaleagent-shy-beauty-takes-the-bait-1509445',
+        'only_matching': True,
     }, {
         'url': 'http://xhamster.com/movies/1509445/femaleagent_shy_beauty_takes_the_bait.html',
         'only_matching': True,
     }, {
         'url': 'http://xhamster.com/movies/2221348/britney_spears_sexy_booty.html?hd',
         'only_matching': True,
     }, {
         'url': 'http://xhamster.com/movies/1509445/femaleagent_shy_beauty_takes_the_bait.html',
         'only_matching': True,
     }, {
         'url': 'http://xhamster.com/movies/2221348/britney_spears_sexy_booty.html?hd',
         'only_matching': True,
+    }, {
+        'url': 'http://de.xhamster.com/videos/skinny-girl-fucks-herself-hard-in-the-forest-xhnBJZx',
+        'only_matching': True,
     }]
 
     def _real_extract(self, url):
     }]
 
     def _real_extract(self, url):
@@ -113,7 +122,7 @@ def _real_extract(self, url):
         display_id = mobj.group('display_id') or mobj.group('display_id_2')
 
         desktop_url = re.sub(r'^(https?://(?:.+?\.)?)m\.', r'\1', url)
         display_id = mobj.group('display_id') or mobj.group('display_id_2')
 
         desktop_url = re.sub(r'^(https?://(?:.+?\.)?)m\.', r'\1', url)
-        webpage = self._download_webpage(desktop_url, video_id)
+        webpage, urlh = self._download_webpage_handle(desktop_url, video_id)
 
         error = self._html_search_regex(
             r'<div[^>]+id=["\']videoClosed["\'][^>]*>(.+?)</div>',
 
         error = self._html_search_regex(
             r'<div[^>]+id=["\']videoClosed["\'][^>]*>(.+?)</div>',
@@ -129,7 +138,7 @@ def get_height(s):
 
         initials = self._parse_json(
             self._search_regex(
 
         initials = self._parse_json(
             self._search_regex(
-                r'window\.initials\s*=\s*({.+?})\s*;\s*\n', webpage, 'initials',
+                r'window\.initials\s*=\s*({.+?})\s*;', webpage, 'initials',
                 default='{}'),
             video_id, fatal=False)
         if initials:
                 default='{}'),
             video_id, fatal=False)
         if initials:
@@ -161,6 +170,9 @@ def get_height(s):
                         'ext': determine_ext(format_url, 'mp4'),
                         'height': get_height(quality),
                         'filesize': filesize,
                         'ext': determine_ext(format_url, 'mp4'),
                         'height': get_height(quality),
                         'filesize': filesize,
+                        'http_headers': {
+                            'Referer': urlh.geturl(),
+                        },
                     })
             self._sort_formats(formats)
 
                     })
             self._sort_formats(formats)
 
index c6c0b3291c8320064fa0a7529be5b5d78f14461c..01b253dcb1e8c92232a06c0b2b4153a545dabcc1 100644 (file)
@@ -47,7 +47,7 @@ class XTubeIE(InfoExtractor):
             'display_id': 'A-Super-Run-Part-1-YT',
             'ext': 'flv',
             'title': 'A Super Run - Part 1 (YT)',
             'display_id': 'A-Super-Run-Part-1-YT',
             'ext': 'flv',
             'title': 'A Super Run - Part 1 (YT)',
-            'description': 'md5:ca0d47afff4a9b2942e4b41aa970fd93',
+            'description': 'md5:4cc3af1aa1b0413289babc88f0d4f616',
             'uploader': 'tshirtguy59',
             'duration': 579,
             'view_count': int,
             'uploader': 'tshirtguy59',
             'duration': 579,
             'view_count': int,
@@ -87,10 +87,24 @@ def _real_extract(self, url):
                 'Cookie': 'age_verified=1; cookiesAccepted=1',
             })
 
                 'Cookie': 'age_verified=1; cookiesAccepted=1',
             })
 
-        sources = self._parse_json(self._search_regex(
-            r'(["\'])?sources\1?\s*:\s*(?P<sources>{.+?}),',
-            webpage, 'sources', group='sources'), video_id,
-            transform_source=js_to_json)
+        title, thumbnail, duration = [None] * 3
+
+        config = self._parse_json(self._search_regex(
+            r'playerConf\s*=\s*({.+?})\s*,\s*\n', webpage, 'config',
+            default='{}'), video_id, transform_source=js_to_json, fatal=False)
+        if config:
+            config = config.get('mainRoll')
+            if isinstance(config, dict):
+                title = config.get('title')
+                thumbnail = config.get('poster')
+                duration = int_or_none(config.get('duration'))
+                sources = config.get('sources') or config.get('format')
+
+        if not isinstance(sources, dict):
+            sources = self._parse_json(self._search_regex(
+                r'(["\'])?sources\1?\s*:\s*(?P<sources>{.+?}),',
+                webpage, 'sources', group='sources'), video_id,
+                transform_source=js_to_json)
 
         formats = []
         for format_id, format_url in sources.items():
 
         formats = []
         for format_id, format_url in sources.items():
@@ -102,20 +116,25 @@ def _real_extract(self, url):
         self._remove_duplicate_formats(formats)
         self._sort_formats(formats)
 
         self._remove_duplicate_formats(formats)
         self._sort_formats(formats)
 
-        title = self._search_regex(
-            (r'<h1>\s*(?P<title>[^<]+?)\s*</h1>', r'videoTitle\s*:\s*(["\'])(?P<title>.+?)\1'),
-            webpage, 'title', group='title')
-        description = self._search_regex(
+        if not title:
+            title = self._search_regex(
+                (r'<h1>\s*(?P<title>[^<]+?)\s*</h1>', r'videoTitle\s*:\s*(["\'])(?P<title>.+?)\1'),
+                webpage, 'title', group='title')
+        description = self._og_search_description(
+            webpage, default=None) or self._html_search_meta(
+            'twitter:description', webpage, default=None) or self._search_regex(
             r'</h1>\s*<p>([^<]+)', webpage, 'description', fatal=False)
         uploader = self._search_regex(
             (r'<input[^>]+name="contentOwnerId"[^>]+value="([^"]+)"',
              r'<span[^>]+class="nickname"[^>]*>([^<]+)'),
             webpage, 'uploader', fatal=False)
             r'</h1>\s*<p>([^<]+)', webpage, 'description', fatal=False)
         uploader = self._search_regex(
             (r'<input[^>]+name="contentOwnerId"[^>]+value="([^"]+)"',
              r'<span[^>]+class="nickname"[^>]*>([^<]+)'),
             webpage, 'uploader', fatal=False)
-        duration = parse_duration(self._search_regex(
-            r'<dt>Runtime:?</dt>\s*<dd>([^<]+)</dd>',
-            webpage, 'duration', fatal=False))
+        if not duration:
+            duration = parse_duration(self._search_regex(
+                r'<dt>Runtime:?</dt>\s*<dd>([^<]+)</dd>',
+                webpage, 'duration', fatal=False))
         view_count = str_to_int(self._search_regex(
         view_count = str_to_int(self._search_regex(
-            r'<dt>Views:?</dt>\s*<dd>([\d,\.]+)</dd>',
+            (r'["\']viewsCount["\'][^>]*>(\d+)\s+views',
+             r'<dt>Views:?</dt>\s*<dd>([\d,\.]+)</dd>'),
             webpage, 'view count', fatal=False))
         comment_count = str_to_int(self._html_search_regex(
             r'>Comments? \(([\d,\.]+)\)<',
             webpage, 'view count', fatal=False))
         comment_count = str_to_int(self._html_search_regex(
             r'>Comments? \(([\d,\.]+)\)<',
@@ -126,6 +145,7 @@ def _real_extract(self, url):
             'display_id': display_id,
             'title': title,
             'description': description,
             'display_id': display_id,
             'title': title,
             'description': description,
+            'thumbnail': thumbnail,
             'uploader': uploader,
             'duration': duration,
             'view_count': view_count,
             'uploader': uploader,
             'duration': duration,
             'view_count': view_count,
@@ -144,7 +164,7 @@ class XTubeUserIE(InfoExtractor):
             'id': 'greenshowers-4056496',
             'age_limit': 18,
         },
             'id': 'greenshowers-4056496',
             'age_limit': 18,
         },
-        'playlist_mincount': 155,
+        'playlist_mincount': 154,
     }
 
     def _real_extract(self, url):
     }
 
     def _real_extract(self, url):
index 238d9cea0c729912351895e5bd6ad453d43b7d31..e4615376c428432f7035c2141d1cbecc738496cc 100644 (file)
@@ -12,6 +12,7 @@
 )
 from ..utils import (
     clean_html,
 )
 from ..utils import (
     clean_html,
+    ExtractorError,
     int_or_none,
     mimetype2ext,
     parse_iso8601,
     int_or_none,
     mimetype2ext,
     parse_iso8601,
@@ -368,31 +369,47 @@ class YahooGyaOPlayerIE(InfoExtractor):
         'url': 'https://gyao.yahoo.co.jp/episode/%E3%81%8D%E3%81%AE%E3%81%86%E4%BD%95%E9%A3%9F%E3%81%B9%E3%81%9F%EF%BC%9F%20%E7%AC%AC2%E8%A9%B1%202019%2F4%2F12%E6%94%BE%E9%80%81%E5%88%86/5cb02352-b725-409e-9f8d-88f947a9f682',
         'only_matching': True,
     }]
         'url': 'https://gyao.yahoo.co.jp/episode/%E3%81%8D%E3%81%AE%E3%81%86%E4%BD%95%E9%A3%9F%E3%81%B9%E3%81%9F%EF%BC%9F%20%E7%AC%AC2%E8%A9%B1%202019%2F4%2F12%E6%94%BE%E9%80%81%E5%88%86/5cb02352-b725-409e-9f8d-88f947a9f682',
         'only_matching': True,
     }]
+    _GEO_BYPASS = False
 
     def _real_extract(self, url):
         video_id = self._match_id(url).replace('/', ':')
 
     def _real_extract(self, url):
         video_id = self._match_id(url).replace('/', ':')
-        video = self._download_json(
-            'https://gyao.yahoo.co.jp/dam/v1/videos/' + video_id,
-            video_id, query={
-                'fields': 'longDescription,title,videoId',
-            }, headers={
-                'X-User-Agent': 'Unknown Pc GYAO!/2.0.0 Web',
-            })
+        headers = self.geo_verification_headers()
+        headers['Accept'] = 'application/json'
+        resp = self._download_json(
+            'https://gyao.yahoo.co.jp/apis/playback/graphql', video_id, query={
+                'appId': 'dj00aiZpPUNJeDh2cU1RazU3UCZzPWNvbnN1bWVyc2VjcmV0Jng9NTk-',
+                'query': '''{
+  content(parameter: {contentId: "%s", logicaAgent: PC_WEB}) {
+    video {
+      delivery {
+        id
+      }
+      title
+    }
+  }
+}''' % video_id,
+            }, headers=headers)
+        content = resp['data']['content']
+        if not content:
+            msg = resp['errors'][0]['message']
+            if msg == 'not in japan':
+                self.raise_geo_restricted(countries=['JP'])
+            raise ExtractorError(msg)
+        video = content['video']
         return {
             '_type': 'url_transparent',
             'id': video_id,
             'title': video['title'],
             'url': smuggle_url(
         return {
             '_type': 'url_transparent',
             'id': video_id,
             'title': video['title'],
             'url': smuggle_url(
-                'http://players.brightcove.net/4235717419001/SyG5P0gjb_default/index.html?videoId=' + video['videoId'],
+                'http://players.brightcove.net/4235717419001/SyG5P0gjb_default/index.html?videoId=' + video['delivery']['id'],
                 {'geo_countries': ['JP']}),
                 {'geo_countries': ['JP']}),
-            'description': video.get('longDescription'),
             'ie_key': BrightcoveNewIE.ie_key(),
         }
 
 
 class YahooGyaOIE(InfoExtractor):
     IE_NAME = 'yahoo:gyao'
             'ie_key': BrightcoveNewIE.ie_key(),
         }
 
 
 class YahooGyaOIE(InfoExtractor):
     IE_NAME = 'yahoo:gyao'
-    _VALID_URL = r'https?://(?:gyao\.yahoo\.co\.jp/(?:p|title/[^/]+)|streaming\.yahoo\.co\.jp/p/y)/(?P<id>\d+/v\d+|[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})'
+    _VALID_URL = r'https?://(?:gyao\.yahoo\.co\.jp/(?:p|title(?:/[^/]+)?)|streaming\.yahoo\.co\.jp/p/y)/(?P<id>\d+/v\d+|[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})'
     _TESTS = [{
         'url': 'https://gyao.yahoo.co.jp/p/00449/v03102/',
         'info_dict': {
     _TESTS = [{
         'url': 'https://gyao.yahoo.co.jp/p/00449/v03102/',
         'info_dict': {
@@ -405,6 +422,9 @@ class YahooGyaOIE(InfoExtractor):
     }, {
         'url': 'https://gyao.yahoo.co.jp/title/%E3%81%97%E3%82%83%E3%81%B9%E3%81%8F%E3%82%8A007/5b025a49-b2e5-4dc7-945c-09c6634afacf',
         'only_matching': True,
     }, {
         'url': 'https://gyao.yahoo.co.jp/title/%E3%81%97%E3%82%83%E3%81%B9%E3%81%8F%E3%82%8A007/5b025a49-b2e5-4dc7-945c-09c6634afacf',
         'only_matching': True,
+    }, {
+        'url': 'https://gyao.yahoo.co.jp/title/5b025a49-b2e5-4dc7-945c-09c6634afacf',
+        'only_matching': True,
     }]
 
     def _real_extract(self, url):
     }]
 
     def _real_extract(self, url):
index dff69fcb7aca250373fc0e70b2f8278ed2661755..88aabd272c98e944523f3b333174342ba23c9fe1 100644 (file)
@@ -44,7 +44,7 @@ def _real_extract(self, url):
 
         encodings = self._parse_json(
             self._search_regex(
 
         encodings = self._parse_json(
             self._search_regex(
-                r'encodings\s*=\s*(\[.+?\]);\n', webpage, 'encodings',
+                r'[Ee]ncodings\s*=\s*(\[.+?\]);\n', webpage, 'encodings',
                 default='[]'),
             video_id, fatal=False)
         for encoding in encodings:
                 default='[]'),
             video_id, fatal=False)
         for encoding in encodings:
index d4eccb4b2a48efafec0232a451b3ee617e6bc859..e7fca22dec9a17b10ef76efcd43c4b9e0e4da208 100644 (file)
@@ -5,7 +5,6 @@
 from .common import InfoExtractor
 from ..utils import (
     int_or_none,
 from .common import InfoExtractor
 from ..utils import (
     int_or_none,
-    sanitized_Request,
     str_to_int,
     unescapeHTML,
     unified_strdate,
     str_to_int,
     unescapeHTML,
     unified_strdate,
@@ -15,7 +14,7 @@
 
 
 class YouPornIE(InfoExtractor):
 
 
 class YouPornIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?youporn\.com/watch/(?P<id>\d+)/(?P<display_id>[^/?#&]+)'
+    _VALID_URL = r'https?://(?:www\.)?youporn\.com/(?:watch|embed)/(?P<id>\d+)(?:/(?P<display_id>[^/?#&]+))?'
     _TESTS = [{
         'url': 'http://www.youporn.com/watch/505835/sex-ed-is-it-safe-to-masturbate-daily/',
         'md5': '3744d24c50438cf5b6f6d59feb5055c2',
     _TESTS = [{
         'url': 'http://www.youporn.com/watch/505835/sex-ed-is-it-safe-to-masturbate-daily/',
         'md5': '3744d24c50438cf5b6f6d59feb5055c2',
@@ -57,16 +56,28 @@ class YouPornIE(InfoExtractor):
         'params': {
             'skip_download': True,
         },
         'params': {
             'skip_download': True,
         },
+    }, {
+        'url': 'https://www.youporn.com/embed/505835/sex-ed-is-it-safe-to-masturbate-daily/',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.youporn.com/watch/505835',
+        'only_matching': True,
     }]
 
     }]
 
+    @staticmethod
+    def _extract_urls(webpage):
+        return re.findall(
+            r'<iframe[^>]+\bsrc=["\']((?:https?:)?//(?:www\.)?youporn\.com/embed/\d+)',
+            webpage)
+
     def _real_extract(self, url):
         mobj = re.match(self._VALID_URL, url)
         video_id = mobj.group('id')
     def _real_extract(self, url):
         mobj = re.match(self._VALID_URL, url)
         video_id = mobj.group('id')
-        display_id = mobj.group('display_id')
+        display_id = mobj.group('display_id') or video_id
 
 
-        request = sanitized_Request(url)
-        request.add_header('Cookie', 'age_verified=1')
-        webpage = self._download_webpage(request, display_id)
+        webpage = self._download_webpage(
+            'http://www.youporn.com/watch/%s' % video_id, display_id,
+            headers={'Cookie': 'age_verified=1'})
 
         title = self._html_search_regex(
             r'(?s)<div[^>]+class=["\']watchVideoTitle[^>]+>(.+?)</div>',
 
         title = self._html_search_regex(
             r'(?s)<div[^>]+class=["\']watchVideoTitle[^>]+>(.+?)</div>',
index 8a2d5f63bdb929edd310c250b9242bd2ed409208..98347491ee00b66d2c2f8df69ddf663c1bffbd84 100644 (file)
@@ -1,6 +1,7 @@
 from __future__ import unicode_literals
 
 from .common import InfoExtractor
 from __future__ import unicode_literals
 
 from .common import InfoExtractor
+from ..compat import compat_str
 from ..utils import (
     parse_duration,
     urljoin,
 from ..utils import (
     parse_duration,
     urljoin,
@@ -8,9 +9,9 @@
 
 
 class YourPornIE(InfoExtractor):
 
 
 class YourPornIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?(?:yourporn\.sexy|sxyprn\.com)/post/(?P<id>[^/?#&.]+)'
+    _VALID_URL = r'https?://(?:www\.)?sxyprn\.com/post/(?P<id>[^/?#&.]+)'
     _TESTS = [{
     _TESTS = [{
-        'url': 'https://yourporn.sexy/post/57ffcb2e1179b.html',
+        'url': 'https://sxyprn.com/post/57ffcb2e1179b.html',
         'md5': '6f8682b6464033d87acaa7a8ff0c092e',
         'info_dict': {
             'id': '57ffcb2e1179b',
         'md5': '6f8682b6464033d87acaa7a8ff0c092e',
         'info_dict': {
             'id': '57ffcb2e1179b',
@@ -33,11 +34,19 @@ def _real_extract(self, url):
 
         webpage = self._download_webpage(url, video_id)
 
 
         webpage = self._download_webpage(url, video_id)
 
-        video_url = urljoin(url, self._parse_json(
+        parts = self._parse_json(
             self._search_regex(
                 r'data-vnfo=(["\'])(?P<data>{.+?})\1', webpage, 'data info',
                 group='data'),
             self._search_regex(
                 r'data-vnfo=(["\'])(?P<data>{.+?})\1', webpage, 'data info',
                 group='data'),
-            video_id)[video_id]).replace('/cdn/', '/cdn5/')
+            video_id)[video_id].split('/')
+
+        num = 0
+        for c in parts[6] + parts[7]:
+            if c.isnumeric():
+                num += int(c)
+        parts[5] = compat_str(int(parts[5]) - num)
+        parts[1] += '8'
+        video_url = urljoin(url, '/'.join(parts))
 
         title = (self._search_regex(
             r'<[^>]+\bclass=["\']PostEditTA[^>]+>([^<]+)', webpage, 'title',
 
         title = (self._search_regex(
             r'<[^>]+\bclass=["\']PostEditTA[^>]+>([^<]+)', webpage, 'title',
@@ -54,4 +63,5 @@ def _real_extract(self, url):
             'thumbnail': thumbnail,
             'duration': duration,
             'age_limit': 18,
             'thumbnail': thumbnail,
             'duration': duration,
             'age_limit': 18,
+            'ext': 'mp4',
         }
         }
index b913d07a63920de2c56570644f7b8175afd5fd8a..b35bf03aafc7c7c45b3c35735a68d00f86aed988 100644 (file)
@@ -29,7 +29,6 @@
 from ..utils import (
     bool_or_none,
     clean_html,
 from ..utils import (
     bool_or_none,
     clean_html,
-    dict_get,
     error_to_compat_str,
     extract_attributes,
     ExtractorError,
     error_to_compat_str,
     extract_attributes,
     ExtractorError,
@@ -71,9 +70,14 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
 
     _PLAYLIST_ID_RE = r'(?:PL|LL|EC|UU|FL|RD|UL|TL|PU|OLAK5uy_)[0-9A-Za-z-_]{10,}'
 
 
     _PLAYLIST_ID_RE = r'(?:PL|LL|EC|UU|FL|RD|UL|TL|PU|OLAK5uy_)[0-9A-Za-z-_]{10,}'
 
+    _YOUTUBE_CLIENT_HEADERS = {
+        'x-youtube-client-name': '1',
+        'x-youtube-client-version': '1.20200609.04.02',
+    }
+
     def _set_language(self):
         self._set_cookie(
     def _set_language(self):
         self._set_cookie(
-            '.youtube.com', 'PREF', 'f1=50000000&hl=en',
+            '.youtube.com', 'PREF', 'f1=50000000&f6=8&hl=en',
             # YouTube sets the expire time to about two months
             expire_time=time.time() + 2 * 30 * 24 * 3600)
 
             # YouTube sets the expire time to about two months
             expire_time=time.time() + 2 * 30 * 24 * 3600)
 
@@ -299,10 +303,11 @@ def _entries(self, page, playlist_id):
                     # Downloading page may result in intermittent 5xx HTTP error
                     # that is usually worked around with a retry
                     more = self._download_json(
                     # Downloading page may result in intermittent 5xx HTTP error
                     # that is usually worked around with a retry
                     more = self._download_json(
-                        'https://youtube.com/%s' % mobj.group('more'), playlist_id,
+                        'https://www.youtube.com/%s' % mobj.group('more'), playlist_id,
                         'Downloading page #%s%s'
                         % (page_num, ' (retry #%d)' % count if count else ''),
                         'Downloading page #%s%s'
                         % (page_num, ' (retry #%d)' % count if count else ''),
-                        transform_source=uppercase_escape)
+                        transform_source=uppercase_escape,
+                        headers=self._YOUTUBE_CLIENT_HEADERS)
                     break
                 except ExtractorError as e:
                     if isinstance(e.cause, compat_HTTPError) and e.cause.code in (500, 503):
                     break
                 except ExtractorError as e:
                     if isinstance(e.cause, compat_HTTPError) and e.cause.code in (500, 503):
@@ -389,8 +394,15 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                             (?:www\.)?invidious\.drycat\.fr/|
                             (?:www\.)?tube\.poal\.co/|
                             (?:www\.)?vid\.wxzm\.sx/|
                             (?:www\.)?invidious\.drycat\.fr/|
                             (?:www\.)?tube\.poal\.co/|
                             (?:www\.)?vid\.wxzm\.sx/|
+                            (?:www\.)?yewtu\.be/|
                             (?:www\.)?yt\.elukerio\.org/|
                             (?:www\.)?yt\.lelux\.fi/|
                             (?:www\.)?yt\.elukerio\.org/|
                             (?:www\.)?yt\.lelux\.fi/|
+                            (?:www\.)?invidious\.ggc-project\.de/|
+                            (?:www\.)?yt\.maisputain\.ovh/|
+                            (?:www\.)?invidious\.13ad\.de/|
+                            (?:www\.)?invidious\.toot\.koeln/|
+                            (?:www\.)?invidious\.fdn\.fr/|
+                            (?:www\.)?watch\.nettohikari\.com/|
                             (?:www\.)?kgg2m7yk5aybusll\.onion/|
                             (?:www\.)?qklhadlycap4cnod\.onion/|
                             (?:www\.)?axqzx4s6s54s32yentfqojs3x5i7faxza6xo3ehd4bzzsg2ii4fv2iid\.onion/|
                             (?:www\.)?kgg2m7yk5aybusll\.onion/|
                             (?:www\.)?qklhadlycap4cnod\.onion/|
                             (?:www\.)?axqzx4s6s54s32yentfqojs3x5i7faxza6xo3ehd4bzzsg2ii4fv2iid\.onion/|
@@ -398,6 +410,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                             (?:www\.)?fz253lmuao3strwbfbmx46yu7acac2jz27iwtorgmbqlkurlclmancad\.onion/|
                             (?:www\.)?invidious\.l4qlywnpwqsluw65ts7md3khrivpirse744un3x7mlskqauz5pyuzgqd\.onion/|
                             (?:www\.)?owxfohz4kjyv25fvlqilyxast7inivgiktls3th44jhk3ej3i7ya\.b32\.i2p/|
                             (?:www\.)?fz253lmuao3strwbfbmx46yu7acac2jz27iwtorgmbqlkurlclmancad\.onion/|
                             (?:www\.)?invidious\.l4qlywnpwqsluw65ts7md3khrivpirse744un3x7mlskqauz5pyuzgqd\.onion/|
                             (?:www\.)?owxfohz4kjyv25fvlqilyxast7inivgiktls3th44jhk3ej3i7ya\.b32\.i2p/|
+                            (?:www\.)?4l2dgddgsrkf2ous66i6seeyi6etzfgrue332grh2n7madpwopotugyd\.onion/|
                             youtube\.googleapis\.com/)                        # the various hostnames, with wildcard subdomains
                          (?:.*?\#/)?                                          # handle anchor (#/) redirect urls
                          (?:                                                  # the various things that can precede the ID:
                             youtube\.googleapis\.com/)                        # the various hostnames, with wildcard subdomains
                          (?:.*?\#/)?                                          # handle anchor (#/) redirect urls
                          (?:                                                  # the various things that can precede the ID:
@@ -427,6 +440,10 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                      (?(1).+)?                                                # if we found the ID, everything can follow
                      $""" % {'playlist_id': YoutubeBaseInfoExtractor._PLAYLIST_ID_RE}
     _NEXT_URL_RE = r'[\?&]next_url=([^&]+)'
                      (?(1).+)?                                                # if we found the ID, everything can follow
                      $""" % {'playlist_id': YoutubeBaseInfoExtractor._PLAYLIST_ID_RE}
     _NEXT_URL_RE = r'[\?&]next_url=([^&]+)'
+    _PLAYER_INFO_RE = (
+        r'/(?P<id>[a-zA-Z0-9_-]{8,})/player_ias\.vflset(?:/[a-zA-Z]{2,3}_[a-zA-Z]{2,3})?/base\.(?P<ext>[a-z]+)$',
+        r'\b(?P<id>vfl[a-zA-Z0-9_-]+)\b.*?\.(?P<ext>[a-z]+)$',
+    )
     _formats = {
         '5': {'ext': 'flv', 'width': 400, 'height': 240, 'acodec': 'mp3', 'abr': 64, 'vcodec': 'h263'},
         '6': {'ext': 'flv', 'width': 450, 'height': 270, 'acodec': 'mp3', 'abr': 64, 'vcodec': 'h263'},
     _formats = {
         '5': {'ext': 'flv', 'width': 400, 'height': 240, 'acodec': 'mp3', 'abr': 64, 'vcodec': 'h263'},
         '6': {'ext': 'flv', 'width': 450, 'height': 270, 'acodec': 'mp3', 'abr': 64, 'vcodec': 'h263'},
@@ -570,7 +587,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                 'upload_date': '20120506',
                 'title': 'Icona Pop - I Love It (feat. Charli XCX) [OFFICIAL VIDEO]',
                 'alt_title': 'I Love It (feat. Charli XCX)',
                 'upload_date': '20120506',
                 'title': 'Icona Pop - I Love It (feat. Charli XCX) [OFFICIAL VIDEO]',
                 'alt_title': 'I Love It (feat. Charli XCX)',
-                'description': 'md5:f3ceb5ef83a08d95b9d146f973157cc8',
+                'description': 'md5:19a2f98d9032b9311e686ed039564f63',
                 'tags': ['Icona Pop i love it', 'sweden', 'pop music', 'big beat records', 'big beat', 'charli',
                          'xcx', 'charli xcx', 'girls', 'hbo', 'i love it', "i don't care", 'icona', 'pop',
                          'iconic ep', 'iconic', 'love', 'it'],
                 'tags': ['Icona Pop i love it', 'sweden', 'pop music', 'big beat records', 'big beat', 'charli',
                          'xcx', 'charli xcx', 'girls', 'hbo', 'i love it', "i don't care", 'icona', 'pop',
                          'iconic ep', 'iconic', 'love', 'it'],
@@ -685,12 +702,11 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                 'id': 'nfWlot6h_JM',
                 'ext': 'm4a',
                 'title': 'Taylor Swift - Shake It Off',
                 'id': 'nfWlot6h_JM',
                 'ext': 'm4a',
                 'title': 'Taylor Swift - Shake It Off',
-                'description': 'md5:bec2185232c05479482cb5a9b82719bf',
+                'description': 'md5:307195cd21ff7fa352270fe884570ef0',
                 'duration': 242,
                 'uploader': 'TaylorSwiftVEVO',
                 'uploader_id': 'TaylorSwiftVEVO',
                 'upload_date': '20140818',
                 'duration': 242,
                 'uploader': 'TaylorSwiftVEVO',
                 'uploader_id': 'TaylorSwiftVEVO',
                 'upload_date': '20140818',
-                'creator': 'Taylor Swift',
             },
             'params': {
                 'youtube_include_dash_manifest': True,
             },
             'params': {
                 'youtube_include_dash_manifest': True,
@@ -755,11 +771,11 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                 'upload_date': '20100430',
                 'uploader_id': 'deadmau5',
                 'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/deadmau5',
                 'upload_date': '20100430',
                 'uploader_id': 'deadmau5',
                 'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/deadmau5',
-                'creator': 'deadmau5',
+                'creator': 'Dada Life, deadmau5',
                 'description': 'md5:12c56784b8032162bb936a5f76d55360',
                 'uploader': 'deadmau5',
                 'title': 'Deadmau5 - Some Chords (HD)',
                 'description': 'md5:12c56784b8032162bb936a5f76d55360',
                 'uploader': 'deadmau5',
                 'title': 'Deadmau5 - Some Chords (HD)',
-                'alt_title': 'Some Chords',
+                'alt_title': 'This Machine Kills Some Chords',
             },
             'expected_warnings': [
                 'DASH manifest missing',
             },
             'expected_warnings': [
                 'DASH manifest missing',
@@ -1135,6 +1151,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                 'skip_download': True,
                 'youtube_include_dash_manifest': False,
             },
                 'skip_download': True,
                 'youtube_include_dash_manifest': False,
             },
+            'skip': 'not actual anymore',
         },
         {
             # Youtube Music Auto-generated description
         },
         {
             # Youtube Music Auto-generated description
@@ -1145,8 +1162,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                 'title': 'Voyeur Girl',
                 'description': 'md5:7ae382a65843d6df2685993e90a8628f',
                 'upload_date': '20190312',
                 'title': 'Voyeur Girl',
                 'description': 'md5:7ae382a65843d6df2685993e90a8628f',
                 'upload_date': '20190312',
-                'uploader': 'Various Artists - Topic',
-                'uploader_id': 'UCVWKBi1ELZn0QX2CBLSkiyw',
+                'uploader': 'Stephen - Topic',
+                'uploader_id': 'UC-pWHpBjdGG69N9mM2auIAA',
                 'artist': 'Stephen',
                 'track': 'Voyeur Girl',
                 'album': 'it\'s too much love to know my dear',
                 'artist': 'Stephen',
                 'track': 'Voyeur Girl',
                 'album': 'it\'s too much love to know my dear',
@@ -1210,7 +1227,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                 'id': '-hcAI0g-f5M',
                 'ext': 'mp4',
                 'title': 'Put It On Me',
                 'id': '-hcAI0g-f5M',
                 'ext': 'mp4',
                 'title': 'Put It On Me',
-                'description': 'md5:93c55acc682ae7b0c668f2e34e1c069e',
+                'description': 'md5:f6422397c07c4c907c6638e1fee380a5',
                 'upload_date': '20180426',
                 'uploader': 'Matt Maeson - Topic',
                 'uploader_id': 'UCnEkIGqtGcQMLk73Kp-Q5LQ',
                 'upload_date': '20180426',
                 'uploader': 'Matt Maeson - Topic',
                 'uploader_id': 'UCnEkIGqtGcQMLk73Kp-Q5LQ',
@@ -1228,6 +1245,26 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
             'url': 'https://www.youtubekids.com/watch?v=3b8nCWDgZ6Q',
             'only_matching': True,
         },
             'url': 'https://www.youtubekids.com/watch?v=3b8nCWDgZ6Q',
             'only_matching': True,
         },
+        {
+            # invalid -> valid video id redirection
+            'url': 'DJztXj2GPfl',
+            'info_dict': {
+                'id': 'DJztXj2GPfk',
+                'ext': 'mp4',
+                'title': 'Panjabi MC - Mundian To Bach Ke (The Dictator Soundtrack)',
+                'description': 'md5:bf577a41da97918e94fa9798d9228825',
+                'upload_date': '20090125',
+                'uploader': 'Prochorowka',
+                'uploader_id': 'Prochorowka',
+                'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/Prochorowka',
+                'artist': 'Panjabi MC',
+                'track': 'Beware of the Boys (Mundian to Bach Ke) - Motivo Hi-Lectro Remix',
+                'album': 'Beware of the Boys (Mundian To Bach Ke)',
+            },
+            'params': {
+                'skip_download': True,
+            },
+        }
     ]
 
     def __init__(self, *args, **kwargs):
     ]
 
     def __init__(self, *args, **kwargs):
@@ -1254,14 +1291,18 @@ def _signature_cache_id(self, example_sig):
         """ Return a string representation of a signature """
         return '.'.join(compat_str(len(part)) for part in example_sig.split('.'))
 
         """ Return a string representation of a signature """
         return '.'.join(compat_str(len(part)) for part in example_sig.split('.'))
 
-    def _extract_signature_function(self, video_id, player_url, example_sig):
-        id_m = re.match(
-            r'.*?-(?P<id>[a-zA-Z0-9_-]+)(?:/watch_as3|/html5player(?:-new)?|(?:/[a-z]{2,3}_[A-Z]{2})?/base)?\.(?P<ext>[a-z]+)$',
-            player_url)
-        if not id_m:
+    @classmethod
+    def _extract_player_info(cls, player_url):
+        for player_re in cls._PLAYER_INFO_RE:
+            id_m = re.search(player_re, player_url)
+            if id_m:
+                break
+        else:
             raise ExtractorError('Cannot identify player %r' % player_url)
             raise ExtractorError('Cannot identify player %r' % player_url)
-        player_type = id_m.group('ext')
-        player_id = id_m.group('id')
+        return id_m.group('ext'), id_m.group('id')
+
+    def _extract_signature_function(self, video_id, player_url, example_sig):
+        player_type, player_id = self._extract_player_info(player_url)
 
         # Read from filesystem cache
         func_id = '%s_%s_%s' % (
 
         # Read from filesystem cache
         func_id = '%s_%s_%s' % (
@@ -1343,6 +1384,7 @@ def _parse_sig_js(self, jscode):
         funcname = self._search_regex(
             (r'\b[cs]\s*&&\s*[adf]\.set\([^,]+\s*,\s*encodeURIComponent\s*\(\s*(?P<sig>[a-zA-Z0-9$]+)\(',
              r'\b[a-zA-Z0-9]+\s*&&\s*[a-zA-Z0-9]+\.set\([^,]+\s*,\s*encodeURIComponent\s*\(\s*(?P<sig>[a-zA-Z0-9$]+)\(',
         funcname = self._search_regex(
             (r'\b[cs]\s*&&\s*[adf]\.set\([^,]+\s*,\s*encodeURIComponent\s*\(\s*(?P<sig>[a-zA-Z0-9$]+)\(',
              r'\b[a-zA-Z0-9]+\s*&&\s*[a-zA-Z0-9]+\.set\([^,]+\s*,\s*encodeURIComponent\s*\(\s*(?P<sig>[a-zA-Z0-9$]+)\(',
+             r'(?:\b|[^a-zA-Z0-9$])(?P<sig>[a-zA-Z0-9$]{2})\s*=\s*function\(\s*a\s*\)\s*{\s*a\s*=\s*a\.split\(\s*""\s*\)',
              r'(?P<sig>[a-zA-Z0-9$]+)\s*=\s*function\(\s*a\s*\)\s*{\s*a\s*=\s*a\.split\(\s*""\s*\)',
              # Obsolete patterns
              r'(["\'])signature\1\s*,\s*(?P<sig>[a-zA-Z0-9$]+)\(',
              r'(?P<sig>[a-zA-Z0-9$]+)\s*=\s*function\(\s*a\s*\)\s*{\s*a\s*=\s*a\.split\(\s*""\s*\)',
              # Obsolete patterns
              r'(["\'])signature\1\s*,\s*(?P<sig>[a-zA-Z0-9$]+)\(',
@@ -1616,8 +1658,63 @@ def extract_id(cls, url):
         video_id = mobj.group(2)
         return video_id
 
         video_id = mobj.group(2)
         return video_id
 
+    def _extract_chapters_from_json(self, webpage, video_id, duration):
+        if not webpage:
+            return
+        player = self._parse_json(
+            self._search_regex(
+                r'RELATED_PLAYER_ARGS["\']\s*:\s*({.+})\s*,?\s*\n', webpage,
+                'player args', default='{}'),
+            video_id, fatal=False)
+        if not player or not isinstance(player, dict):
+            return
+        watch_next_response = player.get('watch_next_response')
+        if not isinstance(watch_next_response, compat_str):
+            return
+        response = self._parse_json(watch_next_response, video_id, fatal=False)
+        if not response or not isinstance(response, dict):
+            return
+        chapters_list = try_get(
+            response,
+            lambda x: x['playerOverlays']
+                       ['playerOverlayRenderer']
+                       ['decoratedPlayerBarRenderer']
+                       ['decoratedPlayerBarRenderer']
+                       ['playerBar']
+                       ['chapteredPlayerBarRenderer']
+                       ['chapters'],
+            list)
+        if not chapters_list:
+            return
+
+        def chapter_time(chapter):
+            return float_or_none(
+                try_get(
+                    chapter,
+                    lambda x: x['chapterRenderer']['timeRangeStartMillis'],
+                    int),
+                scale=1000)
+        chapters = []
+        for next_num, chapter in enumerate(chapters_list, start=1):
+            start_time = chapter_time(chapter)
+            if start_time is None:
+                continue
+            end_time = (chapter_time(chapters_list[next_num])
+                        if next_num < len(chapters_list) else duration)
+            if end_time is None:
+                continue
+            title = try_get(
+                chapter, lambda x: x['chapterRenderer']['title']['simpleText'],
+                compat_str)
+            chapters.append({
+                'start_time': start_time,
+                'end_time': end_time,
+                'title': title,
+            })
+        return chapters
+
     @staticmethod
     @staticmethod
-    def _extract_chapters(description, duration):
+    def _extract_chapters_from_description(description, duration):
         if not description:
             return None
         chapter_lines = re.findall(
         if not description:
             return None
         chapter_lines = re.findall(
@@ -1651,6 +1748,10 @@ def _extract_chapters(description, duration):
             })
         return chapters
 
             })
         return chapters
 
+    def _extract_chapters(self, webpage, description, video_id, duration):
+        return (self._extract_chapters_from_json(webpage, video_id, duration)
+                or self._extract_chapters_from_description(description, duration))
+
     def _real_extract(self, url):
         url, smuggled_data = unsmuggle_url(url, {})
 
     def _real_extract(self, url):
         url, smuggled_data = unsmuggle_url(url, {})
 
@@ -1678,7 +1779,10 @@ def _real_extract(self, url):
 
         # Get video webpage
         url = proto + '://www.youtube.com/watch?v=%s&gl=US&hl=en&has_verified=1&bpctr=9999999999' % video_id
 
         # Get video webpage
         url = proto + '://www.youtube.com/watch?v=%s&gl=US&hl=en&has_verified=1&bpctr=9999999999' % video_id
-        video_webpage = self._download_webpage(url, video_id)
+        video_webpage, urlh = self._download_webpage_handle(url, video_id)
+
+        qs = compat_parse_qs(compat_urllib_parse_urlparse(urlh.geturl()).query)
+        video_id = qs.get('v', [None])[0] or video_id
 
         # Attempt to extract SWF player URL
         mobj = re.search(r'swfConfig.*?"(https?:\\/\\/.*?watch.*?-.*?\.swf)"', video_webpage)
 
         # Attempt to extract SWF player URL
         mobj = re.search(r'swfConfig.*?"(https?:\\/\\/.*?watch.*?-.*?\.swf)"', video_webpage)
@@ -1707,9 +1811,6 @@ def add_dash_mpd_pr(pl_response):
         def extract_view_count(v_info):
             return int_or_none(try_get(v_info, lambda x: x['view_count'][0]))
 
         def extract_view_count(v_info):
             return int_or_none(try_get(v_info, lambda x: x['view_count'][0]))
 
-        def extract_token(v_info):
-            return dict_get(v_info, ('account_playback_token', 'accountPlaybackToken', 'token'))
-
         def extract_player_response(player_response, video_id):
             pl_response = str_or_none(player_response)
             if not pl_response:
         def extract_player_response(player_response, video_id):
             pl_response = str_or_none(player_response)
             if not pl_response:
@@ -1722,6 +1823,7 @@ def extract_player_response(player_response, video_id):
         player_response = {}
 
         # Get video info
         player_response = {}
 
         # Get video info
+        video_info = {}
         embed_webpage = None
         if re.search(r'player-age-gate-content">', video_webpage) is not None:
             age_gate = True
         embed_webpage = None
         if re.search(r'player-age-gate-content">', video_webpage) is not None:
             age_gate = True
@@ -1736,19 +1838,21 @@ def extract_player_response(player_response, video_id):
                     r'"sts"\s*:\s*(\d+)', embed_webpage, 'sts', default=''),
             })
             video_info_url = proto + '://www.youtube.com/get_video_info?' + data
                     r'"sts"\s*:\s*(\d+)', embed_webpage, 'sts', default=''),
             })
             video_info_url = proto + '://www.youtube.com/get_video_info?' + data
-            video_info_webpage = self._download_webpage(
-                video_info_url, video_id,
-                note='Refetching age-gated info webpage',
-                errnote='unable to download video info webpage')
-            video_info = compat_parse_qs(video_info_webpage)
-            pl_response = video_info.get('player_response', [None])[0]
-            player_response = extract_player_response(pl_response, video_id)
-            add_dash_mpd(video_info)
-            view_count = extract_view_count(video_info)
+            try:
+                video_info_webpage = self._download_webpage(
+                    video_info_url, video_id,
+                    note='Refetching age-gated info webpage',
+                    errnote='unable to download video info webpage')
+            except ExtractorError:
+                video_info_webpage = None
+            if video_info_webpage:
+                video_info = compat_parse_qs(video_info_webpage)
+                pl_response = video_info.get('player_response', [None])[0]
+                player_response = extract_player_response(pl_response, video_id)
+                add_dash_mpd(video_info)
+                view_count = extract_view_count(video_info)
         else:
             age_gate = False
         else:
             age_gate = False
-            video_info = None
-            sts = None
             # Try looking directly into the video webpage
             ytplayer_config = self._get_ytplayer_config(video_id, video_webpage)
             if ytplayer_config:
             # Try looking directly into the video webpage
             ytplayer_config = self._get_ytplayer_config(video_id, video_webpage)
             if ytplayer_config:
@@ -1765,61 +1869,10 @@ def extract_player_response(player_response, video_id):
                         args['ypc_vid'], YoutubeIE.ie_key(), video_id=args['ypc_vid'])
                 if args.get('livestream') == '1' or args.get('live_playback') == 1:
                     is_live = True
                         args['ypc_vid'], YoutubeIE.ie_key(), video_id=args['ypc_vid'])
                 if args.get('livestream') == '1' or args.get('live_playback') == 1:
                     is_live = True
-                sts = ytplayer_config.get('sts')
                 if not player_response:
                     player_response = extract_player_response(args.get('player_response'), video_id)
             if not video_info or self._downloader.params.get('youtube_include_dash_manifest', True):
                 add_dash_mpd_pr(player_response)
                 if not player_response:
                     player_response = extract_player_response(args.get('player_response'), video_id)
             if not video_info or self._downloader.params.get('youtube_include_dash_manifest', True):
                 add_dash_mpd_pr(player_response)
-                # We also try looking in get_video_info since it may contain different dashmpd
-                # URL that points to a DASH manifest with possibly different itag set (some itags
-                # are missing from DASH manifest pointed by webpage's dashmpd, some - from DASH
-                # manifest pointed by get_video_info's dashmpd).
-                # The general idea is to take a union of itags of both DASH manifests (for example
-                # video with such 'manifest behavior' see https://github.com/ytdl-org/youtube-dl/issues/6093)
-                self.report_video_info_webpage_download(video_id)
-                for el in ('embedded', 'detailpage', 'vevo', ''):
-                    query = {
-                        'video_id': video_id,
-                        'ps': 'default',
-                        'eurl': '',
-                        'gl': 'US',
-                        'hl': 'en',
-                    }
-                    if el:
-                        query['el'] = el
-                    if sts:
-                        query['sts'] = sts
-                    video_info_webpage = self._download_webpage(
-                        '%s://www.youtube.com/get_video_info' % proto,
-                        video_id, note=False,
-                        errnote='unable to download video info webpage',
-                        fatal=False, query=query)
-                    if not video_info_webpage:
-                        continue
-                    get_video_info = compat_parse_qs(video_info_webpage)
-                    if not player_response:
-                        pl_response = get_video_info.get('player_response', [None])[0]
-                        player_response = extract_player_response(pl_response, video_id)
-                    add_dash_mpd(get_video_info)
-                    if view_count is None:
-                        view_count = extract_view_count(get_video_info)
-                    if not video_info:
-                        video_info = get_video_info
-                    get_token = extract_token(get_video_info)
-                    if get_token:
-                        # Different get_video_info requests may report different results, e.g.
-                        # some may report video unavailability, but some may serve it without
-                        # any complaint (see https://github.com/ytdl-org/youtube-dl/issues/7362,
-                        # the original webpage as well as el=info and el=embedded get_video_info
-                        # requests report video unavailability due to geo restriction while
-                        # el=detailpage succeeds and returns valid data). This is probably
-                        # due to YouTube measures against IP ranges of hosting providers.
-                        # Working around by preferring the first succeeded video_info containing
-                        # the token if no such video_info yet was found.
-                        token = extract_token(video_info)
-                        if not token:
-                            video_info = get_video_info
-                        break
 
         def extract_unavailable_message():
             messages = []
 
         def extract_unavailable_message():
             messages = []
@@ -1832,16 +1885,22 @@ def extract_unavailable_message():
             if messages:
                 return '\n'.join(messages)
 
             if messages:
                 return '\n'.join(messages)
 
-        if not video_info:
+        if not video_info and not player_response:
             unavailable_message = extract_unavailable_message()
             if not unavailable_message:
                 unavailable_message = 'Unable to extract video data'
             raise ExtractorError(
                 'YouTube said: %s' % unavailable_message, expected=True, video_id=video_id)
 
             unavailable_message = extract_unavailable_message()
             if not unavailable_message:
                 unavailable_message = 'Unable to extract video data'
             raise ExtractorError(
                 'YouTube said: %s' % unavailable_message, expected=True, video_id=video_id)
 
+        if not isinstance(video_info, dict):
+            video_info = {}
+
         video_details = try_get(
             player_response, lambda x: x['videoDetails'], dict) or {}
 
         video_details = try_get(
             player_response, lambda x: x['videoDetails'], dict) or {}
 
+        microformat = try_get(
+            player_response, lambda x: x['microformat']['playerMicroformatRenderer'], dict) or {}
+
         video_title = video_info.get('title', [None])[0] or video_details.get('title')
         if not video_title:
             self._downloader.report_warning('Unable to extract video title')
         video_title = video_info.get('title', [None])[0] or video_details.get('title')
         if not video_title:
             self._downloader.report_warning('Unable to extract video title')
@@ -1871,7 +1930,7 @@ def replace_url(m):
             ''', replace_url, video_description)
             video_description = clean_html(video_description)
         else:
             ''', replace_url, video_description)
             video_description = clean_html(video_description)
         else:
-            video_description = self._html_search_meta('description', video_webpage) or video_details.get('shortDescription')
+            video_description = video_details.get('shortDescription') or self._html_search_meta('description', video_webpage)
 
         if not smuggled_data.get('force_singlefeed', False):
             if not self._downloader.params.get('noplaylist'):
 
         if not smuggled_data.get('force_singlefeed', False):
             if not self._downloader.params.get('noplaylist'):
@@ -1888,15 +1947,26 @@ def replace_url(m):
                         # fields may contain comma as well (see
                         # https://github.com/ytdl-org/youtube-dl/issues/8536)
                         feed_data = compat_parse_qs(compat_urllib_parse_unquote_plus(feed))
                         # fields may contain comma as well (see
                         # https://github.com/ytdl-org/youtube-dl/issues/8536)
                         feed_data = compat_parse_qs(compat_urllib_parse_unquote_plus(feed))
+
+                        def feed_entry(name):
+                            return try_get(feed_data, lambda x: x[name][0], compat_str)
+
+                        feed_id = feed_entry('id')
+                        if not feed_id:
+                            continue
+                        feed_title = feed_entry('title')
+                        title = video_title
+                        if feed_title:
+                            title += ' (%s)' % feed_title
                         entries.append({
                             '_type': 'url_transparent',
                             'ie_key': 'Youtube',
                             'url': smuggle_url(
                                 '%s://www.youtube.com/watch?v=%s' % (proto, feed_data['id'][0]),
                                 {'force_singlefeed': True}),
                         entries.append({
                             '_type': 'url_transparent',
                             'ie_key': 'Youtube',
                             'url': smuggle_url(
                                 '%s://www.youtube.com/watch?v=%s' % (proto, feed_data['id'][0]),
                                 {'force_singlefeed': True}),
-                            'title': '%s (%s)' % (video_title, feed_data['title'][0]),
+                            'title': title,
                         })
                         })
-                        feed_ids.append(feed_data['id'][0])
+                        feed_ids.append(feed_id)
                     self.to_screen(
                         'Downloading multifeed video (%s) - add --no-playlist to just download video %s'
                         % (', '.join(feed_ids), video_id))
                     self.to_screen(
                         'Downloading multifeed video (%s) - add --no-playlist to just download video %s'
                         % (', '.join(feed_ids), video_id))
@@ -1908,6 +1978,8 @@ def replace_url(m):
             view_count = extract_view_count(video_info)
         if view_count is None and video_details:
             view_count = int_or_none(video_details.get('viewCount'))
             view_count = extract_view_count(video_info)
         if view_count is None and video_details:
             view_count = int_or_none(video_details.get('viewCount'))
+        if view_count is None and microformat:
+            view_count = int_or_none(microformat.get('viewCount'))
 
         if is_live is None:
             is_live = bool_or_none(video_details.get('isLive'))
 
         if is_live is None:
             is_live = bool_or_none(video_details.get('isLive'))
@@ -1967,12 +2039,12 @@ def _extract_filesize(media_url):
                 }
 
             for fmt in streaming_formats:
                 }
 
             for fmt in streaming_formats:
-                if fmt.get('drm_families'):
+                if fmt.get('drmFamilies') or fmt.get('drm_families'):
                     continue
                 url = url_or_none(fmt.get('url'))
 
                 if not url:
                     continue
                 url = url_or_none(fmt.get('url'))
 
                 if not url:
-                    cipher = fmt.get('cipher')
+                    cipher = fmt.get('cipher') or fmt.get('signatureCipher')
                     if not cipher:
                         continue
                     url_data = compat_parse_qs(cipher)
                     if not cipher:
                         continue
                     url_data = compat_parse_qs(cipher)
@@ -2023,22 +2095,10 @@ def _extract_filesize(media_url):
 
                         if self._downloader.params.get('verbose'):
                             if player_url is None:
 
                         if self._downloader.params.get('verbose'):
                             if player_url is None:
-                                player_version = 'unknown'
                                 player_desc = 'unknown'
                             else:
                                 player_desc = 'unknown'
                             else:
-                                if player_url.endswith('swf'):
-                                    player_version = self._search_regex(
-                                        r'-(.+?)(?:/watch_as3)?\.swf$', player_url,
-                                        'flash player', fatal=False)
-                                    player_desc = 'flash player %s' % player_version
-                                else:
-                                    player_version = self._search_regex(
-                                        [r'html5player-([^/]+?)(?:/html5player(?:-new)?)?\.js',
-                                         r'(?:www|player(?:_ias)?)-([^/]+)(?:/[a-z]{2,3}_[A-Z]{2})?/base\.js'],
-                                        player_url,
-                                        'html5 player', fatal=False)
-                                    player_desc = 'html5 player %s' % player_version
-
+                                player_type, player_version = self._extract_player_info(player_url)
+                                player_desc = '%s player %s' % ('flash' if player_type == 'swf' else 'html5', player_version)
                             parts_sizes = self._signature_cache_id(encrypted_sig)
                             self.to_screen('{%s} signature length %s, %s' %
                                            (format_id, parts_sizes, player_desc))
                             parts_sizes = self._signature_cache_id(encrypted_sig)
                             self.to_screen('{%s} signature length %s, %s' %
                                            (format_id, parts_sizes, player_desc))
@@ -2171,7 +2231,12 @@ def _extract_filesize(media_url):
             video_uploader_id = mobj.group('uploader_id')
             video_uploader_url = mobj.group('uploader_url')
         else:
             video_uploader_id = mobj.group('uploader_id')
             video_uploader_url = mobj.group('uploader_url')
         else:
-            self._downloader.report_warning('unable to extract uploader nickname')
+            owner_profile_url = url_or_none(microformat.get('ownerProfileUrl'))
+            if owner_profile_url:
+                video_uploader_id = self._search_regex(
+                    r'(?:user|channel)/([^/]+)', owner_profile_url, 'uploader id',
+                    default=None)
+                video_uploader_url = owner_profile_url
 
         channel_id = (
             str_or_none(video_details.get('channelId'))
 
         channel_id = (
             str_or_none(video_details.get('channelId'))
@@ -2182,17 +2247,33 @@ def _extract_filesize(media_url):
                 video_webpage, 'channel id', default=None, group='id'))
         channel_url = 'http://www.youtube.com/channel/%s' % channel_id if channel_id else None
 
                 video_webpage, 'channel id', default=None, group='id'))
         channel_url = 'http://www.youtube.com/channel/%s' % channel_id if channel_id else None
 
-        # thumbnail image
-        # We try first to get a high quality image:
-        m_thumb = re.search(r'<span itemprop="thumbnail".*?href="(.*?)">',
-                            video_webpage, re.DOTALL)
-        if m_thumb is not None:
-            video_thumbnail = m_thumb.group(1)
-        elif 'thumbnail_url' not in video_info:
-            self._downloader.report_warning('unable to extract video thumbnail')
+        thumbnails = []
+        thumbnails_list = try_get(
+            video_details, lambda x: x['thumbnail']['thumbnails'], list) or []
+        for t in thumbnails_list:
+            if not isinstance(t, dict):
+                continue
+            thumbnail_url = url_or_none(t.get('url'))
+            if not thumbnail_url:
+                continue
+            thumbnails.append({
+                'url': thumbnail_url,
+                'width': int_or_none(t.get('width')),
+                'height': int_or_none(t.get('height')),
+            })
+
+        if not thumbnails:
             video_thumbnail = None
             video_thumbnail = None
-        else:   # don't panic if we can't find it
-            video_thumbnail = compat_urllib_parse_unquote_plus(video_info['thumbnail_url'][0])
+            # We try first to get a high quality image:
+            m_thumb = re.search(r'<span itemprop="thumbnail".*?href="(.*?)">',
+                                video_webpage, re.DOTALL)
+            if m_thumb is not None:
+                video_thumbnail = m_thumb.group(1)
+            thumbnail_url = try_get(video_info, lambda x: x['thumbnail_url'][0], compat_str)
+            if thumbnail_url:
+                video_thumbnail = compat_urllib_parse_unquote_plus(thumbnail_url)
+            if video_thumbnail:
+                thumbnails.append({'url': video_thumbnail})
 
         # upload date
         upload_date = self._html_search_meta(
 
         # upload date
         upload_date = self._html_search_meta(
@@ -2202,6 +2283,8 @@ def _extract_filesize(media_url):
                 [r'(?s)id="eow-date.*?>(.*?)</span>',
                  r'(?:id="watch-uploader-info".*?>.*?|["\']simpleText["\']\s*:\s*["\'])(?:Published|Uploaded|Streamed live|Started) on (.+?)[<"\']'],
                 video_webpage, 'upload date', default=None)
                 [r'(?s)id="eow-date.*?>(.*?)</span>',
                  r'(?:id="watch-uploader-info".*?>.*?|["\']simpleText["\']\s*:\s*["\'])(?:Published|Uploaded|Streamed live|Started) on (.+?)[<"\']'],
                 video_webpage, 'upload date', default=None)
+        if not upload_date:
+            upload_date = microformat.get('publishDate') or microformat.get('uploadDate')
         upload_date = unified_strdate(upload_date)
 
         video_license = self._html_search_regex(
         upload_date = unified_strdate(upload_date)
 
         video_license = self._html_search_regex(
@@ -2273,17 +2356,21 @@ def extract_meta(field):
         m_cat_container = self._search_regex(
             r'(?s)<h4[^>]*>\s*Category\s*</h4>\s*<ul[^>]*>(.*?)</ul>',
             video_webpage, 'categories', default=None)
         m_cat_container = self._search_regex(
             r'(?s)<h4[^>]*>\s*Category\s*</h4>\s*<ul[^>]*>(.*?)</ul>',
             video_webpage, 'categories', default=None)
+        category = None
         if m_cat_container:
             category = self._html_search_regex(
                 r'(?s)<a[^<]+>(.*?)</a>', m_cat_container, 'category',
                 default=None)
         if m_cat_container:
             category = self._html_search_regex(
                 r'(?s)<a[^<]+>(.*?)</a>', m_cat_container, 'category',
                 default=None)
-            video_categories = None if category is None else [category]
-        else:
-            video_categories = None
+        if not category:
+            category = try_get(
+                microformat, lambda x: x['category'], compat_str)
+        video_categories = None if category is None else [category]
 
         video_tags = [
             unescapeHTML(m.group('content'))
             for m in re.finditer(self._meta_regex('og:video:tag'), video_webpage)]
 
         video_tags = [
             unescapeHTML(m.group('content'))
             for m in re.finditer(self._meta_regex('og:video:tag'), video_webpage)]
+        if not video_tags:
+            video_tags = try_get(video_details, lambda x: x['keywords'], list)
 
         def _extract_count(count_name):
             return str_to_int(self._search_regex(
 
         def _extract_count(count_name):
             return str_to_int(self._search_regex(
@@ -2334,7 +2421,7 @@ def _extract_count(count_name):
                     errnote='Unable to download video annotations', fatal=False,
                     data=urlencode_postdata({xsrf_field_name: xsrf_token}))
 
                     errnote='Unable to download video annotations', fatal=False,
                     data=urlencode_postdata({xsrf_field_name: xsrf_token}))
 
-        chapters = self._extract_chapters(description_original, video_duration)
+        chapters = self._extract_chapters(video_webpage, description_original, video_id, video_duration)
 
         # Look for the DASH manifest
         if self._downloader.params.get('youtube_include_dash_manifest', True):
 
         # Look for the DASH manifest
         if self._downloader.params.get('youtube_include_dash_manifest', True):
@@ -2391,30 +2478,23 @@ def decrypt_sig(mobj):
                         f['stretched_ratio'] = ratio
 
         if not formats:
                         f['stretched_ratio'] = ratio
 
         if not formats:
-            token = extract_token(video_info)
-            if not token:
-                if 'reason' in video_info:
-                    if 'The uploader has not made this video available in your country.' in video_info['reason']:
-                        regions_allowed = self._html_search_meta(
-                            'regionsAllowed', video_webpage, default=None)
-                        countries = regions_allowed.split(',') if regions_allowed else None
-                        self.raise_geo_restricted(
-                            msg=video_info['reason'][0], countries=countries)
-                    reason = video_info['reason'][0]
-                    if 'Invalid parameters' in reason:
-                        unavailable_message = extract_unavailable_message()
-                        if unavailable_message:
-                            reason = unavailable_message
-                    raise ExtractorError(
-                        'YouTube said: %s' % reason,
-                        expected=True, video_id=video_id)
-                else:
-                    raise ExtractorError(
-                        '"token" parameter not in video info for unknown reason',
-                        video_id=video_id)
-
-        if not formats and (video_info.get('license_info') or try_get(player_response, lambda x: x['streamingData']['licenseInfos'])):
-            raise ExtractorError('This video is DRM protected.', expected=True)
+            if 'reason' in video_info:
+                if 'The uploader has not made this video available in your country.' in video_info['reason']:
+                    regions_allowed = self._html_search_meta(
+                        'regionsAllowed', video_webpage, default=None)
+                    countries = regions_allowed.split(',') if regions_allowed else None
+                    self.raise_geo_restricted(
+                        msg=video_info['reason'][0], countries=countries)
+                reason = video_info['reason'][0]
+                if 'Invalid parameters' in reason:
+                    unavailable_message = extract_unavailable_message()
+                    if unavailable_message:
+                        reason = unavailable_message
+                raise ExtractorError(
+                    'YouTube said: %s' % reason,
+                    expected=True, video_id=video_id)
+            if video_info.get('license_info') or try_get(player_response, lambda x: x['streamingData']['licenseInfos']):
+                raise ExtractorError('This video is DRM protected.', expected=True)
 
         self._sort_formats(formats)
 
 
         self._sort_formats(formats)
 
@@ -2432,7 +2512,7 @@ def decrypt_sig(mobj):
             'creator': video_creator or artist,
             'title': video_title,
             'alt_title': video_alt_title or track,
             'creator': video_creator or artist,
             'title': video_title,
             'alt_title': video_alt_title or track,
-            'thumbnail': video_thumbnail,
+            'thumbnails': thumbnails,
             'description': video_description,
             'categories': video_categories,
             'tags': video_tags,
             'description': video_description,
             'categories': video_categories,
             'tags': video_tags,
@@ -2494,20 +2574,23 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
     _VIDEO_RE = _VIDEO_RE_TPL % r'(?P<id>[0-9A-Za-z_-]{11})'
     IE_NAME = 'youtube:playlist'
     _TESTS = [{
     _VIDEO_RE = _VIDEO_RE_TPL % r'(?P<id>[0-9A-Za-z_-]{11})'
     IE_NAME = 'youtube:playlist'
     _TESTS = [{
-        'url': 'https://www.youtube.com/playlist?list=PLwiyx1dc3P2JR9N8gQaQN_BCvlSlap7re',
+        'url': 'https://www.youtube.com/playlist?list=PL4lCao7KL_QFVb7Iudeipvc2BCavECqzc',
         'info_dict': {
         'info_dict': {
-            'title': 'ytdl test PL',
-            'id': 'PLwiyx1dc3P2JR9N8gQaQN_BCvlSlap7re',
+            'uploader_id': 'UCmlqkdCBesrv2Lak1mF_MxA',
+            'uploader': 'Sergey M.',
+            'id': 'PL4lCao7KL_QFVb7Iudeipvc2BCavECqzc',
+            'title': 'youtube-dl public playlist',
         },
         },
-        'playlist_count': 3,
+        'playlist_count': 1,
     }, {
     }, {
-        'url': 'https://www.youtube.com/playlist?list=PLtPgu7CB4gbZDA7i_euNxn75ISqxwZPYx',
+        'url': 'https://www.youtube.com/playlist?list=PL4lCao7KL_QFodcLWhDpGCYnngnHtQ-Xf',
         'info_dict': {
         'info_dict': {
-            'id': 'PLtPgu7CB4gbZDA7i_euNxn75ISqxwZPYx',
-            'title': 'YDL_Empty_List',
+            'uploader_id': 'UCmlqkdCBesrv2Lak1mF_MxA',
+            'uploader': 'Sergey M.',
+            'id': 'PL4lCao7KL_QFodcLWhDpGCYnngnHtQ-Xf',
+            'title': 'youtube-dl empty playlist',
         },
         'playlist_count': 0,
         },
         'playlist_count': 0,
-        'skip': 'This playlist is private',
     }, {
         'note': 'Playlist with deleted videos (#651). As a bonus, the video #51 is also twice in this list.',
         'url': 'https://www.youtube.com/playlist?list=PLwP_SiAcdui0KVebT0mU9Apz359a4ubsC',
     }, {
         'note': 'Playlist with deleted videos (#651). As a bonus, the video #51 is also twice in this list.',
         'url': 'https://www.youtube.com/playlist?list=PLwP_SiAcdui0KVebT0mU9Apz359a4ubsC',
@@ -2517,7 +2600,7 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
             'uploader': 'Christiaan008',
             'uploader_id': 'ChRiStIaAn008',
         },
             'uploader': 'Christiaan008',
             'uploader_id': 'ChRiStIaAn008',
         },
-        'playlist_count': 95,
+        'playlist_count': 96,
     }, {
         'note': 'issue #673',
         'url': 'PLBB231211A4F62143',
     }, {
         'note': 'issue #673',
         'url': 'PLBB231211A4F62143',
@@ -2693,7 +2776,7 @@ def _extract_mix(self, playlist_id):
         ids = []
         last_id = playlist_id[-11:]
         for n in itertools.count(1):
         ids = []
         last_id = playlist_id[-11:]
         for n in itertools.count(1):
-            url = 'https://youtube.com/watch?v=%s&list=%s' % (last_id, playlist_id)
+            url = 'https://www.youtube.com/watch?v=%s&list=%s' % (last_id, playlist_id)
             webpage = self._download_webpage(
                 url, playlist_id, 'Downloading page {0} of Youtube mix'.format(n))
             new_ids = orderedSet(re.findall(
             webpage = self._download_webpage(
                 url, playlist_id, 'Downloading page {0} of Youtube mix'.format(n))
             new_ids = orderedSet(re.findall(
@@ -3033,7 +3116,7 @@ def _real_extract(self, url):
 
 class YoutubePlaylistsIE(YoutubePlaylistsBaseInfoExtractor):
     IE_DESC = 'YouTube.com user/channel playlists'
 
 class YoutubePlaylistsIE(YoutubePlaylistsBaseInfoExtractor):
     IE_DESC = 'YouTube.com user/channel playlists'
-    _VALID_URL = r'https?://(?:\w+\.)?youtube\.com/(?:user|channel)/(?P<id>[^/]+)/playlists'
+    _VALID_URL = r'https?://(?:\w+\.)?youtube\.com/(?:user|channel|c)/(?P<id>[^/]+)/playlists'
     IE_NAME = 'youtube:playlists'
 
     _TESTS = [{
     IE_NAME = 'youtube:playlists'
 
     _TESTS = [{
@@ -3059,6 +3142,9 @@ class YoutubePlaylistsIE(YoutubePlaylistsBaseInfoExtractor):
             'title': 'Chem Player',
         },
         'skip': 'Blocked',
             'title': 'Chem Player',
         },
         'skip': 'Blocked',
+    }, {
+        'url': 'https://www.youtube.com/c/ChristophLaimer/playlists',
+        'only_matching': True,
     }]
 
 
     }]
 
 
@@ -3203,9 +3289,10 @@ def _entries(self, page):
                 break
 
             more = self._download_json(
                 break
 
             more = self._download_json(
-                'https://youtube.com/%s' % mobj.group('more'), self._PLAYLIST_TITLE,
+                'https://www.youtube.com/%s' % mobj.group('more'), self._PLAYLIST_TITLE,
                 'Downloading page #%s' % page_num,
                 'Downloading page #%s' % page_num,
-                transform_source=uppercase_escape)
+                transform_source=uppercase_escape,
+                headers=self._YOUTUBE_CLIENT_HEADERS)
             content_html = more['content_html']
             more_widget_html = more['load_more_widget_html']
 
             content_html = more['content_html']
             more_widget_html = more['load_more_widget_html']
 
index bacb82eeeb2a549edbb0cbf6d0a67e07f28b595b..f6496f5168cf057c9c415cc7461105462ad66370 100644 (file)
@@ -29,7 +29,6 @@ class ZapiksIE(InfoExtractor):
                 'timestamp': 1359044972,
                 'upload_date': '20130124',
                 'view_count': int,
                 'timestamp': 1359044972,
                 'upload_date': '20130124',
                 'view_count': int,
-                'comment_count': int,
             },
         },
         {
             },
         },
         {
index 145c123a42fee5e67c0fd8c2750ea13562632666..656864b2ed8a9982c1da934b01aff6bcb7126acf 100644 (file)
@@ -244,14 +244,14 @@ class ZDFChannelIE(ZDFBaseIE):
             'id': 'das-aktuelle-sportstudio',
             'title': 'das aktuelle sportstudio | ZDF',
         },
             'id': 'das-aktuelle-sportstudio',
             'title': 'das aktuelle sportstudio | ZDF',
         },
-        'playlist_count': 21,
+        'playlist_mincount': 23,
     }, {
         'url': 'https://www.zdf.de/dokumentation/planet-e',
         'info_dict': {
             'id': 'planet-e',
             'title': 'planet e.',
         },
     }, {
         'url': 'https://www.zdf.de/dokumentation/planet-e',
         'info_dict': {
             'id': 'planet-e',
             'title': 'planet e.',
         },
-        'playlist_count': 4,
+        'playlist_mincount': 50,
     }, {
         'url': 'https://www.zdf.de/filme/taunuskrimi/',
         'only_matching': True,
     }, {
         'url': 'https://www.zdf.de/filme/taunuskrimi/',
         'only_matching': True,
index 3b16e703b9b9ef54497b3172deac0d299efb7373..2e2e97a0c4454971dab30136518ca26e903b9470 100644 (file)
@@ -4,10 +4,20 @@
 import re
 
 from .common import InfoExtractor
 import re
 
 from .common import InfoExtractor
+from ..compat import compat_HTTPError
+from ..utils import (
+    dict_get,
+    ExtractorError,
+    int_or_none,
+    js_to_json,
+    parse_iso8601,
+)
 
 
 class ZypeIE(InfoExtractor):
 
 
 class ZypeIE(InfoExtractor):
-    _VALID_URL = r'https?://player\.zype\.com/embed/(?P<id>[\da-fA-F]+)\.js\?.*?api_key=[^&]+'
+    _ID_RE = r'[\da-fA-F]+'
+    _COMMON_RE = r'//player\.zype\.com/embed/%s\.(?:js|json|html)\?.*?(?:access_token|(?:ap[ip]|player)_key)='
+    _VALID_URL = r'https?:%s[^&]+' % (_COMMON_RE % ('(?P<id>%s)' % _ID_RE))
     _TEST = {
         'url': 'https://player.zype.com/embed/5b400b834b32992a310622b9.js?api_key=jZ9GUhRmxcPvX7M3SlfejB6Hle9jyHTdk2jVxG7wOHPLODgncEKVdPYBhuz9iWXQ&autoplay=false&controls=true&da=false',
         'md5': 'eaee31d474c76a955bdaba02a505c595',
     _TEST = {
         'url': 'https://player.zype.com/embed/5b400b834b32992a310622b9.js?api_key=jZ9GUhRmxcPvX7M3SlfejB6Hle9jyHTdk2jVxG7wOHPLODgncEKVdPYBhuz9iWXQ&autoplay=false&controls=true&da=false',
         'md5': 'eaee31d474c76a955bdaba02a505c595',
@@ -16,6 +26,9 @@ class ZypeIE(InfoExtractor):
             'ext': 'mp4',
             'title': 'Smoky Barbecue Favorites',
             'thumbnail': r're:^https?://.*\.jpe?g',
             'ext': 'mp4',
             'title': 'Smoky Barbecue Favorites',
             'thumbnail': r're:^https?://.*\.jpe?g',
+            'description': 'md5:5ff01e76316bd8d46508af26dc86023b',
+            'timestamp': 1504915200,
+            'upload_date': '20170909',
         },
     }
 
         },
     }
 
@@ -24,34 +37,98 @@ def _extract_urls(webpage):
         return [
             mobj.group('url')
             for mobj in re.finditer(
         return [
             mobj.group('url')
             for mobj in re.finditer(
-                r'<script[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//player\.zype\.com/embed/[\da-fA-F]+\.js\?.*?api_key=.+?)\1',
+                r'<script[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?%s.+?)\1' % (ZypeIE._COMMON_RE % ZypeIE._ID_RE),
                 webpage)]
 
     def _real_extract(self, url):
         video_id = self._match_id(url)
 
                 webpage)]
 
     def _real_extract(self, url):
         video_id = self._match_id(url)
 
-        webpage = self._download_webpage(url, video_id)
+        try:
+            response = self._download_json(re.sub(
+                r'\.(?:js|html)\?', '.json?', url), video_id)['response']
+        except ExtractorError as e:
+            if isinstance(e.cause, compat_HTTPError) and e.cause.code in (400, 401, 403):
+                raise ExtractorError(self._parse_json(
+                    e.cause.read().decode(), video_id)['message'], expected=True)
+            raise
 
 
-        title = self._search_regex(
-            r'video_title\s*[:=]\s*(["\'])(?P<value>(?:(?!\1).)+)\1', webpage,
-            'title', group='value')
+        body = response['body']
+        video = response['video']
+        title = video['title']
 
 
-        m3u8_url = self._search_regex(
-            r'(["\'])(?P<url>(?:(?!\1).)+\.m3u8(?:(?!\1).)*)\1', webpage,
-            'm3u8 url', group='url')
-
-        formats = self._extract_m3u8_formats(
-            m3u8_url, video_id, 'mp4', entry_protocol='m3u8_native',
-            m3u8_id='hls')
+        if isinstance(body, dict):
+            formats = []
+            for output in body.get('outputs', []):
+                output_url = output.get('url')
+                if not output_url:
+                    continue
+                name = output.get('name')
+                if name == 'm3u8':
+                    formats = self._extract_m3u8_formats(
+                        output_url, video_id, 'mp4',
+                        'm3u8_native', m3u8_id='hls', fatal=False)
+                else:
+                    f = {
+                        'format_id': name,
+                        'tbr': int_or_none(output.get('bitrate')),
+                        'url': output_url,
+                    }
+                    if name in ('m4a', 'mp3'):
+                        f['vcodec'] = 'none'
+                    else:
+                        f.update({
+                            'height': int_or_none(output.get('height')),
+                            'width': int_or_none(output.get('width')),
+                        })
+                    formats.append(f)
+            text_tracks = body.get('subtitles') or []
+        else:
+            m3u8_url = self._search_regex(
+                r'(["\'])(?P<url>(?:(?!\1).)+\.m3u8(?:(?!\1).)*)\1',
+                body, 'm3u8 url', group='url')
+            formats = self._extract_m3u8_formats(
+                m3u8_url, video_id, 'mp4', 'm3u8_native', m3u8_id='hls')
+            text_tracks = self._search_regex(
+                r'textTracks\s*:\s*(\[[^]]+\])',
+                body, 'text tracks', default=None)
+            if text_tracks:
+                text_tracks = self._parse_json(
+                    text_tracks, video_id, js_to_json, False)
         self._sort_formats(formats)
 
         self._sort_formats(formats)
 
-        thumbnail = self._search_regex(
-            r'poster\s*[:=]\s*(["\'])(?P<url>(?:(?!\1).)+)\1', webpage, 'thumbnail',
-            default=False, group='url')
+        subtitles = {}
+        if text_tracks:
+            for text_track in text_tracks:
+                tt_url = dict_get(text_track, ('file', 'src'))
+                if not tt_url:
+                    continue
+                subtitles.setdefault(text_track.get('label') or 'English', []).append({
+                    'url': tt_url,
+                })
+
+        thumbnails = []
+        for thumbnail in video.get('thumbnails', []):
+            thumbnail_url = thumbnail.get('url')
+            if not thumbnail_url:
+                continue
+            thumbnails.append({
+                'url': thumbnail_url,
+                'width': int_or_none(thumbnail.get('width')),
+                'height': int_or_none(thumbnail.get('height')),
+            })
 
         return {
             'id': video_id,
 
         return {
             'id': video_id,
+            'display_id': video.get('friendly_title'),
             'title': title,
             'title': title,
-            'thumbnail': thumbnail,
+            'thumbnails': thumbnails,
+            'description': dict_get(video, ('description', 'ott_description', 'short_description')),
+            'timestamp': parse_iso8601(video.get('published_at')),
+            'duration': int_or_none(video.get('duration')),
+            'view_count': int_or_none(video.get('request_count')),
+            'average_rating': int_or_none(video.get('rating')),
+            'season_number': int_or_none(video.get('season')),
+            'episode_number': int_or_none(video.get('episode')),
             'formats': formats,
             'formats': formats,
+            'subtitles': subtitles,
         }
         }
index 1ffabc62bedacb42aeb34f585d04ed7bc3ff8045..6d5ac62b3bab68de226060c6bcd3cd9299f02825 100644 (file)
@@ -134,7 +134,7 @@ def _comma_separated_values_options_callback(option, opt_str, value, parser):
         action='help',
         help='Print this help text and exit')
     general.add_option(
         action='help',
         help='Print this help text and exit')
     general.add_option(
-        '-v', '--version',
+        '--version',
         action='version',
         help='Print program version and exit')
     general.add_option(
         action='version',
         help='Print program version and exit')
     general.add_option(
@@ -853,7 +853,7 @@ def _comma_separated_values_options_callback(option, opt_str, value, parser):
     postproc.add_option(
         '--exec',
         metavar='CMD', dest='exec_cmd',
     postproc.add_option(
         '--exec',
         metavar='CMD', dest='exec_cmd',
-        help='Execute a command on the file after downloading, similar to find\'s -exec syntax. Example: --exec \'adb push {} /sdcard/Music/ && rm {}\'')
+        help='Execute a command on the file after downloading and post-processing, similar to find\'s -exec syntax. Example: --exec \'adb push {} /sdcard/Music/ && rm {}\'')
     postproc.add_option(
         '--convert-subs', '--convert-subtitles',
         metavar='FORMAT', dest='convertsubtitles', default=None,
     postproc.add_option(
         '--convert-subs', '--convert-subtitles',
         metavar='FORMAT', dest='convertsubtitles', default=None,
index fd3f921a8a11da2e8c31573889ea4d7f5a9fea25..5f7298345b3f550c9a8d8b7ba2b602d2f084ed36 100644 (file)
@@ -447,6 +447,13 @@ def add(meta_list, info_list=None):
                         metadata[meta_f] = info[info_f]
                     break
 
                         metadata[meta_f] = info[info_f]
                     break
 
+        # See [1-4] for some info on media metadata/metadata supported
+        # by ffmpeg.
+        # 1. https://kdenlive.org/en/project/adding-meta-data-to-mp4-video/
+        # 2. https://wiki.multimedia.cx/index.php/FFmpeg_Metadata
+        # 3. https://kodi.wiki/view/Video_file_tagging
+        # 4. http://atomicparsley.sourceforge.net/mpeg-4files.html
+
         add('title', ('track', 'title'))
         add('date', 'upload_date')
         add(('description', 'comment'), 'description')
         add('title', ('track', 'title'))
         add('date', 'upload_date')
         add(('description', 'comment'), 'description')
@@ -457,6 +464,10 @@ def add(meta_list, info_list=None):
         add('album')
         add('album_artist')
         add('disc', 'disc_number')
         add('album')
         add('album_artist')
         add('disc', 'disc_number')
+        add('show', 'series')
+        add('season_number')
+        add('episode_id', ('episode', 'episode_id'))
+        add('episode_sort', 'episode_number')
 
         if not metadata:
             self._downloader.to_screen('[ffmpeg] There isn\'t any metadata to add')
 
         if not metadata:
             self._downloader.to_screen('[ffmpeg] There isn\'t any metadata to add')
index 002ea7f3386215c61bcf3bc60419d0059abf2bc2..84c9646171e0b8f8d6a6397bff5339205cdadcd7 100644 (file)
@@ -9,6 +9,7 @@
 import sys
 from zipimport import zipimporter
 
 import sys
 from zipimport import zipimporter
 
+from .compat import compat_realpath
 from .utils import encode_compat_str
 
 from .version import __version__
 from .utils import encode_compat_str
 
 from .version import __version__
@@ -84,7 +85,9 @@ def version_tuple(version_str):
     print_notes(to_screen, versions_info['versions'])
 
     # sys.executable is set to the full pathname of the exe-file for py2exe
     print_notes(to_screen, versions_info['versions'])
 
     # sys.executable is set to the full pathname of the exe-file for py2exe
-    filename = sys.executable if hasattr(sys, 'frozen') else sys.argv[0]
+    # though symlinks are not followed so that we need to do this manually
+    # with help of realpath
+    filename = compat_realpath(sys.executable if hasattr(sys, 'frozen') else sys.argv[0])
 
     if not os.access(filename, os.W_OK):
         to_screen('ERROR: no write permissions on %s' % filename)
 
     if not os.access(filename, os.W_OK):
         to_screen('ERROR: no write permissions on %s' % filename)
index f6204692a81002cdfc44b02d183126e755283bd9..d1eca3760a66e89fbfc1673e70a2d526035776cf 100644 (file)
@@ -7,6 +7,7 @@
 import binascii
 import calendar
 import codecs
 import binascii
 import calendar
 import codecs
+import collections
 import contextlib
 import ctypes
 import datetime
 import contextlib
 import ctypes
 import datetime
@@ -30,6 +31,7 @@
 import subprocess
 import sys
 import tempfile
 import subprocess
 import sys
 import tempfile
+import time
 import traceback
 import xml.etree.ElementTree
 import zlib
 import traceback
 import xml.etree.ElementTree
 import zlib
@@ -1835,6 +1837,12 @@ def write_json_file(obj, fn):
                 os.unlink(fn)
             except OSError:
                 pass
                 os.unlink(fn)
             except OSError:
                 pass
+        try:
+            mask = os.umask(0)
+            os.umask(mask)
+            os.chmod(tf.name, 0o666 & ~mask)
+        except OSError:
+            pass
         os.rename(tf.name, fn)
     except Exception:
         try:
         os.rename(tf.name, fn)
     except Exception:
         try:
@@ -2729,15 +2737,72 @@ def https_open(self, req):
 
 
 class YoutubeDLCookieJar(compat_cookiejar.MozillaCookieJar):
 
 
 class YoutubeDLCookieJar(compat_cookiejar.MozillaCookieJar):
+    """
+    See [1] for cookie file format.
+
+    1. https://curl.haxx.se/docs/http-cookies.html
+    """
     _HTTPONLY_PREFIX = '#HttpOnly_'
     _HTTPONLY_PREFIX = '#HttpOnly_'
+    _ENTRY_LEN = 7
+    _HEADER = '''# Netscape HTTP Cookie File
+# This file is generated by youtube-dl.  Do not edit.
+
+'''
+    _CookieFileEntry = collections.namedtuple(
+        'CookieFileEntry',
+        ('domain_name', 'include_subdomains', 'path', 'https_only', 'expires_at', 'name', 'value'))
 
     def save(self, filename=None, ignore_discard=False, ignore_expires=False):
 
     def save(self, filename=None, ignore_discard=False, ignore_expires=False):
+        """
+        Save cookies to a file.
+
+        Most of the code is taken from CPython 3.8 and slightly adapted
+        to support cookie files with UTF-8 in both python 2 and 3.
+        """
+        if filename is None:
+            if self.filename is not None:
+                filename = self.filename
+            else:
+                raise ValueError(compat_cookiejar.MISSING_FILENAME_TEXT)
+
         # Store session cookies with `expires` set to 0 instead of an empty
         # string
         for cookie in self:
             if cookie.expires is None:
                 cookie.expires = 0
         # Store session cookies with `expires` set to 0 instead of an empty
         # string
         for cookie in self:
             if cookie.expires is None:
                 cookie.expires = 0
-        compat_cookiejar.MozillaCookieJar.save(self, filename, ignore_discard, ignore_expires)
+
+        with io.open(filename, 'w', encoding='utf-8') as f:
+            f.write(self._HEADER)
+            now = time.time()
+            for cookie in self:
+                if not ignore_discard and cookie.discard:
+                    continue
+                if not ignore_expires and cookie.is_expired(now):
+                    continue
+                if cookie.secure:
+                    secure = 'TRUE'
+                else:
+                    secure = 'FALSE'
+                if cookie.domain.startswith('.'):
+                    initial_dot = 'TRUE'
+                else:
+                    initial_dot = 'FALSE'
+                if cookie.expires is not None:
+                    expires = compat_str(cookie.expires)
+                else:
+                    expires = ''
+                if cookie.value is None:
+                    # cookies.txt regards 'Set-Cookie: foo' as a cookie
+                    # with no name, whereas http.cookiejar regards it as a
+                    # cookie with no value.
+                    name = ''
+                    value = cookie.name
+                else:
+                    name = cookie.name
+                    value = cookie.value
+                f.write(
+                    '\t'.join([cookie.domain, initial_dot, cookie.path,
+                               secure, expires, name, value]) + '\n')
 
     def load(self, filename=None, ignore_discard=False, ignore_expires=False):
         """Load cookies from a file."""
 
     def load(self, filename=None, ignore_discard=False, ignore_expires=False):
         """Load cookies from a file."""
@@ -2747,12 +2812,30 @@ def load(self, filename=None, ignore_discard=False, ignore_expires=False):
             else:
                 raise ValueError(compat_cookiejar.MISSING_FILENAME_TEXT)
 
             else:
                 raise ValueError(compat_cookiejar.MISSING_FILENAME_TEXT)
 
+        def prepare_line(line):
+            if line.startswith(self._HTTPONLY_PREFIX):
+                line = line[len(self._HTTPONLY_PREFIX):]
+            # comments and empty lines are fine
+            if line.startswith('#') or not line.strip():
+                return line
+            cookie_list = line.split('\t')
+            if len(cookie_list) != self._ENTRY_LEN:
+                raise compat_cookiejar.LoadError('invalid length %d' % len(cookie_list))
+            cookie = self._CookieFileEntry(*cookie_list)
+            if cookie.expires_at and not cookie.expires_at.isdigit():
+                raise compat_cookiejar.LoadError('invalid expires at %s' % cookie.expires_at)
+            return line
+
         cf = io.StringIO()
         cf = io.StringIO()
-        with open(filename) as f:
+        with io.open(filename, encoding='utf-8') as f:
             for line in f:
             for line in f:
-                if line.startswith(self._HTTPONLY_PREFIX):
-                    line = line[len(self._HTTPONLY_PREFIX):]
-                cf.write(compat_str(line))
+                try:
+                    cf.write(prepare_line(line))
+                except compat_cookiejar.LoadError as e:
+                    write_string(
+                        'WARNING: skipping cookie file entry due to %s: %r\n'
+                        % (e, line), sys.stderr)
+                    continue
         cf.seek(0)
         self._really_load(cf, filename, ignore_discard, ignore_expires)
         # Session cookies are denoted by either `expires` field set to
         cf.seek(0)
         self._really_load(cf, filename, ignore_discard, ignore_expires)
         # Session cookies are denoted by either `expires` field set to
@@ -2795,6 +2878,15 @@ def http_response(self, request, response):
     https_response = http_response
 
 
     https_response = http_response
 
 
+class YoutubeDLRedirectHandler(compat_urllib_request.HTTPRedirectHandler):
+    if sys.version_info[0] < 3:
+        def redirect_request(self, req, fp, code, msg, headers, newurl):
+            # On python 2 urlh.geturl() may sometimes return redirect URL
+            # as byte string instead of unicode. This workaround allows
+            # to force it always return unicode.
+            return compat_urllib_request.HTTPRedirectHandler.redirect_request(self, req, fp, code, msg, headers, compat_str(newurl))
+
+
 def extract_timezone(date_str):
     m = re.search(
         r'^.{8,}?(?P<tz>Z$| ?(?P<sign>\+|-)(?P<hours>[0-9]{2}):?(?P<minutes>[0-9]{2})$)',
 def extract_timezone(date_str):
     m = re.search(
         r'^.{8,}?(?P<tz>Z$| ?(?P<sign>\+|-)(?P<hours>[0-9]{2}):?(?P<minutes>[0-9]{2})$)',
index 1227abc0a74c7a7617650e1ac37a7eab7ebbe78c..17101fa47501d9bae1d6f223e35d7cb4dd3f8d5e 100644 (file)
@@ -1,3 +1,3 @@
 from __future__ import unicode_literals
 
 from __future__ import unicode_literals
 
-__version__ = '2019.11.28'
+__version__ = '2020.07.28'