pull changes from remote master (#190)

author Aakash Gajjar <redacted>

Tue, 25 Aug 2020 14:53:34 +0000 (20:23 +0530)

committer GitHub <redacted>

Tue, 25 Aug 2020 14:53:34 +0000 (20:23 +0530)
author Aakash Gajjar <redacted>
Tue, 25 Aug 2020 14:53:34 +0000 (20:23 +0530)
committer GitHub <redacted>
Tue, 25 Aug 2020 14:53:34 +0000 (20:23 +0530)
diff --git a/.github/ISSUE_TEMPLATE/1_broken_site.md b/.github/ISSUE_TEMPLATE/1_broken_site.md

index 3a94bd621ff4c68eed4af3b1d7eb3764d6e85c35..f2260db465ea4798b90e08a2a22759dddcc8ee9c 100644 (file)
--- a/.github/ISSUE_TEMPLATE/1_broken_site.md
+++ b/.github/ISSUE_TEMPLATE/1_broken_site.md
@@ -18,7 +18,7 @@ ## Checklist
  
  <!--
  Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
  
  <!--
  Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
-- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.11.28. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
+- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.07.28. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
  - Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
  - Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape.
  - Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates.
  - Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
  - Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape.
  - Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates.
@@ -26,7 +26,7 @@ ## Checklist
  -->
  
  - [ ] I'm reporting a broken site support
  -->
  
  - [ ] I'm reporting a broken site support
-- [ ] I've verified that I'm running youtube-dl version **2019.11.28**
+- [ ] I've verified that I'm running youtube-dl version **2020.07.28**
  - [ ] I've checked that all provided URLs are alive and playable in a browser
  - [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
  - [ ] I've searched the bugtracker for similar issues including closed ones
  - [ ] I've checked that all provided URLs are alive and playable in a browser
  - [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
  - [ ] I've searched the bugtracker for similar issues including closed ones
@@ -41,7 +41,7 @@ ## Verbose log
   [debug] User config: []
   [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
   [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
   [debug] User config: []
   [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
   [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
- [debug] youtube-dl version 2019.11.28
+ [debug] youtube-dl version 2020.07.28
   [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
   [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
   [debug] Proxy map: {}
   [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
   [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
   [debug] Proxy map: {}
diff --git a/.github/ISSUE_TEMPLATE/2_site_support_request.md b/.github/ISSUE_TEMPLATE/2_site_support_request.md

index 72bee12aa2cde7b5c434dfaab92175259beebec8..8bc05c4ba736a8439c70c7f643aac79ba9366419 100644 (file)
--- a/.github/ISSUE_TEMPLATE/2_site_support_request.md
+++ b/.github/ISSUE_TEMPLATE/2_site_support_request.md
@@ -19,7 +19,7 @@ ## Checklist
  
  <!--
  Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
  
  <!--
  Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
-- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.11.28. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
+- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.07.28. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
  - Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
  - Make sure that site you are requesting is not dedicated to copyright infringement, see https://yt-dl.org/copyright-infringement. youtube-dl does not support such sites. In order for site support request to be accepted all provided example URLs should not violate any copyrights.
  - Search the bugtracker for similar site support requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
  - Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
  - Make sure that site you are requesting is not dedicated to copyright infringement, see https://yt-dl.org/copyright-infringement. youtube-dl does not support such sites. In order for site support request to be accepted all provided example URLs should not violate any copyrights.
  - Search the bugtracker for similar site support requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
@@ -27,7 +27,7 @@ ## Checklist
  -->
  
  - [ ] I'm reporting a new site support request
  -->
  
  - [ ] I'm reporting a new site support request
-- [ ] I've verified that I'm running youtube-dl version **2019.11.28**
+- [ ] I've verified that I'm running youtube-dl version **2020.07.28**
  - [ ] I've checked that all provided URLs are alive and playable in a browser
  - [ ] I've checked that none of provided URLs violate any copyrights
  - [ ] I've searched the bugtracker for similar site support requests including closed ones
  - [ ] I've checked that all provided URLs are alive and playable in a browser
  - [ ] I've checked that none of provided URLs violate any copyrights
  - [ ] I've searched the bugtracker for similar site support requests including closed ones
diff --git a/.github/ISSUE_TEMPLATE/3_site_feature_request.md b/.github/ISSUE_TEMPLATE/3_site_feature_request.md

index ddf67e95183c09de5b25c8ee826d760118a2c2a0..98348e0cd69deacf4a35d012f468d4b705bddee4 100644 (file)
--- a/.github/ISSUE_TEMPLATE/3_site_feature_request.md
+++ b/.github/ISSUE_TEMPLATE/3_site_feature_request.md
@@ -18,13 +18,13 @@ ## Checklist
  
  <!--
  Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
  
  <!--
  Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
-- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.11.28. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
+- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.07.28. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
  - Search the bugtracker for similar site feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
  - Finally, put x into all relevant boxes (like this [x])
  -->
  
  - [ ] I'm reporting a site feature request
  - Search the bugtracker for similar site feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
  - Finally, put x into all relevant boxes (like this [x])
  -->
  
  - [ ] I'm reporting a site feature request
-- [ ] I've verified that I'm running youtube-dl version **2019.11.28**
+- [ ] I've verified that I'm running youtube-dl version **2020.07.28**
  - [ ] I've searched the bugtracker for similar site feature requests including closed ones
  
  
  - [ ] I've searched the bugtracker for similar site feature requests including closed ones
  
  
diff --git a/.github/ISSUE_TEMPLATE/4_bug_report.md b/.github/ISSUE_TEMPLATE/4_bug_report.md

index 7122e2714dd92fe4c9655c9022c0c76c123ce4d4..86706f5289dad4f48409616ee3e6521c5b5a0e29 100644 (file)
--- a/.github/ISSUE_TEMPLATE/4_bug_report.md
+++ b/.github/ISSUE_TEMPLATE/4_bug_report.md
@@ -18,7 +18,7 @@ ## Checklist
  
  <!--
  Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
  
  <!--
  Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
-- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.11.28. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
+- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.07.28. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
  - Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
  - Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape.
  - Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates.
  - Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
  - Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape.
  - Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates.
@@ -27,7 +27,7 @@ ## Checklist
  -->
  
  - [ ] I'm reporting a broken site support issue
  -->
  
  - [ ] I'm reporting a broken site support issue
-- [ ] I've verified that I'm running youtube-dl version **2019.11.28**
+- [ ] I've verified that I'm running youtube-dl version **2020.07.28**
  - [ ] I've checked that all provided URLs are alive and playable in a browser
  - [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
  - [ ] I've searched the bugtracker for similar bug reports including closed ones
  - [ ] I've checked that all provided URLs are alive and playable in a browser
  - [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
  - [ ] I've searched the bugtracker for similar bug reports including closed ones
@@ -43,7 +43,7 @@ ## Verbose log
   [debug] User config: []
   [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
   [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
   [debug] User config: []
   [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
   [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
- [debug] youtube-dl version 2019.11.28
+ [debug] youtube-dl version 2020.07.28
   [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
   [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
   [debug] Proxy map: {}
   [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
   [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
   [debug] Proxy map: {}
diff --git a/.github/ISSUE_TEMPLATE/5_feature_request.md b/.github/ISSUE_TEMPLATE/5_feature_request.md

index a93882b39dace5577bb9bb58effe1086319e625c..52c2709f94346e08a8f304cb6e64e7b05ae13b1d 100644 (file)
--- a/.github/ISSUE_TEMPLATE/5_feature_request.md
+++ b/.github/ISSUE_TEMPLATE/5_feature_request.md
@@ -19,13 +19,13 @@ ## Checklist
  
  <!--
  Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
  
  <!--
  Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
-- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.11.28. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
+- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.07.28. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
  - Search the bugtracker for similar feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
  - Finally, put x into all relevant boxes (like this [x])
  -->
  
  - [ ] I'm reporting a feature request
  - Search the bugtracker for similar feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
  - Finally, put x into all relevant boxes (like this [x])
  -->
  
  - [ ] I'm reporting a feature request
-- [ ] I've verified that I'm running youtube-dl version **2019.11.28**
+- [ ] I've verified that I'm running youtube-dl version **2020.07.28**
  - [ ] I've searched the bugtracker for similar feature requests including closed ones
  
  
  - [ ] I've searched the bugtracker for similar feature requests including closed ones
  
  
diff --git a/.travis.yml b/.travis.yml

index 14d95fa84c105e3c310b2e0ecf984a7688bfb172..51afd469afe569df116d0dd5c200c426f36546b6 100644 (file)
--- a/.travis.yml
+++ b/.travis.yml
@@ -13,7 +13,7 @@ dist: trusty
  env:
    - YTDL_TEST_SET=core
    - YTDL_TEST_SET=download
  env:
    - YTDL_TEST_SET=core
    - YTDL_TEST_SET=download
-matrix:
+jobs:
    include:
      - python: 3.7
        dist: xenial
    include:
      - python: 3.7
        dist: xenial
@@ -35,6 +35,11 @@ matrix:
        env: YTDL_TEST_SET=download
      - env: JYTHON=true; YTDL_TEST_SET=core
      - env: JYTHON=true; YTDL_TEST_SET=download
        env: YTDL_TEST_SET=download
      - env: JYTHON=true; YTDL_TEST_SET=core
      - env: JYTHON=true; YTDL_TEST_SET=download
+    - name: flake8
+      python: 3.8
+      dist: xenial
+      install: pip install flake8
+      script: flake8 .
    fast_finish: true
    allow_failures:
      - env: YTDL_TEST_SET=download
    fast_finish: true
    allow_failures:
      - env: YTDL_TEST_SET=download
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md

index ac759ddc4ee356adc2eb5081d8bdd4325a9d14ff..58ab3a4b8947d5dbadf5f8be1e4bb0868868afec 100644 (file)
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -153,7 +153,7 @@ ### Adding support for a new site
  5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/extractor/extractors.py).
  6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc. Note that tests with `only_matching` key in test's dict are not counted in.
  7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/ytdl-org/youtube-dl/blob/7f41a598b3fba1bcab2817de64a08941200aa3c8/youtube_dl/extractor/common.py#L94-L303). Add tests and code for as many as you want.
  5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/extractor/extractors.py).
  6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc. Note that tests with `only_matching` key in test's dict are not counted in.
  7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/ytdl-org/youtube-dl/blob/7f41a598b3fba1bcab2817de64a08941200aa3c8/youtube_dl/extractor/common.py#L94-L303). Add tests and code for as many as you want.
-8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](http://flake8.pycqa.org/en/latest/index.html#quickstart):
+8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](https://flake8.pycqa.org/en/latest/index.html#quickstart):
  
          $ flake8 youtube_dl/extractor/yourextractor.py
  
  
          $ flake8 youtube_dl/extractor/yourextractor.py
  
diff --git a/ChangeLog b/ChangeLog

index d2f17ee067c9215cdea2f6d1df5c30cee8a4b5ed..bf515f784b2cfefdcd29820c5a5e22e8057cfa5e 100644 (file)
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,10 +1,316 @@
-version <unreleased>
+version 2020.07.28
+
+Extractors
+* [youtube] Fix sigfunc name extraction (#26134, #26135, #26136, #26137)
+* [youtube] Improve description extraction (#25937, #25980)
+* [wistia] Restrict embed regular expression (#25969)
+* [youtube] Prevent excess HTTP 301 (#25786)
++ [youtube:playlists] Extend URL regular expression (#25810)
++ [bellmedia] Add support for cp24.com clip URLs (#25764)
+* [brightcove] Improve embed detection (#25674)
+
+
+version 2020.06.16.1
+
+Extractors
+* [youtube] Force old layout (#25682, #25683, #25680, #25686)
+* [youtube] Fix categories and improve tags extraction
+
+
+version 2020.06.16
+
+Extractors
+* [youtube] Fix uploader id and uploader URL extraction
+* [youtube] Improve view count extraction
+* [youtube] Fix upload date extraction (#25677)
+* [youtube] Fix thumbnails extraction (#25676)
+* [youtube] Fix playlist and feed extraction (#25675)
++ [facebook] Add support for single-video ID links
++ [youtube] Extract chapters from JSON (#24819)
++ [kaltura] Add support for multiple embeds on a webpage (#25523)
+
+
+version 2020.06.06
+
+Extractors
+* [tele5] Bypass geo restriction
++ [jwplatform] Add support for bypass geo restriction
+* [tele5] Prefer jwplatform over nexx (#25533)
+* [twitch:stream] Expect 400 and 410 HTTP errors from API
+* [twitch:stream] Fix extraction (#25528)
+* [twitch] Fix thumbnails extraction (#25531)
++ [twitch] Pass v5 Accept HTTP header (#25531)
+* [brightcove] Fix subtitles extraction (#25540)
++ [malltv] Add support for sk.mall.tv (#25445)
+* [periscope] Fix untitled broadcasts (#25482)
+* [jwplatform] Improve embeds extraction (#25467)
+
+
+version 2020.05.29
+
+Core
+* [postprocessor/ffmpeg] Embed series metadata with --add-metadata
+* [utils] Fix file permissions in write_json_file (#12471, #25122)
+
+Extractors
+* [ard:beta] Extend URL regular expression (#25405)
++ [youtube] Add support for more invidious instances (#25417)
+* [giantbomb] Extend URL regular expression (#25222)
+* [ard] Improve URL regular expression (#25134, #25198)
+* [redtube] Improve formats extraction and extract m3u8 formats (#25311,
+  #25321)
+* [indavideo] Switch to HTTPS for API request (#25191)
+* [redtube] Improve title extraction (#25208)
+* [vimeo] Improve format extraction and sorting (#25285)
+* [soundcloud] Reduce API playlist page limit (#25274)
++ [youtube] Add support for yewtu.be (#25226)
+* [mailru] Fix extraction (#24530, #25239)
+* [bellator] Fix mgid extraction (#25195)
+
+
+version 2020.05.08
+
+Core
+* [downloader/http] Request last data block of exact remaining size
+* [downloader/http] Finish downloading once received data length matches
+  expected
+* [extractor/common] Use compat_cookiejar_Cookie for _set_cookie to always
+  ensure cookie name and value are bytestrings on python 2 (#23256, #24776)
++ [compat] Introduce compat_cookiejar_Cookie
+* [utils] Improve cookie files support
+    + Add support for UTF-8 in cookie files
+    * Skip malformed cookie file entries instead of crashing (invalid entry
+      length, invalid expires at)
+
+Extractors
+* [youtube] Improve signature cipher extraction (#25187, #25188)
+* [iprima] Improve extraction (#25138)
+* [uol] Fix extraction (#22007)
++ [orf] Add support for more radio stations (#24938, #24968)
+* [dailymotion] Fix typo
+- [puhutv] Remove no longer available HTTP formats (#25124)
+
+
+version 2020.05.03
+
+Core
++ [extractor/common] Extract multiple JSON-LD entries
+* [options] Clarify doc on --exec command (#19087, #24883)
+* [extractor/common] Skip malformed ISM manifest XMLs while extracting
+  ISM formats (#24667)
+
+Extractors
+* [crunchyroll] Fix and improve extraction (#25096, #25060)
+* [youtube] Improve player id extraction
+* [youtube] Use redirected video id if any (#25063)
+* [yahoo] Fix GYAO Player extraction and relax URL regular expression
+  (#24178, #24778)
+* [tvplay] Fix Viafree extraction (#15189, #24473, #24789)
+* [tenplay] Relax URL regular expression (#25001)
++ [prosiebensat1] Extract series metadata
+* [prosiebensat1] Improve extraction and remove 7tv.de support (#24948)
+- [prosiebensat1] Remove 7tv.de support (#24948)
+* [youtube] Fix DRM videos detection (#24736)
+* [thisoldhouse] Fix video id extraction (#24548, #24549)
++ [soundcloud] Extract AAC format (#19173, #24708)
+* [youtube] Skip broken multifeed videos (#24711)
+* [nova:embed] Fix extraction (#24700)
+* [motherless] Fix extraction (#24699)
+* [twitch:clips] Extend URL regular expression (#24290, #24642)
+* [tv4] Fix ISM formats extraction (#24667)
+* [tele5] Fix extraction (#24553)
++ [mofosex] Add support for generic embeds (#24633)
++ [youporn] Add support for generic embeds
++ [spankwire] Add support for generic embeds (#24633)
+* [spankwire] Fix extraction (#18924, #20648)
+
+
+version 2020.03.24
+
+Core
+- [utils] Revert support for cookie files with spaces used instead of tabs
+
+Extractors
+* [teachable] Update upskillcourses and gns3 domains
+* [generic] Look for teachable embeds before wistia
++ [teachable] Extract chapter metadata (#24421)
++ [bilibili] Add support for player.bilibili.com (#24402)
++ [bilibili] Add support for new URL schema with BV ids (#24439, #24442)
+* [limelight] Remove disabled API requests (#24255)
+* [soundcloud] Fix download URL extraction (#24394)
++ [cbc:watch] Add support for authentication (#19160)
+* [hellporno] Fix extraction (#24399)
+* [xtube] Fix formats extraction (#24348)
+* [ndr] Fix extraction (#24326)
+* [nhk] Update m3u8 URL and use native HLS downloader (#24329)
+- [nhk] Remove obsolete rtmp formats (#24329)
+* [nhk] Relax URL regular expression (#24329)
+- [vimeo] Revert fix showcase password protected video extraction (#24224)
+
+
+version 2020.03.08
+
+Core
++ [utils] Add support for cookie files with spaces used instead of tabs
+
+Extractors
++ [pornhub] Add support for pornhubpremium.com (#24288)
+- [youtube] Remove outdated code and unnecessary requests
+* [youtube] Improve extraction in 429 HTTP error conditions (#24283)
+* [nhk] Update API version (#24270)
+
+
+version 2020.03.06
+
+Extractors
+* [youtube] Fix age-gated videos support without login (#24248)
+* [vimeo] Fix showcase password protected video extraction (#24224)
+* [pornhub] Improve title extraction (#24184)
+* [peertube] Improve extraction (#23657)
++ [servus] Add support for new URL schema (#23475, #23583, #24142)
+* [vimeo] Fix subtitles URLs (#24209)
+
+
+version 2020.03.01
+
+Core
+* [YoutubeDL] Force redirect URL to unicode on python 2
+- [options] Remove duplicate short option -v for --version (#24162)
+
+Extractors
+* [xhamster] Fix extraction (#24205)
+* [franceculture] Fix extraction (#24204)
++ [telecinco] Add support for article opening videos
+* [telecinco] Fix extraction (#24195)
+* [xtube] Fix metadata extraction (#21073, #22455)
+* [youjizz] Fix extraction (#24181)
+- Remove no longer needed compat_str around geturl
+* [pornhd] Fix extraction (#24128)
++ [teachable] Add support for multiple videos per lecture (#24101)
++ [wistia] Add support for multiple generic embeds (#8347, 11385)
+* [imdb] Fix extraction (#23443)
+* [tv2dk:bornholm:play] Fix extraction (#24076)
+
+
+version 2020.02.16
+
+Core
+* [YoutubeDL] Fix playlist entry indexing with --playlist-items (#10591,
+  #10622)
+* [update] Fix updating via symlinks (#23991)
++ [compat] Introduce compat_realpath (#23991)
+
+Extractors
++ [npr] Add support for streams (#24042)
++ [24video] Add support for porn.24video.net (#23779, #23784)
+- [jpopsuki] Remove extractor (#23858)
+* [nova] Improve extraction (#23690)
+* [nova:embed] Improve (#23690)
+* [nova:embed] Fix extraction (#23672)
++ [abc:iview] Add support for 720p (#22907, #22921)
+* [nytimes] Improve format sorting (#24010)
++ [toggle] Add support for mewatch.sg (#23895, #23930)
+* [thisoldhouse] Fix extraction (#23951)
++ [popcorntimes] Add support for popcorntimes.tv (#23949)
+* [sportdeutschland] Update to new API
+* [twitch:stream] Lowercase channel id for stream request (#23917)
+* [tv5mondeplus] Fix extraction (#23907, #23911)
+* [tva] Relax URL regular expression (#23903)
+* [vimeo] Fix album extraction (#23864)
+* [viewlift] Improve extraction
+    * Fix extraction (#23851)
+    + Add support for authentication
+    + Add support for more domains
+* [svt] Fix series extraction (#22297)
+* [svt] Fix article extraction (#22897, #22919)
+* [soundcloud] Imporve private playlist/set tracks extraction (#3707)
+
+
+version 2020.01.24
+
+Extractors
+* [youtube] Fix sigfunc name extraction (#23819)
+* [stretchinternet] Fix extraction (#4319)
+* [voicerepublic] Fix extraction
+* [azmedien] Fix extraction (#23783)
+* [businessinsider] Fix jwplatform id extraction (#22929, #22954)
++ [24video] Add support for 24video.vip (#23753)
+* [ivi:compilation] Fix entries extraction (#23770)
+* [ard] Improve extraction (#23761)
+    * Simplify extraction
+    + Extract age limit and series
+    * Bypass geo-restriction
++ [nbc] Add support for nbc multi network URLs (#23049)
+* [americastestkitchen] Fix extraction
+* [zype] Improve extraction
+    + Extract subtitles (#21258)
+    + Support URLs with alternative keys/tokens (#21258)
+    + Extract more metadata
+* [orf:tvthek] Improve geo restricted videos detection (#23741)
+* [soundcloud] Restore previews extraction (#23739)
+
+
+version 2020.01.15
+
+Extractors
+* [yourporn] Fix extraction (#21645, #22255, #23459)
++ [canvas] Add support for new API endpoint (#17680, #18629)
+* [ndr:base:embed] Improve thumbnails extraction (#23731)
++ [vodplatform] Add support for embed.kwikmotion.com domain
++ [twitter] Add support for promo_video_website cards (#23711)
+* [orf:radio] Clean description and improve extraction
+* [orf:fm4] Fix extraction (#23599)
+* [safari] Fix kaltura session extraction (#23679, #23670)
+* [lego] Fix extraction and extract subtitle (#23687)
+* [cloudflarestream] Improve extraction
+    + Add support for bytehighway.net domain
+    + Add support for signed URLs
+    + Extract thumbnail
+* [naver] Improve extraction
+    * Improve geo-restriction handling
+    + Extract automatic captions
+    + Extract uploader metadata
+    + Extract VLive HLS formats
+    * Improve metadata extraction
+- [pandatv] Remove extractor (#23630)
+* [dctp] Fix format extraction (#23656)
++ [scrippsnetworks] Add support for www.discovery.com videos
+* [discovery] Fix anonymous token extraction (#23650)
+* [nrktv:seriebase] Fix extraction (#23625, #23537)
+* [wistia] Improve format extraction and extract subtitles (#22590)
+* [vice] Improve extraction (#23631)
+* [redtube] Detect private videos (#23518)
+
+
+version 2020.01.01
+
+Extractors
+* [brightcove] Invalidate policy key cache on failing requests
+* [pornhub] Improve locked videos detection (#22449, #22780)
++ [pornhub] Add support for m3u8 formats
+* [pornhub] Fix extraction (#22749, #23082)
+* [brightcove] Update policy key on failing requests
+* [spankbang] Improve removed video detection (#23423)
+* [spankbang] Fix extraction (#23307, #23423, #23444)
+* [soundcloud] Automatically update client id on failing requests
+* [prosiebensat1] Improve geo restriction handling (#23571)
+* [brightcove] Cache brightcove player policy keys
+* [teachable] Fail with error message if no video URL found
+* [teachable] Improve locked lessons detection (#23528)
++ [scrippsnetworks] Add support for Scripps Networks sites (#19857, #22981)
+* [mitele] Fix extraction (#21354, #23456)
+* [soundcloud] Update client id (#23516)
+* [mailru] Relax URL regular expressions (#23509)
+
+
+version 2019.12.25
  
  Core
  * [utils] Improve str_to_int
  + [downloader/hls] Add ability to override AES decryption key URL (#17521)
  
  Extractors
  
  Core
  * [utils] Improve str_to_int
  + [downloader/hls] Add ability to override AES decryption key URL (#17521)
  
  Extractors
+* [mediaset] Fix parse formats (#23508)
  + [tv2dk:bornholm:play] Add support for play.tv2bornholm.dk (#23291)
  + [slideslive] Add support for url and vimeo service names (#23414)
  * [slideslive] Fix extraction (#23413)
  + [tv2dk:bornholm:play] Add support for play.tv2bornholm.dk (#23291)
  + [slideslive] Add support for url and vimeo service names (#23414)
  * [slideslive] Fix extraction (#23413)
diff --git a/README.md b/README.md

index 01f975958c8370016a39c9f3fb872241c977c62b..45326c69ec5bf3fa6665cca18e808a06546ce8ea 100644 (file)
--- a/README.md
+++ b/README.md
@@ -434,9 +434,9 @@ ## Post-processing Options:
                                       either the path to the binary or its
                                       containing directory.
      --exec CMD                       Execute a command on the file after
                                       either the path to the binary or its
                                       containing directory.
      --exec CMD                       Execute a command on the file after
-                                     downloading, similar to find's -exec
-                                     syntax. Example: --exec 'adb push {}
-                                     /sdcard/Music/ && rm {}'
+                                     downloading and post-processing, similar to
+                                     find's -exec syntax. Example: --exec 'adb
+                                     push {} /sdcard/Music/ && rm {}'
      --convert-subs FORMAT            Convert the subtitles to other format
                                       (currently supported: srt|ass|vtt|lrc)
  
      --convert-subs FORMAT            Convert the subtitles to other format
                                       (currently supported: srt|ass|vtt|lrc)
  
@@ -835,7 +835,9 @@ ### ExtractorError: Could not find JS function u'OF'
  
  ### HTTP Error 429: Too Many Requests or 402: Payment Required
  
  
  ### HTTP Error 429: Too Many Requests or 402: Payment Required
  
-These two error codes indicate that the service is blocking your IP address because of overuse. Contact the service and ask them to unblock your IP address, or - if you have acquired a whitelisted IP address already - use the [`--proxy` or `--source-address` options](#network-options) to select another IP address.
+These two error codes indicate that the service is blocking your IP address because of overuse. Usually this is a soft block meaning that you can gain access again after solving CAPTCHA. Just open a browser and solve a CAPTCHA the service suggests you and after that [pass cookies](#how-do-i-pass-cookies-to-youtube-dl) to youtube-dl. Note that if your machine has multiple external IPs then you should also pass exactly the same IP you've used for solving CAPTCHA with [`--source-address`](#network-options). Also you may need to pass a `User-Agent` HTTP header of your browser with [`--user-agent`](#workarounds).
+
+If this is not the case (no CAPTCHA suggested to solve by the service) then you can contact the service and ask them to unblock your IP address, or - if you have acquired a whitelisted IP address already - use the [`--proxy` or `--source-address` options](#network-options) to select another IP address.
  
  ### SyntaxError: Non-ASCII character
  
  
  ### SyntaxError: Non-ASCII character
  
@@ -1030,7 +1032,7 @@ ### Adding support for a new site
  5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/extractor/extractors.py).
  6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc. Note that tests with `only_matching` key in test's dict are not counted in.
  7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/ytdl-org/youtube-dl/blob/7f41a598b3fba1bcab2817de64a08941200aa3c8/youtube_dl/extractor/common.py#L94-L303). Add tests and code for as many as you want.
  5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/extractor/extractors.py).
  6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc. Note that tests with `only_matching` key in test's dict are not counted in.
  7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/ytdl-org/youtube-dl/blob/7f41a598b3fba1bcab2817de64a08941200aa3c8/youtube_dl/extractor/common.py#L94-L303). Add tests and code for as many as you want.
-8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](http://flake8.pycqa.org/en/latest/index.html#quickstart):
+8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](https://flake8.pycqa.org/en/latest/index.html#quickstart):
  
          $ flake8 youtube_dl/extractor/yourextractor.py
  
  
          $ flake8 youtube_dl/extractor/yourextractor.py
  
diff --git a/devscripts/create-github-release.py b/devscripts/create-github-release.py

index 428111b3f0e893d9ae53da648844833e87dd72b3..2ddfa109698864f642b97f8bcb9846f84631e4d9 100644 (file)
--- a/devscripts/create-github-release.py
+++ b/devscripts/create-github-release.py
@@ -1,7 +1,6 @@
  #!/usr/bin/env python
  from __future__ import unicode_literals
  
  #!/usr/bin/env python
  from __future__ import unicode_literals
  
-import base64
  import io
  import json
  import mimetypes
  import io
  import json
  import mimetypes
@@ -15,7 +14,6 @@
  
  from youtube_dl.compat import (
      compat_basestring,
  
  from youtube_dl.compat import (
      compat_basestring,
-    compat_input,
      compat_getpass,
      compat_print,
      compat_urllib_request,
      compat_getpass,
      compat_print,
      compat_urllib_request,
@@ -40,28 +38,20 @@ def _init_github_account(self):
          try:
              info = netrc.netrc().authenticators(self._NETRC_MACHINE)
              if info is not None:
          try:
              info = netrc.netrc().authenticators(self._NETRC_MACHINE)
              if info is not None:
-                self._username = info[0]
-                self._password = info[2]
+                self._token = info[2]
                  compat_print('Using GitHub credentials found in .netrc...')
                  return
              else:
                  compat_print('No GitHub credentials found in .netrc')
          except (IOError, netrc.NetrcParseError):
              compat_print('Unable to parse .netrc')
                  compat_print('Using GitHub credentials found in .netrc...')
                  return
              else:
                  compat_print('No GitHub credentials found in .netrc')
          except (IOError, netrc.NetrcParseError):
              compat_print('Unable to parse .netrc')
-        self._username = compat_input(
-            'Type your GitHub username or email address and press [Return]: ')
-        self._password = compat_getpass(
-            'Type your GitHub password and press [Return]: ')
+        self._token = compat_getpass(
+            'Type your GitHub PAT (personal access token) and press [Return]: ')
  
      def _call(self, req):
          if isinstance(req, compat_basestring):
              req = sanitized_Request(req)
  
      def _call(self, req):
          if isinstance(req, compat_basestring):
              req = sanitized_Request(req)
-        # Authorizing manually since GitHub does not response with 401 with
-        # WWW-Authenticate header set (see
-        # https://developer.github.com/v3/#basic-authentication)
-        b64 = base64.b64encode(
-            ('%s:%s' % (self._username, self._password)).encode('utf-8')).decode('ascii')
-        req.add_header('Authorization', 'Basic %s' % b64)
+        req.add_header('Authorization', 'token %s' % self._token)
          response = self._opener.open(req).read().decode('utf-8')
          return json.loads(response)
  
          response = self._opener.open(req).read().decode('utf-8')
          return json.loads(response)
  
diff --git a/docs/supportedsites.md b/docs/supportedsites.md

index 2744dfca846d59520e8944231d569e0d1744ad2c..35c1050e5499238917243f09b5c142328ed37969 100644 (file)
--- a/docs/supportedsites.md
+++ b/docs/supportedsites.md
@@ -28,10 +28,11 @@ # Supported sites
   - **acast:channel**
   - **ADN**: Anime Digital Network
   - **AdobeConnect**
   - **acast:channel**
   - **ADN**: Anime Digital Network
   - **AdobeConnect**
- - **AdobeTV**
- - **AdobeTVChannel**
- - **AdobeTVShow**
- - **AdobeTVVideo**
+ - **adobetv**
+ - **adobetv:channel**
+ - **adobetv:embed**
+ - **adobetv:show**
+ - **adobetv:video**
   - **AdultSwim**
   - **aenetworks**: A+E Networks: A&E, Lifetime, History.com, FYI Network and History Vault
   - **afreecatv**: afreecatv.com
   - **AdultSwim**
   - **aenetworks**: A+E Networks: A&E, Lifetime, History.com, FYI Network and History Vault
   - **afreecatv**: afreecatv.com
@@ -97,6 +98,7 @@ # Supported sites
   - **BiliBili**
   - **BilibiliAudio**
   - **BilibiliAudioAlbum**
   - **BiliBili**
   - **BilibiliAudio**
   - **BilibiliAudioAlbum**
+ - **BiliBiliPlayer**
   - **BioBioChileTV**
   - **BIQLE**
   - **BitChute**
   - **BioBioChileTV**
   - **BIQLE**
   - **BitChute**
@@ -388,7 +390,6 @@ # Supported sites
   - **JeuxVideo**
   - **Joj**
   - **Jove**
   - **JeuxVideo**
   - **Joj**
   - **Jove**
- - **jpopsuki.tv**
   - **JWPlatform**
   - **Kakao**
   - **Kaltura**
   - **JWPlatform**
   - **Kakao**
   - **Kaltura**
@@ -396,6 +397,7 @@ # Supported sites
   - **Kankan**
   - **Karaoketv**
   - **KarriereVideos**
   - **Kankan**
   - **Karaoketv**
   - **KarriereVideos**
+ - **Katsomo**
   - **KeezMovies**
   - **Ketnet**
   - **KhanAcademy**
   - **KeezMovies**
   - **Ketnet**
   - **KhanAcademy**
@@ -403,7 +405,6 @@ # Supported sites
   - **KinjaEmbed**
   - **KinoPoisk**
   - **KonserthusetPlay**
   - **KinjaEmbed**
   - **KinoPoisk**
   - **KonserthusetPlay**
- - **kontrtube**: KontrTube.ru - Труба зовёт
   - **KrasView**: Красвью
   - **Ku6**
   - **KUSI**
   - **KrasView**: Красвью
   - **Ku6**
   - **KUSI**
@@ -496,6 +497,7 @@ # Supported sites
   - **MNetTV**
   - **MoeVideo**: LetitBit video services: moevideo.net, playreplay.net and videochart.net
   - **Mofosex**
   - **MNetTV**
   - **MoeVideo**: LetitBit video services: moevideo.net, playreplay.net and videochart.net
   - **Mofosex**
+ - **MofosexEmbed**
   - **Mojvideo**
   - **Morningstar**: morningstar.com
   - **Motherless**
   - **Mojvideo**
   - **Morningstar**: morningstar.com
   - **Motherless**
@@ -513,7 +515,6 @@ # Supported sites
   - **mtvjapan**
   - **mtvservices:embedded**
   - **MuenchenTV**: münchen.tv
   - **mtvjapan**
   - **mtvservices:embedded**
   - **MuenchenTV**: münchen.tv
- - **MusicPlayOn**
   - **mva**: Microsoft Virtual Academy videos
   - **mva:course**: Microsoft Virtual Academy courses
   - **Mwave**
   - **mva**: Microsoft Virtual Academy videos
   - **mva:course**: Microsoft Virtual Academy courses
   - **Mwave**
@@ -619,16 +620,25 @@ # Supported sites
   - **Ooyala**
   - **OoyalaExternal**
   - **OraTV**
   - **Ooyala**
   - **OoyalaExternal**
   - **OraTV**
+ - **orf:burgenland**: Radio Burgenland
   - **orf:fm4**: radio FM4
   - **orf:fm4:story**: fm4.orf.at stories
   - **orf:iptv**: iptv.ORF.at
   - **orf:fm4**: radio FM4
   - **orf:fm4:story**: fm4.orf.at stories
   - **orf:iptv**: iptv.ORF.at
+ - **orf:kaernten**: Radio Kärnten
+ - **orf:noe**: Radio Niederösterreich
+ - **orf:oberoesterreich**: Radio Oberösterreich
   - **orf:oe1**: Radio Österreich 1
   - **orf:oe1**: Radio Österreich 1
+ - **orf:oe3**: Radio Österreich 3
+ - **orf:salzburg**: Radio Salzburg
+ - **orf:steiermark**: Radio Steiermark
+ - **orf:tirol**: Radio Tirol
   - **orf:tvthek**: ORF TVthek
   - **orf:tvthek**: ORF TVthek
+ - **orf:vorarlberg**: Radio Vorarlberg
+ - **orf:wien**: Radio Wien
   - **OsnatelTV**
   - **OutsideTV**
   - **PacktPub**
   - **PacktPubCourse**
   - **OsnatelTV**
   - **OutsideTV**
   - **PacktPub**
   - **PacktPubCourse**
- - **PandaTV**: 熊猫TV
   - **pandora.tv**: 판도라TV
   - **ParamountNetwork**
   - **parliamentlive.tv**: UK parliament videos
   - **pandora.tv**: 판도라TV
   - **ParamountNetwork**
   - **parliamentlive.tv**: UK parliament videos
@@ -664,6 +674,7 @@ # Supported sites
   - **Pokemon**
   - **PolskieRadio**
   - **PolskieRadioCategory**
   - **Pokemon**
   - **PolskieRadio**
   - **PolskieRadioCategory**
+ - **Popcorntimes**
   - **PopcornTV**
   - **PornCom**
   - **PornerBros**
   - **PopcornTV**
   - **PornCom**
   - **PornerBros**
@@ -761,6 +772,7 @@ # Supported sites
   - **screen.yahoo:search**: Yahoo screen search
   - **Screencast**
   - **ScreencastOMatic**
   - **screen.yahoo:search**: Yahoo screen search
   - **Screencast**
   - **ScreencastOMatic**
+ - **ScrippsNetworks**
   - **scrippsnetworks:watch**
   - **SCTE**
   - **SCTECourse**
   - **scrippsnetworks:watch**
   - **SCTE**
   - **SCTECourse**
@@ -913,6 +925,7 @@ # Supported sites
   - **tv2.hu**
   - **TV2Article**
   - **TV2DK**
   - **tv2.hu**
   - **TV2Article**
   - **TV2DK**
+ - **TV2DKBornholmPlay**
   - **TV4**: tv4.se and tv4play.se
   - **TV5MondePlus**: TV5MONDE+
   - **TVA**
   - **TV4**: tv4.se and tv4play.se
   - **TV5MondePlus**: TV5MONDE+
   - **TVA**
@@ -954,6 +967,7 @@ # Supported sites
   - **udemy**
   - **udemy:course**
   - **UDNEmbed**: 聯合影音
   - **udemy**
   - **udemy:course**
   - **UDNEmbed**: 聯合影音
+ - **UFCArabia**
   - **UFCTV**
   - **UKTVPlay**
   - **umg:de**: Universal Music Deutschland
   - **UFCTV**
   - **UKTVPlay**
   - **umg:de**: Universal Music Deutschland
@@ -993,7 +1007,6 @@ # Supported sites
   - **videomore**
   - **videomore:season**
   - **videomore:video**
   - **videomore**
   - **videomore:season**
   - **videomore:video**
- - **VideoPremium**
   - **VideoPress**
   - **Vidio**
   - **VidLii**
   - **VideoPress**
   - **Vidio**
   - **VidLii**
@@ -1003,8 +1016,8 @@ # Supported sites
   - **Vidzi**
   - **vier**: vier.be and vijf.be
   - **vier:videos**
   - **Vidzi**
   - **vier**: vier.be and vijf.be
   - **vier:videos**
- - **ViewLift**
- - **ViewLiftEmbed**
+ - **viewlift**
+ - **viewlift:embed**
   - **Viidea**
   - **viki**
   - **viki:channel**
   - **Viidea**
   - **viki**
   - **viki:channel**
diff --git a/test/test_YoutubeDL.py b/test/test_YoutubeDL.py

index ce96661716c42ae0bf9c6a8ccb9ddf48c715e0a2..1e204e551b499edead22ce65e371787ca22ffc35 100644 (file)
--- a/test/test_YoutubeDL.py
+++ b/test/test_YoutubeDL.py
@@ -816,11 +816,15 @@ def test_playlist_items_selection(self):
              'webpage_url': 'http://example.com',
          }
  
              'webpage_url': 'http://example.com',
          }
  
-        def get_ids(params):
+        def get_downloaded_info_dicts(params):
              ydl = YDL(params)
              ydl = YDL(params)
-            # make a copy because the dictionary can be modified
-            ydl.process_ie_result(playlist.copy())
-            return [int(v['id']) for v in ydl.downloaded_info_dicts]
+            # make a deep copy because the dictionary and nested entries
+            # can be modified
+            ydl.process_ie_result(copy.deepcopy(playlist))
+            return ydl.downloaded_info_dicts
+
+        def get_ids(params):
+            return [int(v['id']) for v in get_downloaded_info_dicts(params)]
  
          result = get_ids({})
          self.assertEqual(result, [1, 2, 3, 4])
  
          result = get_ids({})
          self.assertEqual(result, [1, 2, 3, 4])
@@ -852,6 +856,22 @@ def get_ids(params):
          result = get_ids({'playlist_items': '2-4,3-4,3'})
          self.assertEqual(result, [2, 3, 4])
  
          result = get_ids({'playlist_items': '2-4,3-4,3'})
          self.assertEqual(result, [2, 3, 4])
  
+        # Tests for https://github.com/ytdl-org/youtube-dl/issues/10591
+        # @{
+        result = get_downloaded_info_dicts({'playlist_items': '2-4,3-4,3'})
+        self.assertEqual(result[0]['playlist_index'], 2)
+        self.assertEqual(result[1]['playlist_index'], 3)
+
+        result = get_downloaded_info_dicts({'playlist_items': '2-4,3-4,3'})
+        self.assertEqual(result[0]['playlist_index'], 2)
+        self.assertEqual(result[1]['playlist_index'], 3)
+        self.assertEqual(result[2]['playlist_index'], 4)
+
+        result = get_downloaded_info_dicts({'playlist_items': '4,2'})
+        self.assertEqual(result[0]['playlist_index'], 4)
+        self.assertEqual(result[1]['playlist_index'], 2)
+        # @}
+
      def test_urlopen_no_file_protocol(self):
          # see https://github.com/ytdl-org/youtube-dl/issues/8227
          ydl = YDL()
      def test_urlopen_no_file_protocol(self):
          # see https://github.com/ytdl-org/youtube-dl/issues/8227
          ydl = YDL()
diff --git a/test/test_YoutubeDLCookieJar.py b/test/test_YoutubeDLCookieJar.py

index f959798deb595165fddac6a8e555570c0420e454..05f48bd7417e478ab1ae9ae7eddaca4cbc7039c3 100644 (file)
--- a/test/test_YoutubeDLCookieJar.py
+++ b/test/test_YoutubeDLCookieJar.py
@@ -39,6 +39,13 @@ def assert_cookie_has_value(key):
          assert_cookie_has_value('HTTPONLY_COOKIE')
          assert_cookie_has_value('JS_ACCESSIBLE_COOKIE')
  
          assert_cookie_has_value('HTTPONLY_COOKIE')
          assert_cookie_has_value('JS_ACCESSIBLE_COOKIE')
  
+    def test_malformed_cookies(self):
+        cookiejar = YoutubeDLCookieJar('./test/testdata/cookies/malformed_cookies.txt')
+        cookiejar.load(ignore_discard=True, ignore_expires=True)
+        # Cookies should be empty since all malformed cookie file entries
+        # will be ignored
+        self.assertFalse(cookiejar._cookies)
+
  
  if __name__ == '__main__':
      unittest.main()
  
  if __name__ == '__main__':
      unittest.main()
diff --git a/test/test_subtitles.py b/test/test_subtitles.py

index 7d57a628e5ef79c5e12d13ccd0a2b515548ffa60..17aaaf20d9a002336af43d1afa2f7d49a186ac9a 100644 (file)
--- a/test/test_subtitles.py
+++ b/test/test_subtitles.py
@@ -26,7 +26,6 @@
      ThePlatformIE,
      ThePlatformFeedIE,
      RTVEALaCartaIE,
      ThePlatformIE,
      ThePlatformFeedIE,
      RTVEALaCartaIE,
-    FunnyOrDieIE,
      DemocracynowIE,
  )
  
      DemocracynowIE,
  )
  
@@ -322,18 +321,6 @@ def test_allsubtitles(self):
          self.assertEqual(md5(subtitles['es']), '69e70cae2d40574fb7316f31d6eb7fca')
  
  
          self.assertEqual(md5(subtitles['es']), '69e70cae2d40574fb7316f31d6eb7fca')
  
  
-class TestFunnyOrDieSubtitles(BaseTestSubtitles):
-    url = 'http://www.funnyordie.com/videos/224829ff6d/judd-apatow-will-direct-your-vine'
-    IE = FunnyOrDieIE
-
-    def test_allsubtitles(self):
-        self.DL.params['writesubtitles'] = True
-        self.DL.params['allsubtitles'] = True
-        subtitles = self.getSubtitles()
-        self.assertEqual(set(subtitles.keys()), set(['en']))
-        self.assertEqual(md5(subtitles['en']), 'c5593c193eacd353596c11c2d4f9ecc4')
-
-
  class TestDemocracynowSubtitles(BaseTestSubtitles):
      url = 'http://www.democracynow.org/shows/2015/7/3'
      IE = DemocracynowIE
  class TestDemocracynowSubtitles(BaseTestSubtitles):
      url = 'http://www.democracynow.org/shows/2015/7/3'
      IE = DemocracynowIE
diff --git a/test/test_youtube_chapters.py b/test/test_youtube_chapters.py

index 324ca852578531757d9964f2c90cf6f8e1c4d3b1..e69c57377e617e2864a80e2e736bb72c87c4122e 100644 (file)
--- a/test/test_youtube_chapters.py
+++ b/test/test_youtube_chapters.py
@@ -267,7 +267,7 @@ def test_youtube_chapters(self):
          for description, duration, expected_chapters in self._TEST_CASES:
              ie = YoutubeIE()
              expect_value(
          for description, duration, expected_chapters in self._TEST_CASES:
              ie = YoutubeIE()
              expect_value(
-                self, ie._extract_chapters(description, duration),
+                self, ie._extract_chapters_from_description(description, duration),
                  expected_chapters, None)
  
  
                  expected_chapters, None)
  
  
diff --git a/test/test_youtube_signature.py b/test/test_youtube_signature.py

index f0c370eeedc8942abc0b8cd8c10e57b4361d00c2..69df30edaa75efbeb51a1cad650c8e29b2596b66 100644 (file)
--- a/test/test_youtube_signature.py
+++ b/test/test_youtube_signature.py
@@ -74,6 +74,28 @@
  ]
  
  
  ]
  
  
+class TestPlayerInfo(unittest.TestCase):
+    def test_youtube_extract_player_info(self):
+        PLAYER_URLS = (
+            ('https://www.youtube.com/s/player/64dddad9/player_ias.vflset/en_US/base.js', '64dddad9'),
+            # obsolete
+            ('https://www.youtube.com/yts/jsbin/player_ias-vfle4-e03/en_US/base.js', 'vfle4-e03'),
+            ('https://www.youtube.com/yts/jsbin/player_ias-vfl49f_g4/en_US/base.js', 'vfl49f_g4'),
+            ('https://www.youtube.com/yts/jsbin/player_ias-vflCPQUIL/en_US/base.js', 'vflCPQUIL'),
+            ('https://www.youtube.com/yts/jsbin/player-vflzQZbt7/en_US/base.js', 'vflzQZbt7'),
+            ('https://www.youtube.com/yts/jsbin/player-en_US-vflaxXRn1/base.js', 'vflaxXRn1'),
+            ('https://s.ytimg.com/yts/jsbin/html5player-en_US-vflXGBaUN.js', 'vflXGBaUN'),
+            ('https://s.ytimg.com/yts/jsbin/html5player-en_US-vflKjOTVq/html5player.js', 'vflKjOTVq'),
+            ('http://s.ytimg.com/yt/swfbin/watch_as3-vflrEm9Nq.swf', 'vflrEm9Nq'),
+            ('https://s.ytimg.com/yts/swfbin/player-vflenCdZL/watch_as3.swf', 'vflenCdZL'),
+        )
+        for player_url, expected_player_id in PLAYER_URLS:
+            expected_player_type = player_url.split('.')[-1]
+            player_type, player_id = YoutubeIE._extract_player_info(player_url)
+            self.assertEqual(player_type, expected_player_type)
+            self.assertEqual(player_id, expected_player_id)
+
+
  class TestSignature(unittest.TestCase):
      def setUp(self):
          TEST_DIR = os.path.dirname(os.path.abspath(__file__))
  class TestSignature(unittest.TestCase):
      def setUp(self):
          TEST_DIR = os.path.dirname(os.path.abspath(__file__))
diff --git a/test/testdata/cookies/malformed_cookies.txt b/test/testdata/cookies/malformed_cookies.txt

new file mode 100644 (file)

index 0000000..17bc403
--- /dev/null
+++ b/test/testdata/cookies/malformed_cookies.txt
@@ -0,0 +1,9 @@
+# Netscape HTTP Cookie File
+# http://curl.haxx.se/rfc/cookie_spec.html
+# This is a generated file!  Do not edit.
+
+# Cookie file entry with invalid number of fields - 6 instead of 7
+www.foobar.foobar      FALSE   /       FALSE   0       COOKIE
+
+# Cookie file entry with invalid expires at
+www.foobar.foobar      FALSE   /       FALSE   1.7976931348623157e+308 COOKIE  VALUE
diff --git a/youtube_dl/YoutubeDL.py b/youtube_dl/YoutubeDL.py

index f5cb46308198e4c65316fba10ccf30d2f3e14b6a..19370f62b0d3ddb91c74ae4b6cf6c569341fbdc5 100755 (executable)
--- a/youtube_dl/YoutubeDL.py
+++ b/youtube_dl/YoutubeDL.py
@@ -92,6 +92,7 @@
      YoutubeDLCookieJar,
      YoutubeDLCookieProcessor,
      YoutubeDLHandler,
      YoutubeDLCookieJar,
      YoutubeDLCookieProcessor,
      YoutubeDLHandler,
+    YoutubeDLRedirectHandler,
  )
  from .cache import Cache
  from .extractor import get_info_extractor, gen_extractor_classes, _LAZY_LOADER
  )
  from .cache import Cache
  from .extractor import get_info_extractor, gen_extractor_classes, _LAZY_LOADER
@@ -990,7 +991,7 @@ def report_download(num_entries):
                      'playlist_title': ie_result.get('title'),
                      'playlist_uploader': ie_result.get('uploader'),
                      'playlist_uploader_id': ie_result.get('uploader_id'),
                      'playlist_title': ie_result.get('title'),
                      'playlist_uploader': ie_result.get('uploader'),
                      'playlist_uploader_id': ie_result.get('uploader_id'),
-                    'playlist_index': i + playliststart,
+                    'playlist_index': playlistitems[i - 1] if playlistitems else i + playliststart,
                      'extractor': ie_result['extractor'],
                      'webpage_url': ie_result['webpage_url'],
                      'webpage_url_basename': url_basename(ie_result['webpage_url']),
                      'extractor': ie_result['extractor'],
                      'webpage_url': ie_result['webpage_url'],
                      'webpage_url_basename': url_basename(ie_result['webpage_url']),
@@ -2343,6 +2344,7 @@ def _setup_opener(self):
          debuglevel = 1 if self.params.get('debug_printtraffic') else 0
          https_handler = make_HTTPS_handler(self.params, debuglevel=debuglevel)
          ydlh = YoutubeDLHandler(self.params, debuglevel=debuglevel)
          debuglevel = 1 if self.params.get('debug_printtraffic') else 0
          https_handler = make_HTTPS_handler(self.params, debuglevel=debuglevel)
          ydlh = YoutubeDLHandler(self.params, debuglevel=debuglevel)
+        redirect_handler = YoutubeDLRedirectHandler()
          data_handler = compat_urllib_request_DataHandler()
  
          # When passing our own FileHandler instance, build_opener won't add the
          data_handler = compat_urllib_request_DataHandler()
  
          # When passing our own FileHandler instance, build_opener won't add the
@@ -2356,7 +2358,7 @@ def file_open(*args, **kwargs):
          file_handler.file_open = file_open
  
          opener = compat_urllib_request.build_opener(
          file_handler.file_open = file_open
  
          opener = compat_urllib_request.build_opener(
-            proxy_handler, https_handler, cookie_processor, ydlh, data_handler, file_handler)
+            proxy_handler, https_handler, cookie_processor, ydlh, redirect_handler, data_handler, file_handler)
  
          # Delete the default user-agent header, which would otherwise apply in
          # cases where our custom HTTP handler doesn't come into play
  
          # Delete the default user-agent header, which would otherwise apply in
          # cases where our custom HTTP handler doesn't come into play
diff --git a/youtube_dl/compat.py b/youtube_dl/compat.py

index c75ab131b9955cec1367ec42aa41d8dadde423da..0ee9bc76020377dc811c0d06736eba8082a53dbd 100644 (file)
--- a/youtube_dl/compat.py
+++ b/youtube_dl/compat.py
@@ -57,6 +57,17 @@
  except ImportError:  # Python 2
      import cookielib as compat_cookiejar
  
  except ImportError:  # Python 2
      import cookielib as compat_cookiejar
  
+if sys.version_info[0] == 2:
+    class compat_cookiejar_Cookie(compat_cookiejar.Cookie):
+        def __init__(self, version, name, value, *args, **kwargs):
+            if isinstance(name, compat_str):
+                name = name.encode()
+            if isinstance(value, compat_str):
+                value = value.encode()
+            compat_cookiejar.Cookie.__init__(self, version, name, value, *args, **kwargs)
+else:
+    compat_cookiejar_Cookie = compat_cookiejar.Cookie
+
  try:
      import http.cookies as compat_cookies
  except ImportError:  # Python 2
  try:
      import http.cookies as compat_cookies
  except ImportError:  # Python 2
@@ -2754,6 +2765,17 @@ def compat_expanduser(path):
          compat_expanduser = os.path.expanduser
  
  
          compat_expanduser = os.path.expanduser
  
  
+if compat_os_name == 'nt' and sys.version_info < (3, 8):
+    # os.path.realpath on Windows does not follow symbolic links
+    # prior to Python 3.8 (see https://bugs.python.org/issue9949)
+    def compat_realpath(path):
+        while os.path.islink(path):
+            path = os.path.abspath(os.readlink(path))
+        return path
+else:
+    compat_realpath = os.path.realpath
+
+
  if sys.version_info < (3, 0):
      def compat_print(s):
          from .utils import preferredencoding
  if sys.version_info < (3, 0):
      def compat_print(s):
          from .utils import preferredencoding
@@ -2976,6 +2998,7 @@ def compat_ctypes_WINFUNCTYPE(*args, **kwargs):
      'compat_basestring',
      'compat_chr',
      'compat_cookiejar',
      'compat_basestring',
      'compat_chr',
      'compat_cookiejar',
+    'compat_cookiejar_Cookie',
      'compat_cookies',
      'compat_ctypes_WINFUNCTYPE',
      'compat_etree_Element',
      'compat_cookies',
      'compat_ctypes_WINFUNCTYPE',
      'compat_etree_Element',
@@ -2998,6 +3021,7 @@ def compat_ctypes_WINFUNCTYPE(*args, **kwargs):
      'compat_os_name',
      'compat_parse_qs',
      'compat_print',
      'compat_os_name',
      'compat_parse_qs',
      'compat_print',
+    'compat_realpath',
      'compat_setenv',
      'compat_shlex_quote',
      'compat_shlex_split',
      'compat_setenv',
      'compat_shlex_quote',
      'compat_shlex_split',
diff --git a/youtube_dl/downloader/http.py b/youtube_dl/downloader/http.py

index 3c72ea18b2304befd5221960503ff5b6141304c3..5046878dfcd874013e737e85d32764a95737406e 100644 (file)
--- a/youtube_dl/downloader/http.py
+++ b/youtube_dl/downloader/http.py
@@ -227,7 +227,7 @@ def retry(e):
              while True:
                  try:
                      # Download and write
              while True:
                  try:
                      # Download and write
-                    data_block = ctx.data.read(block_size if not is_test else min(block_size, data_len - byte_counter))
+                    data_block = ctx.data.read(block_size if data_len is None else min(block_size, data_len - byte_counter))
                  # socket.timeout is a subclass of socket.error but may not have
                  # errno set
                  except socket.timeout as e:
                  # socket.timeout is a subclass of socket.error but may not have
                  # errno set
                  except socket.timeout as e:
@@ -299,7 +299,7 @@ def retry(e):
                      'elapsed': now - ctx.start_time,
                  })
  
                      'elapsed': now - ctx.start_time,
                  })
  
-                if is_test and byte_counter == data_len:
+                if data_len is not None and byte_counter == data_len:
                      break
  
              if not is_test and ctx.chunk_size and ctx.data_len is not None and byte_counter < ctx.data_len:
                      break
  
              if not is_test and ctx.chunk_size and ctx.data_len is not None and byte_counter < ctx.data_len:
diff --git a/youtube_dl/extractor/abc.py b/youtube_dl/extractor/abc.py

index 4ac323bf6de6d17016c2425c133aad460072cadd..6637f4f3537591b46870cb7ec3f35d1167cb3cb7 100644 (file)
--- a/youtube_dl/extractor/abc.py
+++ b/youtube_dl/extractor/abc.py
@@ -110,17 +110,17 @@ class ABCIViewIE(InfoExtractor):
  
      # ABC iview programs are normally available for 14 days only.
      _TESTS = [{
  
      # ABC iview programs are normally available for 14 days only.
      _TESTS = [{
-        'url': 'https://iview.abc.net.au/show/ben-and-hollys-little-kingdom/series/0/video/ZX9371A050S00',
-        'md5': 'cde42d728b3b7c2b32b1b94b4a548afc',
+        'url': 'https://iview.abc.net.au/show/gruen/series/11/video/LE1927H001S00',
+        'md5': '67715ce3c78426b11ba167d875ac6abf',
          'info_dict': {
          'info_dict': {
-            'id': 'ZX9371A050S00',
+            'id': 'LE1927H001S00',
              'ext': 'mp4',
              'ext': 'mp4',
-            'title': "Gaston's Birthday",
-            'series': "Ben And Holly's Little Kingdom",
-            'description': 'md5:f9de914d02f226968f598ac76f105bcf',
-            'upload_date': '20180604',
-            'uploader_id': 'abc4kids',
-            'timestamp': 1528140219,
+            'title': "Series 11 Ep 1",
+            'series': "Gruen",
+            'description': 'md5:52cc744ad35045baf6aded2ce7287f67',
+            'upload_date': '20190925',
+            'uploader_id': 'abc1',
+            'timestamp': 1569445289,
          },
          'params': {
              'skip_download': True,
          },
          'params': {
              'skip_download': True,
@@ -148,7 +148,7 @@ def tokenize_url(url, token):
                  'hdnea': token,
              })
  
                  'hdnea': token,
              })
  
-        for sd in ('sd', 'sd-low'):
+        for sd in ('720', 'sd', 'sd-low'):
              sd_url = try_get(
                  stream, lambda x: x['streams']['hls'][sd], compat_str)
              if not sd_url:
              sd_url = try_get(
                  stream, lambda x: x['streams']['hls'][sd], compat_str)
              if not sd_url:
diff --git a/youtube_dl/extractor/americastestkitchen.py b/youtube_dl/extractor/americastestkitchen.py

index 8b32aa886e9696e9334f73a777a70264f28c9433..9c9d77ae107e0b822b46368d89445f21e9e830a6 100644 (file)
--- a/youtube_dl/extractor/americastestkitchen.py
+++ b/youtube_dl/extractor/americastestkitchen.py
@@ -5,6 +5,7 @@
  from ..utils import (
      clean_html,
      int_or_none,
  from ..utils import (
      clean_html,
      int_or_none,
+    js_to_json,
      try_get,
      unified_strdate,
  )
      try_get,
      unified_strdate,
  )
@@ -13,22 +14,21 @@
  class AmericasTestKitchenIE(InfoExtractor):
      _VALID_URL = r'https?://(?:www\.)?americastestkitchen\.com/(?:episode|videos)/(?P<id>\d+)'
      _TESTS = [{
  class AmericasTestKitchenIE(InfoExtractor):
      _VALID_URL = r'https?://(?:www\.)?americastestkitchen\.com/(?:episode|videos)/(?P<id>\d+)'
      _TESTS = [{
-        'url': 'https://www.americastestkitchen.com/episode/548-summer-dinner-party',
+        'url': 'https://www.americastestkitchen.com/episode/582-weeknight-japanese-suppers',
          'md5': 'b861c3e365ac38ad319cfd509c30577f',
          'info_dict': {
          'md5': 'b861c3e365ac38ad319cfd509c30577f',
          'info_dict': {
-            'id': '1_5g5zua6e',
-            'title': 'Summer Dinner Party',
+            'id': '5b400b9ee338f922cb06450c',
+            'title': 'Weeknight Japanese Suppers',
              'ext': 'mp4',
              'ext': 'mp4',
-            'description': 'md5:858d986e73a4826979b6a5d9f8f6a1ec',
-            'thumbnail': r're:^https?://.*\.jpg',
-            'timestamp': 1497285541,
-            'upload_date': '20170612',
-            'uploader_id': 'roger.metcalf@americastestkitchen.com',
-            'release_date': '20170617',
+            'description': 'md5:3d0c1a44bb3b27607ce82652db25b4a8',
+            'thumbnail': r're:^https?://',
+            'timestamp': 1523664000,
+            'upload_date': '20180414',
+            'release_date': '20180414',
              'series': "America's Test Kitchen",
              'series': "America's Test Kitchen",
-            'season_number': 17,
-            'episode': 'Summer Dinner Party',
-            'episode_number': 24,
+            'season_number': 18,
+            'episode': 'Weeknight Japanese Suppers',
+            'episode_number': 15,
          },
          'params': {
              'skip_download': True,
          },
          'params': {
              'skip_download': True,
@@ -47,7 +47,7 @@ def _real_extract(self, url):
              self._search_regex(
                  r'window\.__INITIAL_STATE__\s*=\s*({.+?})\s*;\s*</script>',
                  webpage, 'initial context'),
              self._search_regex(
                  r'window\.__INITIAL_STATE__\s*=\s*({.+?})\s*;\s*</script>',
                  webpage, 'initial context'),
-            video_id)
+            video_id, js_to_json)
  
          ep_data = try_get(
              video_data,
  
          ep_data = try_get(
              video_data,
@@ -55,17 +55,7 @@ def _real_extract(self, url):
               lambda x: x['videoDetail']['content']['data']), dict)
          ep_meta = ep_data.get('full_video', {})
  
               lambda x: x['videoDetail']['content']['data']), dict)
          ep_meta = ep_data.get('full_video', {})
  
-        zype_id = ep_meta.get('zype_id')
-        if zype_id:
-            embed_url = 'https://player.zype.com/embed/%s.js?api_key=jZ9GUhRmxcPvX7M3SlfejB6Hle9jyHTdk2jVxG7wOHPLODgncEKVdPYBhuz9iWXQ' % zype_id
-            ie_key = 'Zype'
-        else:
-            partner_id = self._search_regex(
-                r'src=["\'](?:https?:)?//(?:[^/]+\.)kaltura\.com/(?:[^/]+/)*(?:p|partner_id)/(\d+)',
-                webpage, 'kaltura partner id')
-            external_id = ep_data.get('external_id') or ep_meta['external_id']
-            embed_url = 'kaltura:%s:%s' % (partner_id, external_id)
-            ie_key = 'Kaltura'
+        zype_id = ep_data.get('zype_id') or ep_meta['zype_id']
  
          title = ep_data.get('title') or ep_meta.get('title')
          description = clean_html(ep_meta.get('episode_description') or ep_data.get(
  
          title = ep_data.get('title') or ep_meta.get('title')
          description = clean_html(ep_meta.get('episode_description') or ep_data.get(
@@ -79,8 +69,8 @@ def _real_extract(self, url):
  
          return {
              '_type': 'url_transparent',
  
          return {
              '_type': 'url_transparent',
-            'url': embed_url,
-            'ie_key': ie_key,
+            'url': 'https://player.zype.com/embed/%s.js?api_key=jZ9GUhRmxcPvX7M3SlfejB6Hle9jyHTdk2jVxG7wOHPLODgncEKVdPYBhuz9iWXQ' % zype_id,
+            'ie_key': 'Zype',
              'title': title,
              'description': description,
              'thumbnail': thumbnail,
              'title': title,
              'description': description,
              'thumbnail': thumbnail,
diff --git a/youtube_dl/extractor/ard.py b/youtube_dl/extractor/ard.py

index 8adae46449232fe487ec6796b5b60345f144d310..5b7b2dd6d2ee2427d0df3920f982b3eece6d67fc 100644 (file)
--- a/youtube_dl/extractor/ard.py
+++ b/youtube_dl/extractor/ard.py
@@ -1,6 +1,7 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
  # coding: utf-8
  from __future__ import unicode_literals
  
+import json
  import re
  
  from .common import InfoExtractor
  import re
  
  from .common import InfoExtractor
@@ -22,66 +23,28 @@
  from ..compat import compat_etree_fromstring
  
  
  from ..compat import compat_etree_fromstring
  
  
-class ARDMediathekIE(InfoExtractor):
-    IE_NAME = 'ARD:mediathek'
-    _VALID_URL = r'^https?://(?:(?:(?:www|classic)\.)?ardmediathek\.de|mediathek\.(?:daserste|rbb-online)\.de|one\.ard\.de)/(?:.*/)(?P<video_id>[0-9]+|[^0-9][^/\?]+)[^/\?]*(?:\?.*)?'
-
-    _TESTS = [{
-        # available till 26.07.2022
-        'url': 'http://www.ardmediathek.de/tv/S%C3%9CDLICHT/Was-ist-die-Kunst-der-Zukunft-liebe-Ann/BR-Fernsehen/Video?bcastId=34633636&documentId=44726822',
-        'info_dict': {
-            'id': '44726822',
-            'ext': 'mp4',
-            'title': 'Was ist die Kunst der Zukunft, liebe Anna McCarthy?',
-            'description': 'md5:4ada28b3e3b5df01647310e41f3a62f5',
-            'duration': 1740,
-        },
-        'params': {
-            # m3u8 download
-            'skip_download': True,
-        }
-    }, {
-        'url': 'https://one.ard.de/tv/Mord-mit-Aussicht/Mord-mit-Aussicht-6-39-T%C3%B6dliche-Nach/ONE/Video?bcastId=46384294&documentId=55586872',
-        'only_matching': True,
-    }, {
-        # audio
-        'url': 'http://www.ardmediathek.de/tv/WDR-H%C3%B6rspiel-Speicher/Tod-eines-Fu%C3%9Fballers/WDR-3/Audio-Podcast?documentId=28488308&bcastId=23074086',
-        'only_matching': True,
-    }, {
-        'url': 'http://mediathek.daserste.de/sendungen_a-z/328454_anne-will/22429276_vertrauen-ist-gut-spionieren-ist-besser-geht',
-        'only_matching': True,
-    }, {
-        # audio
-        'url': 'http://mediathek.rbb-online.de/radio/Hörspiel/Vor-dem-Fest/kulturradio/Audio?documentId=30796318&topRessort=radio&bcastId=9839158',
-        'only_matching': True,
-    }, {
-        'url': 'https://classic.ardmediathek.de/tv/Panda-Gorilla-Co/Panda-Gorilla-Co-Folge-274/Das-Erste/Video?bcastId=16355486&documentId=58234698',
-        'only_matching': True,
-    }]
-
-    @classmethod
-    def suitable(cls, url):
-        return False if ARDBetaMediathekIE.suitable(url) else super(ARDMediathekIE, cls).suitable(url)
+class ARDMediathekBaseIE(InfoExtractor):
+    _GEO_COUNTRIES = ['DE']
  
      def _extract_media_info(self, media_info_url, webpage, video_id):
          media_info = self._download_json(
              media_info_url, video_id, 'Downloading media JSON')
  
      def _extract_media_info(self, media_info_url, webpage, video_id):
          media_info = self._download_json(
              media_info_url, video_id, 'Downloading media JSON')
+        return self._parse_media_info(media_info, video_id, '"fsk"' in webpage)
  
  
+    def _parse_media_info(self, media_info, video_id, fsk):
          formats = self._extract_formats(media_info, video_id)
  
          if not formats:
          formats = self._extract_formats(media_info, video_id)
  
          if not formats:
-            if '"fsk"' in webpage:
+            if fsk:
                  raise ExtractorError(
                      'This video is only available after 20:00', expected=True)
              elif media_info.get('_geoblocked'):
                  raise ExtractorError(
                      'This video is only available after 20:00', expected=True)
              elif media_info.get('_geoblocked'):
-                raise ExtractorError('This video is not available due to geo restriction', expected=True)
+                self.raise_geo_restricted(
+                    'This video is not available due to geoblocking',
+                    countries=self._GEO_COUNTRIES)
  
          self._sort_formats(formats)
  
  
          self._sort_formats(formats)
  
-        duration = int_or_none(media_info.get('_duration'))
-        thumbnail = media_info.get('_previewImage')
-        is_live = media_info.get('_isLive') is True
-
          subtitles = {}
          subtitle_url = media_info.get('_subtitleUrl')
          if subtitle_url:
          subtitles = {}
          subtitle_url = media_info.get('_subtitleUrl')
          if subtitle_url:
@@ -92,9 +55,9 @@ def _extract_media_info(self, media_info_url, webpage, video_id):
  
          return {
              'id': video_id,
  
          return {
              'id': video_id,
-            'duration': duration,
-            'thumbnail': thumbnail,
-            'is_live': is_live,
+            'duration': int_or_none(media_info.get('_duration')),
+            'thumbnail': media_info.get('_previewImage'),
+            'is_live': media_info.get('_isLive') is True,
              'formats': formats,
              'subtitles': subtitles,
          }
              'formats': formats,
              'subtitles': subtitles,
          }
@@ -123,11 +86,11 @@ def _extract_formats(self, media_info, video_id):
                              update_url_query(stream_url, {
                                  'hdcore': '3.1.1',
                                  'plugin': 'aasp-3.1.1.69.124'
                              update_url_query(stream_url, {
                                  'hdcore': '3.1.1',
                                  'plugin': 'aasp-3.1.1.69.124'
-                            }),
-                            video_id, f4m_id='hds', fatal=False))
+                            }), video_id, f4m_id='hds', fatal=False))
                      elif ext == 'm3u8':
                          formats.extend(self._extract_m3u8_formats(
                      elif ext == 'm3u8':
                          formats.extend(self._extract_m3u8_formats(
-                            stream_url, video_id, 'mp4', m3u8_id='hls', fatal=False))
+                            stream_url, video_id, 'mp4', 'm3u8_native',
+                            m3u8_id='hls', fatal=False))
                      else:
                          if server and server.startswith('rtmp'):
                              f = {
                      else:
                          if server and server.startswith('rtmp'):
                              f = {
@@ -140,7 +103,9 @@ def _extract_formats(self, media_info, video_id):
                                  'url': stream_url,
                                  'format_id': 'a%s-%s-%s' % (num, ext, quality)
                              }
                                  'url': stream_url,
                                  'format_id': 'a%s-%s-%s' % (num, ext, quality)
                              }
-                        m = re.search(r'_(?P<width>\d+)x(?P<height>\d+)\.mp4$', stream_url)
+                        m = re.search(
+                            r'_(?P<width>\d+)x(?P<height>\d+)\.mp4$',
+                            stream_url)
                          if m:
                              f.update({
                                  'width': int(m.group('width')),
                          if m:
                              f.update({
                                  'width': int(m.group('width')),
@@ -151,6 +116,48 @@ def _extract_formats(self, media_info, video_id):
                          formats.append(f)
          return formats
  
                          formats.append(f)
          return formats
  
+
+class ARDMediathekIE(ARDMediathekBaseIE):
+    IE_NAME = 'ARD:mediathek'
+    _VALID_URL = r'^https?://(?:(?:(?:www|classic)\.)?ardmediathek\.de|mediathek\.(?:daserste|rbb-online)\.de|one\.ard\.de)/(?:.*/)(?P<video_id>[0-9]+|[^0-9][^/\?]+)[^/\?]*(?:\?.*)?'
+
+    _TESTS = [{
+        # available till 26.07.2022
+        'url': 'http://www.ardmediathek.de/tv/S%C3%9CDLICHT/Was-ist-die-Kunst-der-Zukunft-liebe-Ann/BR-Fernsehen/Video?bcastId=34633636&documentId=44726822',
+        'info_dict': {
+            'id': '44726822',
+            'ext': 'mp4',
+            'title': 'Was ist die Kunst der Zukunft, liebe Anna McCarthy?',
+            'description': 'md5:4ada28b3e3b5df01647310e41f3a62f5',
+            'duration': 1740,
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        }
+    }, {
+        'url': 'https://one.ard.de/tv/Mord-mit-Aussicht/Mord-mit-Aussicht-6-39-T%C3%B6dliche-Nach/ONE/Video?bcastId=46384294&documentId=55586872',
+        'only_matching': True,
+    }, {
+        # audio
+        'url': 'http://www.ardmediathek.de/tv/WDR-H%C3%B6rspiel-Speicher/Tod-eines-Fu%C3%9Fballers/WDR-3/Audio-Podcast?documentId=28488308&bcastId=23074086',
+        'only_matching': True,
+    }, {
+        'url': 'http://mediathek.daserste.de/sendungen_a-z/328454_anne-will/22429276_vertrauen-ist-gut-spionieren-ist-besser-geht',
+        'only_matching': True,
+    }, {
+        # audio
+        'url': 'http://mediathek.rbb-online.de/radio/Hörspiel/Vor-dem-Fest/kulturradio/Audio?documentId=30796318&topRessort=radio&bcastId=9839158',
+        'only_matching': True,
+    }, {
+        'url': 'https://classic.ardmediathek.de/tv/Panda-Gorilla-Co/Panda-Gorilla-Co-Folge-274/Das-Erste/Video?bcastId=16355486&documentId=58234698',
+        'only_matching': True,
+    }]
+
+    @classmethod
+    def suitable(cls, url):
+        return False if ARDBetaMediathekIE.suitable(url) else super(ARDMediathekIE, cls).suitable(url)
+
      def _real_extract(self, url):
          # determine video id from url
          m = re.match(self._VALID_URL, url)
      def _real_extract(self, url):
          # determine video id from url
          m = re.match(self._VALID_URL, url)
@@ -242,7 +249,7 @@ def _real_extract(self, url):
  
  
  class ARDIE(InfoExtractor):
  
  
  class ARDIE(InfoExtractor):
-    _VALID_URL = r'(?P<mainurl>https?://(www\.)?daserste\.de/[^?#]+/videos/(?P<display_id>[^/?#]+)-(?P<id>[0-9]+))\.html'
+    _VALID_URL = r'(?P<mainurl>https?://(www\.)?daserste\.de/[^?#]+/videos(?:extern)?/(?P<display_id>[^/?#]+)-(?P<id>[0-9]+))\.html'
      _TESTS = [{
          # available till 14.02.2019
          'url': 'http://www.daserste.de/information/talk/maischberger/videos/das-groko-drama-zerlegen-sich-die-volksparteien-video-102.html',
      _TESTS = [{
          # available till 14.02.2019
          'url': 'http://www.daserste.de/information/talk/maischberger/videos/das-groko-drama-zerlegen-sich-die-volksparteien-video-102.html',
@@ -256,6 +263,9 @@ class ARDIE(InfoExtractor):
              'upload_date': '20180214',
              'thumbnail': r're:^https?://.*\.jpg$',
          },
              'upload_date': '20180214',
              'thumbnail': r're:^https?://.*\.jpg$',
          },
+    }, {
+        'url': 'https://www.daserste.de/information/reportage-dokumentation/erlebnis-erde/videosextern/woelfe-und-herdenschutzhunde-ungleiche-brueder-102.html',
+        'only_matching': True,
      }, {
          'url': 'http://www.daserste.de/information/reportage-dokumentation/dokus/videos/die-story-im-ersten-mission-unter-falscher-flagge-100.html',
          'only_matching': True,
      }, {
          'url': 'http://www.daserste.de/information/reportage-dokumentation/dokus/videos/die-story-im-ersten-mission-unter-falscher-flagge-100.html',
          'only_matching': True,
@@ -302,21 +312,31 @@ def _real_extract(self, url):
          }
  
  
          }
  
  
-class ARDBetaMediathekIE(InfoExtractor):
-    _VALID_URL = r'https://(?:beta|www)\.ardmediathek\.de/[^/]+/(?:player|live)/(?P<video_id>[a-zA-Z0-9]+)(?:/(?P<display_id>[^/?#]+))?'
+class ARDBetaMediathekIE(ARDMediathekBaseIE):
+    _VALID_URL = r'https://(?:(?:beta|www)\.)?ardmediathek\.de/(?P<client>[^/]+)/(?:player|live|video)/(?P<display_id>(?:[^/]+/)*)(?P<video_id>[a-zA-Z0-9]+)'
      _TESTS = [{
      _TESTS = [{
-        'url': 'https://beta.ardmediathek.de/ard/player/Y3JpZDovL2Rhc2Vyc3RlLmRlL3RhdG9ydC9mYmM4NGM1NC0xNzU4LTRmZGYtYWFhZS0wYzcyZTIxNGEyMDE/die-robuste-roswita',
-        'md5': '2d02d996156ea3c397cfc5036b5d7f8f',
+        'url': 'https://ardmediathek.de/ard/video/die-robuste-roswita/Y3JpZDovL2Rhc2Vyc3RlLmRlL3RhdG9ydC9mYmM4NGM1NC0xNzU4LTRmZGYtYWFhZS0wYzcyZTIxNGEyMDE',
+        'md5': 'dfdc87d2e7e09d073d5a80770a9ce88f',
          'info_dict': {
              'display_id': 'die-robuste-roswita',
          'info_dict': {
              'display_id': 'die-robuste-roswita',
-            'id': 'Y3JpZDovL2Rhc2Vyc3RlLmRlL3RhdG9ydC9mYmM4NGM1NC0xNzU4LTRmZGYtYWFhZS0wYzcyZTIxNGEyMDE',
-            'title': 'Tatort: Die robuste Roswita',
+            'id': '70153354',
+            'title': 'Die robuste Roswita',
              'description': r're:^Der Mord.*trüber ist als die Ilm.',
              'duration': 5316,
              'description': r're:^Der Mord.*trüber ist als die Ilm.',
              'duration': 5316,
-            'thumbnail': 'https://img.ardmediathek.de/standard/00/55/43/59/34/-1774185891/16x9/960?mandant=ard',
-            'upload_date': '20180826',
+            'thumbnail': 'https://img.ardmediathek.de/standard/00/70/15/33/90/-1852531467/16x9/960?mandant=ard',
+            'timestamp': 1577047500,
+            'upload_date': '20191222',
              'ext': 'mp4',
          },
              'ext': 'mp4',
          },
+    }, {
+        'url': 'https://beta.ardmediathek.de/ard/video/Y3JpZDovL2Rhc2Vyc3RlLmRlL3RhdG9ydC9mYmM4NGM1NC0xNzU4LTRmZGYtYWFhZS0wYzcyZTIxNGEyMDE',
+        'only_matching': True,
+    }, {
+        'url': 'https://ardmediathek.de/ard/video/saartalk/saartalk-gesellschaftsgift-haltung-gegen-hass/sr-fernsehen/Y3JpZDovL3NyLW9ubGluZS5kZS9TVF84MTY4MA/',
+        'only_matching': True,
+    }, {
+        'url': 'https://www.ardmediathek.de/ard/video/trailer/private-eyes-s01-e01/one/Y3JpZDovL3dkci5kZS9CZWl0cmFnLTE1MTgwYzczLWNiMTEtNGNkMS1iMjUyLTg5MGYzOWQxZmQ1YQ/',
+        'only_matching': True,
      }, {
          'url': 'https://www.ardmediathek.de/ard/player/Y3JpZDovL3N3ci5kZS9hZXgvbzEwNzE5MTU/',
          'only_matching': True,
      }, {
          'url': 'https://www.ardmediathek.de/ard/player/Y3JpZDovL3N3ci5kZS9hZXgvbzEwNzE5MTU/',
          'only_matching': True,
@@ -328,73 +348,75 @@ class ARDBetaMediathekIE(InfoExtractor):
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
          video_id = mobj.group('video_id')
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
          video_id = mobj.group('video_id')
-        display_id = mobj.group('display_id') or video_id
-
-        webpage = self._download_webpage(url, display_id)
-        data_json = self._search_regex(r'window\.__APOLLO_STATE__\s*=\s*(\{.*);\n', webpage, 'json')
-        data = self._parse_json(data_json, display_id)
-
-        res = {
-            'id': video_id,
-            'display_id': display_id,
+        display_id = mobj.group('display_id')
+        if display_id:
+            display_id = display_id.rstrip('/')
+        if not display_id:
+            display_id = video_id
+
+        player_page = self._download_json(
+            'https://api.ardmediathek.de/public-gateway',
+            display_id, data=json.dumps({
+                'query': '''{
+  playerPage(client:"%s", clipId: "%s") {
+    blockedByFsk
+    broadcastedOn
+    maturityContentRating
+    mediaCollection {
+      _duration
+      _geoblocked
+      _isLive
+      _mediaArray {
+        _mediaStreamArray {
+          _quality
+          _server
+          _stream
          }
          }
-        formats = []
-        subtitles = {}
-        geoblocked = False
-        for widget in data.values():
-            if widget.get('_geoblocked') is True:
-                geoblocked = True
-            if '_duration' in widget:
-                res['duration'] = int_or_none(widget['_duration'])
-            if 'clipTitle' in widget:
-                res['title'] = widget['clipTitle']
-            if '_previewImage' in widget:
-                res['thumbnail'] = widget['_previewImage']
-            if 'broadcastedOn' in widget:
-                res['timestamp'] = unified_timestamp(widget['broadcastedOn'])
-            if 'synopsis' in widget:
-                res['description'] = widget['synopsis']
-            subtitle_url = url_or_none(widget.get('_subtitleUrl'))
-            if subtitle_url:
-                subtitles.setdefault('de', []).append({
-                    'ext': 'ttml',
-                    'url': subtitle_url,
-                })
-            if '_quality' in widget:
-                format_url = url_or_none(try_get(
-                    widget, lambda x: x['_stream']['json'][0]))
-                if not format_url:
-                    continue
-                ext = determine_ext(format_url)
-                if ext == 'f4m':
-                    formats.extend(self._extract_f4m_formats(
-                        format_url + '?hdcore=3.11.0',
-                        video_id, f4m_id='hds', fatal=False))
-                elif ext == 'm3u8':
-                    formats.extend(self._extract_m3u8_formats(
-                        format_url, video_id, 'mp4', m3u8_id='hls',
-                        fatal=False))
-                else:
-                    # HTTP formats are not available when geoblocked is True,
-                    # other formats are fine though
-                    if geoblocked:
-                        continue
-                    quality = str_or_none(widget.get('_quality'))
-                    formats.append({
-                        'format_id': ('http-' + quality) if quality else 'http',
-                        'url': format_url,
-                        'preference': 10,  # Plain HTTP, that's nice
-                    })
-
-        if not formats and geoblocked:
-            self.raise_geo_restricted(
-                msg='This video is not available due to geoblocking',
-                countries=['DE'])
-
-        self._sort_formats(formats)
-        res.update({
-            'subtitles': subtitles,
-            'formats': formats,
+      }
+      _previewImage
+      _subtitleUrl
+      _type
+    }
+    show {
+      title
+    }
+    synopsis
+    title
+    tracking {
+      atiCustomVars {
+        contentId
+      }
+    }
+  }
+}''' % (mobj.group('client'), video_id),
+            }).encode(), headers={
+                'Content-Type': 'application/json'
+            })['data']['playerPage']
+        title = player_page['title']
+        content_id = str_or_none(try_get(
+            player_page, lambda x: x['tracking']['atiCustomVars']['contentId']))
+        media_collection = player_page.get('mediaCollection') or {}
+        if not media_collection and content_id:
+            media_collection = self._download_json(
+                'https://www.ardmediathek.de/play/media/' + content_id,
+                content_id, fatal=False) or {}
+        info = self._parse_media_info(
+            media_collection, content_id or video_id,
+            player_page.get('blockedByFsk'))
+        age_limit = None
+        description = player_page.get('synopsis')
+        maturity_content_rating = player_page.get('maturityContentRating')
+        if maturity_content_rating:
+            age_limit = int_or_none(maturity_content_rating.lstrip('FSK'))
+        if not age_limit and description:
+            age_limit = int_or_none(self._search_regex(
+                r'\(FSK\s*(\d+)\)\s*$', description, 'age limit', default=None))
+        info.update({
+            'age_limit': age_limit,
+            'display_id': display_id,
+            'title': title,
+            'description': description,
+            'timestamp': unified_timestamp(player_page.get('broadcastedOn')),
+            'series': try_get(player_page, lambda x: x['show']['title']),
          })
          })
-
-        return res
+        return info
diff --git a/youtube_dl/extractor/azmedien.py b/youtube_dl/extractor/azmedien.py

index fcbdc71b98d98076852e0f88559f4a2ed428d7af..b1e20def5343e6b1a077ff3ba0b36f6a96c4f2c4 100644 (file)
--- a/youtube_dl/extractor/azmedien.py
+++ b/youtube_dl/extractor/azmedien.py
@@ -47,39 +47,19 @@ class AZMedienIE(InfoExtractor):
          'url': 'https://www.telebaern.tv/telebaern-news/montag-1-oktober-2018-ganze-sendung-133531189#video=0_7xjo9lf1',
          'only_matching': True
      }]
          'url': 'https://www.telebaern.tv/telebaern-news/montag-1-oktober-2018-ganze-sendung-133531189#video=0_7xjo9lf1',
          'only_matching': True
      }]
-
+    _API_TEMPL = 'https://www.%s/api/pub/gql/%s/NewsArticleTeaser/cb9f2f81ed22e9b47f4ca64ea3cc5a5d13e88d1d'
      _PARTNER_ID = '1719221'
  
      def _real_extract(self, url):
      _PARTNER_ID = '1719221'
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        host = mobj.group('host')
-        video_id = mobj.group('id')
-        entry_id = mobj.group('kaltura_id')
+        host, display_id, article_id, entry_id = re.match(self._VALID_URL, url).groups()
  
          if not entry_id:
  
          if not entry_id:
-            api_url = 'https://www.%s/api/pub/gql/%s' % (host, host.split('.')[0])
-            payload = {
-                'query': '''query VideoContext($articleId: ID!) {
-                    article: node(id: $articleId) {
-                      ... on Article {
-                        mainAssetRelation {
-                          asset {
-                            ... on VideoAsset {
-                              kalturaId
-                            }
-                          }
-                        }
-                      }
-                    }
-                  }''',
-                'variables': {'articleId': 'Article:%s' % mobj.group('article_id')},
-            }
-            json_data = self._download_json(
-                api_url, video_id, headers={
-                    'Content-Type': 'application/json',
-                },
-                data=json.dumps(payload).encode())
-            entry_id = json_data['data']['article']['mainAssetRelation']['asset']['kalturaId']
+            entry_id = self._download_json(
+                self._API_TEMPL % (host, host.split('.')[0]), display_id, query={
+                    'variables': json.dumps({
+                        'contextId': 'NewsArticle:' + article_id,
+                    }),
+                })['data']['context']['mainAsset']['video']['kaltura']['kalturaId']
  
          return self.url_result(
              'kaltura:%s:%s' % (self._PARTNER_ID, entry_id),
  
          return self.url_result(
              'kaltura:%s:%s' % (self._PARTNER_ID, entry_id),
diff --git a/youtube_dl/extractor/bbc.py b/youtube_dl/extractor/bbc.py

index 901c5a54fb6f9d3320fbd0827a222c2c9e9676f0..002c39c394bf5d8bd3b74600d92a594e1a8ca2b2 100644 (file)
--- a/youtube_dl/extractor/bbc.py
+++ b/youtube_dl/extractor/bbc.py
@@ -528,7 +528,7 @@ def _extract_from_legacy_playlist(self, playlist, playlist_id):
  
              def get_programme_id(item):
                  def get_from_attributes(item):
  
              def get_programme_id(item):
                  def get_from_attributes(item):
-                    for p in('identifier', 'group'):
+                    for p in ('identifier', 'group'):
                          value = item.get(p)
                          if value and re.match(r'^[pb][\da-z]{7}$', value):
                              return value
                          value = item.get(p)
                          if value and re.match(r'^[pb][\da-z]{7}$', value):
                              return value
diff --git a/youtube_dl/extractor/bellmedia.py b/youtube_dl/extractor/bellmedia.py

index 485173774d9f9c2534f9b18f1668a8d5fb204dc9..9f9de96c61332ac405b33bfc1f5758f2c8fd6456 100644 (file)
--- a/youtube_dl/extractor/bellmedia.py
+++ b/youtube_dl/extractor/bellmedia.py
@@ -25,8 +25,8 @@ class BellMediaIE(InfoExtractor):
                  etalk|
                  marilyn
              )\.ca|
                  etalk|
                  marilyn
              )\.ca|
-            much\.com
-        )/.*?(?:\bvid(?:eoid)?=|-vid|~|%7E|/(?:episode)?)(?P<id>[0-9]{6,})'''
+            (?:much|cp24)\.com
+        )/.*?(?:\b(?:vid(?:eoid)?|clipId)=|-vid|~|%7E|/(?:episode)?)(?P<id>[0-9]{6,})'''
      _TESTS = [{
          'url': 'https://www.bnnbloomberg.ca/video/david-cockfield-s-top-picks~1403070',
          'md5': '36d3ef559cfe8af8efe15922cd3ce950',
      _TESTS = [{
          'url': 'https://www.bnnbloomberg.ca/video/david-cockfield-s-top-picks~1403070',
          'md5': '36d3ef559cfe8af8efe15922cd3ce950',
@@ -62,6 +62,9 @@ class BellMediaIE(InfoExtractor):
      }, {
          'url': 'http://www.etalk.ca/video?videoid=663455',
          'only_matching': True,
      }, {
          'url': 'http://www.etalk.ca/video?videoid=663455',
          'only_matching': True,
+    }, {
+        'url': 'https://www.cp24.com/video?clipId=1982548',
+        'only_matching': True,
      }]
      _DOMAINS = {
          'thecomedynetwork': 'comedy',
      }]
      _DOMAINS = {
          'thecomedynetwork': 'comedy',
diff --git a/youtube_dl/extractor/bilibili.py b/youtube_dl/extractor/bilibili.py

index 80bd696e21f3a4af3c996e9899ce439116e13d19..4dc597e160bcea4f0821719dc82e2908f20d1972 100644 (file)
--- a/youtube_dl/extractor/bilibili.py
+++ b/youtube_dl/extractor/bilibili.py
@@ -24,7 +24,18 @@
  
  
  class BiliBiliIE(InfoExtractor):
  
  
  class BiliBiliIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.|bangumi\.|)bilibili\.(?:tv|com)/(?:video/av|anime/(?P<anime_id>\d+)/play#)(?P<id>\d+)'
+    _VALID_URL = r'''(?x)
+                    https?://
+                        (?:(?:www|bangumi)\.)?
+                        bilibili\.(?:tv|com)/
+                        (?:
+                            (?:
+                                video/[aA][vV]|
+                                anime/(?P<anime_id>\d+)/play\#
+                            )(?P<id_bv>\d+)|
+                            video/[bB][vV](?P<id>[^/?#&]+)
+                        )
+                    '''
  
      _TESTS = [{
          'url': 'http://www.bilibili.tv/video/av1074402/',
  
      _TESTS = [{
          'url': 'http://www.bilibili.tv/video/av1074402/',
@@ -92,6 +103,10 @@ class BiliBiliIE(InfoExtractor):
                  'skip_download': True,  # Test metadata only
              },
          }]
                  'skip_download': True,  # Test metadata only
              },
          }]
+    }, {
+        # new BV video id format
+        'url': 'https://www.bilibili.com/video/BV1JE411F741',
+        'only_matching': True,
      }]
  
      _APP_KEY = 'iVGUTjsxvpLeuDCf'
      }]
  
      _APP_KEY = 'iVGUTjsxvpLeuDCf'
@@ -109,7 +124,7 @@ def _real_extract(self, url):
          url, smuggled_data = unsmuggle_url(url, {})
  
          mobj = re.match(self._VALID_URL, url)
          url, smuggled_data = unsmuggle_url(url, {})
  
          mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
+        video_id = mobj.group('id') or mobj.group('id_bv')
          anime_id = mobj.group('anime_id')
          webpage = self._download_webpage(url, video_id)
  
          anime_id = mobj.group('anime_id')
          webpage = self._download_webpage(url, video_id)
  
@@ -419,3 +434,17 @@ def _real_extract(self, url):
                      entries, am_id, album_title, album_data.get('intro'))
  
          return self.playlist_result(entries, am_id)
                      entries, am_id, album_title, album_data.get('intro'))
  
          return self.playlist_result(entries, am_id)
+
+
+class BiliBiliPlayerIE(InfoExtractor):
+    _VALID_URL = r'https?://player\.bilibili\.com/player\.html\?.*?\baid=(?P<id>\d+)'
+    _TEST = {
+        'url': 'http://player.bilibili.com/player.html?aid=92494333&cid=157926707&page=1',
+        'only_matching': True,
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        return self.url_result(
+            'http://www.bilibili.tv/video/av%s/' % video_id,
+            ie=BiliBiliIE.ie_key(), video_id=video_id)
diff --git a/youtube_dl/extractor/brightcove.py b/youtube_dl/extractor/brightcove.py

index 8e2f7217ab85a81a58d1bb902af02b6e62ec2ab6..2aa9f4782e0dfdb2b78225c2d1fe83a8568effe3 100644 (file)
--- a/youtube_dl/extractor/brightcove.py
+++ b/youtube_dl/extractor/brightcove.py
@@ -5,32 +5,34 @@
  import re
  import struct
  
  import re
  import struct
  
-from .common import InfoExtractor
  from .adobepass import AdobePassIE
  from .adobepass import AdobePassIE
+from .common import InfoExtractor
  from ..compat import (
      compat_etree_fromstring,
  from ..compat import (
      compat_etree_fromstring,
+    compat_HTTPError,
      compat_parse_qs,
      compat_urllib_parse_urlparse,
      compat_urlparse,
      compat_xml_parse_error,
      compat_parse_qs,
      compat_urllib_parse_urlparse,
      compat_urlparse,
      compat_xml_parse_error,
-    compat_HTTPError,
  )
  from ..utils import (
  )
  from ..utils import (
-    ExtractorError,
+    clean_html,
      extract_attributes,
      extract_attributes,
+    ExtractorError,
      find_xpath_attr,
      fix_xml_ampersands,
      float_or_none,
      find_xpath_attr,
      fix_xml_ampersands,
      float_or_none,
-    js_to_json,
      int_or_none,
      int_or_none,
+    js_to_json,
+    mimetype2ext,
      parse_iso8601,
      smuggle_url,
      parse_iso8601,
      smuggle_url,
+    str_or_none,
      unescapeHTML,
      unsmuggle_url,
      unescapeHTML,
      unsmuggle_url,
-    update_url_query,
-    clean_html,
-    mimetype2ext,
      UnsupportedError,
      UnsupportedError,
+    update_url_query,
+    url_or_none,
  )
  
  
  )
  
  
@@ -424,7 +426,7 @@ def _extract_urls(ie, webpage):
          # [2] looks like:
          for video, script_tag, account_id, player_id, embed in re.findall(
                  r'''(?isx)
          # [2] looks like:
          for video, script_tag, account_id, player_id, embed in re.findall(
                  r'''(?isx)
-                    (<video\s+[^>]*\bdata-video-id\s*=\s*['"]?[^>]+>)
+                    (<video(?:-js)?\s+[^>]*\bdata-video-id\s*=\s*['"]?[^>]+>)
                      (?:.*?
                          (<script[^>]+
                              src=["\'](?:https?:)?//players\.brightcove\.net/
                      (?:.*?
                          (<script[^>]+
                              src=["\'](?:https?:)?//players\.brightcove\.net/
@@ -553,10 +555,16 @@ def build_format_id(kind):
  
          subtitles = {}
          for text_track in json_data.get('text_tracks', []):
  
          subtitles = {}
          for text_track in json_data.get('text_tracks', []):
-            if text_track.get('src'):
-                subtitles.setdefault(text_track.get('srclang'), []).append({
-                    'url': text_track['src'],
-                })
+            if text_track.get('kind') != 'captions':
+                continue
+            text_track_url = url_or_none(text_track.get('src'))
+            if not text_track_url:
+                continue
+            lang = (str_or_none(text_track.get('srclang'))
+                    or str_or_none(text_track.get('label')) or 'en').lower()
+            subtitles.setdefault(lang, []).append({
+                'url': text_track_url,
+            })
  
          is_live = False
          duration = float_or_none(json_data.get('duration'), 1000)
  
          is_live = False
          duration = float_or_none(json_data.get('duration'), 1000)
@@ -586,45 +594,63 @@ def _real_extract(self, url):
  
          account_id, player_id, embed, content_type, video_id = re.match(self._VALID_URL, url).groups()
  
  
          account_id, player_id, embed, content_type, video_id = re.match(self._VALID_URL, url).groups()
  
-        webpage = self._download_webpage(
-            'http://players.brightcove.net/%s/%s_%s/index.min.js'
-            % (account_id, player_id, embed), video_id)
+        policy_key_id = '%s_%s' % (account_id, player_id)
+        policy_key = self._downloader.cache.load('brightcove', policy_key_id)
+        policy_key_extracted = False
+        store_pk = lambda x: self._downloader.cache.store('brightcove', policy_key_id, x)
  
  
-        policy_key = None
+        def extract_policy_key():
+            webpage = self._download_webpage(
+                'http://players.brightcove.net/%s/%s_%s/index.min.js'
+                % (account_id, player_id, embed), video_id)
  
  
-        catalog = self._search_regex(
-            r'catalog\(({.+?})\);', webpage, 'catalog', default=None)
-        if catalog:
-            catalog = self._parse_json(
-                js_to_json(catalog), video_id, fatal=False)
+            policy_key = None
+
+            catalog = self._search_regex(
+                r'catalog\(({.+?})\);', webpage, 'catalog', default=None)
              if catalog:
              if catalog:
-                policy_key = catalog.get('policyKey')
+                catalog = self._parse_json(
+                    js_to_json(catalog), video_id, fatal=False)
+                if catalog:
+                    policy_key = catalog.get('policyKey')
+
+            if not policy_key:
+                policy_key = self._search_regex(
+                    r'policyKey\s*:\s*(["\'])(?P<pk>.+?)\1',
+                    webpage, 'policy key', group='pk')
  
  
-        if not policy_key:
-            policy_key = self._search_regex(
-                r'policyKey\s*:\s*(["\'])(?P<pk>.+?)\1',
-                webpage, 'policy key', group='pk')
+            store_pk(policy_key)
+            return policy_key
  
          api_url = 'https://edge.api.brightcove.com/playback/v1/accounts/%s/%ss/%s' % (account_id, content_type, video_id)
  
          api_url = 'https://edge.api.brightcove.com/playback/v1/accounts/%s/%ss/%s' % (account_id, content_type, video_id)
-        headers = {
-            'Accept': 'application/json;pk=%s' % policy_key,
-        }
+        headers = {}
          referrer = smuggled_data.get('referrer')
          if referrer:
              headers.update({
                  'Referer': referrer,
                  'Origin': re.search(r'https?://[^/]+', referrer).group(0),
              })
          referrer = smuggled_data.get('referrer')
          if referrer:
              headers.update({
                  'Referer': referrer,
                  'Origin': re.search(r'https?://[^/]+', referrer).group(0),
              })
-        try:
-            json_data = self._download_json(api_url, video_id, headers=headers)
-        except ExtractorError as e:
-            if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
-                json_data = self._parse_json(e.cause.read().decode(), video_id)[0]
-                message = json_data.get('message') or json_data['error_code']
-                if json_data.get('error_subcode') == 'CLIENT_GEO':
-                    self.raise_geo_restricted(msg=message)
-                raise ExtractorError(message, expected=True)
-            raise
+
+        for _ in range(2):
+            if not policy_key:
+                policy_key = extract_policy_key()
+                policy_key_extracted = True
+            headers['Accept'] = 'application/json;pk=%s' % policy_key
+            try:
+                json_data = self._download_json(api_url, video_id, headers=headers)
+                break
+            except ExtractorError as e:
+                if isinstance(e.cause, compat_HTTPError) and e.cause.code in (401, 403):
+                    json_data = self._parse_json(e.cause.read().decode(), video_id)[0]
+                    message = json_data.get('message') or json_data['error_code']
+                    if json_data.get('error_subcode') == 'CLIENT_GEO':
+                        self.raise_geo_restricted(msg=message)
+                    elif json_data.get('error_code') == 'INVALID_POLICY_KEY' and not policy_key_extracted:
+                        policy_key = None
+                        store_pk(None)
+                        continue
+                    raise ExtractorError(message, expected=True)
+                raise
  
          errors = json_data.get('errors')
          if errors and errors[0].get('error_subcode') == 'TVE_AUTH':
  
          errors = json_data.get('errors')
          if errors and errors[0].get('error_subcode') == 'TVE_AUTH':
diff --git a/youtube_dl/extractor/businessinsider.py b/youtube_dl/extractor/businessinsider.py

index dfcf9bc6b50b9274d2e45ff7e0b6d1af9920cab0..73a57b1e4db835ab09ac308704bd105d796628ac 100644 (file)
--- a/youtube_dl/extractor/businessinsider.py
+++ b/youtube_dl/extractor/businessinsider.py
@@ -9,21 +9,26 @@ class BusinessInsiderIE(InfoExtractor):
      _VALID_URL = r'https?://(?:[^/]+\.)?businessinsider\.(?:com|nl)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
      _TESTS = [{
          'url': 'http://uk.businessinsider.com/how-much-radiation-youre-exposed-to-in-everyday-life-2016-6',
      _VALID_URL = r'https?://(?:[^/]+\.)?businessinsider\.(?:com|nl)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
      _TESTS = [{
          'url': 'http://uk.businessinsider.com/how-much-radiation-youre-exposed-to-in-everyday-life-2016-6',
-        'md5': 'ca237a53a8eb20b6dc5bd60564d4ab3e',
+        'md5': 'ffed3e1e12a6f950aa2f7d83851b497a',
          'info_dict': {
          'info_dict': {
-            'id': 'hZRllCfw',
+            'id': 'cjGDb0X9',
              'ext': 'mp4',
              'ext': 'mp4',
-            'title': "Here's how much radiation you're exposed to in everyday life",
-            'description': 'md5:9a0d6e2c279948aadaa5e84d6d9b99bd',
-            'upload_date': '20170709',
-            'timestamp': 1499606400,
-        },
-        'params': {
-            'skip_download': True,
+            'title': "Bananas give you more radiation exposure than living next to a nuclear power plant",
+            'description': 'md5:0175a3baf200dd8fa658f94cade841b3',
+            'upload_date': '20160611',
+            'timestamp': 1465675620,
          },
      }, {
          'url': 'https://www.businessinsider.nl/5-scientifically-proven-things-make-you-less-attractive-2017-7/',
          },
      }, {
          'url': 'https://www.businessinsider.nl/5-scientifically-proven-things-make-you-less-attractive-2017-7/',
-        'only_matching': True,
+        'md5': '43f438dbc6da0b89f5ac42f68529d84a',
+        'info_dict': {
+            'id': '5zJwd4FK',
+            'ext': 'mp4',
+            'title': 'Deze dingen zorgen ervoor dat je minder snel een date scoort',
+            'description': 'md5:2af8975825d38a4fed24717bbe51db49',
+            'upload_date': '20170705',
+            'timestamp': 1499270528,
+        },
      }, {
          'url': 'http://www.businessinsider.com/excel-index-match-vlookup-video-how-to-2015-2?IR=T',
          'only_matching': True,
      }, {
          'url': 'http://www.businessinsider.com/excel-index-match-vlookup-video-how-to-2015-2?IR=T',
          'only_matching': True,
@@ -35,7 +40,8 @@ def _real_extract(self, url):
          jwplatform_id = self._search_regex(
              (r'data-media-id=["\']([a-zA-Z0-9]{8})',
               r'id=["\']jwplayer_([a-zA-Z0-9]{8})',
          jwplatform_id = self._search_regex(
              (r'data-media-id=["\']([a-zA-Z0-9]{8})',
               r'id=["\']jwplayer_([a-zA-Z0-9]{8})',
-             r'id["\']?\s*:\s*["\']?([a-zA-Z0-9]{8})'),
+             r'id["\']?\s*:\s*["\']?([a-zA-Z0-9]{8})',
+             r'(?:jwplatform\.com/players/|jwplayer_)([a-zA-Z0-9]{8})'),
              webpage, 'jwplatform id')
          return self.url_result(
              'jwplatform:%s' % jwplatform_id, ie=JWPlatformIE.ie_key(),
              webpage, 'jwplatform id')
          return self.url_result(
              'jwplatform:%s' % jwplatform_id, ie=JWPlatformIE.ie_key(),
diff --git a/youtube_dl/extractor/canvas.py b/youtube_dl/extractor/canvas.py

index c506bc5dd2402a95752bdf3223fe4a24cf9d06ae..8667a0d0457cccfc145cc52bc1eb1c7816aa04b8 100644 (file)
--- a/youtube_dl/extractor/canvas.py
+++ b/youtube_dl/extractor/canvas.py
@@ -13,6 +13,8 @@
      int_or_none,
      merge_dicts,
      parse_iso8601,
      int_or_none,
      merge_dicts,
      parse_iso8601,
+    str_or_none,
+    url_or_none,
  )
  
  
  )
  
  
@@ -20,15 +22,15 @@ class CanvasIE(InfoExtractor):
      _VALID_URL = r'https?://mediazone\.vrt\.be/api/v1/(?P<site_id>canvas|een|ketnet|vrt(?:video|nieuws)|sporza)/assets/(?P<id>[^/?#&]+)'
      _TESTS = [{
          'url': 'https://mediazone.vrt.be/api/v1/ketnet/assets/md-ast-4ac54990-ce66-4d00-a8ca-9eac86f4c475',
      _VALID_URL = r'https?://mediazone\.vrt\.be/api/v1/(?P<site_id>canvas|een|ketnet|vrt(?:video|nieuws)|sporza)/assets/(?P<id>[^/?#&]+)'
      _TESTS = [{
          'url': 'https://mediazone.vrt.be/api/v1/ketnet/assets/md-ast-4ac54990-ce66-4d00-a8ca-9eac86f4c475',
-        'md5': '90139b746a0a9bd7bb631283f6e2a64e',
+        'md5': '68993eda72ef62386a15ea2cf3c93107',
          'info_dict': {
              'id': 'md-ast-4ac54990-ce66-4d00-a8ca-9eac86f4c475',
              'display_id': 'md-ast-4ac54990-ce66-4d00-a8ca-9eac86f4c475',
          'info_dict': {
              'id': 'md-ast-4ac54990-ce66-4d00-a8ca-9eac86f4c475',
              'display_id': 'md-ast-4ac54990-ce66-4d00-a8ca-9eac86f4c475',
-            'ext': 'flv',
+            'ext': 'mp4',
              'title': 'Nachtwacht: De Greystook',
              'title': 'Nachtwacht: De Greystook',
-            'description': 'md5:1db3f5dc4c7109c821261e7512975be7',
+            'description': 'Nachtwacht: De Greystook',
              'thumbnail': r're:^https?://.*\.jpg$',
              'thumbnail': r're:^https?://.*\.jpg$',
-            'duration': 1468.03,
+            'duration': 1468.04,
          },
          'expected_warnings': ['is not a supported codec', 'Unknown MIME type'],
      }, {
          },
          'expected_warnings': ['is not a supported codec', 'Unknown MIME type'],
      }, {
@@ -39,23 +41,45 @@ class CanvasIE(InfoExtractor):
          'HLS': 'm3u8_native',
          'HLS_AES': 'm3u8',
      }
          'HLS': 'm3u8_native',
          'HLS_AES': 'm3u8',
      }
+    _REST_API_BASE = 'https://media-services-public.vrt.be/vualto-video-aggregator-web/rest/external/v1'
  
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
          site_id, video_id = mobj.group('site_id'), mobj.group('id')
  
  
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
          site_id, video_id = mobj.group('site_id'), mobj.group('id')
  
+        # Old API endpoint, serves more formats but may fail for some videos
          data = self._download_json(
              'https://mediazone.vrt.be/api/v1/%s/assets/%s'
          data = self._download_json(
              'https://mediazone.vrt.be/api/v1/%s/assets/%s'
-            % (site_id, video_id), video_id)
+            % (site_id, video_id), video_id, 'Downloading asset JSON',
+            'Unable to download asset JSON', fatal=False)
+
+        # New API endpoint
+        if not data:
+            token = self._download_json(
+                '%s/tokens' % self._REST_API_BASE, video_id,
+                'Downloading token', data=b'',
+                headers={'Content-Type': 'application/json'})['vrtPlayerToken']
+            data = self._download_json(
+                '%s/videos/%s' % (self._REST_API_BASE, video_id),
+                video_id, 'Downloading video JSON', fatal=False, query={
+                    'vrtPlayerToken': token,
+                    'client': '%s@PROD' % site_id,
+                }, expected_status=400)
+            message = data.get('message')
+            if message and not data.get('title'):
+                if data.get('code') == 'AUTHENTICATION_REQUIRED':
+                    self.raise_login_required(message)
+                raise ExtractorError(message, expected=True)
  
          title = data['title']
          description = data.get('description')
  
          formats = []
          for target in data['targetUrls']:
  
          title = data['title']
          description = data.get('description')
  
          formats = []
          for target in data['targetUrls']:
-            format_url, format_type = target.get('url'), target.get('type')
+            format_url, format_type = url_or_none(target.get('url')), str_or_none(target.get('type'))
              if not format_url or not format_type:
                  continue
              if not format_url or not format_type:
                  continue
+            format_type = format_type.upper()
              if format_type in self._HLS_ENTRY_PROTOCOLS_MAP:
                  formats.extend(self._extract_m3u8_formats(
                      format_url, video_id, 'mp4', self._HLS_ENTRY_PROTOCOLS_MAP[format_type],
              if format_type in self._HLS_ENTRY_PROTOCOLS_MAP:
                  formats.extend(self._extract_m3u8_formats(
                      format_url, video_id, 'mp4', self._HLS_ENTRY_PROTOCOLS_MAP[format_type],
@@ -134,20 +158,20 @@ class CanvasEenIE(InfoExtractor):
          },
          'skip': 'Pagina niet gevonden',
      }, {
          },
          'skip': 'Pagina niet gevonden',
      }, {
-        'url': 'https://www.een.be/sorry-voor-alles/herbekijk-sorry-voor-alles',
+        'url': 'https://www.een.be/thuis/emma-pakt-thilly-aan',
          'info_dict': {
          'info_dict': {
-            'id': 'mz-ast-11a587f8-b921-4266-82e2-0bce3e80d07f',
-            'display_id': 'herbekijk-sorry-voor-alles',
+            'id': 'md-ast-3a24ced2-64d7-44fb-b4ed-ed1aafbf90b8',
+            'display_id': 'emma-pakt-thilly-aan',
              'ext': 'mp4',
              'ext': 'mp4',
-            'title': 'Herbekijk Sorry voor alles',
-            'description': 'md5:8bb2805df8164e5eb95d6a7a29dc0dd3',
+            'title': 'Emma pakt Thilly aan',
+            'description': 'md5:c5c9b572388a99b2690030afa3f3bad7',
              'thumbnail': r're:^https?://.*\.jpg$',
              'thumbnail': r're:^https?://.*\.jpg$',
-            'duration': 3788.06,
+            'duration': 118.24,
          },
          'params': {
              'skip_download': True,
          },
          },
          'params': {
              'skip_download': True,
          },
-        'skip': 'Episode no longer available',
+        'expected_warnings': ['is not a supported codec'],
      }, {
          'url': 'https://www.canvas.be/check-point/najaar-2016/de-politie-uw-vriend',
          'only_matching': True,
      }, {
          'url': 'https://www.canvas.be/check-point/najaar-2016/de-politie-uw-vriend',
          'only_matching': True,
@@ -183,19 +207,44 @@ class VrtNUIE(GigyaBaseIE):
      IE_DESC = 'VrtNU.be'
      _VALID_URL = r'https?://(?:www\.)?vrt\.be/(?P<site_id>vrtnu)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
      _TESTS = [{
      IE_DESC = 'VrtNU.be'
      _VALID_URL = r'https?://(?:www\.)?vrt\.be/(?P<site_id>vrtnu)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
      _TESTS = [{
+        # Available via old API endpoint
          'url': 'https://www.vrt.be/vrtnu/a-z/postbus-x/1/postbus-x-s1a1/',
          'info_dict': {
              'id': 'pbs-pub-2e2d8c27-df26-45c9-9dc6-90c78153044d$vid-90c932b1-e21d-4fb8-99b1-db7b49cf74de',
          'url': 'https://www.vrt.be/vrtnu/a-z/postbus-x/1/postbus-x-s1a1/',
          'info_dict': {
              'id': 'pbs-pub-2e2d8c27-df26-45c9-9dc6-90c78153044d$vid-90c932b1-e21d-4fb8-99b1-db7b49cf74de',
-            'ext': 'flv',
+            'ext': 'mp4',
              'title': 'De zwarte weduwe',
              'title': 'De zwarte weduwe',
-            'description': 'md5:d90c21dced7db869a85db89a623998d4',
+            'description': 'md5:db1227b0f318c849ba5eab1fef895ee4',
              'duration': 1457.04,
              'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 1457.04,
              'thumbnail': r're:^https?://.*\.jpg$',
-            'season': '1',
+            'season': 'Season 1',
              'season_number': 1,
              'episode_number': 1,
          },
              'season_number': 1,
              'episode_number': 1,
          },
-        'skip': 'This video is only available for registered users'
+        'skip': 'This video is only available for registered users',
+        'params': {
+            'username': '<snip>',
+            'password': '<snip>',
+        },
+        'expected_warnings': ['is not a supported codec'],
+    }, {
+        # Only available via new API endpoint
+        'url': 'https://www.vrt.be/vrtnu/a-z/kamp-waes/1/kamp-waes-s1a5/',
+        'info_dict': {
+            'id': 'pbs-pub-0763b56c-64fb-4d38-b95b-af60bf433c71$vid-ad36a73c-4735-4f1f-b2c0-a38e6e6aa7e1',
+            'ext': 'mp4',
+            'title': 'Aflevering 5',
+            'description': 'Wie valt door de mand tijdens een missie?',
+            'duration': 2967.06,
+            'season': 'Season 1',
+            'season_number': 1,
+            'episode_number': 5,
+        },
+        'skip': 'This video is only available for registered users',
+        'params': {
+            'username': '<snip>',
+            'password': '<snip>',
+        },
+        'expected_warnings': ['Unable to download asset JSON', 'is not a supported codec', 'Unknown MIME type'],
      }]
      _NETRC_MACHINE = 'vrtnu'
      _APIKEY = '3_0Z2HujMtiWq_pkAjgnS2Md2E11a1AwZjYiBETtwNE-EoEHDINgtnvcAOpNgmrVGy'
      }]
      _NETRC_MACHINE = 'vrtnu'
      _APIKEY = '3_0Z2HujMtiWq_pkAjgnS2Md2E11a1AwZjYiBETtwNE-EoEHDINgtnvcAOpNgmrVGy'
diff --git a/youtube_dl/extractor/cbc.py b/youtube_dl/extractor/cbc.py

index 751a3a8f26c94ecb19c130503593515312bac6c7..fd5ec6033b80513012cf2615fc56e80c7e82cadc 100644 (file)
--- a/youtube_dl/extractor/cbc.py
+++ b/youtube_dl/extractor/cbc.py
@@ -1,8 +1,10 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
  # coding: utf-8
  from __future__ import unicode_literals
  
+import hashlib
  import json
  import re
  import json
  import re
+from xml.sax.saxutils import escape
  
  from .common import InfoExtractor
  from ..compat import (
  
  from .common import InfoExtractor
  from ..compat import (
@@ -216,6 +218,29 @@ class CBCWatchBaseIE(InfoExtractor):
          'clearleap': 'http://www.clearleap.com/namespace/clearleap/1.0/',
      }
      _GEO_COUNTRIES = ['CA']
          'clearleap': 'http://www.clearleap.com/namespace/clearleap/1.0/',
      }
      _GEO_COUNTRIES = ['CA']
+    _LOGIN_URL = 'https://api.loginradius.com/identity/v2/auth/login'
+    _TOKEN_URL = 'https://cloud-api.loginradius.com/sso/jwt/api/token'
+    _API_KEY = '3f4beddd-2061-49b0-ae80-6f1f2ed65b37'
+    _NETRC_MACHINE = 'cbcwatch'
+
+    def _signature(self, email, password):
+        data = json.dumps({
+            'email': email,
+            'password': password,
+        }).encode()
+        headers = {'content-type': 'application/json'}
+        query = {'apikey': self._API_KEY}
+        resp = self._download_json(self._LOGIN_URL, None, data=data, headers=headers, query=query)
+        access_token = resp['access_token']
+
+        # token
+        query = {
+            'access_token': access_token,
+            'apikey': self._API_KEY,
+            'jwtapp': 'jwt',
+        }
+        resp = self._download_json(self._TOKEN_URL, None, headers=headers, query=query)
+        return resp['signature']
  
      def _call_api(self, path, video_id):
          url = path if path.startswith('http') else self._API_BASE_URL + path
  
      def _call_api(self, path, video_id):
          url = path if path.startswith('http') else self._API_BASE_URL + path
@@ -239,7 +264,8 @@ def _call_api(self, path, video_id):
      def _real_initialize(self):
          if self._valid_device_token():
              return
      def _real_initialize(self):
          if self._valid_device_token():
              return
-        device = self._downloader.cache.load('cbcwatch', 'device') or {}
+        device = self._downloader.cache.load(
+            'cbcwatch', self._cache_device_key()) or {}
          self._device_id, self._device_token = device.get('id'), device.get('token')
          if self._valid_device_token():
              return
          self._device_id, self._device_token = device.get('id'), device.get('token')
          if self._valid_device_token():
              return
@@ -248,16 +274,30 @@ def _real_initialize(self):
      def _valid_device_token(self):
          return self._device_id and self._device_token
  
      def _valid_device_token(self):
          return self._device_id and self._device_token
  
+    def _cache_device_key(self):
+        email, _ = self._get_login_info()
+        return '%s_device' % hashlib.sha256(email.encode()).hexdigest() if email else 'device'
+
      def _register_device(self):
      def _register_device(self):
-        self._device_id = self._device_token = None
          result = self._download_xml(
              self._API_BASE_URL + 'device/register',
              None, 'Acquiring device token',
              data=b'<device><type>web</type></device>')
          self._device_id = xpath_text(result, 'deviceId', fatal=True)
          result = self._download_xml(
              self._API_BASE_URL + 'device/register',
              None, 'Acquiring device token',
              data=b'<device><type>web</type></device>')
          self._device_id = xpath_text(result, 'deviceId', fatal=True)
-        self._device_token = xpath_text(result, 'deviceToken', fatal=True)
+        email, password = self._get_login_info()
+        if email and password:
+            signature = self._signature(email, password)
+            data = '<login><token>{0}</token><device><deviceId>{1}</deviceId><type>web</type></device></login>'.format(
+                escape(signature), escape(self._device_id)).encode()
+            url = self._API_BASE_URL + 'device/login'
+            result = self._download_xml(
+                url, None, data=data,
+                headers={'content-type': 'application/xml'})
+            self._device_token = xpath_text(result, 'token', fatal=True)
+        else:
+            self._device_token = xpath_text(result, 'deviceToken', fatal=True)
          self._downloader.cache.store(
          self._downloader.cache.store(
-            'cbcwatch', 'device', {
+            'cbcwatch', self._cache_device_key(), {
                  'id': self._device_id,
                  'token': self._device_token,
              })
                  'id': self._device_id,
                  'token': self._device_token,
              })
diff --git a/youtube_dl/extractor/cloudflarestream.py b/youtube_dl/extractor/cloudflarestream.py

index 8ff2c6531570ee3a06210832cf967c7033d6ab9f..2fdcfbb3af1fbffb9e66abff56b86e31762ad449 100644 (file)
--- a/youtube_dl/extractor/cloudflarestream.py
+++ b/youtube_dl/extractor/cloudflarestream.py
@@ -1,20 +1,24 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
  # coding: utf-8
  from __future__ import unicode_literals
  
+import base64
  import re
  
  from .common import InfoExtractor
  
  
  class CloudflareStreamIE(InfoExtractor):
  import re
  
  from .common import InfoExtractor
  
  
  class CloudflareStreamIE(InfoExtractor):
+    _DOMAIN_RE = r'(?:cloudflarestream\.com|(?:videodelivery|bytehighway)\.net)'
+    _EMBED_RE = r'embed\.%s/embed/[^/]+\.js\?.*?\bvideo=' % _DOMAIN_RE
+    _ID_RE = r'[\da-f]{32}|[\w-]+\.[\w-]+\.[\w-]+'
      _VALID_URL = r'''(?x)
                      https?://
                          (?:
      _VALID_URL = r'''(?x)
                      https?://
                          (?:
-                            (?:watch\.)?(?:cloudflarestream\.com|videodelivery\.net)/|
-                            embed\.(?:cloudflarestream\.com|videodelivery\.net)/embed/[^/]+\.js\?.*?\bvideo=
+                            (?:watch\.)?%s/|
+                            %s
                          )
                          )
-                        (?P<id>[\da-f]+)
-                    '''
+                        (?P<id>%s)
+                    ''' % (_DOMAIN_RE, _EMBED_RE, _ID_RE)
      _TESTS = [{
          'url': 'https://embed.cloudflarestream.com/embed/we4g.fla9.latest.js?video=31c9291ab41fac05471db4e73aa11717',
          'info_dict': {
      _TESTS = [{
          'url': 'https://embed.cloudflarestream.com/embed/we4g.fla9.latest.js?video=31c9291ab41fac05471db4e73aa11717',
          'info_dict': {
@@ -41,23 +45,28 @@ def _extract_urls(webpage):
          return [
              mobj.group('url')
              for mobj in re.finditer(
          return [
              mobj.group('url')
              for mobj in re.finditer(
-                r'<script[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//embed\.(?:cloudflarestream\.com|videodelivery\.net)/embed/[^/]+\.js\?.*?\bvideo=[\da-f]+?.*?)\1',
+                r'<script[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//%s(?:%s).*?)\1' % (CloudflareStreamIE._EMBED_RE, CloudflareStreamIE._ID_RE),
                  webpage)]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
                  webpage)]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
+        domain = 'bytehighway.net' if 'bytehighway.net/' in url else 'videodelivery.net'
+        base_url = 'https://%s/%s/' % (domain, video_id)
+        if '.' in video_id:
+            video_id = self._parse_json(base64.urlsafe_b64decode(
+                video_id.split('.')[1]), video_id)['sub']
+        manifest_base_url = base_url + 'manifest/video.'
  
          formats = self._extract_m3u8_formats(
  
          formats = self._extract_m3u8_formats(
-            'https://cloudflarestream.com/%s/manifest/video.m3u8' % video_id,
-            video_id, 'mp4', entry_protocol='m3u8_native', m3u8_id='hls',
-            fatal=False)
+            manifest_base_url + 'm3u8', video_id, 'mp4',
+            'm3u8_native', m3u8_id='hls', fatal=False)
          formats.extend(self._extract_mpd_formats(
          formats.extend(self._extract_mpd_formats(
-            'https://cloudflarestream.com/%s/manifest/video.mpd' % video_id,
-            video_id, mpd_id='dash', fatal=False))
+            manifest_base_url + 'mpd', video_id, mpd_id='dash', fatal=False))
          self._sort_formats(formats)
  
          return {
              'id': video_id,
              'title': video_id,
          self._sort_formats(formats)
  
          return {
              'id': video_id,
              'title': video_id,
+            'thumbnail': base_url + 'thumbnails/thumbnail.jpg',
              'formats': formats,
          }
              'formats': formats,
          }
diff --git a/youtube_dl/extractor/common.py b/youtube_dl/extractor/common.py

index eaae5e484f99311ccf018301f773c87f9c8cf544..a61753b17cd35835474c347c2b438e5f32949d73 100644 (file)
--- a/youtube_dl/extractor/common.py
+++ b/youtube_dl/extractor/common.py
@@ -15,7 +15,7 @@
  import math
  
  from ..compat import (
  import math
  
  from ..compat import (
-    compat_cookiejar,
+    compat_cookiejar_Cookie,
      compat_cookies,
      compat_etree_Element,
      compat_etree_fromstring,
      compat_cookies,
      compat_etree_Element,
      compat_etree_fromstring,
@@ -1182,16 +1182,33 @@ def _twitter_search_player(self, html):
                                        'twitter card player')
  
      def _search_json_ld(self, html, video_id, expected_type=None, **kwargs):
                                        'twitter card player')
  
      def _search_json_ld(self, html, video_id, expected_type=None, **kwargs):
-        json_ld = self._search_regex(
-            JSON_LD_RE, html, 'JSON-LD', group='json_ld', **kwargs)
+        json_ld_list = list(re.finditer(JSON_LD_RE, html))
          default = kwargs.get('default', NO_DEFAULT)
          default = kwargs.get('default', NO_DEFAULT)
-        if not json_ld:
-            return default if default is not NO_DEFAULT else {}
          # JSON-LD may be malformed and thus `fatal` should be respected.
          # At the same time `default` may be passed that assumes `fatal=False`
          # for _search_regex. Let's simulate the same behavior here as well.
          fatal = kwargs.get('fatal', True) if default == NO_DEFAULT else False
          # JSON-LD may be malformed and thus `fatal` should be respected.
          # At the same time `default` may be passed that assumes `fatal=False`
          # for _search_regex. Let's simulate the same behavior here as well.
          fatal = kwargs.get('fatal', True) if default == NO_DEFAULT else False
-        return self._json_ld(json_ld, video_id, fatal=fatal, expected_type=expected_type)
+        json_ld = []
+        for mobj in json_ld_list:
+            json_ld_item = self._parse_json(
+                mobj.group('json_ld'), video_id, fatal=fatal)
+            if not json_ld_item:
+                continue
+            if isinstance(json_ld_item, dict):
+                json_ld.append(json_ld_item)
+            elif isinstance(json_ld_item, (list, tuple)):
+                json_ld.extend(json_ld_item)
+        if json_ld:
+            json_ld = self._json_ld(json_ld, video_id, fatal=fatal, expected_type=expected_type)
+        if json_ld:
+            return json_ld
+        if default is not NO_DEFAULT:
+            return default
+        elif fatal:
+            raise RegexNotFoundError('Unable to extract JSON-LD')
+        else:
+            self._downloader.report_warning('unable to extract JSON-LD %s' % bug_reports_message())
+            return {}
  
      def _json_ld(self, json_ld, video_id, fatal=True, expected_type=None):
          if isinstance(json_ld, compat_str):
  
      def _json_ld(self, json_ld, video_id, fatal=True, expected_type=None):
          if isinstance(json_ld, compat_str):
@@ -1256,10 +1273,10 @@ def extract_video_object(e):
              extract_interaction_statistic(e)
  
          for e in json_ld:
              extract_interaction_statistic(e)
  
          for e in json_ld:
-            if isinstance(e.get('@context'), compat_str) and re.match(r'^https?://schema.org/?$', e.get('@context')):
+            if '@context' in e:
                  item_type = e.get('@type')
                  if expected_type is not None and expected_type != item_type:
                  item_type = e.get('@type')
                  if expected_type is not None and expected_type != item_type:
-                    return info
+                    continue
                  if item_type in ('TVEpisode', 'Episode'):
                      episode_name = unescapeHTML(e.get('name'))
                      info.update({
                  if item_type in ('TVEpisode', 'Episode'):
                      episode_name = unescapeHTML(e.get('name'))
                      info.update({
@@ -1293,11 +1310,17 @@ def extract_video_object(e):
                      })
                  elif item_type == 'VideoObject':
                      extract_video_object(e)
                      })
                  elif item_type == 'VideoObject':
                      extract_video_object(e)
-                    continue
+                    if expected_type is None:
+                        continue
+                    else:
+                        break
                  video = e.get('video')
                  if isinstance(video, dict) and video.get('@type') == 'VideoObject':
                      extract_video_object(video)
                  video = e.get('video')
                  if isinstance(video, dict) and video.get('@type') == 'VideoObject':
                      extract_video_object(video)
-                break
+                if expected_type is None:
+                    continue
+                else:
+                    break
          return dict((k, v) for k, v in info.items() if v is not None)
  
      @staticmethod
          return dict((k, v) for k, v in info.items() if v is not None)
  
      @staticmethod
@@ -2340,6 +2363,8 @@ def _extract_ism_formats(self, ism_url, video_id, ism_id=None, note=None, errnot
          if res is False:
              return []
          ism_doc, urlh = res
          if res is False:
              return []
          ism_doc, urlh = res
+        if ism_doc is None:
+            return []
  
          return self._parse_ism_formats(ism_doc, urlh.geturl(), ism_id)
  
  
          return self._parse_ism_formats(ism_doc, urlh.geturl(), ism_id)
  
@@ -2818,7 +2843,7 @@ def _float(self, v, name, fatal=False, **kwargs):
  
      def _set_cookie(self, domain, name, value, expire_time=None, port=None,
                      path='/', secure=False, discard=False, rest={}, **kwargs):
  
      def _set_cookie(self, domain, name, value, expire_time=None, port=None,
                      path='/', secure=False, discard=False, rest={}, **kwargs):
-        cookie = compat_cookiejar.Cookie(
+        cookie = compat_cookiejar_Cookie(
              0, name, value, port, port is not None, domain, True,
              domain.startswith('.'), path, True, secure, expire_time,
              discard, None, None, rest)
              0, name, value, port, port is not None, domain, True,
              domain.startswith('.'), path, True, secure, expire_time,
              discard, None, None, rest)
diff --git a/youtube_dl/extractor/crunchyroll.py b/youtube_dl/extractor/crunchyroll.py

index 85a9a577f645395d6edde83e7b72a1f4001561f8..bc2d1fa8b041e3ec1bbc4d6d1b5f055ac31ee140 100644 (file)
--- a/youtube_dl/extractor/crunchyroll.py
+++ b/youtube_dl/extractor/crunchyroll.py
@@ -13,6 +13,7 @@
      compat_b64decode,
      compat_etree_Element,
      compat_etree_fromstring,
      compat_b64decode,
      compat_etree_Element,
      compat_etree_fromstring,
+    compat_str,
      compat_urllib_parse_urlencode,
      compat_urllib_request,
      compat_urlparse,
      compat_urllib_parse_urlencode,
      compat_urllib_request,
      compat_urlparse,
@@ -25,9 +26,9 @@
      intlist_to_bytes,
      int_or_none,
      lowercase_escape,
      intlist_to_bytes,
      int_or_none,
      lowercase_escape,
+    merge_dicts,
      remove_end,
      sanitized_Request,
      remove_end,
      sanitized_Request,
-    unified_strdate,
      urlencode_postdata,
      xpath_text,
  )
      urlencode_postdata,
      xpath_text,
  )
@@ -136,6 +137,7 @@ class CrunchyrollIE(CrunchyrollBaseIE, VRVIE):
              # rtmp
              'skip_download': True,
          },
              # rtmp
              'skip_download': True,
          },
+        'skip': 'Video gone',
      }, {
          'url': 'http://www.crunchyroll.com/media-589804/culture-japan-1',
          'info_dict': {
      }, {
          'url': 'http://www.crunchyroll.com/media-589804/culture-japan-1',
          'info_dict': {
@@ -157,11 +159,12 @@ class CrunchyrollIE(CrunchyrollBaseIE, VRVIE):
          'info_dict': {
              'id': '702409',
              'ext': 'mp4',
          'info_dict': {
              'id': '702409',
              'ext': 'mp4',
-            'title': 'Re:ZERO -Starting Life in Another World- Episode 5 – The Morning of Our Promise Is Still Distant',
-            'description': 'md5:97664de1ab24bbf77a9c01918cb7dca9',
+            'title': compat_str,
+            'description': compat_str,
              'thumbnail': r're:^https?://.*\.jpg$',
              'thumbnail': r're:^https?://.*\.jpg$',
-            'uploader': 'TV TOKYO',
-            'upload_date': '20160508',
+            'uploader': 'Re:Zero Partners',
+            'timestamp': 1462098900,
+            'upload_date': '20160501',
          },
          'params': {
              # m3u8 download
          },
          'params': {
              # m3u8 download
@@ -172,12 +175,13 @@ class CrunchyrollIE(CrunchyrollBaseIE, VRVIE):
          'info_dict': {
              'id': '727589',
              'ext': 'mp4',
          'info_dict': {
              'id': '727589',
              'ext': 'mp4',
-            'title': "KONOSUBA -God's blessing on this wonderful world! 2 Episode 1 – Give Me Deliverance From This Judicial Injustice!",
-            'description': 'md5:cbcf05e528124b0f3a0a419fc805ea7d',
+            'title': compat_str,
+            'description': compat_str,
              'thumbnail': r're:^https?://.*\.jpg$',
              'uploader': 'Kadokawa Pictures Inc.',
              'thumbnail': r're:^https?://.*\.jpg$',
              'uploader': 'Kadokawa Pictures Inc.',
-            'upload_date': '20170118',
-            'series': "KONOSUBA -God's blessing on this wonderful world!",
+            'timestamp': 1484130900,
+            'upload_date': '20170111',
+            'series': compat_str,
              'season': "KONOSUBA -God's blessing on this wonderful world! 2",
              'season_number': 2,
              'episode': 'Give Me Deliverance From This Judicial Injustice!',
              'season': "KONOSUBA -God's blessing on this wonderful world! 2",
              'season_number': 2,
              'episode': 'Give Me Deliverance From This Judicial Injustice!',
@@ -200,10 +204,11 @@ class CrunchyrollIE(CrunchyrollBaseIE, VRVIE):
          'info_dict': {
              'id': '535080',
              'ext': 'mp4',
          'info_dict': {
              'id': '535080',
              'ext': 'mp4',
-            'title': '11eyes Episode 1 – Red Night ~ Piros éjszaka',
-            'description': 'Kakeru and Yuka are thrown into an alternate nightmarish world they call "Red Night".',
+            'title': compat_str,
+            'description': compat_str,
              'uploader': 'Marvelous AQL Inc.',
              'uploader': 'Marvelous AQL Inc.',
-            'upload_date': '20091021',
+            'timestamp': 1255512600,
+            'upload_date': '20091014',
          },
          'params': {
              # Just test metadata extraction
          },
          'params': {
              # Just test metadata extraction
@@ -224,15 +229,17 @@ class CrunchyrollIE(CrunchyrollBaseIE, VRVIE):
              # just test metadata extraction
              'skip_download': True,
          },
              # just test metadata extraction
              'skip_download': True,
          },
+        'skip': 'Video gone',
      }, {
          # A video with a vastly different season name compared to the series name
          'url': 'http://www.crunchyroll.com/nyarko-san-another-crawling-chaos/episode-1-test-590532',
          'info_dict': {
              'id': '590532',
              'ext': 'mp4',
      }, {
          # A video with a vastly different season name compared to the series name
          'url': 'http://www.crunchyroll.com/nyarko-san-another-crawling-chaos/episode-1-test-590532',
          'info_dict': {
              'id': '590532',
              'ext': 'mp4',
-            'title': 'Haiyoru! Nyaruani (ONA) Episode 1 – Test',
-            'description': 'Mahiro and Nyaruko talk about official certification.',
+            'title': compat_str,
+            'description': compat_str,
              'uploader': 'TV TOKYO',
              'uploader': 'TV TOKYO',
+            'timestamp': 1330956000,
              'upload_date': '20120305',
              'series': 'Nyarko-san: Another Crawling Chaos',
              'season': 'Haiyoru! Nyaruani (ONA)',
              'upload_date': '20120305',
              'series': 'Nyarko-san: Another Crawling Chaos',
              'season': 'Haiyoru! Nyaruani (ONA)',
@@ -442,23 +449,21 @@ def _real_extract(self, url):
              webpage, 'language', default=None, group='lang')
  
          video_title = self._html_search_regex(
              webpage, 'language', default=None, group='lang')
  
          video_title = self._html_search_regex(
-            r'(?s)<h1[^>]*>((?:(?!<h1).)*?<span[^>]+itemprop=["\']title["\'][^>]*>(?:(?!<h1).)+?)</h1>',
-            webpage, 'video_title')
+            (r'(?s)<h1[^>]*>((?:(?!<h1).)*?<(?:span[^>]+itemprop=["\']title["\']|meta[^>]+itemprop=["\']position["\'])[^>]*>(?:(?!<h1).)+?)</h1>',
+             r'<title>(.+?),\s+-\s+.+? Crunchyroll'),
+            webpage, 'video_title', default=None)
+        if not video_title:
+            video_title = re.sub(r'^Watch\s+', '', self._og_search_description(webpage))
          video_title = re.sub(r' {2,}', ' ', video_title)
          video_description = (self._parse_json(self._html_search_regex(
              r'<script[^>]*>\s*.+?\[media_id=%s\].+?({.+?"description"\s*:.+?})\);' % video_id,
              webpage, 'description', default='{}'), video_id) or media_metadata).get('description')
          if video_description:
              video_description = lowercase_escape(video_description.replace(r'\r\n', '\n'))
          video_title = re.sub(r' {2,}', ' ', video_title)
          video_description = (self._parse_json(self._html_search_regex(
              r'<script[^>]*>\s*.+?\[media_id=%s\].+?({.+?"description"\s*:.+?})\);' % video_id,
              webpage, 'description', default='{}'), video_id) or media_metadata).get('description')
          if video_description:
              video_description = lowercase_escape(video_description.replace(r'\r\n', '\n'))
-        video_upload_date = self._html_search_regex(
-            [r'<div>Availability for free users:(.+?)</div>', r'<div>[^<>]+<span>\s*(.+?\d{4})\s*</span></div>'],
-            webpage, 'video_upload_date', fatal=False, flags=re.DOTALL)
-        if video_upload_date:
-            video_upload_date = unified_strdate(video_upload_date)
          video_uploader = self._html_search_regex(
              # try looking for both an uploader that's a link and one that's not
              [r'<a[^>]+href="/publisher/[^"]+"[^>]*>([^<]+)</a>', r'<div>\s*Publisher:\s*<span>\s*(.+?)\s*</span>\s*</div>'],
          video_uploader = self._html_search_regex(
              # try looking for both an uploader that's a link and one that's not
              [r'<a[^>]+href="/publisher/[^"]+"[^>]*>([^<]+)</a>', r'<div>\s*Publisher:\s*<span>\s*(.+?)\s*</span>\s*</div>'],
-            webpage, 'video_uploader', fatal=False)
+            webpage, 'video_uploader', default=False)
  
          formats = []
          for stream in media.get('streams', []):
  
          formats = []
          for stream in media.get('streams', []):
@@ -611,14 +616,15 @@ def _real_extract(self, url):
              r'(?s)<h\d[^>]+id=["\']showmedia_about_episode_num[^>]+>.+?</h\d>\s*<h4>\s*Season (\d+)',
              webpage, 'season number', default=None))
  
              r'(?s)<h\d[^>]+id=["\']showmedia_about_episode_num[^>]+>.+?</h\d>\s*<h4>\s*Season (\d+)',
              webpage, 'season number', default=None))
  
-        return {
+        info = self._search_json_ld(webpage, video_id, default={})
+
+        return merge_dicts({
              'id': video_id,
              'title': video_title,
              'description': video_description,
              'duration': duration,
              'thumbnail': thumbnail,
              'uploader': video_uploader,
              'id': video_id,
              'title': video_title,
              'description': video_description,
              'duration': duration,
              'thumbnail': thumbnail,
              'uploader': video_uploader,
-            'upload_date': video_upload_date,
              'series': series,
              'season': season,
              'season_number': season_number,
              'series': series,
              'season': season,
              'season_number': season_number,
@@ -626,7 +632,7 @@ def _real_extract(self, url):
              'episode_number': episode_number,
              'subtitles': subtitles,
              'formats': formats,
              'episode_number': episode_number,
              'subtitles': subtitles,
              'formats': formats,
-        }
+        }, info)
  
  
  class CrunchyrollShowPlaylistIE(CrunchyrollBaseIE):
  
  
  class CrunchyrollShowPlaylistIE(CrunchyrollBaseIE):
diff --git a/youtube_dl/extractor/dailymotion.py b/youtube_dl/extractor/dailymotion.py

index 327fdb04a71215b8de170314d7080093ced3bbba..b8529050c45c8ee1479e2624ba26aac3f002b9fe 100644 (file)
--- a/youtube_dl/extractor/dailymotion.py
+++ b/youtube_dl/extractor/dailymotion.py
@@ -32,7 +32,7 @@ def _get_dailymotion_cookies(self):
  
      @staticmethod
      def _get_cookie_value(cookies, name):
  
      @staticmethod
      def _get_cookie_value(cookies, name):
-        cookie = cookies.get('name')
+        cookie = cookies.get(name)
          if cookie:
              return cookie.value
  
          if cookie:
              return cookie.value
  
diff --git a/youtube_dl/extractor/dctp.py b/youtube_dl/extractor/dctp.py

index 04ff214f727826a60bbdde5ec17bb48ba004a91e..e700f8d86531415da0f1db0f2ccdaef6ea10ac53 100644 (file)
--- a/youtube_dl/extractor/dctp.py
+++ b/youtube_dl/extractor/dctp.py
@@ -16,10 +16,11 @@ class DctpTvIE(InfoExtractor):
      _TESTS = [{
          # 4x3
          'url': 'http://www.dctp.tv/filme/videoinstallation-fuer-eine-kaufhausfassade/',
      _TESTS = [{
          # 4x3
          'url': 'http://www.dctp.tv/filme/videoinstallation-fuer-eine-kaufhausfassade/',
+        'md5': '3ffbd1556c3fe210724d7088fad723e3',
          'info_dict': {
              'id': '95eaa4f33dad413aa17b4ee613cccc6c',
              'display_id': 'videoinstallation-fuer-eine-kaufhausfassade',
          'info_dict': {
              'id': '95eaa4f33dad413aa17b4ee613cccc6c',
              'display_id': 'videoinstallation-fuer-eine-kaufhausfassade',
-            'ext': 'flv',
+            'ext': 'm4v',
              'title': 'Videoinstallation für eine Kaufhausfassade',
              'description': 'Kurzfilm',
              'thumbnail': r're:^https?://.*\.jpg$',
              'title': 'Videoinstallation für eine Kaufhausfassade',
              'description': 'Kurzfilm',
              'thumbnail': r're:^https?://.*\.jpg$',
@@ -27,10 +28,6 @@ class DctpTvIE(InfoExtractor):
              'timestamp': 1302172322,
              'upload_date': '20110407',
          },
              'timestamp': 1302172322,
              'upload_date': '20110407',
          },
-        'params': {
-            # rtmp download
-            'skip_download': True,
-        },
      }, {
          # 16x9
          'url': 'http://www.dctp.tv/filme/sind-youtuber-die-besseren-lehrer/',
      }, {
          # 16x9
          'url': 'http://www.dctp.tv/filme/sind-youtuber-die-besseren-lehrer/',
@@ -59,33 +56,26 @@ def _real_extract(self, url):
  
          uuid = media['uuid']
          title = media['title']
  
          uuid = media['uuid']
          title = media['title']
-        ratio = '16x9' if media.get('is_wide') else '4x3'
-        play_path = 'mp4:%s_dctp_0500_%s.m4v' % (uuid, ratio)
-
-        servers = self._download_json(
-            'http://www.dctp.tv/streaming_servers/', display_id,
-            note='Downloading server list JSON', fatal=False)
-
-        if servers:
-            endpoint = next(
-                server['endpoint']
-                for server in servers
-                if url_or_none(server.get('endpoint'))
-                and 'cloudfront' in server['endpoint'])
-        else:
-            endpoint = 'rtmpe://s2pqqn4u96e4j8.cloudfront.net/cfx/st/'
-
-        app = self._search_regex(
-            r'^rtmpe?://[^/]+/(?P<app>.*)$', endpoint, 'app')
-
-        formats = [{
-            'url': endpoint,
-            'app': app,
-            'play_path': play_path,
-            'page_url': url,
-            'player_url': 'http://svm-prod-dctptv-static.s3.amazonaws.com/dctptv-relaunch2012-110.swf',
-            'ext': 'flv',
-        }]
+        is_wide = media.get('is_wide')
+        formats = []
+
+        def add_formats(suffix):
+            templ = 'https://%%s/%s_dctp_%s.m4v' % (uuid, suffix)
+            formats.extend([{
+                'format_id': 'hls-' + suffix,
+                'url': templ % 'cdn-segments.dctp.tv' + '/playlist.m3u8',
+                'protocol': 'm3u8_native',
+            }, {
+                'format_id': 's3-' + suffix,
+                'url': templ % 'completed-media.s3.amazonaws.com',
+            }, {
+                'format_id': 'http-' + suffix,
+                'url': templ % 'cdn-media.dctp.tv',
+            }])
+
+        add_formats('0500_' + ('16x9' if is_wide else '4x3'))
+        if is_wide:
+            add_formats('720p')
  
          thumbnails = []
          images = media.get('images')
  
          thumbnails = []
          images = media.get('images')
diff --git a/youtube_dl/extractor/discovery.py b/youtube_dl/extractor/discovery.py

index 6a2712cc50429b7297a9d4fe9e1ec2d80177986e..e0139cc862d74bc3c20a9d6567747b96b642c730 100644 (file)
--- a/youtube_dl/extractor/discovery.py
+++ b/youtube_dl/extractor/discovery.py
@@ -13,8 +13,8 @@
  class DiscoveryIE(DiscoveryGoBaseIE):
      _VALID_URL = r'''(?x)https?://
          (?P<site>
  class DiscoveryIE(DiscoveryGoBaseIE):
      _VALID_URL = r'''(?x)https?://
          (?P<site>
-            (?:(?:www|go)\.)?discovery|
-            (?:www\.)?
+            go\.discovery|
+            www\.
                  (?:
                      investigationdiscovery|
                      discoverylife|
                  (?:
                      investigationdiscovery|
                      discoverylife|
@@ -22,8 +22,7 @@ class DiscoveryIE(DiscoveryGoBaseIE):
                      ahctv|
                      destinationamerica|
                      sciencechannel|
                      ahctv|
                      destinationamerica|
                      sciencechannel|
-                    tlc|
-                    velocity
+                    tlc
                  )|
              watch\.
                  (?:
                  )|
              watch\.
                  (?:
@@ -83,7 +82,7 @@ def _real_extract(self, url):
                      'authRel': 'authorization',
                      'client_id': '3020a40c2356a645b4b4',
                      'nonce': ''.join([random.choice(string.ascii_letters) for _ in range(32)]),
                      'authRel': 'authorization',
                      'client_id': '3020a40c2356a645b4b4',
                      'nonce': ''.join([random.choice(string.ascii_letters) for _ in range(32)]),
-                    'redirectUri': 'https://fusion.ddmcdn.com/app/mercury-sdk/180/redirectHandler.html?https://www.%s.com' % site,
+                    'redirectUri': 'https://www.discovery.com/',
                  })['access_token']
  
          headers = self.geo_verification_headers()
                  })['access_token']
  
          headers = self.geo_verification_headers()
diff --git a/youtube_dl/extractor/eporner.py b/youtube_dl/extractor/eporner.py

index c050bf9df3fb7ececed5b3e03a70aea9b2c37417..fe42821c731c711e8f0974fd4ce48f5c9aee8e8f 100644 (file)
--- a/youtube_dl/extractor/eporner.py
+++ b/youtube_dl/extractor/eporner.py
@@ -4,7 +4,6 @@
  import re
  
  from .common import InfoExtractor
  import re
  
  from .common import InfoExtractor
-from ..compat import compat_str
  from ..utils import (
      encode_base_n,
      ExtractorError,
  from ..utils import (
      encode_base_n,
      ExtractorError,
@@ -55,7 +54,7 @@ def _real_extract(self, url):
  
          webpage, urlh = self._download_webpage_handle(url, display_id)
  
  
          webpage, urlh = self._download_webpage_handle(url, display_id)
  
-        video_id = self._match_id(compat_str(urlh.geturl()))
+        video_id = self._match_id(urlh.geturl())
  
          hash = self._search_regex(
              r'hash\s*:\s*["\']([\da-f]{32})', webpage, 'hash')
  
          hash = self._search_regex(
              r'hash\s*:\s*["\']([\da-f]{32})', webpage, 'hash')
diff --git a/youtube_dl/extractor/extractors.py b/youtube_dl/extractor/extractors.py

index 50f69f0b6c153d2be19250b829bf2a28af25cec0..3d3dae7a4d90d00ad2a8de19d09383166db97877 100644 (file)
--- a/youtube_dl/extractor/extractors.py
+++ b/youtube_dl/extractor/extractors.py
@@ -105,6 +105,7 @@
      BiliBiliBangumiIE,
      BilibiliAudioIE,
      BilibiliAudioAlbumIE,
      BiliBiliBangumiIE,
      BilibiliAudioIE,
      BilibiliAudioAlbumIE,
+    BiliBiliPlayerIE,
  )
  from .biobiochiletv import BioBioChileTVIE
  from .bitchute import (
  )
  from .biobiochiletv import BioBioChileTVIE
  from .bitchute import (
@@ -497,7 +498,6 @@
  from .jove import JoveIE
  from .joj import JojIE
  from .jwplatform import JWPlatformIE
  from .jove import JoveIE
  from .joj import JojIE
  from .jwplatform import JWPlatformIE
-from .jpopsukitv import JpopsukiIE
  from .kakao import KakaoIE
  from .kaltura import KalturaIE
  from .kanalplay import KanalPlayIE
  from .kakao import KakaoIE
  from .kaltura import KalturaIE
  from .kanalplay import KanalPlayIE
@@ -636,7 +636,10 @@
  from .mlb import MLBIE
  from .mnet import MnetIE
  from .moevideo import MoeVideoIE
  from .mlb import MLBIE
  from .mnet import MnetIE
  from .moevideo import MoeVideoIE
-from .mofosex import MofosexIE
+from .mofosex import (
+    MofosexIE,
+    MofosexEmbedIE,
+)
  from .mojvideo import MojvideoIE
  from .morningstar import MorningstarIE
  from .motherless import (
  from .mojvideo import MojvideoIE
  from .morningstar import MorningstarIE
  from .motherless import (
@@ -801,6 +804,16 @@
      ORFFM4IE,
      ORFFM4StoryIE,
      ORFOE1IE,
      ORFFM4IE,
      ORFFM4StoryIE,
      ORFOE1IE,
+    ORFOE3IE,
+    ORFNOEIE,
+    ORFWIEIE,
+    ORFBGLIE,
+    ORFOOEIE,
+    ORFSTMIE,
+    ORFKTNIE,
+    ORFSBGIE,
+    ORFTIRIE,
+    ORFVBGIE,
      ORFIPTVIE,
  )
  from .outsidetv import OutsideTVIE
      ORFIPTVIE,
  )
  from .outsidetv import OutsideTVIE
@@ -808,7 +821,6 @@
      PacktPubIE,
      PacktPubCourseIE,
  )
      PacktPubIE,
      PacktPubCourseIE,
  )
-from .pandatv import PandaTVIE
  from .pandoratv import PandoraTVIE
  from .parliamentliveuk import ParliamentLiveUKIE
  from .patreon import PatreonIE
  from .pandoratv import PandoraTVIE
  from .parliamentliveuk import ParliamentLiveUKIE
  from .patreon import PatreonIE
@@ -851,6 +863,7 @@
      PolskieRadioIE,
      PolskieRadioCategoryIE,
  )
      PolskieRadioIE,
      PolskieRadioCategoryIE,
  )
+from .popcorntimes import PopcorntimesIE
  from .popcorntv import PopcornTVIE
  from .porn91 import Porn91IE
  from .porncom import PornComIE
  from .popcorntv import PopcornTVIE
  from .porn91 import Porn91IE
  from .porncom import PornComIE
@@ -963,7 +976,10 @@
  from .sbs import SBSIE
  from .screencast import ScreencastIE
  from .screencastomatic import ScreencastOMaticIE
  from .sbs import SBSIE
  from .screencast import ScreencastIE
  from .screencastomatic import ScreencastOMaticIE
-from .scrippsnetworks import ScrippsNetworksWatchIE
+from .scrippsnetworks import (
+    ScrippsNetworksWatchIE,
+    ScrippsNetworksIE,
+)
  from .scte import (
      SCTEIE,
      SCTECourseIE,
  from .scte import (
      SCTEIE,
      SCTECourseIE,
diff --git a/youtube_dl/extractor/facebook.py b/youtube_dl/extractor/facebook.py

index ce64e26831fdafceb97b6d8ae919c00a78f0f90f..610d6674592384922f9df7af4da5958592ce56bd 100644 (file)
--- a/youtube_dl/extractor/facebook.py
+++ b/youtube_dl/extractor/facebook.py
@@ -466,15 +466,18 @@ def _real_extract(self, url):
              return info_dict
  
          if '/posts/' in url:
              return info_dict
  
          if '/posts/' in url:
-            entries = [
-                self.url_result('facebook:%s' % vid, FacebookIE.ie_key())
-                for vid in self._parse_json(
-                    self._search_regex(
-                        r'(["\'])video_ids\1\s*:\s*(?P<ids>\[.+?\])',
-                        webpage, 'video ids', group='ids'),
-                    video_id)]
-
-            return self.playlist_result(entries, video_id)
+            video_id_json = self._search_regex(
+                r'(["\'])video_ids\1\s*:\s*(?P<ids>\[.+?\])', webpage, 'video ids', group='ids',
+                default='')
+            if video_id_json:
+                entries = [
+                    self.url_result('facebook:%s' % vid, FacebookIE.ie_key())
+                    for vid in self._parse_json(video_id_json, video_id)]
+                return self.playlist_result(entries, video_id)
+
+            # Single Video?
+            video_id = self._search_regex(r'video_id:\s*"([0-9]+)"', webpage, 'single video id')
+            return self.url_result('facebook:%s' % video_id, FacebookIE.ie_key())
          else:
              _, info_dict = self._extract_from_url(
                  self._VIDEO_PAGE_TEMPLATE % video_id,
          else:
              _, info_dict = self._extract_from_url(
                  self._VIDEO_PAGE_TEMPLATE % video_id,
diff --git a/youtube_dl/extractor/franceculture.py b/youtube_dl/extractor/franceculture.py

index b8fa175880f47d6050e4fa3908994bd9777131ac..306b45fc99a4c3495a233d8fb3c649032641d87a 100644 (file)
--- a/youtube_dl/extractor/franceculture.py
+++ b/youtube_dl/extractor/franceculture.py
@@ -31,7 +31,13 @@ def _real_extract(self, url):
          webpage = self._download_webpage(url, display_id)
  
          video_data = extract_attributes(self._search_regex(
          webpage = self._download_webpage(url, display_id)
  
          video_data = extract_attributes(self._search_regex(
-            r'(?s)<div[^>]+class="[^"]*?(?:title-zone-diffusion|heading-zone-(?:wrapper|player-button))[^"]*?"[^>]*>.*?(<button[^>]+data-asset-source="[^"]+"[^>]+>)',
+            r'''(?sx)
+                (?:
+                    </h1>|
+                    <div[^>]+class="[^"]*?(?:title-zone-diffusion|heading-zone-(?:wrapper|player-button))[^"]*?"[^>]*>
+                ).*?
+                (<button[^>]+data-asset-source="[^"]+"[^>]+>)
+            ''',
              webpage, 'video data'))
  
          video_url = video_data['data-asset-source']
              webpage, 'video data'))
  
          video_url = video_data['data-asset-source']
diff --git a/youtube_dl/extractor/generic.py b/youtube_dl/extractor/generic.py

index 743ef47dbe2b6a534a65be67dee28e78aa8fe215..355067a509fc197c10a2085bad058d4459729bc9 100644 (file)
--- a/youtube_dl/extractor/generic.py
+++ b/youtube_dl/extractor/generic.py
@@ -60,6 +60,9 @@
  from .drtuber import DrTuberIE
  from .redtube import RedTubeIE
  from .tube8 import Tube8IE
  from .drtuber import DrTuberIE
  from .redtube import RedTubeIE
  from .tube8 import Tube8IE
+from .mofosex import MofosexEmbedIE
+from .spankwire import SpankwireIE
+from .youporn import YouPornIE
  from .vimeo import VimeoIE
  from .dailymotion import DailymotionIE
  from .dailymail import DailyMailIE
  from .vimeo import VimeoIE
  from .dailymotion import DailymotionIE
  from .dailymail import DailyMailIE
@@ -1705,6 +1708,15 @@ class GenericIE(InfoExtractor):
              },
              'add_ie': ['Kaltura'],
          },
              },
              'add_ie': ['Kaltura'],
          },
+        {
+            # multiple kaltura embeds, nsfw
+            'url': 'https://www.quartier-rouge.be/prive/femmes/kamila-avec-video-jaime-sadomie.html',
+            'info_dict': {
+                'id': 'kamila-avec-video-jaime-sadomie',
+                'title': "Kamila avec vídeo “J'aime sadomie”",
+            },
+            'playlist_count': 8,
+        },
          {
              # Non-standard Vimeo embed
              'url': 'https://openclassrooms.com/courses/understanding-the-web',
          {
              # Non-standard Vimeo embed
              'url': 'https://openclassrooms.com/courses/understanding-the-web',
@@ -2098,6 +2110,9 @@ class GenericIE(InfoExtractor):
                  'ext': 'mp4',
                  'title': 'Smoky Barbecue Favorites',
                  'thumbnail': r're:^https?://.*\.jpe?g',
                  'ext': 'mp4',
                  'title': 'Smoky Barbecue Favorites',
                  'thumbnail': r're:^https?://.*\.jpe?g',
+                'description': 'md5:5ff01e76316bd8d46508af26dc86023b',
+                'upload_date': '20170909',
+                'timestamp': 1504915200,
              },
              'add_ie': [ZypeIE.ie_key()],
              'params': {
              },
              'add_ie': [ZypeIE.ie_key()],
              'params': {
@@ -2284,7 +2299,7 @@ def _real_extract(self, url):
  
          if head_response is not False:
              # Check for redirect
  
          if head_response is not False:
              # Check for redirect
-            new_url = compat_str(head_response.geturl())
+            new_url = head_response.geturl()
              if url != new_url:
                  self.report_following_redirect(new_url)
                  if force_videoid:
              if url != new_url:
                  self.report_following_redirect(new_url)
                  if force_videoid:
@@ -2384,12 +2399,12 @@ def _real_extract(self, url):
                  return self.playlist_result(
                      self._parse_xspf(
                          doc, video_id, xspf_url=url,
                  return self.playlist_result(
                      self._parse_xspf(
                          doc, video_id, xspf_url=url,
-                        xspf_base_url=compat_str(full_response.geturl())),
+                        xspf_base_url=full_response.geturl()),
                      video_id)
              elif re.match(r'(?i)^(?:{[^}]+})?MPD$', doc.tag):
                  info_dict['formats'] = self._parse_mpd_formats(
                      doc,
                      video_id)
              elif re.match(r'(?i)^(?:{[^}]+})?MPD$', doc.tag):
                  info_dict['formats'] = self._parse_mpd_formats(
                      doc,
-                    mpd_base_url=compat_str(full_response.geturl()).rpartition('/')[0],
+                    mpd_base_url=full_response.geturl().rpartition('/')[0],
                      mpd_url=url)
                  self._sort_formats(info_dict['formats'])
                  return info_dict
                      mpd_url=url)
                  self._sort_formats(info_dict['formats'])
                  return info_dict
@@ -2533,15 +2548,21 @@ def _real_extract(self, url):
              return self.playlist_from_matches(
                  dailymail_urls, video_id, video_title, ie=DailyMailIE.ie_key())
  
              return self.playlist_from_matches(
                  dailymail_urls, video_id, video_title, ie=DailyMailIE.ie_key())
  
+        # Look for Teachable embeds, must be before Wistia
+        teachable_url = TeachableIE._extract_url(webpage, url)
+        if teachable_url:
+            return self.url_result(teachable_url)
+
          # Look for embedded Wistia player
          # Look for embedded Wistia player
-        wistia_url = WistiaIE._extract_url(webpage)
-        if wistia_url:
-            return {
-                '_type': 'url_transparent',
-                'url': self._proto_relative_url(wistia_url),
-                'ie_key': WistiaIE.ie_key(),
-                'uploader': video_uploader,
-            }
+        wistia_urls = WistiaIE._extract_urls(webpage)
+        if wistia_urls:
+            playlist = self.playlist_from_matches(wistia_urls, video_id, video_title, ie=WistiaIE.ie_key())
+            for entry in playlist['entries']:
+                entry.update({
+                    '_type': 'url_transparent',
+                    'uploader': video_uploader,
+                })
+            return playlist
  
          # Look for SVT player
          svt_url = SVTIE._extract_url(webpage)
  
          # Look for SVT player
          svt_url = SVTIE._extract_url(webpage)
@@ -2706,6 +2727,21 @@ def _real_extract(self, url):
          if tube8_urls:
              return self.playlist_from_matches(tube8_urls, video_id, video_title, ie=Tube8IE.ie_key())
  
          if tube8_urls:
              return self.playlist_from_matches(tube8_urls, video_id, video_title, ie=Tube8IE.ie_key())
  
+        # Look for embedded Mofosex player
+        mofosex_urls = MofosexEmbedIE._extract_urls(webpage)
+        if mofosex_urls:
+            return self.playlist_from_matches(mofosex_urls, video_id, video_title, ie=MofosexEmbedIE.ie_key())
+
+        # Look for embedded Spankwire player
+        spankwire_urls = SpankwireIE._extract_urls(webpage)
+        if spankwire_urls:
+            return self.playlist_from_matches(spankwire_urls, video_id, video_title, ie=SpankwireIE.ie_key())
+
+        # Look for embedded YouPorn player
+        youporn_urls = YouPornIE._extract_urls(webpage)
+        if youporn_urls:
+            return self.playlist_from_matches(youporn_urls, video_id, video_title, ie=YouPornIE.ie_key())
+
          # Look for embedded Tvigle player
          mobj = re.search(
              r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//cloud\.tvigle\.ru/video/.+?)\1', webpage)
          # Look for embedded Tvigle player
          mobj = re.search(
              r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//cloud\.tvigle\.ru/video/.+?)\1', webpage)
@@ -2817,9 +2853,12 @@ def _real_extract(self, url):
              return self.url_result(mobj.group('url'), 'Zapiks')
  
          # Look for Kaltura embeds
              return self.url_result(mobj.group('url'), 'Zapiks')
  
          # Look for Kaltura embeds
-        kaltura_url = KalturaIE._extract_url(webpage)
-        if kaltura_url:
-            return self.url_result(smuggle_url(kaltura_url, {'source_url': url}), KalturaIE.ie_key())
+        kaltura_urls = KalturaIE._extract_urls(webpage)
+        if kaltura_urls:
+            return self.playlist_from_matches(
+                kaltura_urls, video_id, video_title,
+                getter=lambda x: smuggle_url(x, {'source_url': url}),
+                ie=KalturaIE.ie_key())
  
          # Look for EaglePlatform embeds
          eagleplatform_url = EaglePlatformIE._extract_url(webpage)
  
          # Look for EaglePlatform embeds
          eagleplatform_url = EaglePlatformIE._extract_url(webpage)
@@ -2960,7 +2999,7 @@ def _real_extract(self, url):
  
          # Look for VODPlatform embeds
          mobj = re.search(
  
          # Look for VODPlatform embeds
          mobj = re.search(
-            r'<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//(?:www\.)?vod-platform\.net/[eE]mbed/.+?)\1',
+            r'<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//(?:(?:www\.)?vod-platform\.net|embed\.kwikmotion\.com)/[eE]mbed/.+?)\1',
              webpage)
          if mobj is not None:
              return self.url_result(
              webpage)
          if mobj is not None:
              return self.url_result(
@@ -3137,10 +3176,6 @@ def _real_extract(self, url):
              return self.playlist_from_matches(
                  peertube_urls, video_id, video_title, ie=PeerTubeIE.ie_key())
  
              return self.playlist_from_matches(
                  peertube_urls, video_id, video_title, ie=PeerTubeIE.ie_key())
  
-        teachable_url = TeachableIE._extract_url(webpage, url)
-        if teachable_url:
-            return self.url_result(teachable_url)
-
          indavideo_urls = IndavideoEmbedIE._extract_urls(webpage)
          if indavideo_urls:
              return self.playlist_from_matches(
          indavideo_urls = IndavideoEmbedIE._extract_urls(webpage)
          if indavideo_urls:
              return self.playlist_from_matches(
diff --git a/youtube_dl/extractor/giantbomb.py b/youtube_dl/extractor/giantbomb.py

index 6a1b1e96ebf4dc59f7f5e13dc18e3ee08ef1110c..c6477958d2766704ade1ba25bc2ed68676655889 100644 (file)
--- a/youtube_dl/extractor/giantbomb.py
+++ b/youtube_dl/extractor/giantbomb.py
@@ -13,10 +13,10 @@
  
  
  class GiantBombIE(InfoExtractor):
  
  
  class GiantBombIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?giantbomb\.com/videos/(?P<display_id>[^/]+)/(?P<id>\d+-\d+)'
-    _TEST = {
+    _VALID_URL = r'https?://(?:www\.)?giantbomb\.com/(?:videos|shows)/(?P<display_id>[^/]+)/(?P<id>\d+-\d+)'
+    _TESTS = [{
          'url': 'http://www.giantbomb.com/videos/quick-look-destiny-the-dark-below/2300-9782/',
          'url': 'http://www.giantbomb.com/videos/quick-look-destiny-the-dark-below/2300-9782/',
-        'md5': 'c8ea694254a59246a42831155dec57ac',
+        'md5': '132f5a803e7e0ab0e274d84bda1e77ae',
          'info_dict': {
              'id': '2300-9782',
              'display_id': 'quick-look-destiny-the-dark-below',
          'info_dict': {
              'id': '2300-9782',
              'display_id': 'quick-look-destiny-the-dark-below',
@@ -26,7 +26,10 @@ class GiantBombIE(InfoExtractor):
              'duration': 2399,
              'thumbnail': r're:^https?://.*\.jpg$',
          }
              'duration': 2399,
              'thumbnail': r're:^https?://.*\.jpg$',
          }
-    }
+    }, {
+        'url': 'https://www.giantbomb.com/shows/ben-stranding/2970-20212',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
  
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
diff --git a/youtube_dl/extractor/hellporno.py b/youtube_dl/extractor/hellporno.py

index 0ee8ea712c72e618a4d7544f26c376e94fcaf70d..fae4251034da29dd252557f86528531b8fcb9235 100644 (file)
--- a/youtube_dl/extractor/hellporno.py
+++ b/youtube_dl/extractor/hellporno.py
@@ -1,12 +1,11 @@
  from __future__ import unicode_literals
  
  from __future__ import unicode_literals
  
-import re
-
  from .common import InfoExtractor
  from ..utils import (
  from .common import InfoExtractor
  from ..utils import (
-    js_to_json,
+    int_or_none,
+    merge_dicts,
      remove_end,
      remove_end,
-    determine_ext,
+    unified_timestamp,
  )
  
  
  )
  
  
@@ -14,15 +13,21 @@ class HellPornoIE(InfoExtractor):
      _VALID_URL = r'https?://(?:www\.)?hellporno\.(?:com/videos|net/v)/(?P<id>[^/]+)'
      _TESTS = [{
          'url': 'http://hellporno.com/videos/dixie-is-posing-with-naked-ass-very-erotic/',
      _VALID_URL = r'https?://(?:www\.)?hellporno\.(?:com/videos|net/v)/(?P<id>[^/]+)'
      _TESTS = [{
          'url': 'http://hellporno.com/videos/dixie-is-posing-with-naked-ass-very-erotic/',
-        'md5': '1fee339c610d2049699ef2aa699439f1',
+        'md5': 'f0a46ebc0bed0c72ae8fe4629f7de5f3',
          'info_dict': {
              'id': '149116',
              'display_id': 'dixie-is-posing-with-naked-ass-very-erotic',
              'ext': 'mp4',
              'title': 'Dixie is posing with naked ass very erotic',
          'info_dict': {
              'id': '149116',
              'display_id': 'dixie-is-posing-with-naked-ass-very-erotic',
              'ext': 'mp4',
              'title': 'Dixie is posing with naked ass very erotic',
+            'description': 'md5:9a72922749354edb1c4b6e540ad3d215',
+            'categories': list,
              'thumbnail': r're:https?://.*\.jpg$',
              'thumbnail': r're:https?://.*\.jpg$',
+            'duration': 240,
+            'timestamp': 1398762720,
+            'upload_date': '20140429',
+            'view_count': int,
              'age_limit': 18,
              'age_limit': 18,
-        }
+        },
      }, {
          'url': 'http://hellporno.net/v/186271/',
          'only_matching': True,
      }, {
          'url': 'http://hellporno.net/v/186271/',
          'only_matching': True,
@@ -36,40 +41,36 @@ def _real_extract(self, url):
          title = remove_end(self._html_search_regex(
              r'<title>([^<]+)</title>', webpage, 'title'), ' - Hell Porno')
  
          title = remove_end(self._html_search_regex(
              r'<title>([^<]+)</title>', webpage, 'title'), ' - Hell Porno')
  
-        flashvars = self._parse_json(self._search_regex(
-            r'var\s+flashvars\s*=\s*({.+?});', webpage, 'flashvars'),
-            display_id, transform_source=js_to_json)
-
-        video_id = flashvars.get('video_id')
-        thumbnail = flashvars.get('preview_url')
-        ext = determine_ext(flashvars.get('postfix'), 'mp4')
-
-        formats = []
-        for video_url_key in ['video_url', 'video_alt_url']:
-            video_url = flashvars.get(video_url_key)
-            if not video_url:
-                continue
-            video_text = flashvars.get('%s_text' % video_url_key)
-            fmt = {
-                'url': video_url,
-                'ext': ext,
-                'format_id': video_text,
-            }
-            m = re.search(r'^(?P<height>\d+)[pP]', video_text)
-            if m:
-                fmt['height'] = int(m.group('height'))
-            formats.append(fmt)
-        self._sort_formats(formats)
+        info = self._parse_html5_media_entries(url, webpage, display_id)[0]
+        self._sort_formats(info['formats'])
  
  
-        categories = self._html_search_meta(
-            'keywords', webpage, 'categories', default='').split(',')
+        video_id = self._search_regex(
+            (r'chs_object\s*=\s*["\'](\d+)',
+             r'params\[["\']video_id["\']\]\s*=\s*(\d+)'), webpage, 'video id',
+            default=display_id)
+        description = self._search_regex(
+            r'class=["\']desc_video_view_v2[^>]+>([^<]+)', webpage,
+            'description', fatal=False)
+        categories = [
+            c.strip()
+            for c in self._html_search_meta(
+                'keywords', webpage, 'categories', default='').split(',')
+            if c.strip()]
+        duration = int_or_none(self._og_search_property(
+            'video:duration', webpage, fatal=False))
+        timestamp = unified_timestamp(self._og_search_property(
+            'video:release_date', webpage, fatal=False))
+        view_count = int_or_none(self._search_regex(
+            r'>Views\s+(\d+)', webpage, 'view count', fatal=False))
  
  
-        return {
+        return merge_dicts(info, {
              'id': video_id,
              'display_id': display_id,
              'title': title,
              'id': video_id,
              'display_id': display_id,
              'title': title,
-            'thumbnail': thumbnail,
+            'description': description,
              'categories': categories,
              'categories': categories,
+            'duration': duration,
+            'timestamp': timestamp,
+            'view_count': view_count,
              'age_limit': 18,
              'age_limit': 18,
-            'formats': formats,
-        }
+        })
diff --git a/youtube_dl/extractor/imdb.py b/youtube_dl/extractor/imdb.py

index 436759da5480da347a578aba6cb388cb01e97448..a31301985b0c7d212886a2e6e495c7d705714041 100644 (file)
--- a/youtube_dl/extractor/imdb.py
+++ b/youtube_dl/extractor/imdb.py
@@ -1,5 +1,7 @@
  from __future__ import unicode_literals
  
  from __future__ import unicode_literals
  
+import base64
+import json
  import re
  
  from .common import InfoExtractor
  import re
  
  from .common import InfoExtractor
@@ -8,6 +10,7 @@
      mimetype2ext,
      parse_duration,
      qualities,
      mimetype2ext,
      parse_duration,
      qualities,
+    try_get,
      url_or_none,
  )
  
      url_or_none,
  )
  
@@ -15,15 +18,16 @@
  class ImdbIE(InfoExtractor):
      IE_NAME = 'imdb'
      IE_DESC = 'Internet Movie Database trailers'
  class ImdbIE(InfoExtractor):
      IE_NAME = 'imdb'
      IE_DESC = 'Internet Movie Database trailers'
-    _VALID_URL = r'https?://(?:www|m)\.imdb\.com/(?:video|title|list).+?[/-]vi(?P<id>\d+)'
+    _VALID_URL = r'https?://(?:www|m)\.imdb\.com/(?:video|title|list).*?[/-]vi(?P<id>\d+)'
  
      _TESTS = [{
          'url': 'http://www.imdb.com/video/imdb/vi2524815897',
          'info_dict': {
              'id': '2524815897',
              'ext': 'mp4',
  
      _TESTS = [{
          'url': 'http://www.imdb.com/video/imdb/vi2524815897',
          'info_dict': {
              'id': '2524815897',
              'ext': 'mp4',
-            'title': 'No. 2 from Ice Age: Continental Drift (2012)',
+            'title': 'No. 2',
              'description': 'md5:87bd0bdc61e351f21f20d2d7441cb4e7',
              'description': 'md5:87bd0bdc61e351f21f20d2d7441cb4e7',
+            'duration': 152,
          }
      }, {
          'url': 'http://www.imdb.com/video/_/vi2524815897',
          }
      }, {
          'url': 'http://www.imdb.com/video/_/vi2524815897',
@@ -47,21 +51,23 @@ class ImdbIE(InfoExtractor):
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
-        webpage = self._download_webpage(
-            'https://www.imdb.com/videoplayer/vi' + video_id, video_id)
-        video_metadata = self._parse_json(self._search_regex(
-            r'window\.IMDbReactInitialState\.push\(({.+?})\);', webpage,
-            'video metadata'), video_id)['videos']['videoMetadata']['vi' + video_id]
-        title = self._html_search_meta(
-            ['og:title', 'twitter:title'], webpage) or self._html_search_regex(
-            r'<title>(.+?)</title>', webpage, 'title', fatal=False) or video_metadata['title']
+
+        data = self._download_json(
+            'https://www.imdb.com/ve/data/VIDEO_PLAYBACK_DATA', video_id,
+            query={
+                'key': base64.b64encode(json.dumps({
+                    'type': 'VIDEO_PLAYER',
+                    'subType': 'FORCE_LEGACY',
+                    'id': 'vi%s' % video_id,
+                }).encode()).decode(),
+            })[0]
  
          quality = qualities(('SD', '480p', '720p', '1080p'))
          formats = []
  
          quality = qualities(('SD', '480p', '720p', '1080p'))
          formats = []
-        for encoding in video_metadata.get('encodings', []):
+        for encoding in data['videoLegacyEncodings']:
              if not encoding or not isinstance(encoding, dict):
                  continue
              if not encoding or not isinstance(encoding, dict):
                  continue
-            video_url = url_or_none(encoding.get('videoUrl'))
+            video_url = url_or_none(encoding.get('url'))
              if not video_url:
                  continue
              ext = mimetype2ext(encoding.get(
              if not video_url:
                  continue
              ext = mimetype2ext(encoding.get(
@@ -69,7 +75,7 @@ def _real_extract(self, url):
              if ext == 'm3u8':
                  formats.extend(self._extract_m3u8_formats(
                      video_url, video_id, 'mp4', entry_protocol='m3u8_native',
              if ext == 'm3u8':
                  formats.extend(self._extract_m3u8_formats(
                      video_url, video_id, 'mp4', entry_protocol='m3u8_native',
-                    m3u8_id='hls', fatal=False))
+                    preference=1, m3u8_id='hls', fatal=False))
                  continue
              format_id = encoding.get('definition')
              formats.append({
                  continue
              format_id = encoding.get('definition')
              formats.append({
@@ -80,13 +86,33 @@ def _real_extract(self, url):
              })
          self._sort_formats(formats)
  
              })
          self._sort_formats(formats)
  
+        webpage = self._download_webpage(
+            'https://www.imdb.com/video/vi' + video_id, video_id)
+        video_metadata = self._parse_json(self._search_regex(
+            r'args\.push\(\s*({.+?})\s*\)\s*;', webpage,
+            'video metadata'), video_id)
+
+        video_info = video_metadata.get('VIDEO_INFO')
+        if video_info and isinstance(video_info, dict):
+            info = try_get(
+                video_info, lambda x: x[list(video_info.keys())[0]][0], dict)
+        else:
+            info = {}
+
+        title = self._html_search_meta(
+            ['og:title', 'twitter:title'], webpage) or self._html_search_regex(
+            r'<title>(.+?)</title>', webpage, 'title',
+            default=None) or info['videoTitle']
+
          return {
              'id': video_id,
              'title': title,
          return {
              'id': video_id,
              'title': title,
+            'alt_title': info.get('videoSubTitle'),
              'formats': formats,
              'formats': formats,
-            'description': video_metadata.get('description'),
-            'thumbnail': video_metadata.get('slate', {}).get('url'),
-            'duration': parse_duration(video_metadata.get('duration')),
+            'description': info.get('videoDescription'),
+            'thumbnail': url_or_none(try_get(
+                video_metadata, lambda x: x['videoSlate']['source'])),
+            'duration': parse_duration(info.get('videoRuntime')),
          }
  
  
          }
  
  
diff --git a/youtube_dl/extractor/indavideo.py b/youtube_dl/extractor/indavideo.py

index 2b5b2b5b0b303aa4c1b6bdb4a6e1226dea11e218..4c16243ec1976676391a5e07a3b35e1140a5ec7a 100644 (file)
--- a/youtube_dl/extractor/indavideo.py
+++ b/youtube_dl/extractor/indavideo.py
@@ -58,7 +58,7 @@ def _real_extract(self, url):
          video_id = self._match_id(url)
  
          video = self._download_json(
          video_id = self._match_id(url)
  
          video = self._download_json(
-            'http://amfphp.indavideo.hu/SYm0json.php/player.playerHandler.getVideoData/%s' % video_id,
+            'https://amfphp.indavideo.hu/SYm0json.php/player.playerHandler.getVideoData/%s' % video_id,
              video_id)['data']
  
          title = video['title']
              video_id)['data']
  
          title = video['title']
diff --git a/youtube_dl/extractor/iprima.py b/youtube_dl/extractor/iprima.py

index 11bbeb5922a9d85e05977196c822a076c8b45ec3..53a550c11e4407813deb12f646a0c714436862b5 100644 (file)
--- a/youtube_dl/extractor/iprima.py
+++ b/youtube_dl/extractor/iprima.py
@@ -16,12 +16,22 @@ class IPrimaIE(InfoExtractor):
      _GEO_BYPASS = False
  
      _TESTS = [{
      _GEO_BYPASS = False
  
      _TESTS = [{
-        'url': 'http://play.iprima.cz/gondici-s-r-o-33',
+        'url': 'https://prima.iprima.cz/particka/92-epizoda',
          'info_dict': {
          'info_dict': {
-            'id': 'p136534',
+            'id': 'p51388',
              'ext': 'mp4',
              'ext': 'mp4',
-            'title': 'Gondíci s. r. o. (34)',
-            'description': 'md5:16577c629d006aa91f59ca8d8e7f99bd',
+            'title': 'Partička (92)',
+            'description': 'md5:859d53beae4609e6dd7796413f1b6cac',
+        },
+        'params': {
+            'skip_download': True,  # m3u8 download
+        },
+    }, {
+        'url': 'https://cnn.iprima.cz/videa/70-epizoda',
+        'info_dict': {
+            'id': 'p681554',
+            'ext': 'mp4',
+            'title': 'HLAVNÍ ZPRÁVY 3.5.2020',
          },
          'params': {
              'skip_download': True,  # m3u8 download
          },
          'params': {
              'skip_download': True,  # m3u8 download
@@ -68,9 +78,15 @@ def _real_extract(self, url):
  
          webpage = self._download_webpage(url, video_id)
  
  
          webpage = self._download_webpage(url, video_id)
  
+        title = self._og_search_title(
+            webpage, default=None) or self._search_regex(
+            r'<h1>([^<]+)', webpage, 'title')
+
          video_id = self._search_regex(
              (r'<iframe[^>]+\bsrc=["\'](?:https?:)?//(?:api\.play-backend\.iprima\.cz/prehravac/embedded|prima\.iprima\.cz/[^/]+/[^/]+)\?.*?\bid=(p\d+)',
          video_id = self._search_regex(
              (r'<iframe[^>]+\bsrc=["\'](?:https?:)?//(?:api\.play-backend\.iprima\.cz/prehravac/embedded|prima\.iprima\.cz/[^/]+/[^/]+)\?.*?\bid=(p\d+)',
-             r'data-product="([^"]+)">'),
+             r'data-product="([^"]+)">',
+             r'id=["\']player-(p\d+)"',
+             r'playerId\s*:\s*["\']player-(p\d+)'),
              webpage, 'real id')
  
          playerpage = self._download_webpage(
              webpage, 'real id')
  
          playerpage = self._download_webpage(
@@ -125,8 +141,8 @@ def extract_formats(format_url, format_key=None, lang=None):
  
          return {
              'id': video_id,
  
          return {
              'id': video_id,
-            'title': self._og_search_title(webpage),
-            'thumbnail': self._og_search_thumbnail(webpage),
+            'title': title,
+            'thumbnail': self._og_search_thumbnail(webpage, default=None),
              'formats': formats,
              'formats': formats,
-            'description': self._og_search_description(webpage),
+            'description': self._og_search_description(webpage, default=None),
          }
          }
diff --git a/youtube_dl/extractor/ivi.py b/youtube_dl/extractor/ivi.py

index a502e88066850b14de284e6e3ec7d47ea9397f3d..b5a740a01e1d360ca667869edef88f85e3559842 100644 (file)
--- a/youtube_dl/extractor/ivi.py
+++ b/youtube_dl/extractor/ivi.py
@@ -239,7 +239,7 @@ def _extract_entries(self, html, compilation_id):
              self.url_result(
                  'http://www.ivi.ru/watch/%s/%s' % (compilation_id, serie), IviIE.ie_key())
              for serie in re.findall(
              self.url_result(
                  'http://www.ivi.ru/watch/%s/%s' % (compilation_id, serie), IviIE.ie_key())
              for serie in re.findall(
-                r'<a href="/watch/%s/(\d+)"[^>]+data-id="\1"' % compilation_id, html)]
+                r'<a\b[^>]+\bhref=["\']/watch/%s/(\d+)["\']' % compilation_id, html)]
  
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
  
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
diff --git a/youtube_dl/extractor/jpopsukitv.py b/youtube_dl/extractor/jpopsukitv.py

deleted file mode 100644 (file)

index 4b5f346..0000000
--- a/youtube_dl/extractor/jpopsukitv.py
+++ /dev/null
@@ -1,68 +0,0 @@
-# coding: utf-8
-from __future__ import unicode_literals
-
-from .common import InfoExtractor
-from ..utils import (
-    int_or_none,
-    unified_strdate,
-)
-
-
-class JpopsukiIE(InfoExtractor):
-    IE_NAME = 'jpopsuki.tv'
-    _VALID_URL = r'https?://(?:www\.)?jpopsuki\.tv/(?:category/)?video/[^/]+/(?P<id>\S+)'
-
-    _TEST = {
-        'url': 'http://www.jpopsuki.tv/video/ayumi-hamasaki---evolution/00be659d23b0b40508169cdee4545771',
-        'md5': '88018c0c1a9b1387940e90ec9e7e198e',
-        'info_dict': {
-            'id': '00be659d23b0b40508169cdee4545771',
-            'ext': 'mp4',
-            'title': 'ayumi hamasaki - evolution',
-            'description': 'Release date: 2001.01.31\r\n浜崎あゆみ - evolution',
-            'thumbnail': 'http://www.jpopsuki.tv/cache/89722c74d2a2ebe58bcac65321c115b2.jpg',
-            'uploader': 'plama_chan',
-            'uploader_id': '404',
-            'upload_date': '20121101'
-        }
-    }
-
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-
-        webpage = self._download_webpage(url, video_id)
-
-        video_url = 'http://www.jpopsuki.tv' + self._html_search_regex(
-            r'<source src="(.*?)" type', webpage, 'video url')
-
-        video_title = self._og_search_title(webpage)
-        description = self._og_search_description(webpage)
-        thumbnail = self._og_search_thumbnail(webpage)
-        uploader = self._html_search_regex(
-            r'<li>from: <a href="/user/view/user/(.*?)/uid/',
-            webpage, 'video uploader', fatal=False)
-        uploader_id = self._html_search_regex(
-            r'<li>from: <a href="/user/view/user/\S*?/uid/(\d*)',
-            webpage, 'video uploader_id', fatal=False)
-        upload_date = unified_strdate(self._html_search_regex(
-            r'<li>uploaded: (.*?)</li>', webpage, 'video upload_date',
-            fatal=False))
-        view_count_str = self._html_search_regex(
-            r'<li>Hits: ([0-9]+?)</li>', webpage, 'video view_count',
-            fatal=False)
-        comment_count_str = self._html_search_regex(
-            r'<h2>([0-9]+?) comments</h2>', webpage, 'video comment_count',
-            fatal=False)
-
-        return {
-            'id': video_id,
-            'url': video_url,
-            'title': video_title,
-            'description': description,
-            'thumbnail': thumbnail,
-            'uploader': uploader,
-            'uploader_id': uploader_id,
-            'upload_date': upload_date,
-            'view_count': int_or_none(view_count_str),
-            'comment_count': int_or_none(comment_count_str),
-        }
diff --git a/youtube_dl/extractor/jwplatform.py b/youtube_dl/extractor/jwplatform.py

index 2aabd98b5bbaf36efecdc76415bf52854705bd1e..c34b5f5e6bd9e7d38e762f5d82f3669ac2c438a2 100644 (file)
--- a/youtube_dl/extractor/jwplatform.py
+++ b/youtube_dl/extractor/jwplatform.py
@@ -4,6 +4,7 @@
  import re
  
  from .common import InfoExtractor
  import re
  
  from .common import InfoExtractor
+from ..utils import unsmuggle_url
  
  
  class JWPlatformIE(InfoExtractor):
  
  
  class JWPlatformIE(InfoExtractor):
@@ -32,10 +33,14 @@ def _extract_url(webpage):
      @staticmethod
      def _extract_urls(webpage):
          return re.findall(
      @staticmethod
      def _extract_urls(webpage):
          return re.findall(
-            r'<(?:script|iframe)[^>]+?src=["\']((?:https?:)?//content\.jwplatform\.com/players/[a-zA-Z0-9]{8})',
+            r'<(?:script|iframe)[^>]+?src=["\']((?:https?:)?//(?:content\.jwplatform|cdn\.jwplayer)\.com/players/[a-zA-Z0-9]{8})',
              webpage)
  
      def _real_extract(self, url):
              webpage)
  
      def _real_extract(self, url):
+        url, smuggled_data = unsmuggle_url(url, {})
+        self._initialize_geo_bypass({
+            'countries': smuggled_data.get('geo_countries'),
+        })
          video_id = self._match_id(url)
          json_data = self._download_json('https://cdn.jwplayer.com/v2/media/' + video_id, video_id)
          return self._parse_jwplayer_data(json_data, video_id)
          video_id = self._match_id(url)
          json_data = self._download_json('https://cdn.jwplayer.com/v2/media/' + video_id, video_id)
          return self._parse_jwplayer_data(json_data, video_id)
diff --git a/youtube_dl/extractor/kaltura.py b/youtube_dl/extractor/kaltura.py

index 2d38b758b72a852c6d9718f0537c62e7c215e903..49d13460df7f0edd4d2a08f97deaf831ba9d6a46 100644 (file)
--- a/youtube_dl/extractor/kaltura.py
+++ b/youtube_dl/extractor/kaltura.py
@@ -113,9 +113,14 @@ class KalturaIE(InfoExtractor):
  
      @staticmethod
      def _extract_url(webpage):
  
      @staticmethod
      def _extract_url(webpage):
+        urls = KalturaIE._extract_urls(webpage)
+        return urls[0] if urls else None
+
+    @staticmethod
+    def _extract_urls(webpage):
          # Embed codes: https://knowledge.kaltura.com/embedding-kaltura-media-players-your-site
          # Embed codes: https://knowledge.kaltura.com/embedding-kaltura-media-players-your-site
-        mobj = (
-            re.search(
+        finditer = (
+            re.finditer(
                  r"""(?xs)
                      kWidget\.(?:thumb)?[Ee]mbed\(
                      \{.*?
                  r"""(?xs)
                      kWidget\.(?:thumb)?[Ee]mbed\(
                      \{.*?
@@ -124,7 +129,7 @@ def _extract_url(webpage):
                          (?P<q3>['"])entry_?[Ii]d(?P=q3)\s*:\s*
                          (?P<q4>['"])(?P<id>(?:(?!(?P=q4)).)+)(?P=q4)(?:,|\s*\})
                  """, webpage)
                          (?P<q3>['"])entry_?[Ii]d(?P=q3)\s*:\s*
                          (?P<q4>['"])(?P<id>(?:(?!(?P=q4)).)+)(?P=q4)(?:,|\s*\})
                  """, webpage)
-            or re.search(
+            or re.finditer(
                  r'''(?xs)
                      (?P<q1>["'])
                          (?:https?:)?//cdnapi(?:sec)?\.kaltura\.com(?::\d+)?/(?:(?!(?P=q1)).)*\b(?:p|partner_id)/(?P<partner_id>\d+)(?:(?!(?P=q1)).)*
                  r'''(?xs)
                      (?P<q1>["'])
                          (?:https?:)?//cdnapi(?:sec)?\.kaltura\.com(?::\d+)?/(?:(?!(?P=q1)).)*\b(?:p|partner_id)/(?P<partner_id>\d+)(?:(?!(?P=q1)).)*
@@ -138,7 +143,7 @@ def _extract_url(webpage):
                      )
                      (?P<q3>["'])(?P<id>(?:(?!(?P=q3)).)+)(?P=q3)
                  ''', webpage)
                      )
                      (?P<q3>["'])(?P<id>(?:(?!(?P=q3)).)+)(?P=q3)
                  ''', webpage)
-            or re.search(
+            or re.finditer(
                  r'''(?xs)
                      <(?:iframe[^>]+src|meta[^>]+\bcontent)=(?P<q1>["'])
                        (?:https?:)?//(?:(?:www|cdnapi(?:sec)?)\.)?kaltura\.com/(?:(?!(?P=q1)).)*\b(?:p|partner_id)/(?P<partner_id>\d+)
                  r'''(?xs)
                      <(?:iframe[^>]+src|meta[^>]+\bcontent)=(?P<q1>["'])
                        (?:https?:)?//(?:(?:www|cdnapi(?:sec)?)\.)?kaltura\.com/(?:(?!(?P=q1)).)*\b(?:p|partner_id)/(?P<partner_id>\d+)
@@ -148,7 +153,8 @@ def _extract_url(webpage):
                      (?P=q1)
                  ''', webpage)
          )
                      (?P=q1)
                  ''', webpage)
          )
-        if mobj:
+        urls = []
+        for mobj in finditer:
              embed_info = mobj.groupdict()
              for k, v in embed_info.items():
                  if v:
              embed_info = mobj.groupdict()
              for k, v in embed_info.items():
                  if v:
@@ -160,7 +166,8 @@ def _extract_url(webpage):
                  webpage)
              if service_mobj:
                  url = smuggle_url(url, {'service_url': service_mobj.group('id')})
                  webpage)
              if service_mobj:
                  url = smuggle_url(url, {'service_url': service_mobj.group('id')})
-            return url
+            urls.append(url)
+        return urls
  
      def _kaltura_api_call(self, video_id, actions, service_url=None, *args, **kwargs):
          params = actions[0]
  
      def _kaltura_api_call(self, video_id, actions, service_url=None, *args, **kwargs):
          params = actions[0]
diff --git a/youtube_dl/extractor/lecturio.py b/youtube_dl/extractor/lecturio.py

index 6ed7da4abaa7a2a45f924b4bf9f919261a40bec9..1b2dcef46621237fd7c7ce376165a6bc5c674606 100644 (file)
--- a/youtube_dl/extractor/lecturio.py
+++ b/youtube_dl/extractor/lecturio.py
@@ -4,7 +4,6 @@
  import re
  
  from .common import InfoExtractor
  import re
  
  from .common import InfoExtractor
-from ..compat import compat_str
  from ..utils import (
      clean_html,
      determine_ext,
  from ..utils import (
      clean_html,
      determine_ext,
@@ -36,7 +35,7 @@ def _login(self):
              self._LOGIN_URL, None, 'Downloading login popup')
  
          def is_logged(url_handle):
              self._LOGIN_URL, None, 'Downloading login popup')
  
          def is_logged(url_handle):
-            return self._LOGIN_URL not in compat_str(url_handle.geturl())
+            return self._LOGIN_URL not in url_handle.geturl()
  
          # Already logged in
          if is_logged(urlh):
  
          # Already logged in
          if is_logged(urlh):
diff --git a/youtube_dl/extractor/lego.py b/youtube_dl/extractor/lego.py

index b312e77f1abd5d4a05f43c763b7c9f56aefbd0e5..1e3c19dfd65b442ee7b4f3abf1803022c5975d8e 100644 (file)
--- a/youtube_dl/extractor/lego.py
+++ b/youtube_dl/extractor/lego.py
@@ -2,23 +2,24 @@
  from __future__ import unicode_literals
  
  import re
  from __future__ import unicode_literals
  
  import re
+import uuid
  
  from .common import InfoExtractor
  
  from .common import InfoExtractor
-from ..compat import compat_str
+from ..compat import compat_HTTPError
  from ..utils import (
  from ..utils import (
-    unescapeHTML,
-    parse_duration,
-    get_element_by_class,
+    ExtractorError,
+    int_or_none,
+    qualities,
  )
  
  
  class LEGOIE(InfoExtractor):
  )
  
  
  class LEGOIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?lego\.com/(?P<locale>[^/]+)/(?:[^/]+/)*videos/(?:[^/]+/)*[^/?#]+-(?P<id>[0-9a-f]+)'
+    _VALID_URL = r'https?://(?:www\.)?lego\.com/(?P<locale>[a-z]{2}-[a-z]{2})/(?:[^/]+/)*videos/(?:[^/]+/)*[^/?#]+-(?P<id>[0-9a-f]{32})'
      _TESTS = [{
          'url': 'http://www.lego.com/en-us/videos/themes/club/blocumentary-kawaguchi-55492d823b1b4d5e985787fa8c2973b1',
          'md5': 'f34468f176cfd76488767fc162c405fa',
          'info_dict': {
      _TESTS = [{
          'url': 'http://www.lego.com/en-us/videos/themes/club/blocumentary-kawaguchi-55492d823b1b4d5e985787fa8c2973b1',
          'md5': 'f34468f176cfd76488767fc162c405fa',
          'info_dict': {
-            'id': '55492d823b1b4d5e985787fa8c2973b1',
+            'id': '55492d82-3b1b-4d5e-9857-87fa8c2973b1_en-US',
              'ext': 'mp4',
              'title': 'Blocumentary Great Creations: Akiyuki Kawaguchi',
              'description': 'Blocumentary Great Creations: Akiyuki Kawaguchi',
              'ext': 'mp4',
              'title': 'Blocumentary Great Creations: Akiyuki Kawaguchi',
              'description': 'Blocumentary Great Creations: Akiyuki Kawaguchi',
@@ -26,103 +27,123 @@ class LEGOIE(InfoExtractor):
      }, {
          # geo-restricted but the contentUrl contain a valid url
          'url': 'http://www.lego.com/nl-nl/videos/themes/nexoknights/episode-20-kingdom-of-heroes-13bdc2299ab24d9685701a915b3d71e7##sp=399',
      }, {
          # geo-restricted but the contentUrl contain a valid url
          'url': 'http://www.lego.com/nl-nl/videos/themes/nexoknights/episode-20-kingdom-of-heroes-13bdc2299ab24d9685701a915b3d71e7##sp=399',
-        'md5': '4c3fec48a12e40c6e5995abc3d36cc2e',
+        'md5': 'c7420221f7ffd03ff056f9db7f8d807c',
          'info_dict': {
          'info_dict': {
-            'id': '13bdc2299ab24d9685701a915b3d71e7',
+            'id': '13bdc229-9ab2-4d96-8570-1a915b3d71e7_nl-NL',
              'ext': 'mp4',
              'ext': 'mp4',
-            'title': 'Aflevering 20 - Helden van het koninkrijk',
+            'title': 'Aflevering 20:  Helden van het koninkrijk',
              'description': 'md5:8ee499aac26d7fa8bcb0cedb7f9c3941',
              'description': 'md5:8ee499aac26d7fa8bcb0cedb7f9c3941',
+            'age_limit': 5,
          },
      }, {
          },
      }, {
-        # special characters in title
-        'url': 'http://www.lego.com/en-us/starwars/videos/lego-star-wars-force-surprise-9685ee9d12e84ff38e84b4e3d0db533d',
+        # with subtitle
+        'url': 'https://www.lego.com/nl-nl/kids/videos/classic/creative-storytelling-the-little-puppy-aa24f27c7d5242bc86102ebdc0f24cba',
          'info_dict': {
          'info_dict': {
-            'id': '9685ee9d12e84ff38e84b4e3d0db533d',
+            'id': 'aa24f27c-7d52-42bc-8610-2ebdc0f24cba_nl-NL',
              'ext': 'mp4',
              'ext': 'mp4',
-            'title': 'Force Surprise – LEGO® Star Wars™ Microfighters',
-            'description': 'md5:9c673c96ce6f6271b88563fe9dc56de3',
+            'title': 'De kleine puppy',
+            'description': 'md5:5b725471f849348ac73f2e12cfb4be06',
+            'age_limit': 1,
+            'subtitles': {
+                'nl': [{
+                    'ext': 'srt',
+                    'url': r're:^https://.+\.srt$',
+                }],
+            },
          },
          'params': {
              'skip_download': True,
          },
      }]
          },
          'params': {
              'skip_download': True,
          },
      }]
-    _BITRATES = [256, 512, 1024, 1536, 2560]
+    _QUALITIES = {
+        'Lowest': (64, 180, 320),
+        'Low': (64, 270, 480),
+        'Medium': (96, 360, 640),
+        'High': (128, 540, 960),
+        'Highest': (128, 720, 1280),
+    }
  
      def _real_extract(self, url):
          locale, video_id = re.match(self._VALID_URL, url).groups()
  
      def _real_extract(self, url):
          locale, video_id = re.match(self._VALID_URL, url).groups()
-        webpage = self._download_webpage(url, video_id)
-        title = get_element_by_class('video-header', webpage).strip()
-        progressive_base = 'https://lc-mediaplayerns-live-s.legocdn.com/'
-        streaming_base = 'http://legoprod-f.akamaihd.net/'
-        content_url = self._html_search_meta('contentUrl', webpage)
-        path = self._search_regex(
-            r'(?:https?:)?//[^/]+/(?:[iz]/s/)?public/(.+)_[0-9,]+\.(?:mp4|webm)',
-            content_url, 'video path', default=None)
-        if not path:
-            player_url = self._proto_relative_url(self._search_regex(
-                r'<iframe[^>]+src="((?:https?)?//(?:www\.)?lego\.com/[^/]+/mediaplayer/video/[^"]+)',
-                webpage, 'player url', default=None))
-            if not player_url:
-                base_url = self._proto_relative_url(self._search_regex(
-                    r'data-baseurl="([^"]+)"', webpage, 'base url',
-                    default='http://www.lego.com/%s/mediaplayer/video/' % locale))
-                player_url = base_url + video_id
-            player_webpage = self._download_webpage(player_url, video_id)
-            video_data = self._parse_json(unescapeHTML(self._search_regex(
-                r"video='([^']+)'", player_webpage, 'video data')), video_id)
-            progressive_base = self._search_regex(
-                r'data-video-progressive-url="([^"]+)"',
-                player_webpage, 'progressive base', default='https://lc-mediaplayerns-live-s.legocdn.com/')
-            streaming_base = self._search_regex(
-                r'data-video-streaming-url="([^"]+)"',
-                player_webpage, 'streaming base', default='http://legoprod-f.akamaihd.net/')
-            item_id = video_data['ItemId']
+        countries = [locale.split('-')[1].upper()]
+        self._initialize_geo_bypass({
+            'countries': countries,
+        })
  
  
-            net_storage_path = video_data.get('NetStoragePath') or '/'.join([item_id[:2], item_id[2:4]])
-            base_path = '_'.join([item_id, video_data['VideoId'], video_data['Locale'], compat_str(video_data['VideoVersion'])])
-            path = '/'.join([net_storage_path, base_path])
-        streaming_path = ','.join(map(lambda bitrate: compat_str(bitrate), self._BITRATES))
+        try:
+            item = self._download_json(
+                # https://contentfeed.services.lego.com/api/v2/item/[VIDEO_ID]?culture=[LOCALE]&contentType=Video
+                'https://services.slingshot.lego.com/mediaplayer/v2',
+                video_id, query={
+                    'videoId': '%s_%s' % (uuid.UUID(video_id), locale),
+                }, headers=self.geo_verification_headers())
+        except ExtractorError as e:
+            if isinstance(e.cause, compat_HTTPError) and e.cause.code == 451:
+                self.raise_geo_restricted(countries=countries)
+            raise
  
  
-        formats = self._extract_akamai_formats(
-            '%si/s/public/%s_,%s,.mp4.csmil/master.m3u8' % (streaming_base, path, streaming_path), video_id)
-        m3u8_formats = list(filter(
-            lambda f: f.get('protocol') == 'm3u8_native' and f.get('vcodec') != 'none',
-            formats))
-        if len(m3u8_formats) == len(self._BITRATES):
-            self._sort_formats(m3u8_formats)
-            for bitrate, m3u8_format in zip(self._BITRATES, m3u8_formats):
-                progressive_base_url = '%spublic/%s_%d.' % (progressive_base, path, bitrate)
-                mp4_f = m3u8_format.copy()
-                mp4_f.update({
-                    'url': progressive_base_url + 'mp4',
-                    'format_id': m3u8_format['format_id'].replace('hls', 'mp4'),
-                    'protocol': 'http',
-                })
-                web_f = {
-                    'url': progressive_base_url + 'webm',
-                    'format_id': m3u8_format['format_id'].replace('hls', 'webm'),
-                    'width': m3u8_format['width'],
-                    'height': m3u8_format['height'],
-                    'tbr': m3u8_format.get('tbr'),
-                    'ext': 'webm',
+        video = item['Video']
+        video_id = video['Id']
+        title = video['Title']
+
+        q = qualities(['Lowest', 'Low', 'Medium', 'High', 'Highest'])
+        formats = []
+        for video_source in item.get('VideoFormats', []):
+            video_source_url = video_source.get('Url')
+            if not video_source_url:
+                continue
+            video_source_format = video_source.get('Format')
+            if video_source_format == 'F4M':
+                formats.extend(self._extract_f4m_formats(
+                    video_source_url, video_id,
+                    f4m_id=video_source_format, fatal=False))
+            elif video_source_format == 'M3U8':
+                formats.extend(self._extract_m3u8_formats(
+                    video_source_url, video_id, 'mp4', 'm3u8_native',
+                    m3u8_id=video_source_format, fatal=False))
+            else:
+                video_source_quality = video_source.get('Quality')
+                format_id = []
+                for v in (video_source_format, video_source_quality):
+                    if v:
+                        format_id.append(v)
+                f = {
+                    'format_id': '-'.join(format_id),
+                    'quality': q(video_source_quality),
+                    'url': video_source_url,
                  }
                  }
-                formats.extend([web_f, mp4_f])
-        else:
-            for bitrate in self._BITRATES:
-                for ext in ('web', 'mp4'):
-                    formats.append({
-                        'format_id': '%s-%s' % (ext, bitrate),
-                        'url': '%spublic/%s_%d.%s' % (progressive_base, path, bitrate, ext),
-                        'tbr': bitrate,
-                        'ext': ext,
-                    })
+                quality = self._QUALITIES.get(video_source_quality)
+                if quality:
+                    f.update({
+                        'abr': quality[0],
+                        'height': quality[1],
+                        'width': quality[2],
+                    }),
+                formats.append(f)
          self._sort_formats(formats)
  
          self._sort_formats(formats)
  
+        subtitles = {}
+        sub_file_id = video.get('SubFileId')
+        if sub_file_id and sub_file_id != '00000000-0000-0000-0000-000000000000':
+            net_storage_path = video.get('NetstoragePath')
+            invariant_id = video.get('InvariantId')
+            video_file_id = video.get('VideoFileId')
+            video_version = video.get('VideoVersion')
+            if net_storage_path and invariant_id and video_file_id and video_version:
+                subtitles.setdefault(locale[:2], []).append({
+                    'url': 'https://lc-mediaplayerns-live-s.legocdn.com/public/%s/%s_%s_%s_%s_sub.srt' % (net_storage_path, invariant_id, video_file_id, locale, video_version),
+                })
+
          return {
              'id': video_id,
              'title': title,
          return {
              'id': video_id,
              'title': title,
-            'description': self._html_search_meta('description', webpage),
-            'thumbnail': self._html_search_meta('thumbnail', webpage),
-            'duration': parse_duration(self._html_search_meta('duration', webpage)),
+            'description': video.get('Description'),
+            'thumbnail': video.get('GeneratedCoverImage') or video.get('GeneratedThumbnail'),
+            'duration': int_or_none(video.get('Length')),
              'formats': formats,
              'formats': formats,
+            'subtitles': subtitles,
+            'age_limit': int_or_none(video.get('AgeFrom')),
+            'season': video.get('SeasonTitle'),
+            'season_number': int_or_none(video.get('Season')) or None,
+            'episode_number': int_or_none(video.get('Episode')) or None,
          }
          }
diff --git a/youtube_dl/extractor/limelight.py b/youtube_dl/extractor/limelight.py

index 729d8de50fab70cd69bab41fae9db0cba4d7da9b..39f74d2822bc7296df8a5c16e5edfce3298e82ab 100644 (file)
--- a/youtube_dl/extractor/limelight.py
+++ b/youtube_dl/extractor/limelight.py
@@ -18,7 +18,6 @@
  
  class LimelightBaseIE(InfoExtractor):
      _PLAYLIST_SERVICE_URL = 'http://production-ps.lvp.llnw.net/r/PlaylistService/%s/%s/%s'
  
  class LimelightBaseIE(InfoExtractor):
      _PLAYLIST_SERVICE_URL = 'http://production-ps.lvp.llnw.net/r/PlaylistService/%s/%s/%s'
-    _API_URL = 'http://api.video.limelight.com/rest/organizations/%s/%s/%s/%s.json'
  
      @classmethod
      def _extract_urls(cls, webpage, source_url):
  
      @classmethod
      def _extract_urls(cls, webpage, source_url):
@@ -70,7 +69,8 @@ def _call_playlist_service(self, item_id, method, fatal=True, referer=None):
          try:
              return self._download_json(
                  self._PLAYLIST_SERVICE_URL % (self._PLAYLIST_SERVICE_PATH, item_id, method),
          try:
              return self._download_json(
                  self._PLAYLIST_SERVICE_URL % (self._PLAYLIST_SERVICE_PATH, item_id, method),
-                item_id, 'Downloading PlaylistService %s JSON' % method, fatal=fatal, headers=headers)
+                item_id, 'Downloading PlaylistService %s JSON' % method,
+                fatal=fatal, headers=headers)
          except ExtractorError as e:
              if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
                  error = self._parse_json(e.cause.read().decode(), item_id)['detail']['contentAccessPermission']
          except ExtractorError as e:
              if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
                  error = self._parse_json(e.cause.read().decode(), item_id)['detail']['contentAccessPermission']
@@ -79,22 +79,22 @@ def _call_playlist_service(self, item_id, method, fatal=True, referer=None):
                  raise ExtractorError(error, expected=True)
              raise
  
                  raise ExtractorError(error, expected=True)
              raise
  
-    def _call_api(self, organization_id, item_id, method):
-        return self._download_json(
-            self._API_URL % (organization_id, self._API_PATH, item_id, method),
-            item_id, 'Downloading API %s JSON' % method)
-
-    def _extract(self, item_id, pc_method, mobile_method, meta_method, referer=None):
+    def _extract(self, item_id, pc_method, mobile_method, referer=None):
          pc = self._call_playlist_service(item_id, pc_method, referer=referer)
          pc = self._call_playlist_service(item_id, pc_method, referer=referer)
-        metadata = self._call_api(pc['orgId'], item_id, meta_method)
-        mobile = self._call_playlist_service(item_id, mobile_method, fatal=False, referer=referer)
-        return pc, mobile, metadata
+        mobile = self._call_playlist_service(
+            item_id, mobile_method, fatal=False, referer=referer)
+        return pc, mobile
+
+    def _extract_info(self, pc, mobile, i, referer):
+        get_item = lambda x, y: try_get(x, lambda x: x[y][i], dict) or {}
+        pc_item = get_item(pc, 'playlistItems')
+        mobile_item = get_item(mobile, 'mediaList')
+        video_id = pc_item.get('mediaId') or mobile_item['mediaId']
+        title = pc_item.get('title') or mobile_item['title']
  
  
-    def _extract_info(self, streams, mobile_urls, properties):
-        video_id = properties['media_id']
          formats = []
          urls = []
          formats = []
          urls = []
-        for stream in streams:
+        for stream in pc_item.get('streams', []):
              stream_url = stream.get('url')
              if not stream_url or stream.get('drmProtected') or stream_url in urls:
                  continue
              stream_url = stream.get('url')
              if not stream_url or stream.get('drmProtected') or stream_url in urls:
                  continue
@@ -155,7 +155,7 @@ def _extract_info(self, streams, mobile_urls, properties):
                      })
                  formats.append(fmt)
  
                      })
                  formats.append(fmt)
  
-        for mobile_url in mobile_urls:
+        for mobile_url in mobile_item.get('mobileUrls', []):
              media_url = mobile_url.get('mobileUrl')
              format_id = mobile_url.get('targetMediaPlatform')
              if not media_url or format_id in ('Widevine', 'SmoothStreaming') or media_url in urls:
              media_url = mobile_url.get('mobileUrl')
              format_id = mobile_url.get('targetMediaPlatform')
              if not media_url or format_id in ('Widevine', 'SmoothStreaming') or media_url in urls:
@@ -179,54 +179,34 @@ def _extract_info(self, streams, mobile_urls, properties):
  
          self._sort_formats(formats)
  
  
          self._sort_formats(formats)
  
-        title = properties['title']
-        description = properties.get('description')
-        timestamp = int_or_none(properties.get('publish_date') or properties.get('create_date'))
-        duration = float_or_none(properties.get('duration_in_milliseconds'), 1000)
-        filesize = int_or_none(properties.get('total_storage_in_bytes'))
-        categories = [properties.get('category')]
-        tags = properties.get('tags', [])
-        thumbnails = [{
-            'url': thumbnail['url'],
-            'width': int_or_none(thumbnail.get('width')),
-            'height': int_or_none(thumbnail.get('height')),
-        } for thumbnail in properties.get('thumbnails', []) if thumbnail.get('url')]
-
          subtitles = {}
          subtitles = {}
-        for caption in properties.get('captions', []):
-            lang = caption.get('language_code')
-            subtitles_url = caption.get('url')
-            if lang and subtitles_url:
-                subtitles.setdefault(lang, []).append({
-                    'url': subtitles_url,
-                })
-        closed_captions_url = properties.get('closed_captions_url')
-        if closed_captions_url:
-            subtitles.setdefault('en', []).append({
-                'url': closed_captions_url,
-                'ext': 'ttml',
-            })
+        for flag in mobile_item.get('flags'):
+            if flag == 'ClosedCaptions':
+                closed_captions = self._call_playlist_service(
+                    video_id, 'getClosedCaptionsDetailsByMediaId',
+                    False, referer) or []
+                for cc in closed_captions:
+                    cc_url = cc.get('webvttFileUrl')
+                    if not cc_url:
+                        continue
+                    lang = cc.get('languageCode') or self._search_regex(r'/[a-z]{2}\.vtt', cc_url, 'lang', default='en')
+                    subtitles.setdefault(lang, []).append({
+                        'url': cc_url,
+                    })
+                break
+
+        get_meta = lambda x: pc_item.get(x) or mobile_item.get(x)
  
          return {
              'id': video_id,
              'title': title,
  
          return {
              'id': video_id,
              'title': title,
-            'description': description,
+            'description': get_meta('description'),
              'formats': formats,
              'formats': formats,
-            'timestamp': timestamp,
-            'duration': duration,
-            'filesize': filesize,
-            'categories': categories,
-            'tags': tags,
-            'thumbnails': thumbnails,
+            'duration': float_or_none(get_meta('durationInMilliseconds'), 1000),
+            'thumbnail': get_meta('previewImageUrl') or get_meta('thumbnailImageUrl'),
              'subtitles': subtitles,
          }
  
              'subtitles': subtitles,
          }
  
-    def _extract_info_helper(self, pc, mobile, i, metadata):
-        return self._extract_info(
-            try_get(pc, lambda x: x['playlistItems'][i]['streams'], list) or [],
-            try_get(mobile, lambda x: x['mediaList'][i]['mobileUrls'], list) or [],
-            metadata)
-
  
  class LimelightMediaIE(LimelightBaseIE):
      IE_NAME = 'limelight'
  
  class LimelightMediaIE(LimelightBaseIE):
      IE_NAME = 'limelight'
@@ -251,8 +231,6 @@ class LimelightMediaIE(LimelightBaseIE):
              'description': 'md5:8005b944181778e313d95c1237ddb640',
              'thumbnail': r're:^https?://.*\.jpeg$',
              'duration': 144.23,
              'description': 'md5:8005b944181778e313d95c1237ddb640',
              'thumbnail': r're:^https?://.*\.jpeg$',
              'duration': 144.23,
-            'timestamp': 1244136834,
-            'upload_date': '20090604',
          },
          'params': {
              # m3u8 download
          },
          'params': {
              # m3u8 download
@@ -268,30 +246,29 @@ class LimelightMediaIE(LimelightBaseIE):
              'title': '3Play Media Overview Video',
              'thumbnail': r're:^https?://.*\.jpeg$',
              'duration': 78.101,
              'title': '3Play Media Overview Video',
              'thumbnail': r're:^https?://.*\.jpeg$',
              'duration': 78.101,
-            'timestamp': 1338929955,
-            'upload_date': '20120605',
-            'subtitles': 'mincount:9',
+            # TODO: extract all languages that were accessible via API
+            # 'subtitles': 'mincount:9',
+            'subtitles': 'mincount:1',
          },
      }, {
          'url': 'https://assets.delvenetworks.com/player/loader.swf?mediaId=8018a574f08d416e95ceaccae4ba0452',
          'only_matching': True,
      }]
      _PLAYLIST_SERVICE_PATH = 'media'
          },
      }, {
          'url': 'https://assets.delvenetworks.com/player/loader.swf?mediaId=8018a574f08d416e95ceaccae4ba0452',
          'only_matching': True,
      }]
      _PLAYLIST_SERVICE_PATH = 'media'
-    _API_PATH = 'media'
  
      def _real_extract(self, url):
          url, smuggled_data = unsmuggle_url(url, {})
          video_id = self._match_id(url)
  
      def _real_extract(self, url):
          url, smuggled_data = unsmuggle_url(url, {})
          video_id = self._match_id(url)
+        source_url = smuggled_data.get('source_url')
          self._initialize_geo_bypass({
              'countries': smuggled_data.get('geo_countries'),
          })
  
          self._initialize_geo_bypass({
              'countries': smuggled_data.get('geo_countries'),
          })
  
-        pc, mobile, metadata = self._extract(
+        pc, mobile = self._extract(
              video_id, 'getPlaylistByMediaId',
              video_id, 'getPlaylistByMediaId',
-            'getMobilePlaylistByMediaId', 'properties',
-            smuggled_data.get('source_url'))
+            'getMobilePlaylistByMediaId', source_url)
  
  
-        return self._extract_info_helper(pc, mobile, 0, metadata)
+        return self._extract_info(pc, mobile, 0, source_url)
  
  
  class LimelightChannelIE(LimelightBaseIE):
  
  
  class LimelightChannelIE(LimelightBaseIE):
@@ -313,6 +290,7 @@ class LimelightChannelIE(LimelightBaseIE):
          'info_dict': {
              'id': 'ab6a524c379342f9b23642917020c082',
              'title': 'Javascript Sample Code',
          'info_dict': {
              'id': 'ab6a524c379342f9b23642917020c082',
              'title': 'Javascript Sample Code',
+            'description': 'Javascript Sample Code - http://www.delvenetworks.com/sample-code/playerCode-demo.html',
          },
          'playlist_mincount': 3,
      }, {
          },
          'playlist_mincount': 3,
      }, {
@@ -320,22 +298,23 @@ class LimelightChannelIE(LimelightBaseIE):
          'only_matching': True,
      }]
      _PLAYLIST_SERVICE_PATH = 'channel'
          'only_matching': True,
      }]
      _PLAYLIST_SERVICE_PATH = 'channel'
-    _API_PATH = 'channels'
  
      def _real_extract(self, url):
          url, smuggled_data = unsmuggle_url(url, {})
          channel_id = self._match_id(url)
  
      def _real_extract(self, url):
          url, smuggled_data = unsmuggle_url(url, {})
          channel_id = self._match_id(url)
+        source_url = smuggled_data.get('source_url')
  
  
-        pc, mobile, medias = self._extract(
+        pc, mobile = self._extract(
              channel_id, 'getPlaylistByChannelId',
              'getMobilePlaylistWithNItemsByChannelId?begin=0&count=-1',
              channel_id, 'getPlaylistByChannelId',
              'getMobilePlaylistWithNItemsByChannelId?begin=0&count=-1',
-            'media', smuggled_data.get('source_url'))
+            source_url)
  
          entries = [
  
          entries = [
-            self._extract_info_helper(pc, mobile, i, medias['media_list'][i])
-            for i in range(len(medias['media_list']))]
+            self._extract_info(pc, mobile, i, source_url)
+            for i in range(len(pc['playlistItems']))]
  
  
-        return self.playlist_result(entries, channel_id, pc['title'])
+        return self.playlist_result(
+            entries, channel_id, pc.get('title'), mobile.get('description'))
  
  
  class LimelightChannelListIE(LimelightBaseIE):
  
  
  class LimelightChannelListIE(LimelightBaseIE):
@@ -368,10 +347,12 @@ class LimelightChannelListIE(LimelightBaseIE):
      def _real_extract(self, url):
          channel_list_id = self._match_id(url)
  
      def _real_extract(self, url):
          channel_list_id = self._match_id(url)
  
-        channel_list = self._call_playlist_service(channel_list_id, 'getMobileChannelListById')
+        channel_list = self._call_playlist_service(
+            channel_list_id, 'getMobileChannelListById')
  
          entries = [
              self.url_result('limelight:channel:%s' % channel['id'], 'LimelightChannel')
              for channel in channel_list['channelList']]
  
  
          entries = [
              self.url_result('limelight:channel:%s' % channel['id'], 'LimelightChannel')
              for channel in channel_list['channelList']]
  
-        return self.playlist_result(entries, channel_list_id, channel_list['title'])
+        return self.playlist_result(
+            entries, channel_list_id, channel_list['title'])
diff --git a/youtube_dl/extractor/linuxacademy.py b/youtube_dl/extractor/linuxacademy.py

index a78c6556e105220a09b66dac94d8fc27780e3dc5..23ca965d977b1ec682101f048684f20f1b70834c 100644 (file)
--- a/youtube_dl/extractor/linuxacademy.py
+++ b/youtube_dl/extractor/linuxacademy.py
@@ -8,7 +8,6 @@
  from ..compat import (
      compat_b64decode,
      compat_HTTPError,
  from ..compat import (
      compat_b64decode,
      compat_HTTPError,
-    compat_str,
  )
  from ..utils import (
      ExtractorError,
  )
  from ..utils import (
      ExtractorError,
@@ -99,7 +98,7 @@ def random_string():
              'sso': 'true',
          })
  
              'sso': 'true',
          })
  
-        login_state_url = compat_str(urlh.geturl())
+        login_state_url = urlh.geturl()
  
          try:
              login_page = self._download_webpage(
  
          try:
              login_page = self._download_webpage(
@@ -129,7 +128,7 @@ def random_string():
              })
  
          access_token = self._search_regex(
              })
  
          access_token = self._search_regex(
-            r'access_token=([^=&]+)', compat_str(urlh.geturl()),
+            r'access_token=([^=&]+)', urlh.geturl(),
              'access token')
  
          self._download_webpage(
              'access token')
  
          self._download_webpage(
diff --git a/youtube_dl/extractor/mailru.py b/youtube_dl/extractor/mailru.py

index 6b0e64b7f1032159262220dcf77c6ffaa358d014..65cc474db0dbc84e323a1610ab7692f2aca075f0 100644 (file)
--- a/youtube_dl/extractor/mailru.py
+++ b/youtube_dl/extractor/mailru.py
@@ -20,10 +20,10 @@ class MailRuIE(InfoExtractor):
      IE_DESC = 'Видео@Mail.Ru'
      _VALID_URL = r'''(?x)
                      https?://
      IE_DESC = 'Видео@Mail.Ru'
      _VALID_URL = r'''(?x)
                      https?://
-                        (?:(?:www|m)\.)?my\.mail\.ru/
+                        (?:(?:www|m)\.)?my\.mail\.ru/+
                          (?:
                              video/.*\#video=/?(?P<idv1>(?:[^/]+/){3}\d+)|
                          (?:
                              video/.*\#video=/?(?P<idv1>(?:[^/]+/){3}\d+)|
-                            (?:(?P<idv2prefix>(?:[^/]+/){2})video/(?P<idv2suffix>[^/]+/\d+))\.html|
+                            (?:(?P<idv2prefix>(?:[^/]+/+){2})video/(?P<idv2suffix>[^/]+/\d+))\.html|
                              (?:video/embed|\+/video/meta)/(?P<metaid>\d+)
                          )
                      '''
                              (?:video/embed|\+/video/meta)/(?P<metaid>\d+)
                          )
                      '''
@@ -85,6 +85,14 @@ class MailRuIE(InfoExtractor):
          {
              'url': 'http://my.mail.ru/+/video/meta/7949340477499637815',
              'only_matching': True,
          {
              'url': 'http://my.mail.ru/+/video/meta/7949340477499637815',
              'only_matching': True,
+        },
+        {
+            'url': 'https://my.mail.ru//list/sinyutin10/video/_myvideo/4.html',
+            'only_matching': True,
+        },
+        {
+            'url': 'https://my.mail.ru//list//sinyutin10/video/_myvideo/4.html',
+            'only_matching': True,
          }
      ]
  
          }
      ]
  
@@ -120,6 +128,12 @@ def _real_extract(self, url):
                  'http://api.video.mail.ru/videos/%s.json?new=1' % video_id,
                  video_id, 'Downloading video JSON')
  
                  'http://api.video.mail.ru/videos/%s.json?new=1' % video_id,
                  video_id, 'Downloading video JSON')
  
+        headers = {}
+
+        video_key = self._get_cookies('https://my.mail.ru').get('video_key')
+        if video_key:
+            headers['Cookie'] = 'video_key=%s' % video_key.value
+
          formats = []
          for f in video_data['videos']:
              video_url = f.get('url')
          formats = []
          for f in video_data['videos']:
              video_url = f.get('url')
@@ -132,6 +146,7 @@ def _real_extract(self, url):
                  'url': video_url,
                  'format_id': format_id,
                  'height': height,
                  'url': video_url,
                  'format_id': format_id,
                  'height': height,
+                'http_headers': headers,
              })
          self._sort_formats(formats)
  
              })
          self._sort_formats(formats)
  
@@ -237,7 +252,7 @@ def _extract_track(t, fatal=True):
  class MailRuMusicIE(MailRuMusicSearchBaseIE):
      IE_NAME = 'mailru:music'
      IE_DESC = 'Музыка@Mail.Ru'
  class MailRuMusicIE(MailRuMusicSearchBaseIE):
      IE_NAME = 'mailru:music'
      IE_DESC = 'Музыка@Mail.Ru'
-    _VALID_URL = r'https?://my\.mail\.ru/music/songs/[^/?#&]+-(?P<id>[\da-f]+)'
+    _VALID_URL = r'https?://my\.mail\.ru/+music/+songs/+[^/?#&]+-(?P<id>[\da-f]+)'
      _TESTS = [{
          'url': 'https://my.mail.ru/music/songs/%D0%BC8%D0%BB8%D1%82%D1%85-l-a-h-luciferian-aesthetics-of-herrschaft-single-2017-4e31f7125d0dfaef505d947642366893',
          'md5': '0f8c22ef8c5d665b13ac709e63025610',
      _TESTS = [{
          'url': 'https://my.mail.ru/music/songs/%D0%BC8%D0%BB8%D1%82%D1%85-l-a-h-luciferian-aesthetics-of-herrschaft-single-2017-4e31f7125d0dfaef505d947642366893',
          'md5': '0f8c22ef8c5d665b13ac709e63025610',
@@ -273,7 +288,7 @@ def _real_extract(self, url):
  class MailRuMusicSearchIE(MailRuMusicSearchBaseIE):
      IE_NAME = 'mailru:music:search'
      IE_DESC = 'Музыка@Mail.Ru'
  class MailRuMusicSearchIE(MailRuMusicSearchBaseIE):
      IE_NAME = 'mailru:music:search'
      IE_DESC = 'Музыка@Mail.Ru'
-    _VALID_URL = r'https?://my\.mail\.ru/music/search/(?P<id>[^/?#&]+)'
+    _VALID_URL = r'https?://my\.mail\.ru/+music/+search/+(?P<id>[^/?#&]+)'
      _TESTS = [{
          'url': 'https://my.mail.ru/music/search/black%20shadow',
          'info_dict': {
      _TESTS = [{
          'url': 'https://my.mail.ru/music/search/black%20shadow',
          'info_dict': {
diff --git a/youtube_dl/extractor/malltv.py b/youtube_dl/extractor/malltv.py

index e13c2e11a5baf301d32a34d8343776ab249ea821..6f4fd927fa3c5a607cb7caee632f2d7aed2471d5 100644 (file)
--- a/youtube_dl/extractor/malltv.py
+++ b/youtube_dl/extractor/malltv.py
@@ -8,7 +8,7 @@
  
  
  class MallTVIE(InfoExtractor):
  
  
  class MallTVIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?mall\.tv/(?:[^/]+/)*(?P<id>[^/?#&]+)'
+    _VALID_URL = r'https?://(?:(?:www|sk)\.)?mall\.tv/(?:[^/]+/)*(?P<id>[^/?#&]+)'
      _TESTS = [{
          'url': 'https://www.mall.tv/18-miliard-pro-neziskovky-opravdu-jsou-sportovci-nebo-clovek-v-tisni-pijavice',
          'md5': '1c4a37f080e1f3023103a7b43458e518',
      _TESTS = [{
          'url': 'https://www.mall.tv/18-miliard-pro-neziskovky-opravdu-jsou-sportovci-nebo-clovek-v-tisni-pijavice',
          'md5': '1c4a37f080e1f3023103a7b43458e518',
@@ -26,6 +26,9 @@ class MallTVIE(InfoExtractor):
      }, {
          'url': 'https://www.mall.tv/kdo-to-plati/18-miliard-pro-neziskovky-opravdu-jsou-sportovci-nebo-clovek-v-tisni-pijavice',
          'only_matching': True,
      }, {
          'url': 'https://www.mall.tv/kdo-to-plati/18-miliard-pro-neziskovky-opravdu-jsou-sportovci-nebo-clovek-v-tisni-pijavice',
          'only_matching': True,
+    }, {
+        'url': 'https://sk.mall.tv/gejmhaus/reklamacia-nehreje-vyrobnik-tepla-alebo-spekacka',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
      }]
  
      def _real_extract(self, url):
diff --git a/youtube_dl/extractor/mediaset.py b/youtube_dl/extractor/mediaset.py

index f976506f416b9f5461dbfb85f988e2816c43b334..933df14952d5cc16857485e306be07f2d32384d3 100644 (file)
--- a/youtube_dl/extractor/mediaset.py
+++ b/youtube_dl/extractor/mediaset.py
@@ -6,7 +6,6 @@
  from .theplatform import ThePlatformBaseIE
  from ..compat import (
      compat_parse_qs,
  from .theplatform import ThePlatformBaseIE
  from ..compat import (
      compat_parse_qs,
-    compat_str,
      compat_urllib_parse_urlparse,
  )
  from ..utils import (
      compat_urllib_parse_urlparse,
  )
  from ..utils import (
@@ -114,7 +113,7 @@ def _program_guid(qs):
                  continue
              urlh = ie._request_webpage(
                  embed_url, video_id, note='Following embed URL redirect')
                  continue
              urlh = ie._request_webpage(
                  embed_url, video_id, note='Following embed URL redirect')
-            embed_url = compat_str(urlh.geturl())
+            embed_url = urlh.geturl()
              program_guid = _program_guid(_qs(embed_url))
              if program_guid:
                  entries.append(embed_url)
              program_guid = _program_guid(_qs(embed_url))
              if program_guid:
                  entries.append(embed_url)
@@ -123,7 +122,7 @@ def _program_guid(qs):
      def _parse_smil_formats(self, smil, smil_url, video_id, namespace=None, f4m_params=None, transform_rtmp_url=None):
          for video in smil.findall(self._xpath_ns('.//video', namespace)):
              video.attrib['src'] = re.sub(r'(https?://vod05)t(-mediaset-it\.akamaized\.net/.+?.mpd)\?.+', r'\1\2', video.attrib['src'])
      def _parse_smil_formats(self, smil, smil_url, video_id, namespace=None, f4m_params=None, transform_rtmp_url=None):
          for video in smil.findall(self._xpath_ns('.//video', namespace)):
              video.attrib['src'] = re.sub(r'(https?://vod05)t(-mediaset-it\.akamaized\.net/.+?.mpd)\?.+', r'\1\2', video.attrib['src'])
-        return super()._parse_smil_formats(smil, smil_url, video_id, namespace, f4m_params, transform_rtmp_url)
+        return super(MediasetIE, self)._parse_smil_formats(smil, smil_url, video_id, namespace, f4m_params, transform_rtmp_url)
  
      def _real_extract(self, url):
          guid = self._match_id(url)
  
      def _real_extract(self, url):
          guid = self._match_id(url)
diff --git a/youtube_dl/extractor/mediasite.py b/youtube_dl/extractor/mediasite.py

index 694a264d672288b47c2700b9265bfc0635158ff2..d6eb1574065dece67e28a4b36fa43478dd48dfa3 100644 (file)
--- a/youtube_dl/extractor/mediasite.py
+++ b/youtube_dl/extractor/mediasite.py
@@ -129,7 +129,7 @@ def _real_extract(self, url):
          query = mobj.group('query')
  
          webpage, urlh = self._download_webpage_handle(url, resource_id)  # XXX: add UrlReferrer?
          query = mobj.group('query')
  
          webpage, urlh = self._download_webpage_handle(url, resource_id)  # XXX: add UrlReferrer?
-        redirect_url = compat_str(urlh.geturl())
+        redirect_url = urlh.geturl()
  
          # XXX: might have also extracted UrlReferrer and QueryString from the html
          service_path = compat_urlparse.urljoin(redirect_url, self._html_search_regex(
  
          # XXX: might have also extracted UrlReferrer and QueryString from the html
          service_path = compat_urlparse.urljoin(redirect_url, self._html_search_regex(
diff --git a/youtube_dl/extractor/mitele.py b/youtube_dl/extractor/mitele.py

index 40f214a873a7dcbb5087726aa270b74a5dab3e76..ad9da96125b92d66d4e6367e23b7ed9a1a633aaf 100644 (file)
--- a/youtube_dl/extractor/mitele.py
+++ b/youtube_dl/extractor/mitele.py
@@ -4,8 +4,8 @@
  from .common import InfoExtractor
  from ..utils import (
      int_or_none,
  from .common import InfoExtractor
  from ..utils import (
      int_or_none,
+    parse_iso8601,
      smuggle_url,
      smuggle_url,
-    parse_duration,
  )
  
  
  )
  
  
@@ -18,16 +18,18 @@ class MiTeleIE(InfoExtractor):
          'info_dict': {
              'id': 'FhYW1iNTE6J6H7NkQRIEzfne6t2quqPg',
              'ext': 'mp4',
          'info_dict': {
              'id': 'FhYW1iNTE6J6H7NkQRIEzfne6t2quqPg',
              'ext': 'mp4',
-            'title': 'Tor, la web invisible',
-            'description': 'md5:3b6fce7eaa41b2d97358726378d9369f',
+            'title': 'Diario de La redacción Programa 144',
+            'description': 'md5:07c35a7b11abb05876a6a79185b58d27',
              'series': 'Diario de',
              'series': 'Diario de',
-            'season': 'La redacción',
+            'season': 'Season 14',
              'season_number': 14,
              'season_number': 14,
-            'season_id': 'diario_de_t14_11981',
-            'episode': 'Programa 144',
+            'episode': 'Tor, la web invisible',
              'episode_number': 3,
              'thumbnail': r're:(?i)^https?://.*\.jpg$',
              'duration': 2913,
              'episode_number': 3,
              'thumbnail': r're:(?i)^https?://.*\.jpg$',
              'duration': 2913,
+            'age_limit': 16,
+            'timestamp': 1471209401,
+            'upload_date': '20160814',
          },
          'add_ie': ['Ooyala'],
      }, {
          },
          'add_ie': ['Ooyala'],
      }, {
@@ -39,13 +41,15 @@ class MiTeleIE(InfoExtractor):
              'title': 'Cuarto Milenio Temporada 6 Programa 226',
              'description': 'md5:5ff132013f0cd968ffbf1f5f3538a65f',
              'series': 'Cuarto Milenio',
              'title': 'Cuarto Milenio Temporada 6 Programa 226',
              'description': 'md5:5ff132013f0cd968ffbf1f5f3538a65f',
              'series': 'Cuarto Milenio',
-            'season': 'Temporada 6',
+            'season': 'Season 6',
              'season_number': 6,
              'season_number': 6,
-            'season_id': 'cuarto_milenio_t06_12715',
-            'episode': 'Programa 226',
+            'episode': 'Episode 24',
              'episode_number': 24,
              'thumbnail': r're:(?i)^https?://.*\.jpg$',
              'duration': 7313,
              'episode_number': 24,
              'thumbnail': r're:(?i)^https?://.*\.jpg$',
              'duration': 7313,
+            'age_limit': 12,
+            'timestamp': 1471209021,
+            'upload_date': '20160814',
          },
          'params': {
              'skip_download': True,
          },
          'params': {
              'skip_download': True,
@@ -54,67 +58,36 @@ class MiTeleIE(InfoExtractor):
      }, {
          'url': 'http://www.mitele.es/series-online/la-que-se-avecina/57aac5c1c915da951a8b45ed/player',
          'only_matching': True,
      }, {
          'url': 'http://www.mitele.es/series-online/la-que-se-avecina/57aac5c1c915da951a8b45ed/player',
          'only_matching': True,
+    }, {
+        'url': 'https://www.mitele.es/programas-tv/diario-de/la-redaccion/programa-144-40_1006364575251/player/',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
      }]
  
      def _real_extract(self, url):
-        video_id = self._match_id(url)
-
-        paths = self._download_json(
-            'https://www.mitele.es/amd/agp/web/metadata/general_configuration',
-            video_id, 'Downloading paths JSON')
-
-        ooyala_s = paths['general_configuration']['api_configuration']['ooyala_search']
-        base_url = ooyala_s.get('base_url', 'cdn-search-mediaset.carbyne.ps.ooyala.com')
-        full_path = ooyala_s.get('full_path', '/search/v1/full/providers/')
-        source = self._download_json(
-            '%s://%s%s%s/docs/%s' % (
-                ooyala_s.get('protocol', 'https'), base_url, full_path,
-                ooyala_s.get('provider_id', '104951'), video_id),
-            video_id, 'Downloading data JSON', query={
-                'include_titles': 'Series,Season',
-                'product_name': ooyala_s.get('product_name', 'test'),
-                'format': 'full',
-            })['hits']['hits'][0]['_source']
-
-        embedCode = source['offers'][0]['embed_codes'][0]
-        titles = source['localizable_titles'][0]
-
-        title = titles.get('title_medium') or titles['title_long']
-
-        description = titles.get('summary_long') or titles.get('summary_medium')
-
-        def get(key1, key2):
-            value1 = source.get(key1)
-            if not value1 or not isinstance(value1, list):
-                return
-            if not isinstance(value1[0], dict):
-                return
-            return value1[0].get(key2)
-
-        series = get('localizable_titles_series', 'title_medium')
-
-        season = get('localizable_titles_season', 'title_medium')
-        season_number = int_or_none(source.get('season_number'))
-        season_id = source.get('season_id')
-
-        episode = titles.get('title_sort_name')
-        episode_number = int_or_none(source.get('episode_number'))
-
-        duration = parse_duration(get('videos', 'duration'))
+        display_id = self._match_id(url)
+        webpage = self._download_webpage(url, display_id)
+        pre_player = self._parse_json(self._search_regex(
+            r'window\.\$REACTBASE_STATE\.prePlayer_mtweb\s*=\s*({.+})',
+            webpage, 'Pre Player'), display_id)['prePlayer']
+        title = pre_player['title']
+        video = pre_player['video']
+        video_id = video['dataMediaId']
+        content = pre_player.get('content') or {}
+        info = content.get('info') or {}
  
          return {
              '_type': 'url_transparent',
              # for some reason only HLS is supported
  
          return {
              '_type': 'url_transparent',
              # for some reason only HLS is supported
-            'url': smuggle_url('ooyala:' + embedCode, {'supportedformats': 'm3u8,dash'}),
+            'url': smuggle_url('ooyala:' + video_id, {'supportedformats': 'm3u8,dash'}),
              'id': video_id,
              'title': title,
              'id': video_id,
              'title': title,
-            'description': description,
-            'series': series,
-            'season': season,
-            'season_number': season_number,
-            'season_id': season_id,
-            'episode': episode,
-            'episode_number': episode_number,
-            'duration': duration,
-            'thumbnail': get('images', 'url'),
+            'description': info.get('synopsis'),
+            'series': content.get('title'),
+            'season_number': int_or_none(info.get('season_number')),
+            'episode': content.get('subtitle'),
+            'episode_number': int_or_none(info.get('episode_number')),
+            'duration': int_or_none(info.get('duration')),
+            'thumbnail': video.get('dataPoster'),
+            'age_limit': int_or_none(info.get('rating')),
+            'timestamp': parse_iso8601(pre_player.get('publishedTime')),
          }
          }
diff --git a/youtube_dl/extractor/mofosex.py b/youtube_dl/extractor/mofosex.py

index 1c652813adb96b994c3ba517db994805e8ea8eb3..5234cac02632d9cddde53ebb10e2a0d91c4ec508 100644 (file)
--- a/youtube_dl/extractor/mofosex.py
+++ b/youtube_dl/extractor/mofosex.py
@@ -1,5 +1,8 @@
  from __future__ import unicode_literals
  
  from __future__ import unicode_literals
  
+import re
+
+from .common import InfoExtractor
  from ..utils import (
      int_or_none,
      str_to_int,
  from ..utils import (
      int_or_none,
      str_to_int,
@@ -54,3 +57,23 @@ def _real_extract(self, url):
          })
  
          return info
          })
  
          return info
+
+
+class MofosexEmbedIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?mofosex\.com/embed/?\?.*?\bvideoid=(?P<id>\d+)'
+    _TESTS = [{
+        'url': 'https://www.mofosex.com/embed/?videoid=318131&referrer=KM',
+        'only_matching': True,
+    }]
+
+    @staticmethod
+    def _extract_urls(webpage):
+        return re.findall(
+            r'<iframe[^>]+\bsrc=["\']((?:https?:)?//(?:www\.)?mofosex\.com/embed/?\?.*?\bvideoid=\d+)',
+            webpage)
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        return self.url_result(
+            'http://www.mofosex.com/videos/{0}/{0}.html'.format(video_id),
+            ie=MofosexIE.ie_key(), video_id=video_id)
diff --git a/youtube_dl/extractor/motherless.py b/youtube_dl/extractor/motherless.py

index 43fd70f112005a893377c8e5cf489291fb8cc812..b1615b4d8e4bce8b580942f717477e6ed57ee92e 100644 (file)
--- a/youtube_dl/extractor/motherless.py
+++ b/youtube_dl/extractor/motherless.py
@@ -26,7 +26,7 @@ class MotherlessIE(InfoExtractor):
              'categories': ['Gaming', 'anal', 'reluctant', 'rough', 'Wife'],
              'upload_date': '20100913',
              'uploader_id': 'famouslyfuckedup',
              'categories': ['Gaming', 'anal', 'reluctant', 'rough', 'Wife'],
              'upload_date': '20100913',
              'uploader_id': 'famouslyfuckedup',
-            'thumbnail': r're:http://.*\.jpg',
+            'thumbnail': r're:https?://.*\.jpg',
              'age_limit': 18,
          }
      }, {
              'age_limit': 18,
          }
      }, {
@@ -40,7 +40,7 @@ class MotherlessIE(InfoExtractor):
                             'game', 'hairy'],
              'upload_date': '20140622',
              'uploader_id': 'Sulivana7x',
                             'game', 'hairy'],
              'upload_date': '20140622',
              'uploader_id': 'Sulivana7x',
-            'thumbnail': r're:http://.*\.jpg',
+            'thumbnail': r're:https?://.*\.jpg',
              'age_limit': 18,
          },
          'skip': '404',
              'age_limit': 18,
          },
          'skip': '404',
@@ -54,7 +54,7 @@ class MotherlessIE(InfoExtractor):
              'categories': ['superheroine heroine  superher'],
              'upload_date': '20140827',
              'uploader_id': 'shade0230',
              'categories': ['superheroine heroine  superher'],
              'upload_date': '20140827',
              'uploader_id': 'shade0230',
-            'thumbnail': r're:http://.*\.jpg',
+            'thumbnail': r're:https?://.*\.jpg',
              'age_limit': 18,
          }
      }, {
              'age_limit': 18,
          }
      }, {
@@ -76,7 +76,8 @@ def _real_extract(self, url):
              raise ExtractorError('Video %s is for friends only' % video_id, expected=True)
  
          title = self._html_search_regex(
              raise ExtractorError('Video %s is for friends only' % video_id, expected=True)
  
          title = self._html_search_regex(
-            r'id="view-upload-title">\s+([^<]+)<', webpage, 'title')
+            (r'(?s)<div[^>]+\bclass=["\']media-meta-title[^>]+>(.+?)</div>',
+             r'id="view-upload-title">\s+([^<]+)<'), webpage, 'title')
          video_url = (self._html_search_regex(
              (r'setup\(\{\s*["\']file["\']\s*:\s*(["\'])(?P<url>(?:(?!\1).)+)\1',
               r'fileurl\s*=\s*(["\'])(?P<url>(?:(?!\1).)+)\1'),
          video_url = (self._html_search_regex(
              (r'setup\(\{\s*["\']file["\']\s*:\s*(["\'])(?P<url>(?:(?!\1).)+)\1',
               r'fileurl\s*=\s*(["\'])(?P<url>(?:(?!\1).)+)\1'),
@@ -84,14 +85,15 @@ def _real_extract(self, url):
              or 'http://cdn4.videos.motherlessmedia.com/videos/%s.mp4?fs=opencloud' % video_id)
          age_limit = self._rta_search(webpage)
          view_count = str_to_int(self._html_search_regex(
              or 'http://cdn4.videos.motherlessmedia.com/videos/%s.mp4?fs=opencloud' % video_id)
          age_limit = self._rta_search(webpage)
          view_count = str_to_int(self._html_search_regex(
-            r'<strong>Views</strong>\s+([^<]+)<',
+            (r'>(\d+)\s+Views<', r'<strong>Views</strong>\s+([^<]+)<'),
              webpage, 'view count', fatal=False))
          like_count = str_to_int(self._html_search_regex(
              webpage, 'view count', fatal=False))
          like_count = str_to_int(self._html_search_regex(
-            r'<strong>Favorited</strong>\s+([^<]+)<',
+            (r'>(\d+)\s+Favorites<', r'<strong>Favorited</strong>\s+([^<]+)<'),
              webpage, 'like count', fatal=False))
  
          upload_date = self._html_search_regex(
              webpage, 'like count', fatal=False))
  
          upload_date = self._html_search_regex(
-            r'<strong>Uploaded</strong>\s+([^<]+)<', webpage, 'upload date')
+            (r'class=["\']count[^>]+>(\d+\s+[a-zA-Z]{3}\s+\d{4})<',
+             r'<strong>Uploaded</strong>\s+([^<]+)<'), webpage, 'upload date')
          if 'Ago' in upload_date:
              days = int(re.search(r'([0-9]+)', upload_date).group(1))
              upload_date = (datetime.datetime.now() - datetime.timedelta(days=days)).strftime('%Y%m%d')
          if 'Ago' in upload_date:
              days = int(re.search(r'([0-9]+)', upload_date).group(1))
              upload_date = (datetime.datetime.now() - datetime.timedelta(days=days)).strftime('%Y%m%d')
diff --git a/youtube_dl/extractor/naver.py b/youtube_dl/extractor/naver.py

index bb3d944133d6a1e2685779b86a7565ad9b0985f0..61fc59126f61ef3599f69ac7ad1d920e9bc87817 100644 (file)
--- a/youtube_dl/extractor/naver.py
+++ b/youtube_dl/extractor/naver.py
@@ -1,68 +1,33 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
  # coding: utf-8
  from __future__ import unicode_literals
  
+import re
+
  from .common import InfoExtractor
  from ..utils import (
  from .common import InfoExtractor
  from ..utils import (
+    clean_html,
+    dict_get,
      ExtractorError,
      int_or_none,
      ExtractorError,
      int_or_none,
+    parse_duration,
+    try_get,
      update_url_query,
  )
  
  
      update_url_query,
  )
  
  
-class NaverIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:m\.)?tv(?:cast)?\.naver\.com/v/(?P<id>\d+)'
+class NaverBaseIE(InfoExtractor):
+    _CAPTION_EXT_RE = r'\.(?:ttml|vtt)'
  
  
-    _TESTS = [{
-        'url': 'http://tv.naver.com/v/81652',
-        'info_dict': {
-            'id': '81652',
-            'ext': 'mp4',
-            'title': '[9월 모의고사 해설강의][수학_김상희] 수학 A형 16~20번',
-            'description': '합격불변의 법칙 메가스터디 | 메가스터디 수학 김상희 선생님이 9월 모의고사 수학A형 16번에서 20번까지 해설강의를 공개합니다.',
-            'upload_date': '20130903',
-        },
-    }, {
-        'url': 'http://tv.naver.com/v/395837',
-        'md5': '638ed4c12012c458fefcddfd01f173cd',
-        'info_dict': {
-            'id': '395837',
-            'ext': 'mp4',
-            'title': '9년이 지나도 아픈 기억, 전효성의 아버지',
-            'description': 'md5:5bf200dcbf4b66eb1b350d1eb9c753f7',
-            'upload_date': '20150519',
-        },
-        'skip': 'Georestricted',
-    }, {
-        'url': 'http://tvcast.naver.com/v/81652',
-        'only_matching': True,
-    }]
-
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-        webpage = self._download_webpage(url, video_id)
-
-        vid = self._search_regex(
-            r'videoId["\']\s*:\s*(["\'])(?P<value>(?:(?!\1).)+)\1', webpage,
-            'video id', fatal=None, group='value')
-        in_key = self._search_regex(
-            r'inKey["\']\s*:\s*(["\'])(?P<value>(?:(?!\1).)+)\1', webpage,
-            'key', default=None, group='value')
-
-        if not vid or not in_key:
-            error = self._html_search_regex(
-                r'(?s)<div class="(?:nation_error|nation_box|error_box)">\s*(?:<!--.*?-->)?\s*<p class="[^"]+">(?P<msg>.+?)</p>\s*</div>',
-                webpage, 'error', default=None)
-            if error:
-                raise ExtractorError(error, expected=True)
-            raise ExtractorError('couldn\'t extract vid and key')
+    def _extract_video_info(self, video_id, vid, key):
          video_data = self._download_json(
              'http://play.rmcnmv.naver.com/vod/play/v2.0/' + vid,
              video_id, query={
          video_data = self._download_json(
              'http://play.rmcnmv.naver.com/vod/play/v2.0/' + vid,
              video_id, query={
-                'key': in_key,
+                'key': key,
              })
          meta = video_data['meta']
          title = meta['subject']
          formats = []
              })
          meta = video_data['meta']
          title = meta['subject']
          formats = []
+        get_list = lambda x: try_get(video_data, lambda y: y[x + 's']['list'], list) or []
  
          def extract_formats(streams, stream_type, query={}):
              for stream in streams:
  
          def extract_formats(streams, stream_type, query={}):
              for stream in streams:
@@ -73,7 +38,7 @@ def extract_formats(streams, stream_type, query={}):
                  encoding_option = stream.get('encodingOption', {})
                  bitrate = stream.get('bitrate', {})
                  formats.append({
                  encoding_option = stream.get('encodingOption', {})
                  bitrate = stream.get('bitrate', {})
                  formats.append({
-                    'format_id': '%s_%s' % (stream.get('type') or stream_type, encoding_option.get('id') or encoding_option.get('name')),
+                    'format_id': '%s_%s' % (stream.get('type') or stream_type, dict_get(encoding_option, ('name', 'id'))),
                      'url': stream_url,
                      'width': int_or_none(encoding_option.get('width')),
                      'height': int_or_none(encoding_option.get('height')),
                      'url': stream_url,
                      'width': int_or_none(encoding_option.get('width')),
                      'height': int_or_none(encoding_option.get('height')),
@@ -83,7 +48,7 @@ def extract_formats(streams, stream_type, query={}):
                      'protocol': 'm3u8_native' if stream_type == 'HLS' else None,
                  })
  
                      'protocol': 'm3u8_native' if stream_type == 'HLS' else None,
                  })
  
-        extract_formats(video_data.get('videos', {}).get('list', []), 'H264')
+        extract_formats(get_list('video'), 'H264')
          for stream_set in video_data.get('streams', []):
              query = {}
              for param in stream_set.get('keys', []):
          for stream_set in video_data.get('streams', []):
              query = {}
              for param in stream_set.get('keys', []):
@@ -101,28 +66,101 @@ def extract_formats(streams, stream_type, query={}):
                      'mp4', 'm3u8_native', m3u8_id=stream_type, fatal=False))
          self._sort_formats(formats)
  
                      'mp4', 'm3u8_native', m3u8_id=stream_type, fatal=False))
          self._sort_formats(formats)
  
+        replace_ext = lambda x, y: re.sub(self._CAPTION_EXT_RE, '.' + y, x)
+
+        def get_subs(caption_url):
+            if re.search(self._CAPTION_EXT_RE, caption_url):
+                return [{
+                    'url': replace_ext(caption_url, 'ttml'),
+                }, {
+                    'url': replace_ext(caption_url, 'vtt'),
+                }]
+            else:
+                return [{'url': caption_url}]
+
+        automatic_captions = {}
          subtitles = {}
          subtitles = {}
-        for caption in video_data.get('captions', {}).get('list', []):
+        for caption in get_list('caption'):
              caption_url = caption.get('source')
              if not caption_url:
                  continue
              caption_url = caption.get('source')
              if not caption_url:
                  continue
-            subtitles.setdefault(caption.get('language') or caption.get('locale'), []).append({
-                'url': caption_url,
-            })
+            sub_dict = automatic_captions if caption.get('type') == 'auto' else subtitles
+            sub_dict.setdefault(dict_get(caption, ('locale', 'language')), []).extend(get_subs(caption_url))
  
  
-        upload_date = self._search_regex(
-            r'<span[^>]+class="date".*?(\d{4}\.\d{2}\.\d{2})',
-            webpage, 'upload date', fatal=False)
-        if upload_date:
-            upload_date = upload_date.replace('.', '')
+        user = meta.get('user', {})
  
          return {
              'id': video_id,
              'title': title,
              'formats': formats,
              'subtitles': subtitles,
  
          return {
              'id': video_id,
              'title': title,
              'formats': formats,
              'subtitles': subtitles,
-            'description': self._og_search_description(webpage),
-            'thumbnail': meta.get('cover', {}).get('source') or self._og_search_thumbnail(webpage),
+            'automatic_captions': automatic_captions,
+            'thumbnail': try_get(meta, lambda x: x['cover']['source']),
              'view_count': int_or_none(meta.get('count')),
              'view_count': int_or_none(meta.get('count')),
-            'upload_date': upload_date,
+            'uploader_id': user.get('id'),
+            'uploader': user.get('name'),
+            'uploader_url': user.get('url'),
          }
          }
+
+
+class NaverIE(NaverBaseIE):
+    _VALID_URL = r'https?://(?:m\.)?tv(?:cast)?\.naver\.com/(?:v|embed)/(?P<id>\d+)'
+    _GEO_BYPASS = False
+    _TESTS = [{
+        'url': 'http://tv.naver.com/v/81652',
+        'info_dict': {
+            'id': '81652',
+            'ext': 'mp4',
+            'title': '[9월 모의고사 해설강의][수학_김상희] 수학 A형 16~20번',
+            'description': '메가스터디 수학 김상희 선생님이 9월 모의고사 수학A형 16번에서 20번까지 해설강의를 공개합니다.',
+            'timestamp': 1378200754,
+            'upload_date': '20130903',
+            'uploader': '메가스터디, 합격불변의 법칙',
+            'uploader_id': 'megastudy',
+        },
+    }, {
+        'url': 'http://tv.naver.com/v/395837',
+        'md5': '8a38e35354d26a17f73f4e90094febd3',
+        'info_dict': {
+            'id': '395837',
+            'ext': 'mp4',
+            'title': '9년이 지나도 아픈 기억, 전효성의 아버지',
+            'description': 'md5:eb6aca9d457b922e43860a2a2b1984d3',
+            'timestamp': 1432030253,
+            'upload_date': '20150519',
+            'uploader': '4가지쇼 시즌2',
+            'uploader_id': 'wrappinguser29',
+        },
+        'skip': 'Georestricted',
+    }, {
+        'url': 'http://tvcast.naver.com/v/81652',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        content = self._download_json(
+            'https://tv.naver.com/api/json/v/' + video_id,
+            video_id, headers=self.geo_verification_headers())
+        player_info_json = content.get('playerInfoJson') or {}
+        current_clip = player_info_json.get('currentClip') or {}
+
+        vid = current_clip.get('videoId')
+        in_key = current_clip.get('inKey')
+
+        if not vid or not in_key:
+            player_auth = try_get(player_info_json, lambda x: x['playerOption']['auth'])
+            if player_auth == 'notCountry':
+                self.raise_geo_restricted(countries=['KR'])
+            elif player_auth == 'notLogin':
+                self.raise_login_required()
+            raise ExtractorError('couldn\'t extract vid and key')
+        info = self._extract_video_info(video_id, vid, in_key)
+        info.update({
+            'description': clean_html(current_clip.get('description')),
+            'timestamp': int_or_none(current_clip.get('firstExposureTime'), 1000),
+            'duration': parse_duration(current_clip.get('displayPlayTime')),
+            'like_count': int_or_none(current_clip.get('recommendPoint')),
+            'age_limit': 19 if current_clip.get('adult') else None,
+        })
+        return info
diff --git a/youtube_dl/extractor/nbc.py b/youtube_dl/extractor/nbc.py

index 5bc39d00242c78e7dca8d479dad38b8df8728a0e..6f3cb30034da7f5fcebb99fc6dec05f1ff3cd8e4 100644 (file)
--- a/youtube_dl/extractor/nbc.py
+++ b/youtube_dl/extractor/nbc.py
@@ -87,11 +87,25 @@ class NBCIE(AdobePassIE):
      def _real_extract(self, url):
          permalink, video_id = re.match(self._VALID_URL, url).groups()
          permalink = 'http' + compat_urllib_parse_unquote(permalink)
      def _real_extract(self, url):
          permalink, video_id = re.match(self._VALID_URL, url).groups()
          permalink = 'http' + compat_urllib_parse_unquote(permalink)
-        response = self._download_json(
+        video_data = self._download_json(
              'https://friendship.nbc.co/v2/graphql', video_id, query={
              'https://friendship.nbc.co/v2/graphql', video_id, query={
-                'query': '''{
-  page(name: "%s", platform: web, type: VIDEO, userId: "0") {
-    data {
+                'query': '''query bonanzaPage(
+  $app: NBCUBrands! = nbc
+  $name: String!
+  $oneApp: Boolean
+  $platform: SupportedPlatforms! = web
+  $type: EntityPageType! = VIDEO
+  $userId: String!
+) {
+  bonanzaPage(
+    app: $app
+    name: $name
+    oneApp: $oneApp
+    platform: $platform
+    type: $type
+    userId: $userId
+  ) {
+    metadata {
        ... on VideoPageData {
          description
          episodeNumber
        ... on VideoPageData {
          description
          episodeNumber
@@ -100,15 +114,20 @@ def _real_extract(self, url):
          mpxAccountId
          mpxGuid
          rating
          mpxAccountId
          mpxGuid
          rating
+        resourceId
          seasonNumber
          secondaryTitle
          seriesShortTitle
        }
      }
    }
          seasonNumber
          secondaryTitle
          seriesShortTitle
        }
      }
    }
-}''' % permalink,
-            })
-        video_data = response['data']['page']['data']
+}''',
+                'variables': json.dumps({
+                    'name': permalink,
+                    'oneApp': True,
+                    'userId': '0',
+                }),
+            })['data']['bonanzaPage']['metadata']
          query = {
              'mbr': 'true',
              'manifest': 'm3u',
          query = {
              'mbr': 'true',
              'manifest': 'm3u',
@@ -117,8 +136,8 @@ def _real_extract(self, url):
          title = video_data['secondaryTitle']
          if video_data.get('locked'):
              resource = self._get_mvpd_resource(
          title = video_data['secondaryTitle']
          if video_data.get('locked'):
              resource = self._get_mvpd_resource(
-                'nbcentertainment', title, video_id,
-                video_data.get('rating'))
+                video_data.get('resourceId') or 'nbcentertainment',
+                title, video_id, video_data.get('rating'))
              query['auth'] = self._extract_mvpd_auth(
                  url, video_id, 'nbcentertainment', resource)
          theplatform_url = smuggle_url(update_url_query(
              query['auth'] = self._extract_mvpd_auth(
                  url, video_id, 'nbcentertainment', resource)
          theplatform_url = smuggle_url(update_url_query(
diff --git a/youtube_dl/extractor/ndr.py b/youtube_dl/extractor/ndr.py

index aec2ea1331f3c909957e50d4166e7657618fa1a6..2447c812e021e73991082aefab4bd98e6dd000a1 100644 (file)
--- a/youtube_dl/extractor/ndr.py
+++ b/youtube_dl/extractor/ndr.py
@@ -7,8 +7,11 @@
  from ..utils import (
      determine_ext,
      int_or_none,
  from ..utils import (
      determine_ext,
      int_or_none,
+    merge_dicts,
      parse_iso8601,
      qualities,
      parse_iso8601,
      qualities,
+    try_get,
+    urljoin,
  )
  
  
  )
  
  
@@ -85,21 +88,25 @@ class NDRIE(NDRBaseIE):
  
      def _extract_embed(self, webpage, display_id):
          embed_url = self._html_search_meta(
  
      def _extract_embed(self, webpage, display_id):
          embed_url = self._html_search_meta(
-            'embedURL', webpage, 'embed URL', fatal=True)
+            'embedURL', webpage, 'embed URL',
+            default=None) or self._search_regex(
+            r'\bembedUrl["\']\s*:\s*(["\'])(?P<url>(?:(?!\1).)+)\1', webpage,
+            'embed URL', group='url')
          description = self._search_regex(
              r'<p[^>]+itemprop="description">([^<]+)</p>',
              webpage, 'description', default=None) or self._og_search_description(webpage)
          timestamp = parse_iso8601(
              self._search_regex(
                  r'<span[^>]+itemprop="(?:datePublished|uploadDate)"[^>]+content="([^"]+)"',
          description = self._search_regex(
              r'<p[^>]+itemprop="description">([^<]+)</p>',
              webpage, 'description', default=None) or self._og_search_description(webpage)
          timestamp = parse_iso8601(
              self._search_regex(
                  r'<span[^>]+itemprop="(?:datePublished|uploadDate)"[^>]+content="([^"]+)"',
-                webpage, 'upload date', fatal=False))
-        return {
+                webpage, 'upload date', default=None))
+        info = self._search_json_ld(webpage, display_id, default={})
+        return merge_dicts({
              '_type': 'url_transparent',
              'url': embed_url,
              'display_id': display_id,
              'description': description,
              'timestamp': timestamp,
              '_type': 'url_transparent',
              'url': embed_url,
              'display_id': display_id,
              'description': description,
              'timestamp': timestamp,
-        }
+        }, info)
  
  
  class NJoyIE(NDRBaseIE):
  
  
  class NJoyIE(NDRBaseIE):
@@ -220,11 +227,17 @@ def _real_extract(self, url):
          upload_date = ppjson.get('config', {}).get('publicationDate')
          duration = int_or_none(config.get('duration'))
  
          upload_date = ppjson.get('config', {}).get('publicationDate')
          duration = int_or_none(config.get('duration'))
  
-        thumbnails = [{
-            'id': thumbnail.get('quality') or thumbnail_id,
-            'url': thumbnail['src'],
-            'preference': quality_key(thumbnail.get('quality')),
-        } for thumbnail_id, thumbnail in config.get('poster', {}).items() if thumbnail.get('src')]
+        thumbnails = []
+        poster = try_get(config, lambda x: x['poster'], dict) or {}
+        for thumbnail_id, thumbnail in poster.items():
+            thumbnail_url = urljoin(url, thumbnail.get('src'))
+            if not thumbnail_url:
+                continue
+            thumbnails.append({
+                'id': thumbnail.get('quality') or thumbnail_id,
+                'url': thumbnail_url,
+                'preference': quality_key(thumbnail.get('quality')),
+            })
  
          return {
              'id': video_id,
  
          return {
              'id': video_id,
diff --git a/youtube_dl/extractor/nhk.py b/youtube_dl/extractor/nhk.py

index 6a2c6cb7bb6d039c56fcf7325de422846c437ab5..de6a707c4265c4fc61a57db117a432a95468ab54 100644 (file)
--- a/youtube_dl/extractor/nhk.py
+++ b/youtube_dl/extractor/nhk.py
@@ -6,7 +6,7 @@
  
  
  class NhkVodIE(InfoExtractor):
  
  
  class NhkVodIE(InfoExtractor):
-    _VALID_URL = r'https?://www3\.nhk\.or\.jp/nhkworld/(?P<lang>[a-z]{2})/ondemand/(?P<type>video|audio)/(?P<id>\d{7}|[a-z]+-\d{8}-\d+)'
+    _VALID_URL = r'https?://www3\.nhk\.or\.jp/nhkworld/(?P<lang>[a-z]{2})/ondemand/(?P<type>video|audio)/(?P<id>\d{7}|[^/]+?-\d{8}-\d+)'
      # Content available only for a limited period of time. Visit
      # https://www3.nhk.or.jp/nhkworld/en/ondemand/ for working samples.
      _TESTS = [{
      # Content available only for a limited period of time. Visit
      # https://www3.nhk.or.jp/nhkworld/en/ondemand/ for working samples.
      _TESTS = [{
@@ -30,8 +30,11 @@ class NhkVodIE(InfoExtractor):
      }, {
          'url': 'https://www3.nhk.or.jp/nhkworld/fr/ondemand/audio/plugin-20190404-1/',
          'only_matching': True,
      }, {
          'url': 'https://www3.nhk.or.jp/nhkworld/fr/ondemand/audio/plugin-20190404-1/',
          'only_matching': True,
+    }, {
+        'url': 'https://www3.nhk.or.jp/nhkworld/en/ondemand/audio/j_art-20150903-1/',
+        'only_matching': True,
      }]
      }]
-    _API_URL_TEMPLATE = 'https://api.nhk.or.jp/nhkworld/%sod%slist/v7/episode/%s/%s/all%s.json'
+    _API_URL_TEMPLATE = 'https://api.nhk.or.jp/nhkworld/%sod%slist/v7a/episode/%s/%s/all%s.json'
  
      def _real_extract(self, url):
          lang, m_type, episode_id = re.match(self._VALID_URL, url).groups()
  
      def _real_extract(self, url):
          lang, m_type, episode_id = re.match(self._VALID_URL, url).groups()
@@ -82,15 +85,9 @@ def get_clean_field(key):
              audio = episode['audio']
              audio_path = audio['audio']
              info['formats'] = self._extract_m3u8_formats(
              audio = episode['audio']
              audio_path = audio['audio']
              info['formats'] = self._extract_m3u8_formats(
-                'https://nhks-vh.akamaihd.net/i%s/master.m3u8' % audio_path,
-                episode_id, 'm4a', m3u8_id='hls', fatal=False)
-            for proto in ('rtmpt', 'rtmp'):
-                info['formats'].append({
-                    'ext': 'flv',
-                    'format_id': proto,
-                    'url': '%s://flv.nhk.or.jp/ondemand/mp4:flv%s' % (proto, audio_path),
-                    'vcodec': 'none',
-                })
+                'https://nhkworld-vh.akamaihd.net/i%s/master.m3u8' % audio_path,
+                episode_id, 'm4a', entry_protocol='m3u8_native',
+                m3u8_id='hls', fatal=False)
              for f in info['formats']:
                  f['language'] = lang
          return info
              for f in info['formats']:
                  f['language'] = lang
          return info
diff --git a/youtube_dl/extractor/nova.py b/youtube_dl/extractor/nova.py

index 901f44b54f40c2c02e120c636a80b0b5bfb4ea2e..47b9748f0202d76ddef8ee5f96257bebe88e4169 100644 (file)
--- a/youtube_dl/extractor/nova.py
+++ b/youtube_dl/extractor/nova.py
@@ -6,6 +6,7 @@
  from .common import InfoExtractor
  from ..utils import (
      clean_html,
  from .common import InfoExtractor
  from ..utils import (
      clean_html,
+    determine_ext,
      int_or_none,
      js_to_json,
      qualities,
      int_or_none,
      js_to_json,
      qualities,
@@ -18,7 +19,7 @@ class NovaEmbedIE(InfoExtractor):
      _VALID_URL = r'https?://media\.cms\.nova\.cz/embed/(?P<id>[^/?#&]+)'
      _TEST = {
          'url': 'https://media.cms.nova.cz/embed/8o0n0r?autoplay=1',
      _VALID_URL = r'https?://media\.cms\.nova\.cz/embed/(?P<id>[^/?#&]+)'
      _TEST = {
          'url': 'https://media.cms.nova.cz/embed/8o0n0r?autoplay=1',
-        'md5': 'b3834f6de5401baabf31ed57456463f7',
+        'md5': 'ee009bafcc794541570edd44b71cbea3',
          'info_dict': {
              'id': '8o0n0r',
              'ext': 'mp4',
          'info_dict': {
              'id': '8o0n0r',
              'ext': 'mp4',
@@ -33,36 +34,76 @@ def _real_extract(self, url):
  
          webpage = self._download_webpage(url, video_id)
  
  
          webpage = self._download_webpage(url, video_id)
  
-        bitrates = self._parse_json(
+        duration = None
+        formats = []
+
+        player = self._parse_json(
              self._search_regex(
              self._search_regex(
-                r'(?s)(?:src|bitrates)\s*=\s*({.+?})\s*;', webpage, 'formats'),
-            video_id, transform_source=js_to_json)
+                r'Player\.init\s*\([^,]+,\s*({.+?})\s*,\s*{.+?}\s*\)\s*;',
+                webpage, 'player', default='{}'), video_id, fatal=False)
+        if player:
+            for format_id, format_list in player['tracks'].items():
+                if not isinstance(format_list, list):
+                    format_list = [format_list]
+                for format_dict in format_list:
+                    if not isinstance(format_dict, dict):
+                        continue
+                    format_url = url_or_none(format_dict.get('src'))
+                    format_type = format_dict.get('type')
+                    ext = determine_ext(format_url)
+                    if (format_type == 'application/x-mpegURL'
+                            or format_id == 'HLS' or ext == 'm3u8'):
+                        formats.extend(self._extract_m3u8_formats(
+                            format_url, video_id, 'mp4',
+                            entry_protocol='m3u8_native', m3u8_id='hls',
+                            fatal=False))
+                    elif (format_type == 'application/dash+xml'
+                          or format_id == 'DASH' or ext == 'mpd'):
+                        formats.extend(self._extract_mpd_formats(
+                            format_url, video_id, mpd_id='dash', fatal=False))
+                    else:
+                        formats.append({
+                            'url': format_url,
+                        })
+            duration = int_or_none(player.get('duration'))
+        else:
+            # Old path, not actual as of 08.04.2020
+            bitrates = self._parse_json(
+                self._search_regex(
+                    r'(?s)(?:src|bitrates)\s*=\s*({.+?})\s*;', webpage, 'formats'),
+                video_id, transform_source=js_to_json)
  
  
-        QUALITIES = ('lq', 'mq', 'hq', 'hd')
-        quality_key = qualities(QUALITIES)
+            QUALITIES = ('lq', 'mq', 'hq', 'hd')
+            quality_key = qualities(QUALITIES)
+
+            for format_id, format_list in bitrates.items():
+                if not isinstance(format_list, list):
+                    format_list = [format_list]
+                for format_url in format_list:
+                    format_url = url_or_none(format_url)
+                    if not format_url:
+                        continue
+                    if format_id == 'hls':
+                        formats.extend(self._extract_m3u8_formats(
+                            format_url, video_id, ext='mp4',
+                            entry_protocol='m3u8_native', m3u8_id='hls',
+                            fatal=False))
+                        continue
+                    f = {
+                        'url': format_url,
+                    }
+                    f_id = format_id
+                    for quality in QUALITIES:
+                        if '%s.mp4' % quality in format_url:
+                            f_id += '-%s' % quality
+                            f.update({
+                                'quality': quality_key(quality),
+                                'format_note': quality.upper(),
+                            })
+                            break
+                    f['format_id'] = f_id
+                    formats.append(f)
  
  
-        formats = []
-        for format_id, format_list in bitrates.items():
-            if not isinstance(format_list, list):
-                continue
-            for format_url in format_list:
-                format_url = url_or_none(format_url)
-                if not format_url:
-                    continue
-                f = {
-                    'url': format_url,
-                }
-                f_id = format_id
-                for quality in QUALITIES:
-                    if '%s.mp4' % quality in format_url:
-                        f_id += '-%s' % quality
-                        f.update({
-                            'quality': quality_key(quality),
-                            'format_note': quality.upper(),
-                        })
-                        break
-                f['format_id'] = f_id
-                formats.append(f)
          self._sort_formats(formats)
  
          title = self._og_search_title(
          self._sort_formats(formats)
  
          title = self._og_search_title(
@@ -75,7 +116,8 @@ def _real_extract(self, url):
              r'poster\s*:\s*(["\'])(?P<value>(?:(?!\1).)+)\1', webpage,
              'thumbnail', fatal=False, group='value')
          duration = int_or_none(self._search_regex(
              r'poster\s*:\s*(["\'])(?P<value>(?:(?!\1).)+)\1', webpage,
              'thumbnail', fatal=False, group='value')
          duration = int_or_none(self._search_regex(
-            r'videoDuration\s*:\s*(\d+)', webpage, 'duration', fatal=False))
+            r'videoDuration\s*:\s*(\d+)', webpage, 'duration',
+            default=duration))
  
          return {
              'id': video_id,
  
          return {
              'id': video_id,
@@ -91,7 +133,7 @@ class NovaIE(InfoExtractor):
      _VALID_URL = r'https?://(?:[^.]+\.)?(?P<site>tv(?:noviny)?|tn|novaplus|vymena|fanda|krasna|doma|prask)\.nova\.cz/(?:[^/]+/)+(?P<id>[^/]+?)(?:\.html|/|$)'
      _TESTS = [{
          'url': 'http://tn.nova.cz/clanek/tajemstvi-ukryte-v-podzemi-specialni-nemocnice-v-prazske-krci.html#player_13260',
      _VALID_URL = r'https?://(?:[^.]+\.)?(?P<site>tv(?:noviny)?|tn|novaplus|vymena|fanda|krasna|doma|prask)\.nova\.cz/(?:[^/]+/)+(?P<id>[^/]+?)(?:\.html|/|$)'
      _TESTS = [{
          'url': 'http://tn.nova.cz/clanek/tajemstvi-ukryte-v-podzemi-specialni-nemocnice-v-prazske-krci.html#player_13260',
-        'md5': '1dd7b9d5ea27bc361f110cd855a19bd3',
+        'md5': '249baab7d0104e186e78b0899c7d5f28',
          'info_dict': {
              'id': '1757139',
              'display_id': 'tajemstvi-ukryte-v-podzemi-specialni-nemocnice-v-prazske-krci',
          'info_dict': {
              'id': '1757139',
              'display_id': 'tajemstvi-ukryte-v-podzemi-specialni-nemocnice-v-prazske-krci',
@@ -113,7 +155,8 @@ class NovaIE(InfoExtractor):
          'params': {
              # rtmp download
              'skip_download': True,
          'params': {
              # rtmp download
              'skip_download': True,
-        }
+        },
+        'skip': 'gone',
      }, {
          # media.cms.nova.cz embed
          'url': 'https://novaplus.nova.cz/porad/ulice/epizoda/18760-2180-dil',
      }, {
          # media.cms.nova.cz embed
          'url': 'https://novaplus.nova.cz/porad/ulice/epizoda/18760-2180-dil',
@@ -128,6 +171,7 @@ class NovaIE(InfoExtractor):
              'skip_download': True,
          },
          'add_ie': [NovaEmbedIE.ie_key()],
              'skip_download': True,
          },
          'add_ie': [NovaEmbedIE.ie_key()],
+        'skip': 'CHYBA 404: STRÁNKA NENALEZENA',
      }, {
          'url': 'http://sport.tn.nova.cz/clanek/sport/hokej/nhl/zivot-jde-dal-hodnotil-po-vyrazeni-z-playoff-jiri-sekac.html',
          'only_matching': True,
      }, {
          'url': 'http://sport.tn.nova.cz/clanek/sport/hokej/nhl/zivot-jde-dal-hodnotil-po-vyrazeni-z-playoff-jiri-sekac.html',
          'only_matching': True,
@@ -152,14 +196,29 @@ def _real_extract(self, url):
  
          webpage = self._download_webpage(url, display_id)
  
  
          webpage = self._download_webpage(url, display_id)
  
+        description = clean_html(self._og_search_description(webpage, default=None))
+        if site == 'novaplus':
+            upload_date = unified_strdate(self._search_regex(
+                r'(\d{1,2}-\d{1,2}-\d{4})$', display_id, 'upload date', default=None))
+        elif site == 'fanda':
+            upload_date = unified_strdate(self._search_regex(
+                r'<span class="date_time">(\d{1,2}\.\d{1,2}\.\d{4})', webpage, 'upload date', default=None))
+        else:
+            upload_date = None
+
          # novaplus
          embed_id = self._search_regex(
              r'<iframe[^>]+\bsrc=["\'](?:https?:)?//media\.cms\.nova\.cz/embed/([^/?#&]+)',
              webpage, 'embed url', default=None)
          if embed_id:
          # novaplus
          embed_id = self._search_regex(
              r'<iframe[^>]+\bsrc=["\'](?:https?:)?//media\.cms\.nova\.cz/embed/([^/?#&]+)',
              webpage, 'embed url', default=None)
          if embed_id:
-            return self.url_result(
-                'https://media.cms.nova.cz/embed/%s' % embed_id,
-                ie=NovaEmbedIE.ie_key(), video_id=embed_id)
+            return {
+                '_type': 'url_transparent',
+                'url': 'https://media.cms.nova.cz/embed/%s' % embed_id,
+                'ie_key': NovaEmbedIE.ie_key(),
+                'id': embed_id,
+                'description': description,
+                'upload_date': upload_date
+            }
  
          video_id = self._search_regex(
              [r"(?:media|video_id)\s*:\s*'(\d+)'",
  
          video_id = self._search_regex(
              [r"(?:media|video_id)\s*:\s*'(\d+)'",
@@ -233,18 +292,8 @@ def _real_extract(self, url):
          self._sort_formats(formats)
  
          title = mediafile.get('meta', {}).get('title') or self._og_search_title(webpage)
          self._sort_formats(formats)
  
          title = mediafile.get('meta', {}).get('title') or self._og_search_title(webpage)
-        description = clean_html(self._og_search_description(webpage, default=None))
          thumbnail = config.get('poster')
  
          thumbnail = config.get('poster')
  
-        if site == 'novaplus':
-            upload_date = unified_strdate(self._search_regex(
-                r'(\d{1,2}-\d{1,2}-\d{4})$', display_id, 'upload date', default=None))
-        elif site == 'fanda':
-            upload_date = unified_strdate(self._search_regex(
-                r'<span class="date_time">(\d{1,2}\.\d{1,2}\.\d{4})', webpage, 'upload date', default=None))
-        else:
-            upload_date = None
-
          return {
              'id': video_id,
              'display_id': display_id,
          return {
              'id': video_id,
              'display_id': display_id,
diff --git a/youtube_dl/extractor/npr.py b/youtube_dl/extractor/npr.py

index a5e8baa7e2542f4e8d6a8c83dea7bddecf82413d..53acc6e574c0743a223d552d95d8806dc071ed41 100644 (file)
--- a/youtube_dl/extractor/npr.py
+++ b/youtube_dl/extractor/npr.py
@@ -4,6 +4,7 @@
  from ..utils import (
      int_or_none,
      qualities,
  from ..utils import (
      int_or_none,
      qualities,
+    url_or_none,
  )
  
  
  )
  
  
@@ -48,6 +49,10 @@ class NprIE(InfoExtractor):
              },
          }],
          'expected_warnings': ['Failed to download m3u8 information'],
              },
          }],
          'expected_warnings': ['Failed to download m3u8 information'],
+    }, {
+        # multimedia, no formats, stream
+        'url': 'https://www.npr.org/2020/02/14/805476846/laura-stevenson-tiny-desk-concert',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
      }]
  
      def _real_extract(self, url):
@@ -95,6 +100,17 @@ def _real_extract(self, url):
                              'format_id': format_id,
                              'quality': quality(format_id),
                          })
                              'format_id': format_id,
                              'quality': quality(format_id),
                          })
+            for stream_id, stream_entry in media.get('stream', {}).items():
+                if not isinstance(stream_entry, dict):
+                    continue
+                if stream_id != 'hlsUrl':
+                    continue
+                stream_url = url_or_none(stream_entry.get('$text'))
+                if not stream_url:
+                    continue
+                formats.extend(self._extract_m3u8_formats(
+                    stream_url, stream_id, 'mp4', 'm3u8_native',
+                    m3u8_id='hls', fatal=False))
              self._sort_formats(formats)
  
              entries.append({
              self._sort_formats(formats)
  
              entries.append({
diff --git a/youtube_dl/extractor/nrk.py b/youtube_dl/extractor/nrk.py

index 60933f069c4ca4d38cfb19cad742fd2cb1d1b537..94115534b72ac19f3aaea3e35ef06fa3eaef3d7f 100644 (file)
--- a/youtube_dl/extractor/nrk.py
+++ b/youtube_dl/extractor/nrk.py
@@ -12,6 +12,7 @@
      ExtractorError,
      int_or_none,
      JSON_LD_RE,
      ExtractorError,
      int_or_none,
      JSON_LD_RE,
+    js_to_json,
      NO_DEFAULT,
      parse_age_limit,
      parse_duration,
      NO_DEFAULT,
      parse_age_limit,
      parse_duration,
@@ -105,6 +106,7 @@ def video_id_and_title(idx):
              MESSAGES = {
                  'ProgramRightsAreNotReady': 'Du kan dessverre ikke se eller høre programmet',
                  'ProgramRightsHasExpired': 'Programmet har gått ut',
              MESSAGES = {
                  'ProgramRightsAreNotReady': 'Du kan dessverre ikke se eller høre programmet',
                  'ProgramRightsHasExpired': 'Programmet har gått ut',
+                'NoProgramRights': 'Ikke tilgjengelig',
                  'ProgramIsGeoBlocked': 'NRK har ikke rettigheter til å vise dette programmet utenfor Norge',
              }
              message_type = data.get('messageType', '')
                  'ProgramIsGeoBlocked': 'NRK har ikke rettigheter til å vise dette programmet utenfor Norge',
              }
              message_type = data.get('messageType', '')
@@ -255,6 +257,17 @@ class NRKTVIE(NRKBaseIE):
                      ''' % _EPISODE_RE
      _API_HOSTS = ('psapi-ne.nrk.no', 'psapi-we.nrk.no')
      _TESTS = [{
                      ''' % _EPISODE_RE
      _API_HOSTS = ('psapi-ne.nrk.no', 'psapi-we.nrk.no')
      _TESTS = [{
+        'url': 'https://tv.nrk.no/program/MDDP12000117',
+        'md5': '8270824df46ec629b66aeaa5796b36fb',
+        'info_dict': {
+            'id': 'MDDP12000117AA',
+            'ext': 'mp4',
+            'title': 'Alarm Trolltunga',
+            'description': 'md5:46923a6e6510eefcce23d5ef2a58f2ce',
+            'duration': 2223,
+            'age_limit': 6,
+        },
+    }, {
          'url': 'https://tv.nrk.no/serie/20-spoersmaal-tv/MUHH48000314/23-05-2014',
          'md5': '9a167e54d04671eb6317a37b7bc8a280',
          'info_dict': {
          'url': 'https://tv.nrk.no/serie/20-spoersmaal-tv/MUHH48000314/23-05-2014',
          'md5': '9a167e54d04671eb6317a37b7bc8a280',
          'info_dict': {
@@ -266,6 +279,7 @@ class NRKTVIE(NRKBaseIE):
              'series': '20 spørsmål',
              'episode': '23.05.2014',
          },
              'series': '20 spørsmål',
              'episode': '23.05.2014',
          },
+        'skip': 'NoProgramRights',
      }, {
          'url': 'https://tv.nrk.no/program/mdfp15000514',
          'info_dict': {
      }, {
          'url': 'https://tv.nrk.no/program/mdfp15000514',
          'info_dict': {
@@ -370,7 +384,24 @@ class NRKTVIE(NRKBaseIE):
  
  class NRKTVEpisodeIE(InfoExtractor):
      _VALID_URL = r'https?://tv\.nrk\.no/serie/(?P<id>[^/]+/sesong/\d+/episode/\d+)'
  
  class NRKTVEpisodeIE(InfoExtractor):
      _VALID_URL = r'https?://tv\.nrk\.no/serie/(?P<id>[^/]+/sesong/\d+/episode/\d+)'
-    _TEST = {
+    _TESTS = [{
+        'url': 'https://tv.nrk.no/serie/hellums-kro/sesong/1/episode/2',
+        'info_dict': {
+            'id': 'MUHH36005220BA',
+            'ext': 'mp4',
+            'title': 'Kro, krig og kjærlighet 2:6',
+            'description': 'md5:b32a7dc0b1ed27c8064f58b97bda4350',
+            'duration': 1563,
+            'series': 'Hellums kro',
+            'season_number': 1,
+            'episode_number': 2,
+            'episode': '2:6',
+            'age_limit': 6,
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }, {
          'url': 'https://tv.nrk.no/serie/backstage/sesong/1/episode/8',
          'info_dict': {
              'id': 'MSUI14000816AA',
          'url': 'https://tv.nrk.no/serie/backstage/sesong/1/episode/8',
          'info_dict': {
              'id': 'MSUI14000816AA',
@@ -386,7 +417,8 @@ class NRKTVEpisodeIE(InfoExtractor):
          'params': {
              'skip_download': True,
          },
          'params': {
              'skip_download': True,
          },
-    }
+        'skip': 'ProgramRightsHasExpired',
+    }]
  
      def _real_extract(self, url):
          display_id = self._match_id(url)
  
      def _real_extract(self, url):
          display_id = self._match_id(url)
@@ -409,7 +441,7 @@ def _extract_series(self, webpage, display_id, fatal=True):
                  (r'INITIAL_DATA(?:_V\d)?_*\s*=\s*({.+?})\s*;',
                   r'({.+?})\s*,\s*"[^"]+"\s*\)\s*</script>'),
                  webpage, 'config', default='{}' if not fatal else NO_DEFAULT),
                  (r'INITIAL_DATA(?:_V\d)?_*\s*=\s*({.+?})\s*;',
                   r'({.+?})\s*,\s*"[^"]+"\s*\)\s*</script>'),
                  webpage, 'config', default='{}' if not fatal else NO_DEFAULT),
-            display_id, fatal=False)
+            display_id, fatal=False, transform_source=js_to_json)
          if not config:
              return
          return try_get(
          if not config:
              return
          return try_get(
@@ -479,6 +511,14 @@ class NRKTVSeriesIE(NRKTVSerieBaseIE):
      _VALID_URL = r'https?://(?:tv|radio)\.nrk(?:super)?\.no/serie/(?P<id>[^/]+)'
      _ITEM_RE = r'(?:data-season=["\']|id=["\']season-)(?P<id>\d+)'
      _TESTS = [{
      _VALID_URL = r'https?://(?:tv|radio)\.nrk(?:super)?\.no/serie/(?P<id>[^/]+)'
      _ITEM_RE = r'(?:data-season=["\']|id=["\']season-)(?P<id>\d+)'
      _TESTS = [{
+        'url': 'https://tv.nrk.no/serie/blank',
+        'info_dict': {
+            'id': 'blank',
+            'title': 'Blank',
+            'description': 'md5:7664b4e7e77dc6810cd3bca367c25b6e',
+        },
+        'playlist_mincount': 30,
+    }, {
          # new layout, seasons
          'url': 'https://tv.nrk.no/serie/backstage',
          'info_dict': {
          # new layout, seasons
          'url': 'https://tv.nrk.no/serie/backstage',
          'info_dict': {
@@ -648,7 +688,7 @@ class NRKSkoleIE(InfoExtractor):
  
      _TESTS = [{
          'url': 'https://www.nrk.no/skole/?page=search&q=&mediaId=14099',
  
      _TESTS = [{
          'url': 'https://www.nrk.no/skole/?page=search&q=&mediaId=14099',
-        'md5': '6bc936b01f9dd8ed45bc58b252b2d9b6',
+        'md5': '18c12c3d071953c3bf8d54ef6b2587b7',
          'info_dict': {
              'id': '6021',
              'ext': 'mp4',
          'info_dict': {
              'id': '6021',
              'ext': 'mp4',
diff --git a/youtube_dl/extractor/nytimes.py b/youtube_dl/extractor/nytimes.py

index 2bb77ab249239163d8318a57e8fd0fdb57d2e32a..fc78ca56c90d37b00c1f396aee7c896d54fb91c9 100644 (file)
--- a/youtube_dl/extractor/nytimes.py
+++ b/youtube_dl/extractor/nytimes.py
@@ -69,10 +69,10 @@ def get_file_size(file_size):
                      'width': int_or_none(video.get('width')),
                      'height': int_or_none(video.get('height')),
                      'filesize': get_file_size(video.get('file_size') or video.get('fileSize')),
                      'width': int_or_none(video.get('width')),
                      'height': int_or_none(video.get('height')),
                      'filesize': get_file_size(video.get('file_size') or video.get('fileSize')),
-                    'tbr': int_or_none(video.get('bitrate'), 1000),
+                    'tbr': int_or_none(video.get('bitrate'), 1000) or None,
                      'ext': ext,
                  })
                      'ext': ext,
                  })
-        self._sort_formats(formats)
+        self._sort_formats(formats, ('height', 'width', 'filesize', 'tbr', 'fps', 'format_id'))
  
          thumbnails = []
          for image in video_data.get('images', []):
  
          thumbnails = []
          for image in video_data.get('images', []):
diff --git a/youtube_dl/extractor/orf.py b/youtube_dl/extractor/orf.py

index 3425f76024c04cdb937502361480a81e0b958c16..700ce448c4b8faa925aa1dae179f030acaa0b4f6 100644 (file)
--- a/youtube_dl/extractor/orf.py
+++ b/youtube_dl/extractor/orf.py
@@ -6,12 +6,14 @@
  from .common import InfoExtractor
  from ..compat import compat_str
  from ..utils import (
  from .common import InfoExtractor
  from ..compat import compat_str
  from ..utils import (
+    clean_html,
      determine_ext,
      float_or_none,
      HEADRequest,
      int_or_none,
      orderedSet,
      remove_end,
      determine_ext,
      float_or_none,
      HEADRequest,
      int_or_none,
      orderedSet,
      remove_end,
+    str_or_none,
      strip_jsonp,
      unescapeHTML,
      unified_strdate,
      strip_jsonp,
      unescapeHTML,
      unified_strdate,
@@ -88,8 +90,11 @@ def _real_extract(self, url):
                  format_id = '-'.join(format_id_list)
                  ext = determine_ext(src)
                  if ext == 'm3u8':
                  format_id = '-'.join(format_id_list)
                  ext = determine_ext(src)
                  if ext == 'm3u8':
-                    formats.extend(self._extract_m3u8_formats(
-                        src, video_id, 'mp4', m3u8_id=format_id, fatal=False))
+                    m3u8_formats = self._extract_m3u8_formats(
+                        src, video_id, 'mp4', m3u8_id=format_id, fatal=False)
+                    if any('/geoprotection' in f['url'] for f in m3u8_formats):
+                        self.raise_geo_restricted()
+                    formats.extend(m3u8_formats)
                  elif ext == 'f4m':
                      formats.extend(self._extract_f4m_formats(
                          src, video_id, f4m_id=format_id, fatal=False))
                  elif ext == 'f4m':
                      formats.extend(self._extract_f4m_formats(
                          src, video_id, f4m_id=format_id, fatal=False))
@@ -157,48 +162,53 @@ def _real_extract(self, url):
  class ORFRadioIE(InfoExtractor):
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
  class ORFRadioIE(InfoExtractor):
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
-        station = mobj.group('station')
          show_date = mobj.group('date')
          show_id = mobj.group('show')
  
          show_date = mobj.group('date')
          show_id = mobj.group('show')
  
-        if station == 'fm4':
-            show_id = '4%s' % show_id
-
          data = self._download_json(
          data = self._download_json(
-            'http://audioapi.orf.at/%s/api/json/current/broadcast/%s/%s' % (station, show_id, show_date),
-            show_id
-        )
-
-        def extract_entry_dict(info, title, subtitle):
-            return {
-                'id': info['loopStreamId'].replace('.mp3', ''),
-                'url': 'http://loopstream01.apa.at/?channel=%s&id=%s' % (station, info['loopStreamId']),
+            'http://audioapi.orf.at/%s/api/json/current/broadcast/%s/%s'
+            % (self._API_STATION, show_id, show_date), show_id)
+
+        entries = []
+        for info in data['streams']:
+            loop_stream_id = str_or_none(info.get('loopStreamId'))
+            if not loop_stream_id:
+                continue
+            title = str_or_none(data.get('title'))
+            if not title:
+                continue
+            start = int_or_none(info.get('start'), scale=1000)
+            end = int_or_none(info.get('end'), scale=1000)
+            duration = end - start if end and start else None
+            entries.append({
+                'id': loop_stream_id.replace('.mp3', ''),
+                'url': 'http://loopstream01.apa.at/?channel=%s&id=%s' % (self._LOOP_STATION, loop_stream_id),
                  'title': title,
                  'title': title,
-                'description': subtitle,
-                'duration': (info['end'] - info['start']) / 1000,
-                'timestamp': info['start'] / 1000,
+                'description': clean_html(data.get('subtitle')),
+                'duration': duration,
+                'timestamp': start,
                  'ext': 'mp3',
                  'ext': 'mp3',
-                'series': data.get('programTitle')
-            }
-
-        entries = [extract_entry_dict(t, data['title'], data['subtitle']) for t in data['streams']]
+                'series': data.get('programTitle'),
+            })
  
          return {
              '_type': 'playlist',
              'id': show_id,
  
          return {
              '_type': 'playlist',
              'id': show_id,
-            'title': data['title'],
-            'description': data['subtitle'],
-            'entries': entries
+            'title': data.get('title'),
+            'description': clean_html(data.get('subtitle')),
+            'entries': entries,
          }
  
  
  class ORFFM4IE(ORFRadioIE):
      IE_NAME = 'orf:fm4'
      IE_DESC = 'radio FM4'
          }
  
  
  class ORFFM4IE(ORFRadioIE):
      IE_NAME = 'orf:fm4'
      IE_DESC = 'radio FM4'
-    _VALID_URL = r'https?://(?P<station>fm4)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
+    _VALID_URL = r'https?://(?P<station>fm4)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>4\w+)'
+    _API_STATION = 'fm4'
+    _LOOP_STATION = 'fm4'
  
      _TEST = {
  
      _TEST = {
-        'url': 'http://fm4.orf.at/player/20170107/CC',
+        'url': 'http://fm4.orf.at/player/20170107/4CC',
          'md5': '2b0be47375432a7ef104453432a19212',
          'info_dict': {
              'id': '2017-01-07_2100_tl_54_7DaysSat18_31295',
          'md5': '2b0be47375432a7ef104453432a19212',
          'info_dict': {
              'id': '2017-01-07_2100_tl_54_7DaysSat18_31295',
@@ -209,7 +219,138 @@ class ORFFM4IE(ORFRadioIE):
              'timestamp': 1483819257,
              'upload_date': '20170107',
          },
              'timestamp': 1483819257,
              'upload_date': '20170107',
          },
-        'skip': 'Shows from ORF radios are only available for 7 days.'
+        'skip': 'Shows from ORF radios are only available for 7 days.',
+        'only_matching': True,
+    }
+
+
+class ORFNOEIE(ORFRadioIE):
+    IE_NAME = 'orf:noe'
+    IE_DESC = 'Radio Niederösterreich'
+    _VALID_URL = r'https?://(?P<station>noe)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
+    _API_STATION = 'noe'
+    _LOOP_STATION = 'oe2n'
+
+    _TEST = {
+        'url': 'https://noe.orf.at/player/20200423/NGM',
+        'only_matching': True,
+    }
+
+
+class ORFWIEIE(ORFRadioIE):
+    IE_NAME = 'orf:wien'
+    IE_DESC = 'Radio Wien'
+    _VALID_URL = r'https?://(?P<station>wien)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
+    _API_STATION = 'wie'
+    _LOOP_STATION = 'oe2w'
+
+    _TEST = {
+        'url': 'https://wien.orf.at/player/20200423/WGUM',
+        'only_matching': True,
+    }
+
+
+class ORFBGLIE(ORFRadioIE):
+    IE_NAME = 'orf:burgenland'
+    IE_DESC = 'Radio Burgenland'
+    _VALID_URL = r'https?://(?P<station>burgenland)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
+    _API_STATION = 'bgl'
+    _LOOP_STATION = 'oe2b'
+
+    _TEST = {
+        'url': 'https://burgenland.orf.at/player/20200423/BGM',
+        'only_matching': True,
+    }
+
+
+class ORFOOEIE(ORFRadioIE):
+    IE_NAME = 'orf:oberoesterreich'
+    IE_DESC = 'Radio Oberösterreich'
+    _VALID_URL = r'https?://(?P<station>ooe)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
+    _API_STATION = 'ooe'
+    _LOOP_STATION = 'oe2o'
+
+    _TEST = {
+        'url': 'https://ooe.orf.at/player/20200423/OGMO',
+        'only_matching': True,
+    }
+
+
+class ORFSTMIE(ORFRadioIE):
+    IE_NAME = 'orf:steiermark'
+    IE_DESC = 'Radio Steiermark'
+    _VALID_URL = r'https?://(?P<station>steiermark)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
+    _API_STATION = 'stm'
+    _LOOP_STATION = 'oe2st'
+
+    _TEST = {
+        'url': 'https://steiermark.orf.at/player/20200423/STGMS',
+        'only_matching': True,
+    }
+
+
+class ORFKTNIE(ORFRadioIE):
+    IE_NAME = 'orf:kaernten'
+    IE_DESC = 'Radio Kärnten'
+    _VALID_URL = r'https?://(?P<station>kaernten)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
+    _API_STATION = 'ktn'
+    _LOOP_STATION = 'oe2k'
+
+    _TEST = {
+        'url': 'https://kaernten.orf.at/player/20200423/KGUMO',
+        'only_matching': True,
+    }
+
+
+class ORFSBGIE(ORFRadioIE):
+    IE_NAME = 'orf:salzburg'
+    IE_DESC = 'Radio Salzburg'
+    _VALID_URL = r'https?://(?P<station>salzburg)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
+    _API_STATION = 'sbg'
+    _LOOP_STATION = 'oe2s'
+
+    _TEST = {
+        'url': 'https://salzburg.orf.at/player/20200423/SGUM',
+        'only_matching': True,
+    }
+
+
+class ORFTIRIE(ORFRadioIE):
+    IE_NAME = 'orf:tirol'
+    IE_DESC = 'Radio Tirol'
+    _VALID_URL = r'https?://(?P<station>tirol)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
+    _API_STATION = 'tir'
+    _LOOP_STATION = 'oe2t'
+
+    _TEST = {
+        'url': 'https://tirol.orf.at/player/20200423/TGUMO',
+        'only_matching': True,
+    }
+
+
+class ORFVBGIE(ORFRadioIE):
+    IE_NAME = 'orf:vorarlberg'
+    IE_DESC = 'Radio Vorarlberg'
+    _VALID_URL = r'https?://(?P<station>vorarlberg)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
+    _API_STATION = 'vbg'
+    _LOOP_STATION = 'oe2v'
+
+    _TEST = {
+        'url': 'https://vorarlberg.orf.at/player/20200423/VGUM',
+        'only_matching': True,
+    }
+
+
+class ORFOE3IE(ORFRadioIE):
+    IE_NAME = 'orf:oe3'
+    IE_DESC = 'Radio Österreich 3'
+    _VALID_URL = r'https?://(?P<station>oe3)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
+    _API_STATION = 'oe3'
+    _LOOP_STATION = 'oe3'
+
+    _TEST = {
+        'url': 'https://oe3.orf.at/player/20200424/3WEK',
+        'only_matching': True,
      }
  
  
      }
  
  
@@ -217,6 +358,8 @@ class ORFOE1IE(ORFRadioIE):
      IE_NAME = 'orf:oe1'
      IE_DESC = 'Radio Österreich 1'
      _VALID_URL = r'https?://(?P<station>oe1)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
      IE_NAME = 'orf:oe1'
      IE_DESC = 'Radio Österreich 1'
      _VALID_URL = r'https?://(?P<station>oe1)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
+    _API_STATION = 'oe1'
+    _LOOP_STATION = 'oe1'
  
      _TEST = {
          'url': 'http://oe1.orf.at/player/20170108/456544',
  
      _TEST = {
          'url': 'http://oe1.orf.at/player/20170108/456544',
diff --git a/youtube_dl/extractor/pandatv.py b/youtube_dl/extractor/pandatv.py

deleted file mode 100644 (file)

index 4219802..0000000
--- a/youtube_dl/extractor/pandatv.py
+++ /dev/null
@@ -1,99 +0,0 @@
-# coding: utf-8
-from __future__ import unicode_literals
-
-from .common import InfoExtractor
-from ..utils import (
-    ExtractorError,
-    qualities,
-)
-
-
-class PandaTVIE(InfoExtractor):
-    IE_DESC = '熊猫TV'
-    _VALID_URL = r'https?://(?:www\.)?panda\.tv/(?P<id>[0-9]+)'
-    _TESTS = [{
-        'url': 'http://www.panda.tv/66666',
-        'info_dict': {
-            'id': '66666',
-            'title': 're:.+',
-            'uploader': '刘杀鸡',
-            'ext': 'flv',
-            'is_live': True,
-        },
-        'params': {
-            'skip_download': True,
-        },
-        'skip': 'Live stream is offline',
-    }, {
-        'url': 'https://www.panda.tv/66666',
-        'only_matching': True,
-    }]
-
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-
-        config = self._download_json(
-            'https://www.panda.tv/api_room_v2?roomid=%s' % video_id, video_id)
-
-        error_code = config.get('errno', 0)
-        if error_code != 0:
-            raise ExtractorError(
-                '%s returned error %s: %s'
-                % (self.IE_NAME, error_code, config['errmsg']),
-                expected=True)
-
-        data = config['data']
-        video_info = data['videoinfo']
-
-        # 2 = live, 3 = offline
-        if video_info.get('status') != '2':
-            raise ExtractorError(
-                'Live stream is offline', expected=True)
-
-        title = data['roominfo']['name']
-        uploader = data.get('hostinfo', {}).get('name')
-        room_key = video_info['room_key']
-        stream_addr = video_info.get(
-            'stream_addr', {'OD': '1', 'HD': '1', 'SD': '1'})
-
-        # Reverse engineered from web player swf
-        # (http://s6.pdim.gs/static/07153e425f581151.swf at the moment of
-        # writing).
-        plflag0, plflag1 = video_info['plflag'].split('_')
-        plflag0 = int(plflag0) - 1
-        if plflag1 == '21':
-            plflag0 = 10
-            plflag1 = '4'
-        live_panda = 'live_panda' if plflag0 < 1 else ''
-
-        plflag_auth = self._parse_json(video_info['plflag_list'], video_id)
-        sign = plflag_auth['auth']['sign']
-        ts = plflag_auth['auth']['time']
-        rid = plflag_auth['auth']['rid']
-
-        quality_key = qualities(['OD', 'HD', 'SD'])
-        suffix = ['_small', '_mid', '']
-        formats = []
-        for k, v in stream_addr.items():
-            if v != '1':
-                continue
-            quality = quality_key(k)
-            if quality <= 0:
-                continue
-            for pref, (ext, pl) in enumerate((('m3u8', '-hls'), ('flv', ''))):
-                formats.append({
-                    'url': 'https://pl%s%s.live.panda.tv/live_panda/%s%s%s.%s?sign=%s&ts=%s&rid=%s'
-                    % (pl, plflag1, room_key, live_panda, suffix[quality], ext, sign, ts, rid),
-                    'format_id': '%s-%s' % (k, ext),
-                    'quality': quality,
-                    'source_preference': pref,
-                })
-        self._sort_formats(formats)
-
-        return {
-            'id': video_id,
-            'title': self._live_title(title),
-            'uploader': uploader,
-            'formats': formats,
-            'is_live': True,
-        }
diff --git a/youtube_dl/extractor/peertube.py b/youtube_dl/extractor/peertube.py

index d3a83ea2bb5215e34e72ad85bf99867697e2e1b2..48fb9541693c35878317f22ed9dd6e2da4412ced 100644 (file)
--- a/youtube_dl/extractor/peertube.py
+++ b/youtube_dl/extractor/peertube.py
@@ -8,6 +8,7 @@
  from ..utils import (
      int_or_none,
      parse_resolution,
  from ..utils import (
      int_or_none,
      parse_resolution,
+    str_or_none,
      try_get,
      unified_timestamp,
      url_or_none,
      try_get,
      unified_timestamp,
      url_or_none,
@@ -415,6 +416,7 @@ class PeerTubeIE(InfoExtractor):
                              peertube\.cpy\.re
                          )'''
      _UUID_RE = r'[\da-fA-F]{8}-[\da-fA-F]{4}-[\da-fA-F]{4}-[\da-fA-F]{4}-[\da-fA-F]{12}'
                              peertube\.cpy\.re
                          )'''
      _UUID_RE = r'[\da-fA-F]{8}-[\da-fA-F]{4}-[\da-fA-F]{4}-[\da-fA-F]{4}-[\da-fA-F]{12}'
+    _API_BASE = 'https://%s/api/v1/videos/%s/%s'
      _VALID_URL = r'''(?x)
                      (?:
                          peertube:(?P<host>[^:]+):|
      _VALID_URL = r'''(?x)
                      (?:
                          peertube:(?P<host>[^:]+):|
@@ -423,26 +425,30 @@ class PeerTubeIE(InfoExtractor):
                      (?P<id>%s)
                      ''' % (_INSTANCES_RE, _UUID_RE)
      _TESTS = [{
                      (?P<id>%s)
                      ''' % (_INSTANCES_RE, _UUID_RE)
      _TESTS = [{
-        'url': 'https://peertube.cpy.re/videos/watch/2790feb0-8120-4e63-9af3-c943c69f5e6c',
-        'md5': '80f24ff364cc9d333529506a263e7feb',
+        'url': 'https://framatube.org/videos/watch/9c9de5e8-0a1e-484a-b099-e80766180a6d',
+        'md5': '9bed8c0137913e17b86334e5885aacff',
          'info_dict': {
          'info_dict': {
-            'id': '2790feb0-8120-4e63-9af3-c943c69f5e6c',
+            'id': '9c9de5e8-0a1e-484a-b099-e80766180a6d',
              'ext': 'mp4',
              'ext': 'mp4',
-            'title': 'wow',
-            'description': 'wow such video, so gif',
+            'title': 'What is PeerTube?',
+            'description': 'md5:3fefb8dde2b189186ce0719fda6f7b10',
              'thumbnail': r're:https?://.*\.(?:jpg|png)',
              'thumbnail': r're:https?://.*\.(?:jpg|png)',
-            'timestamp': 1519297480,
-            'upload_date': '20180222',
-            'uploader': 'Luclu7',
-            'uploader_id': '7fc42640-efdb-4505-a45d-a15b1a5496f1',
-            'uploder_url': 'https://peertube.nsa.ovh/accounts/luclu7',
-            'license': 'Unknown',
-            'duration': 3,
+            'timestamp': 1538391166,
+            'upload_date': '20181001',
+            'uploader': 'Framasoft',
+            'uploader_id': '3',
+            'uploader_url': 'https://framatube.org/accounts/framasoft',
+            'channel': 'Les vidéos de Framasoft',
+            'channel_id': '2',
+            'channel_url': 'https://framatube.org/video-channels/bf54d359-cfad-4935-9d45-9d6be93f63e8',
+            'language': 'en',
+            'license': 'Attribution - Share Alike',
+            'duration': 113,
              'view_count': int,
              'like_count': int,
              'dislike_count': int,
              'view_count': int,
              'like_count': int,
              'dislike_count': int,
-            'tags': list,
-            'categories': list,
+            'tags': ['framasoft', 'peertube'],
+            'categories': ['Science & Technology'],
          }
      }, {
          'url': 'https://peertube.tamanoir.foucry.net/videos/watch/0b04f13d-1e18-4f1d-814e-4979aa7c9c44',
          }
      }, {
          'url': 'https://peertube.tamanoir.foucry.net/videos/watch/0b04f13d-1e18-4f1d-814e-4979aa7c9c44',
@@ -484,13 +490,38 @@ def _extract_urls(webpage, source_url):
                  entries = [peertube_url]
          return entries
  
                  entries = [peertube_url]
          return entries
  
+    def _call_api(self, host, video_id, path, note=None, errnote=None, fatal=True):
+        return self._download_json(
+            self._API_BASE % (host, video_id, path), video_id,
+            note=note, errnote=errnote, fatal=fatal)
+
+    def _get_subtitles(self, host, video_id):
+        captions = self._call_api(
+            host, video_id, 'captions', note='Downloading captions JSON',
+            fatal=False)
+        if not isinstance(captions, dict):
+            return
+        data = captions.get('data')
+        if not isinstance(data, list):
+            return
+        subtitles = {}
+        for e in data:
+            language_id = try_get(e, lambda x: x['language']['id'], compat_str)
+            caption_url = urljoin('https://%s' % host, e.get('captionPath'))
+            if not caption_url:
+                continue
+            subtitles.setdefault(language_id or 'en', []).append({
+                'url': caption_url,
+            })
+        return subtitles
+
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
          host = mobj.group('host') or mobj.group('host_2')
          video_id = mobj.group('id')
  
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
          host = mobj.group('host') or mobj.group('host_2')
          video_id = mobj.group('id')
  
-        video = self._download_json(
-            'https://%s/api/v1/videos/%s' % (host, video_id), video_id)
+        video = self._call_api(
+            host, video_id, '', note='Downloading video JSON')
  
          title = video['name']
  
  
          title = video['name']
  
@@ -513,10 +544,28 @@ def _real_extract(self, url):
              formats.append(f)
          self._sort_formats(formats)
  
              formats.append(f)
          self._sort_formats(formats)
  
-        def account_data(field):
-            return try_get(video, lambda x: x['account'][field], compat_str)
+        full_description = self._call_api(
+            host, video_id, 'description', note='Downloading description JSON',
+            fatal=False)
+
+        description = None
+        if isinstance(full_description, dict):
+            description = str_or_none(full_description.get('description'))
+        if not description:
+            description = video.get('description')
+
+        subtitles = self.extract_subtitles(host, video_id)
+
+        def data(section, field, type_):
+            return try_get(video, lambda x: x[section][field], type_)
+
+        def account_data(field, type_):
+            return data('account', field, type_)
+
+        def channel_data(field, type_):
+            return data('channel', field, type_)
  
  
-        category = try_get(video, lambda x: x['category']['label'], compat_str)
+        category = data('category', 'label', compat_str)
          categories = [category] if category else None
  
          nsfw = video.get('nsfw')
          categories = [category] if category else None
  
          nsfw = video.get('nsfw')
@@ -528,14 +577,17 @@ def account_data(field):
          return {
              'id': video_id,
              'title': title,
          return {
              'id': video_id,
              'title': title,
-            'description': video.get('description'),
+            'description': description,
              'thumbnail': urljoin(url, video.get('thumbnailPath')),
              'timestamp': unified_timestamp(video.get('publishedAt')),
              'thumbnail': urljoin(url, video.get('thumbnailPath')),
              'timestamp': unified_timestamp(video.get('publishedAt')),
-            'uploader': account_data('displayName'),
-            'uploader_id': account_data('uuid'),
-            'uploder_url': account_data('url'),
-            'license': try_get(
-                video, lambda x: x['licence']['label'], compat_str),
+            'uploader': account_data('displayName', compat_str),
+            'uploader_id': str_or_none(account_data('id', int)),
+            'uploader_url': url_or_none(account_data('url', compat_str)),
+            'channel': channel_data('displayName', compat_str),
+            'channel_id': str_or_none(channel_data('id', int)),
+            'channel_url': url_or_none(channel_data('url', compat_str)),
+            'language': data('language', 'id', compat_str),
+            'license': data('licence', 'label', compat_str),
              'duration': int_or_none(video.get('duration')),
              'view_count': int_or_none(video.get('views')),
              'like_count': int_or_none(video.get('likes')),
              'duration': int_or_none(video.get('duration')),
              'view_count': int_or_none(video.get('views')),
              'like_count': int_or_none(video.get('likes')),
@@ -544,4 +596,5 @@ def account_data(field):
              'tags': try_get(video, lambda x: x['tags'], list),
              'categories': categories,
              'formats': formats,
              'tags': try_get(video, lambda x: x['tags'], list),
              'categories': categories,
              'formats': formats,
+            'subtitles': subtitles
          }
          }
diff --git a/youtube_dl/extractor/periscope.py b/youtube_dl/extractor/periscope.py

index c02e34abac8720361f94b7085e5f7fb3814df312..b15906390d07715494b5653dce5499ca0ad72141 100644 (file)
--- a/youtube_dl/extractor/periscope.py
+++ b/youtube_dl/extractor/periscope.py
@@ -18,7 +18,7 @@ def _call_api(self, method, query, item_id):
              item_id, query=query)
  
      def _parse_broadcast_data(self, broadcast, video_id):
              item_id, query=query)
  
      def _parse_broadcast_data(self, broadcast, video_id):
-        title = broadcast['status']
+        title = broadcast.get('status') or 'Periscope Broadcast'
          uploader = broadcast.get('user_display_name') or broadcast.get('username')
          title = '%s - %s' % (uploader, title) if uploader else title
          is_live = broadcast.get('state').lower() == 'running'
          uploader = broadcast.get('user_display_name') or broadcast.get('username')
          title = '%s - %s' % (uploader, title) if uploader else title
          is_live = broadcast.get('state').lower() == 'running'
diff --git a/youtube_dl/extractor/platzi.py b/youtube_dl/extractor/platzi.py

index 602207bebdd6a01d7f33dbf08302ab5a75ccf207..23c8256b59dab4a92ae79ef48dc8e3b0adf0ff68 100644 (file)
--- a/youtube_dl/extractor/platzi.py
+++ b/youtube_dl/extractor/platzi.py
@@ -46,7 +46,7 @@ def _login(self):
              headers={'Referer': self._LOGIN_URL})
  
          # login succeeded
              headers={'Referer': self._LOGIN_URL})
  
          # login succeeded
-        if 'platzi.com/login' not in compat_str(urlh.geturl()):
+        if 'platzi.com/login' not in urlh.geturl():
              return
  
          login_error = self._webpage_read_content(
              return
  
          login_error = self._webpage_read_content(
diff --git a/youtube_dl/extractor/pokemon.py b/youtube_dl/extractor/pokemon.py

index dd5f17f1192c3543636f6ff24624b0c9cc9a0bd6..80222d42831b8e58116449f96a615ea250b2985f 100644 (file)
--- a/youtube_dl/extractor/pokemon.py
+++ b/youtube_dl/extractor/pokemon.py
@@ -20,20 +20,16 @@ class PokemonIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'The Ol’ Raise and Switch!',
              'description': 'md5:7db77f7107f98ba88401d3adc80ff7af',
              'ext': 'mp4',
              'title': 'The Ol’ Raise and Switch!',
              'description': 'md5:7db77f7107f98ba88401d3adc80ff7af',
-            'timestamp': 1511824728,
-            'upload_date': '20171127',
          },
          'add_id': ['LimelightMedia'],
      }, {
          # no data-video-title
          },
          'add_id': ['LimelightMedia'],
      }, {
          # no data-video-title
-        'url': 'https://www.pokemon.com/us/pokemon-episodes/pokemon-movies/pokemon-the-rise-of-darkrai-2008',
+        'url': 'https://www.pokemon.com/fr/episodes-pokemon/films-pokemon/pokemon-lascension-de-darkrai-2008',
          'info_dict': {
          'info_dict': {
-            'id': '99f3bae270bf4e5097274817239ce9c8',
+            'id': 'dfbaf830d7e54e179837c50c0c6cc0e1',
              'ext': 'mp4',
              'ext': 'mp4',
-            'title': 'Pokémon: The Rise of Darkrai',
-            'description': 'md5:ea8fbbf942e1e497d54b19025dd57d9d',
-            'timestamp': 1417778347,
-            'upload_date': '20141205',
+            'title': "Pokémon : L'ascension de Darkrai",
+            'description': 'md5:d1dbc9e206070c3e14a06ff557659fb5',
          },
          'add_id': ['LimelightMedia'],
          'params': {
          },
          'add_id': ['LimelightMedia'],
          'params': {
diff --git a/youtube_dl/extractor/popcorntimes.py b/youtube_dl/extractor/popcorntimes.py

new file mode 100644 (file)

index 0000000..7bf7f98
--- /dev/null
+++ b/youtube_dl/extractor/popcorntimes.py
@@ -0,0 +1,99 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..compat import (
+    compat_b64decode,
+    compat_chr,
+)
+from ..utils import int_or_none
+
+
+class PopcorntimesIE(InfoExtractor):
+    _VALID_URL = r'https?://popcorntimes\.tv/[^/]+/m/(?P<id>[^/]+)/(?P<display_id>[^/?#&]+)'
+    _TEST = {
+        'url': 'https://popcorntimes.tv/de/m/A1XCFvz/haensel-und-gretel-opera-fantasy',
+        'md5': '93f210991ad94ba8c3485950a2453257',
+        'info_dict': {
+            'id': 'A1XCFvz',
+            'display_id': 'haensel-und-gretel-opera-fantasy',
+            'ext': 'mp4',
+            'title': 'Hänsel und Gretel',
+            'description': 'md5:1b8146791726342e7b22ce8125cf6945',
+            'thumbnail': r're:^https?://.*\.jpg$',
+            'creator': 'John Paul',
+            'release_date': '19541009',
+            'duration': 4260,
+            'tbr': 5380,
+            'width': 720,
+            'height': 540,
+        },
+    }
+
+    def _real_extract(self, url):
+        mobj = re.match(self._VALID_URL, url)
+        video_id, display_id = mobj.group('id', 'display_id')
+
+        webpage = self._download_webpage(url, display_id)
+
+        title = self._search_regex(
+            r'<h1>([^<]+)', webpage, 'title',
+            default=None) or self._html_search_meta(
+            'ya:ovs:original_name', webpage, 'title', fatal=True)
+
+        loc = self._search_regex(
+            r'PCTMLOC\s*=\s*(["\'])(?P<value>(?:(?!\1).)+)\1', webpage, 'loc',
+            group='value')
+
+        loc_b64 = ''
+        for c in loc:
+            c_ord = ord(c)
+            if ord('a') <= c_ord <= ord('z') or ord('A') <= c_ord <= ord('Z'):
+                upper = ord('Z') if c_ord <= ord('Z') else ord('z')
+                c_ord += 13
+                if upper < c_ord:
+                    c_ord -= 26
+            loc_b64 += compat_chr(c_ord)
+
+        video_url = compat_b64decode(loc_b64).decode('utf-8')
+
+        description = self._html_search_regex(
+            r'(?s)<div[^>]+class=["\']pt-movie-desc[^>]+>(.+?)</div>', webpage,
+            'description', fatal=False)
+
+        thumbnail = self._search_regex(
+            r'<img[^>]+class=["\']video-preview[^>]+\bsrc=(["\'])(?P<value>(?:(?!\1).)+)\1',
+            webpage, 'thumbnail', default=None,
+            group='value') or self._og_search_thumbnail(webpage)
+
+        creator = self._html_search_meta(
+            'video:director', webpage, 'creator', default=None)
+
+        release_date = self._html_search_meta(
+            'video:release_date', webpage, default=None)
+        if release_date:
+            release_date = release_date.replace('-', '')
+
+        def int_meta(name):
+            return int_or_none(self._html_search_meta(
+                name, webpage, default=None))
+
+        return {
+            'id': video_id,
+            'display_id': display_id,
+            'url': video_url,
+            'title': title,
+            'description': description,
+            'thumbnail': thumbnail,
+            'creator': creator,
+            'release_date': release_date,
+            'duration': int_meta('video:duration'),
+            'tbr': int_meta('ya:ovs:bitrate'),
+            'width': int_meta('og:video:width'),
+            'height': int_meta('og:video:height'),
+            'http_headers': {
+                'Referer': url,
+            },
+        }
diff --git a/youtube_dl/extractor/pornhd.py b/youtube_dl/extractor/pornhd.py

index 27d65d4b9cdcf1e068b0d6971502eaa0caccf895..c6052ac9f966f332d0cfb1f7acfe68b0a143d2b7 100644 (file)
--- a/youtube_dl/extractor/pornhd.py
+++ b/youtube_dl/extractor/pornhd.py
@@ -8,6 +8,7 @@
      ExtractorError,
      int_or_none,
      js_to_json,
      ExtractorError,
      int_or_none,
      js_to_json,
+    merge_dicts,
      urljoin,
  )
  
      urljoin,
  )
  
@@ -27,23 +28,22 @@ class PornHdIE(InfoExtractor):
              'view_count': int,
              'like_count': int,
              'age_limit': 18,
              'view_count': int,
              'like_count': int,
              'age_limit': 18,
-        }
+        },
+        'skip': 'HTTP Error 404: Not Found',
      }, {
      }, {
-        # removed video
          'url': 'http://www.pornhd.com/videos/1962/sierra-day-gets-his-cum-all-over-herself-hd-porn-video',
          'url': 'http://www.pornhd.com/videos/1962/sierra-day-gets-his-cum-all-over-herself-hd-porn-video',
-        'md5': '956b8ca569f7f4d8ec563e2c41598441',
+        'md5': '1b7b3a40b9d65a8e5b25f7ab9ee6d6de',
          'info_dict': {
              'id': '1962',
              'display_id': 'sierra-day-gets-his-cum-all-over-herself-hd-porn-video',
              'ext': 'mp4',
          'info_dict': {
              'id': '1962',
              'display_id': 'sierra-day-gets-his-cum-all-over-herself-hd-porn-video',
              'ext': 'mp4',
-            'title': 'Sierra loves doing laundry',
+            'title': 'md5:98c6f8b2d9c229d0f0fde47f61a1a759',
              'description': 'md5:8ff0523848ac2b8f9b065ba781ccf294',
              'thumbnail': r're:^https?://.*\.jpg',
              'view_count': int,
              'like_count': int,
              'age_limit': 18,
          },
              'description': 'md5:8ff0523848ac2b8f9b065ba781ccf294',
              'thumbnail': r're:^https?://.*\.jpg',
              'view_count': int,
              'like_count': int,
              'age_limit': 18,
          },
-        'skip': 'Not available anymore',
      }]
  
      def _real_extract(self, url):
      }]
  
      def _real_extract(self, url):
@@ -61,7 +61,13 @@ def _real_extract(self, url):
              r"(?s)sources'?\s*[:=]\s*(\{.+?\})",
              webpage, 'sources', default='{}')), video_id)
  
              r"(?s)sources'?\s*[:=]\s*(\{.+?\})",
              webpage, 'sources', default='{}')), video_id)
  
+        info = {}
          if not sources:
          if not sources:
+            entries = self._parse_html5_media_entries(url, webpage, video_id)
+            if entries:
+                info = entries[0]
+
+        if not sources and not info:
              message = self._html_search_regex(
                  r'(?s)<(div|p)[^>]+class="no-video"[^>]*>(?P<value>.+?)</\1',
                  webpage, 'error message', group='value')
              message = self._html_search_regex(
                  r'(?s)<(div|p)[^>]+class="no-video"[^>]*>(?P<value>.+?)</\1',
                  webpage, 'error message', group='value')
@@ -80,23 +86,29 @@ def _real_extract(self, url):
                  'format_id': format_id,
                  'height': height,
              })
                  'format_id': format_id,
                  'height': height,
              })
-        self._sort_formats(formats)
+        if formats:
+            info['formats'] = formats
+        self._sort_formats(info['formats'])
  
          description = self._html_search_regex(
  
          description = self._html_search_regex(
-            r'<(div|p)[^>]+class="description"[^>]*>(?P<value>[^<]+)</\1',
-            webpage, 'description', fatal=False, group='value')
+            (r'(?s)<section[^>]+class=["\']video-description[^>]+>(?P<value>.+?)</section>',
+             r'<(div|p)[^>]+class="description"[^>]*>(?P<value>[^<]+)</\1'),
+            webpage, 'description', fatal=False,
+            group='value') or self._html_search_meta(
+            'description', webpage, default=None) or self._og_search_description(webpage)
          view_count = int_or_none(self._html_search_regex(
              r'(\d+) views\s*<', webpage, 'view count', fatal=False))
          thumbnail = self._search_regex(
              r"poster'?\s*:\s*([\"'])(?P<url>(?:(?!\1).)+)\1", webpage,
          view_count = int_or_none(self._html_search_regex(
              r'(\d+) views\s*<', webpage, 'view count', fatal=False))
          thumbnail = self._search_regex(
              r"poster'?\s*:\s*([\"'])(?P<url>(?:(?!\1).)+)\1", webpage,
-            'thumbnail', fatal=False, group='url')
+            'thumbnail', default=None, group='url')
  
          like_count = int_or_none(self._search_regex(
  
          like_count = int_or_none(self._search_regex(
-            (r'(\d+)\s*</11[^>]+>(?:&nbsp;|\s)*\blikes',
+            (r'(\d+)</span>\s*likes',
+             r'(\d+)\s*</11[^>]+>(?:&nbsp;|\s)*\blikes',
               r'class=["\']save-count["\'][^>]*>\s*(\d+)'),
              webpage, 'like count', fatal=False))
  
               r'class=["\']save-count["\'][^>]*>\s*(\d+)'),
              webpage, 'like count', fatal=False))
  
-        return {
+        return merge_dicts(info, {
              'id': video_id,
              'display_id': display_id,
              'title': title,
              'id': video_id,
              'display_id': display_id,
              'title': title,
@@ -106,4 +118,4 @@ def _real_extract(self, url):
              'like_count': like_count,
              'formats': formats,
              'age_limit': 18,
              'like_count': like_count,
              'formats': formats,
              'age_limit': 18,
-        }
+        })
diff --git a/youtube_dl/extractor/pornhub.py b/youtube_dl/extractor/pornhub.py

index ba0ad7da29d188f5e920376805bf7532a1613bee..3567a32839eef2f75123a3f1b939038cf3eaf678 100644 (file)
--- a/youtube_dl/extractor/pornhub.py
+++ b/youtube_dl/extractor/pornhub.py
@@ -17,6 +17,7 @@
      determine_ext,
      ExtractorError,
      int_or_none,
      determine_ext,
      ExtractorError,
      int_or_none,
+    NO_DEFAULT,
      orderedSet,
      remove_quotes,
      str_to_int,
      orderedSet,
      remove_quotes,
      str_to_int,
@@ -51,7 +52,7 @@ class PornHubIE(PornHubBaseIE):
      _VALID_URL = r'''(?x)
                      https?://
                          (?:
      _VALID_URL = r'''(?x)
                      https?://
                          (?:
-                            (?:[^/]+\.)?(?P<host>pornhub\.(?:com|net))/(?:(?:view_video\.php|video/show)\?viewkey=|embed/)|
+                            (?:[^/]+\.)?(?P<host>pornhub(?:premium)?\.(?:com|net))/(?:(?:view_video\.php|video/show)\?viewkey=|embed/)|
                              (?:www\.)?thumbzilla\.com/video/
                          )
                          (?P<id>[\da-z]+)
                              (?:www\.)?thumbzilla\.com/video/
                          )
                          (?P<id>[\da-z]+)
@@ -148,6 +149,9 @@ class PornHubIE(PornHubBaseIE):
      }, {
          'url': 'https://www.pornhub.net/view_video.php?viewkey=203640933',
          'only_matching': True,
      }, {
          'url': 'https://www.pornhub.net/view_video.php?viewkey=203640933',
          'only_matching': True,
+    }, {
+        'url': 'https://www.pornhubpremium.com/view_video.php?viewkey=ph5e4acdae54a82',
+        'only_matching': True,
      }]
  
      @staticmethod
      }]
  
      @staticmethod
@@ -165,6 +169,13 @@ def _real_extract(self, url):
          host = mobj.group('host') or 'pornhub.com'
          video_id = mobj.group('id')
  
          host = mobj.group('host') or 'pornhub.com'
          video_id = mobj.group('id')
  
+        if 'premium' in host:
+            if not self._downloader.params.get('cookiefile'):
+                raise ExtractorError(
+                    'PornHub Premium requires authentication.'
+                    ' You may want to use --cookies.',
+                    expected=True)
+
          self._set_cookie(host, 'age_verified', '1')
  
          def dl_webpage(platform):
          self._set_cookie(host, 'age_verified', '1')
  
          def dl_webpage(platform):
@@ -188,10 +199,10 @@ def dl_webpage(platform):
          # http://www.pornhub.com/view_video.php?viewkey=1331683002), not relying
          # on that anymore.
          title = self._html_search_meta(
          # http://www.pornhub.com/view_video.php?viewkey=1331683002), not relying
          # on that anymore.
          title = self._html_search_meta(
-            'twitter:title', webpage, default=None) or self._search_regex(
-            (r'<h1[^>]+class=["\']title["\'][^>]*>(?P<title>[^<]+)',
-             r'<div[^>]+data-video-title=(["\'])(?P<title>.+?)\1',
-             r'shareTitle\s*=\s*(["\'])(?P<title>.+?)\1'),
+            'twitter:title', webpage, default=None) or self._html_search_regex(
+            (r'(?s)<h1[^>]+class=["\']title["\'][^>]*>(?P<title>.+?)</h1>',
+             r'<div[^>]+data-video-title=(["\'])(?P<title>(?:(?!\1).)+)\1',
+             r'shareTitle["\']\s*[=:]\s*(["\'])(?P<title>(?:(?!\1).)+)\1'),
              webpage, 'title', group='title')
  
          video_urls = []
              webpage, 'title', group='title')
  
          video_urls = []
@@ -227,12 +238,13 @@ def dl_webpage(platform):
          else:
              thumbnail, duration = [None] * 2
  
          else:
              thumbnail, duration = [None] * 2
  
-        if not video_urls:
-            tv_webpage = dl_webpage('tv')
-
+        def extract_js_vars(webpage, pattern, default=NO_DEFAULT):
              assignments = self._search_regex(
              assignments = self._search_regex(
-                r'(var.+?mediastring.+?)</script>', tv_webpage,
-                'encoded url').split(';')
+                pattern, webpage, 'encoded url', default=default)
+            if not assignments:
+                return {}
+
+            assignments = assignments.split(';')
  
              js_vars = {}
  
  
              js_vars = {}
  
@@ -254,11 +266,35 @@ def parse_js_value(inp):
                  assn = re.sub(r'var\s+', '', assn)
                  vname, value = assn.split('=', 1)
                  js_vars[vname] = parse_js_value(value)
                  assn = re.sub(r'var\s+', '', assn)
                  vname, value = assn.split('=', 1)
                  js_vars[vname] = parse_js_value(value)
+            return js_vars
  
  
-            video_url = js_vars['mediastring']
-            if video_url not in video_urls_set:
-                video_urls.append((video_url, None))
-                video_urls_set.add(video_url)
+        def add_video_url(video_url):
+            v_url = url_or_none(video_url)
+            if not v_url:
+                return
+            if v_url in video_urls_set:
+                return
+            video_urls.append((v_url, None))
+            video_urls_set.add(v_url)
+
+        if not video_urls:
+            FORMAT_PREFIXES = ('media', 'quality')
+            js_vars = extract_js_vars(
+                webpage, r'(var\s+(?:%s)_.+)' % '|'.join(FORMAT_PREFIXES),
+                default=None)
+            if js_vars:
+                for key, format_url in js_vars.items():
+                    if any(key.startswith(p) for p in FORMAT_PREFIXES):
+                        add_video_url(format_url)
+            if not video_urls and re.search(
+                    r'<[^>]+\bid=["\']lockedPlayer', webpage):
+                raise ExtractorError(
+                    'Video %s is locked' % video_id, expected=True)
+
+        if not video_urls:
+            js_vars = extract_js_vars(
+                dl_webpage('tv'), r'(var.+?mediastring.+?)</script>')
+            add_video_url(js_vars['mediastring'])
  
          for mobj in re.finditer(
                  r'<a[^>]+\bclass=["\']downloadBtn\b[^>]+\bhref=(["\'])(?P<url>(?:(?!\1).)+)\1',
  
          for mobj in re.finditer(
                  r'<a[^>]+\bclass=["\']downloadBtn\b[^>]+\bhref=(["\'])(?P<url>(?:(?!\1).)+)\1',
@@ -276,10 +312,16 @@ def parse_js_value(inp):
                      r'/(\d{6}/\d{2})/', video_url, 'upload data', default=None)
                  if upload_date:
                      upload_date = upload_date.replace('/', '')
                      r'/(\d{6}/\d{2})/', video_url, 'upload data', default=None)
                  if upload_date:
                      upload_date = upload_date.replace('/', '')
-            if determine_ext(video_url) == 'mpd':
+            ext = determine_ext(video_url)
+            if ext == 'mpd':
                  formats.extend(self._extract_mpd_formats(
                      video_url, video_id, mpd_id='dash', fatal=False))
                  continue
                  formats.extend(self._extract_mpd_formats(
                      video_url, video_id, mpd_id='dash', fatal=False))
                  continue
+            elif ext == 'm3u8':
+                formats.extend(self._extract_m3u8_formats(
+                    video_url, video_id, 'mp4', entry_protocol='m3u8_native',
+                    m3u8_id='hls', fatal=False))
+                continue
              tbr = None
              mobj = re.search(r'(?P<height>\d+)[pP]?_(?P<tbr>\d+)[kK]', video_url)
              if mobj:
              tbr = None
              mobj = re.search(r'(?P<height>\d+)[pP]?_(?P<tbr>\d+)[kK]', video_url)
              if mobj:
@@ -373,7 +415,7 @@ def _real_extract(self, url):
  
  
  class PornHubUserIE(PornHubPlaylistBaseIE):
  
  
  class PornHubUserIE(PornHubPlaylistBaseIE):
-    _VALID_URL = r'(?P<url>https?://(?:[^/]+\.)?pornhub\.(?:com|net)/(?:(?:user|channel)s|model|pornstar)/(?P<id>[^/?#&]+))(?:[?#&]|/(?!videos)|$)'
+    _VALID_URL = r'(?P<url>https?://(?:[^/]+\.)?(?P<host>pornhub(?:premium)?\.(?:com|net))/(?:(?:user|channel)s|model|pornstar)/(?P<id>[^/?#&]+))(?:[?#&]|/(?!videos)|$)'
      _TESTS = [{
          'url': 'https://www.pornhub.com/model/zoe_ph',
          'playlist_mincount': 118,
      _TESTS = [{
          'url': 'https://www.pornhub.com/model/zoe_ph',
          'playlist_mincount': 118,
@@ -441,7 +483,7 @@ def _real_extract(self, url):
  
  
  class PornHubPagedVideoListIE(PornHubPagedPlaylistBaseIE):
  
  
  class PornHubPagedVideoListIE(PornHubPagedPlaylistBaseIE):
-    _VALID_URL = r'https?://(?:[^/]+\.)?(?P<host>pornhub\.(?:com|net))/(?P<id>(?:[^/]+/)*[^/?#&]+)'
+    _VALID_URL = r'https?://(?:[^/]+\.)?(?P<host>pornhub(?:premium)?\.(?:com|net))/(?P<id>(?:[^/]+/)*[^/?#&]+)'
      _TESTS = [{
          'url': 'https://www.pornhub.com/model/zoe_ph/videos',
          'only_matching': True,
      _TESTS = [{
          'url': 'https://www.pornhub.com/model/zoe_ph/videos',
          'only_matching': True,
@@ -556,7 +598,7 @@ def suitable(cls, url):
  
  
  class PornHubUserVideosUploadIE(PornHubPagedPlaylistBaseIE):
  
  
  class PornHubUserVideosUploadIE(PornHubPagedPlaylistBaseIE):
-    _VALID_URL = r'(?P<url>https?://(?:[^/]+\.)?(?P<host>pornhub\.(?:com|net))/(?:(?:user|channel)s|model|pornstar)/(?P<id>[^/]+)/videos/upload)'
+    _VALID_URL = r'(?P<url>https?://(?:[^/]+\.)?(?P<host>pornhub(?:premium)?\.(?:com|net))/(?:(?:user|channel)s|model|pornstar)/(?P<id>[^/]+)/videos/upload)'
      _TESTS = [{
          'url': 'https://www.pornhub.com/pornstar/jenny-blighe/videos/upload',
          'info_dict': {
      _TESTS = [{
          'url': 'https://www.pornhub.com/pornstar/jenny-blighe/videos/upload',
          'info_dict': {
diff --git a/youtube_dl/extractor/prosiebensat1.py b/youtube_dl/extractor/prosiebensat1.py

index e19a470a5eee5efda48fe88a694c7e3a62010963..e470882922ffe8f22fad735f73899e0104a4a561 100644 (file)
--- a/youtube_dl/extractor/prosiebensat1.py
+++ b/youtube_dl/extractor/prosiebensat1.py
@@ -11,12 +11,13 @@
      determine_ext,
      float_or_none,
      int_or_none,
      determine_ext,
      float_or_none,
      int_or_none,
+    merge_dicts,
      unified_strdate,
  )
  
  
  class ProSiebenSat1BaseIE(InfoExtractor):
      unified_strdate,
  )
  
  
  class ProSiebenSat1BaseIE(InfoExtractor):
-    _GEO_COUNTRIES = ['DE']
+    _GEO_BYPASS = False
      _ACCESS_ID = None
      _SUPPORTED_PROTOCOLS = 'dash:clear,hls:clear,progressive:clear'
      _V4_BASE_URL = 'https://vas-v4.p7s1video.net/4.0/get'
      _ACCESS_ID = None
      _SUPPORTED_PROTOCOLS = 'dash:clear,hls:clear,progressive:clear'
      _V4_BASE_URL = 'https://vas-v4.p7s1video.net/4.0/get'
@@ -39,14 +40,18 @@ def _extract_video_info(self, url, clip_id):
          formats = []
          if self._ACCESS_ID:
              raw_ct = self._ENCRYPTION_KEY + clip_id + self._IV + self._ACCESS_ID
          formats = []
          if self._ACCESS_ID:
              raw_ct = self._ENCRYPTION_KEY + clip_id + self._IV + self._ACCESS_ID
-            server_token = (self._download_json(
+            protocols = self._download_json(
                  self._V4_BASE_URL + 'protocols', clip_id,
                  'Downloading protocols JSON',
                  headers=self.geo_verification_headers(), query={
                      'access_id': self._ACCESS_ID,
                      'client_token': sha1((raw_ct).encode()).hexdigest(),
                      'video_id': clip_id,
                  self._V4_BASE_URL + 'protocols', clip_id,
                  'Downloading protocols JSON',
                  headers=self.geo_verification_headers(), query={
                      'access_id': self._ACCESS_ID,
                      'client_token': sha1((raw_ct).encode()).hexdigest(),
                      'video_id': clip_id,
-                }, fatal=False) or {}).get('server_token')
+                }, fatal=False, expected_status=(403,)) or {}
+            error = protocols.get('error') or {}
+            if error.get('title') == 'Geo check failed':
+                self.raise_geo_restricted(countries=['AT', 'CH', 'DE'])
+            server_token = protocols.get('server_token')
              if server_token:
                  urls = (self._download_json(
                      self._V4_BASE_URL + 'urls', clip_id, 'Downloading urls JSON', query={
              if server_token:
                  urls = (self._download_json(
                      self._V4_BASE_URL + 'urls', clip_id, 'Downloading urls JSON', query={
@@ -171,7 +176,7 @@ class ProSiebenSat1IE(ProSiebenSat1BaseIE):
                          (?:
                              (?:beta\.)?
                              (?:
                          (?:
                              (?:beta\.)?
                              (?:
-                                prosieben(?:maxx)?|sixx|sat1(?:gold)?|kabeleins(?:doku)?|the-voice-of-germany|7tv|advopedia
+                                prosieben(?:maxx)?|sixx|sat1(?:gold)?|kabeleins(?:doku)?|the-voice-of-germany|advopedia
                              )\.(?:de|at|ch)|
                              ran\.de|fem\.com|advopedia\.de|galileo\.tv/video
                          )
                              )\.(?:de|at|ch)|
                              ran\.de|fem\.com|advopedia\.de|galileo\.tv/video
                          )
@@ -189,10 +194,14 @@ class ProSiebenSat1IE(ProSiebenSat1BaseIE):
              'info_dict': {
                  'id': '2104602',
                  'ext': 'mp4',
              'info_dict': {
                  'id': '2104602',
                  'ext': 'mp4',
-                'title': 'Episode 18 - Staffel 2',
+                'title': 'CIRCUS HALLIGALLI - Episode 18 - Staffel 2',
                  'description': 'md5:8733c81b702ea472e069bc48bb658fc1',
                  'upload_date': '20131231',
                  'duration': 5845.04,
                  'description': 'md5:8733c81b702ea472e069bc48bb658fc1',
                  'upload_date': '20131231',
                  'duration': 5845.04,
+                'series': 'CIRCUS HALLIGALLI',
+                'season_number': 2,
+                'episode': 'Episode 18 - Staffel 2',
+                'episode_number': 18,
              },
          },
          {
              },
          },
          {
@@ -296,8 +305,9 @@ class ProSiebenSat1IE(ProSiebenSat1BaseIE):
              'info_dict': {
                  'id': '2572814',
                  'ext': 'mp4',
              'info_dict': {
                  'id': '2572814',
                  'ext': 'mp4',
-                'title': 'Andreas Kümmert: Rocket Man',
+                'title': 'The Voice of Germany - Andreas Kümmert: Rocket Man',
                  'description': 'md5:6ddb02b0781c6adf778afea606652e38',
                  'description': 'md5:6ddb02b0781c6adf778afea606652e38',
+                'timestamp': 1382041620,
                  'upload_date': '20131017',
                  'duration': 469.88,
              },
                  'upload_date': '20131017',
                  'duration': 469.88,
              },
@@ -306,7 +316,7 @@ class ProSiebenSat1IE(ProSiebenSat1BaseIE):
              },
          },
          {
              },
          },
          {
-            'url': 'http://www.fem.com/wellness/videos/wellness-video-clip-kurztripps-zum-valentinstag.html',
+            'url': 'http://www.fem.com/videos/beauty-lifestyle/kurztrips-zum-valentinstag',
              'info_dict': {
                  'id': '2156342',
                  'ext': 'mp4',
              'info_dict': {
                  'id': '2156342',
                  'ext': 'mp4',
@@ -328,19 +338,6 @@ class ProSiebenSat1IE(ProSiebenSat1BaseIE):
              'playlist_count': 2,
              'skip': 'This video is unavailable',
          },
              'playlist_count': 2,
              'skip': 'This video is unavailable',
          },
-        {
-            'url': 'http://www.7tv.de/circus-halligalli/615-best-of-circus-halligalli-ganze-folge',
-            'info_dict': {
-                'id': '4187506',
-                'ext': 'mp4',
-                'title': 'Best of Circus HalliGalli',
-                'description': 'md5:8849752efd90b9772c9db6fdf87fb9e9',
-                'upload_date': '20151229',
-            },
-            'params': {
-                'skip_download': True,
-            },
-        },
          {
              # title in <h2 class="subtitle">
              'url': 'http://www.prosieben.de/stars/oscar-award/videos/jetzt-erst-enthuellt-das-geheimnis-von-emma-stones-oscar-robe-clip',
          {
              # title in <h2 class="subtitle">
              'url': 'http://www.prosieben.de/stars/oscar-award/videos/jetzt-erst-enthuellt-das-geheimnis-von-emma-stones-oscar-robe-clip',
@@ -417,7 +414,6 @@ class ProSiebenSat1IE(ProSiebenSat1BaseIE):
          r'<div[^>]+id="veeseoDescription"[^>]*>(.+?)</div>',
      ]
      _UPLOAD_DATE_REGEXES = [
          r'<div[^>]+id="veeseoDescription"[^>]*>(.+?)</div>',
      ]
      _UPLOAD_DATE_REGEXES = [
-        r'<meta property="og:published_time" content="(.+?)">',
          r'<span>\s*(\d{2}\.\d{2}\.\d{4} \d{2}:\d{2}) \|\s*<span itemprop="duration"',
          r'<footer>\s*(\d{2}\.\d{2}\.\d{4}) \d{2}:\d{2} Uhr',
          r'<span style="padding-left: 4px;line-height:20px; color:#404040">(\d{2}\.\d{2}\.\d{4})</span>',
          r'<span>\s*(\d{2}\.\d{2}\.\d{4} \d{2}:\d{2}) \|\s*<span itemprop="duration"',
          r'<footer>\s*(\d{2}\.\d{2}\.\d{4}) \d{2}:\d{2} Uhr',
          r'<span style="padding-left: 4px;line-height:20px; color:#404040">(\d{2}\.\d{2}\.\d{4})</span>',
@@ -447,17 +443,21 @@ def _extract_clip(self, url, webpage):
          if description is None:
              description = self._og_search_description(webpage)
          thumbnail = self._og_search_thumbnail(webpage)
          if description is None:
              description = self._og_search_description(webpage)
          thumbnail = self._og_search_thumbnail(webpage)
-        upload_date = unified_strdate(self._html_search_regex(
-            self._UPLOAD_DATE_REGEXES, webpage, 'upload date', default=None))
+        upload_date = unified_strdate(
+            self._html_search_meta('og:published_time', webpage,
+                                   'upload date', default=None)
+            or self._html_search_regex(self._UPLOAD_DATE_REGEXES,
+                                       webpage, 'upload date', default=None))
+
+        json_ld = self._search_json_ld(webpage, clip_id, default={})
  
  
-        info.update({
+        return merge_dicts(info, {
              'id': clip_id,
              'title': title,
              'description': description,
              'thumbnail': thumbnail,
              'upload_date': upload_date,
              'id': clip_id,
              'title': title,
              'description': description,
              'thumbnail': thumbnail,
              'upload_date': upload_date,
-        })
-        return info
+        }, json_ld)
  
      def _extract_playlist(self, url, webpage):
          playlist_id = self._html_search_regex(
  
      def _extract_playlist(self, url, webpage):
          playlist_id = self._html_search_regex(
diff --git a/youtube_dl/extractor/puhutv.py b/youtube_dl/extractor/puhutv.py

index fb704a3c4390b9da5b6fa3a5dad027c6e812b7eb..ca71665e0fabf958738192b497130ee12d6ad1f6 100644 (file)
--- a/youtube_dl/extractor/puhutv.py
+++ b/youtube_dl/extractor/puhutv.py
@@ -82,17 +82,6 @@ def _real_extract(self, url):
          urls = []
          formats = []
  
          urls = []
          formats = []
  
-        def add_http_from_hls(m3u8_f):
-            http_url = m3u8_f['url'].replace('/hls/', '/mp4/').replace('/chunklist.m3u8', '.mp4')
-            if http_url != m3u8_f['url']:
-                f = m3u8_f.copy()
-                f.update({
-                    'format_id': f['format_id'].replace('hls', 'http'),
-                    'protocol': 'http',
-                    'url': http_url,
-                })
-                formats.append(f)
-
          for video in videos['data']['videos']:
              media_url = url_or_none(video.get('url'))
              if not media_url or media_url in urls:
          for video in videos['data']['videos']:
              media_url = url_or_none(video.get('url'))
              if not media_url or media_url in urls:
@@ -101,12 +90,9 @@ def add_http_from_hls(m3u8_f):
  
              playlist = video.get('is_playlist')
              if (video.get('stream_type') == 'hls' and playlist is True) or 'playlist.m3u8' in media_url:
  
              playlist = video.get('is_playlist')
              if (video.get('stream_type') == 'hls' and playlist is True) or 'playlist.m3u8' in media_url:
-                m3u8_formats = self._extract_m3u8_formats(
+                formats.extend(self._extract_m3u8_formats(
                      media_url, video_id, 'mp4', entry_protocol='m3u8_native',
                      media_url, video_id, 'mp4', entry_protocol='m3u8_native',
-                    m3u8_id='hls', fatal=False)
-                for m3u8_f in m3u8_formats:
-                    formats.append(m3u8_f)
-                    add_http_from_hls(m3u8_f)
+                    m3u8_id='hls', fatal=False))
                  continue
  
              quality = int_or_none(video.get('quality'))
                  continue
  
              quality = int_or_none(video.get('quality'))
@@ -128,8 +114,6 @@ def add_http_from_hls(m3u8_f):
                  format_id += '-%sp' % quality
              f['format_id'] = format_id
              formats.append(f)
                  format_id += '-%sp' % quality
              f['format_id'] = format_id
              formats.append(f)
-            if is_hls:
-                add_http_from_hls(f)
          self._sort_formats(formats)
  
          creator = try_get(
          self._sort_formats(formats)
  
          creator = try_get(
diff --git a/youtube_dl/extractor/redtube.py b/youtube_dl/extractor/redtube.py

index 5c84028ef97e8220d494633f5ada42804fa2ae7f..2d2f6a98c97dba8605cb9f640c7c73d860caa1d0 100644 (file)
--- a/youtube_dl/extractor/redtube.py
+++ b/youtube_dl/extractor/redtube.py
@@ -4,6 +4,7 @@
  
  from .common import InfoExtractor
  from ..utils import (
  
  from .common import InfoExtractor
  from ..utils import (
+    determine_ext,
      ExtractorError,
      int_or_none,
      merge_dicts,
      ExtractorError,
      int_or_none,
      merge_dicts,
@@ -43,14 +44,21 @@ def _real_extract(self, url):
          webpage = self._download_webpage(
              'http://www.redtube.com/%s' % video_id, video_id)
  
          webpage = self._download_webpage(
              'http://www.redtube.com/%s' % video_id, video_id)
  
-        if any(s in webpage for s in ['video-deleted-info', '>This video has been removed']):
-            raise ExtractorError('Video %s has been removed' % video_id, expected=True)
+        ERRORS = (
+            (('video-deleted-info', '>This video has been removed'), 'has been removed'),
+            (('private_video_text', '>This video is private', '>Send a friend request to its owner to be able to view it'), 'is private'),
+        )
+
+        for patterns, message in ERRORS:
+            if any(p in webpage for p in patterns):
+                raise ExtractorError(
+                    'Video %s %s' % (video_id, message), expected=True)
  
          info = self._search_json_ld(webpage, video_id, default={})
  
          if not info.get('title'):
              info['title'] = self._html_search_regex(
  
          info = self._search_json_ld(webpage, video_id, default={})
  
          if not info.get('title'):
              info['title'] = self._html_search_regex(
-                (r'<h(\d)[^>]+class="(?:video_title_text|videoTitle)[^"]*">(?P<title>(?:(?!\1).)+)</h\1>',
+                (r'<h(\d)[^>]+class="(?:video_title_text|videoTitle|video_title)[^"]*">(?P<title>(?:(?!\1).)+)</h\1>',
                   r'(?:videoTitle|title)\s*:\s*(["\'])(?P<title>(?:(?!\1).)+)\1',),
                  webpage, 'title', group='title',
                  default=None) or self._og_search_title(webpage)
                   r'(?:videoTitle|title)\s*:\s*(["\'])(?P<title>(?:(?!\1).)+)\1',),
                  webpage, 'title', group='title',
                  default=None) or self._og_search_title(webpage)
@@ -70,7 +78,7 @@ def _real_extract(self, url):
                      })
          medias = self._parse_json(
              self._search_regex(
                      })
          medias = self._parse_json(
              self._search_regex(
-                r'mediaDefinition\s*:\s*(\[.+?\])', webpage,
+                r'mediaDefinition["\']?\s*:\s*(\[.+?}\s*\])', webpage,
                  'media definitions', default='{}'),
              video_id, fatal=False)
          if medias and isinstance(medias, list):
                  'media definitions', default='{}'),
              video_id, fatal=False)
          if medias and isinstance(medias, list):
@@ -78,6 +86,12 @@ def _real_extract(self, url):
                  format_url = url_or_none(media.get('videoUrl'))
                  if not format_url:
                      continue
                  format_url = url_or_none(media.get('videoUrl'))
                  if not format_url:
                      continue
+                if media.get('format') == 'hls' or determine_ext(format_url) == 'm3u8':
+                    formats.extend(self._extract_m3u8_formats(
+                        format_url, video_id, 'mp4',
+                        entry_protocol='m3u8_native', m3u8_id='hls',
+                        fatal=False))
+                    continue
                  format_id = media.get('quality')
                  formats.append({
                      'url': format_url,
                  format_id = media.get('quality')
                  formats.append({
                      'url': format_url,
diff --git a/youtube_dl/extractor/safari.py b/youtube_dl/extractor/safari.py

index bd9ee1647d47d47bfc8d8341139c2bf953ecf158..2cc66512241dbc6f65589e52d842cf70b7250ccb 100644 (file)
--- a/youtube_dl/extractor/safari.py
+++ b/youtube_dl/extractor/safari.py
@@ -8,7 +8,6 @@
  
  from ..compat import (
      compat_parse_qs,
  
  from ..compat import (
      compat_parse_qs,
-    compat_str,
      compat_urlparse,
  )
  from ..utils import (
      compat_urlparse,
  )
  from ..utils import (
@@ -39,13 +38,13 @@ def _login(self):
              'Downloading login page')
  
          def is_logged(urlh):
              'Downloading login page')
  
          def is_logged(urlh):
-            return 'learning.oreilly.com/home/' in compat_str(urlh.geturl())
+            return 'learning.oreilly.com/home/' in urlh.geturl()
  
          if is_logged(urlh):
              self.LOGGED_IN = True
              return
  
  
          if is_logged(urlh):
              self.LOGGED_IN = True
              return
  
-        redirect_url = compat_str(urlh.geturl())
+        redirect_url = urlh.geturl()
          parsed_url = compat_urlparse.urlparse(redirect_url)
          qs = compat_parse_qs(parsed_url.query)
          next_uri = compat_urlparse.urljoin(
          parsed_url = compat_urlparse.urlparse(redirect_url)
          qs = compat_parse_qs(parsed_url.query)
          next_uri = compat_urlparse.urljoin(
@@ -165,7 +164,8 @@ def _real_extract(self, url):
              kaltura_session = self._download_json(
                  '%s/player/kaltura_session/?reference_id=%s' % (self._API_BASE, reference_id),
                  video_id, 'Downloading kaltura session JSON',
              kaltura_session = self._download_json(
                  '%s/player/kaltura_session/?reference_id=%s' % (self._API_BASE, reference_id),
                  video_id, 'Downloading kaltura session JSON',
-                'Unable to download kaltura session JSON', fatal=False)
+                'Unable to download kaltura session JSON', fatal=False,
+                headers={'Accept': 'application/json'})
              if kaltura_session:
                  session = kaltura_session.get('session')
                  if session:
              if kaltura_session:
                  session = kaltura_session.get('session')
                  if session:
diff --git a/youtube_dl/extractor/scrippsnetworks.py b/youtube_dl/extractor/scrippsnetworks.py

index 8b3275735b1638b98c9c64f8360a6498c525a8f7..b40b4c4afded1b6f9541d60b3b2d3fb5fe0c5973 100644 (file)
--- a/youtube_dl/extractor/scrippsnetworks.py
+++ b/youtube_dl/extractor/scrippsnetworks.py
@@ -7,6 +7,7 @@
  
  from .aws import AWSIE
  from .anvato import AnvatoIE
  
  from .aws import AWSIE
  from .anvato import AnvatoIE
+from .common import InfoExtractor
  from ..utils import (
      smuggle_url,
      urlencode_postdata,
  from ..utils import (
      smuggle_url,
      urlencode_postdata,
@@ -102,3 +103,50 @@ def get(key):
                  'anvato:anvato_scripps_app_web_prod_0837996dbe373629133857ae9eb72e740424d80a:%s' % mcp_id,
                  {'geo_countries': ['US']}),
              AnvatoIE.ie_key(), video_id=mcp_id)
                  'anvato:anvato_scripps_app_web_prod_0837996dbe373629133857ae9eb72e740424d80a:%s' % mcp_id,
                  {'geo_countries': ['US']}),
              AnvatoIE.ie_key(), video_id=mcp_id)
+
+
+class ScrippsNetworksIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?(?P<site>cookingchanneltv|discovery|(?:diy|food)network|hgtv|travelchannel)\.com/videos/[0-9a-z-]+-(?P<id>\d+)'
+    _TESTS = [{
+        'url': 'https://www.cookingchanneltv.com/videos/the-best-of-the-best-0260338',
+        'info_dict': {
+            'id': '0260338',
+            'ext': 'mp4',
+            'title': 'The Best of the Best',
+            'description': 'Catch a new episode of MasterChef Canada Tuedsay at 9/8c.',
+            'timestamp': 1475678834,
+            'upload_date': '20161005',
+            'uploader': 'SCNI-SCND',
+        },
+        'add_ie': ['ThePlatform'],
+    }, {
+        'url': 'https://www.diynetwork.com/videos/diy-barnwood-tablet-stand-0265790',
+        'only_matching': True,
+    }, {
+        'url': 'https://www.foodnetwork.com/videos/chocolate-strawberry-cake-roll-7524591',
+        'only_matching': True,
+    }, {
+        'url': 'https://www.hgtv.com/videos/cookie-decorating-101-0301929',
+        'only_matching': True,
+    }, {
+        'url': 'https://www.travelchannel.com/videos/two-climates-one-bag-5302184',
+        'only_matching': True,
+    }, {
+        'url': 'https://www.discovery.com/videos/guardians-of-the-glades-cooking-with-tom-cobb-5578368',
+        'only_matching': True,
+    }]
+    _ACCOUNT_MAP = {
+        'cookingchanneltv': 2433005105,
+        'discovery': 2706091867,
+        'diynetwork': 2433004575,
+        'foodnetwork': 2433005105,
+        'hgtv': 2433004575,
+        'travelchannel': 2433005739,
+    }
+    _TP_TEMPL = 'https://link.theplatform.com/s/ip77QC/media/guid/%d/%s?mbr=true'
+
+    def _real_extract(self, url):
+        site, guid = re.match(self._VALID_URL, url).groups()
+        return self.url_result(smuggle_url(
+            self._TP_TEMPL % (self._ACCOUNT_MAP[site], guid),
+            {'force_smil_url': True}), 'ThePlatform', guid)
diff --git a/youtube_dl/extractor/servus.py b/youtube_dl/extractor/servus.py

index e579d42cf525b56500b98c84272b67b279e36baa..9401bf2cf7fcdad2eb218f5a3d072399932fea9a 100644 (file)
--- a/youtube_dl/extractor/servus.py
+++ b/youtube_dl/extractor/servus.py
@@ -7,9 +7,18 @@
  
  
  class ServusIE(InfoExtractor):
  
  
  class ServusIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?servus\.com/(?:(?:at|de)/p/[^/]+|tv/videos)/(?P<id>[aA]{2}-\w+|\d+-\d+)'
+    _VALID_URL = r'''(?x)
+                    https?://
+                        (?:www\.)?
+                        (?:
+                            servus\.com/(?:(?:at|de)/p/[^/]+|tv/videos)|
+                            servustv\.com/videos
+                        )
+                        /(?P<id>[aA]{2}-\w+|\d+-\d+)
+                    '''
      _TESTS = [{
      _TESTS = [{
-        'url': 'https://www.servus.com/de/p/Die-Gr%C3%BCnen-aus-Sicht-des-Volkes/AA-1T6VBU5PW1W12/',
+        # new URL schema
+        'url': 'https://www.servustv.com/videos/aa-1t6vbu5pw1w12/',
          'md5': '3e1dd16775aa8d5cbef23628cfffc1f4',
          'info_dict': {
              'id': 'AA-1T6VBU5PW1W12',
          'md5': '3e1dd16775aa8d5cbef23628cfffc1f4',
          'info_dict': {
              'id': 'AA-1T6VBU5PW1W12',
@@ -18,6 +27,10 @@ class ServusIE(InfoExtractor):
              'description': 'md5:1247204d85783afe3682644398ff2ec4',
              'thumbnail': r're:^https?://.*\.jpg',
          }
              'description': 'md5:1247204d85783afe3682644398ff2ec4',
              'thumbnail': r're:^https?://.*\.jpg',
          }
+    }, {
+        # old URL schema
+        'url': 'https://www.servus.com/de/p/Die-Gr%C3%BCnen-aus-Sicht-des-Volkes/AA-1T6VBU5PW1W12/',
+        'only_matching': True,
      }, {
          'url': 'https://www.servus.com/at/p/Wie-das-Leben-beginnt/1309984137314-381415152/',
          'only_matching': True,
      }, {
          'url': 'https://www.servus.com/at/p/Wie-das-Leben-beginnt/1309984137314-381415152/',
          'only_matching': True,
diff --git a/youtube_dl/extractor/soundcloud.py b/youtube_dl/extractor/soundcloud.py

index c2ee54457e90909fea418f38320d9e5480ccc379..d37c52543f1469652d69bea603e7d4f296d2bb50 100644 (file)
--- a/youtube_dl/extractor/soundcloud.py
+++ b/youtube_dl/extractor/soundcloud.py
@@ -9,10 +9,13 @@
      SearchInfoExtractor
  )
  from ..compat import (
      SearchInfoExtractor
  )
  from ..compat import (
+    compat_HTTPError,
+    compat_kwargs,
      compat_str,
      compat_urlparse,
  )
  from ..utils import (
      compat_str,
      compat_urlparse,
  )
  from ..utils import (
+    error_to_compat_str,
      ExtractorError,
      float_or_none,
      HEADRequest,
      ExtractorError,
      float_or_none,
      HEADRequest,
@@ -24,6 +27,7 @@
      unified_timestamp,
      update_url_query,
      url_or_none,
      unified_timestamp,
      update_url_query,
      url_or_none,
+    urlhandle_detect_ext,
  )
  
  
  )
  
  
@@ -93,7 +97,7 @@ class SoundcloudIE(InfoExtractor):
                  'repost_count': int,
              }
          },
                  'repost_count': int,
              }
          },
-        # not streamable song
+        # geo-restricted
          {
              'url': 'https://soundcloud.com/the-concept-band/goldrushed-mastered?in=the-concept-band/sets/the-royal-concept-ep',
              'info_dict': {
          {
              'url': 'https://soundcloud.com/the-concept-band/goldrushed-mastered?in=the-concept-band/sets/the-royal-concept-ep',
              'info_dict': {
@@ -105,18 +109,13 @@ class SoundcloudIE(InfoExtractor):
                  'uploader_id': '9615865',
                  'timestamp': 1337635207,
                  'upload_date': '20120521',
                  'uploader_id': '9615865',
                  'timestamp': 1337635207,
                  'upload_date': '20120521',
-                'duration': 30,
+                'duration': 227.155,
                  'license': 'all-rights-reserved',
                  'view_count': int,
                  'like_count': int,
                  'comment_count': int,
                  'repost_count': int,
              },
                  'license': 'all-rights-reserved',
                  'view_count': int,
                  'like_count': int,
                  'comment_count': int,
                  'repost_count': int,
              },
-            'params': {
-                # rtmp
-                'skip_download': True,
-            },
-            'skip': 'Preview',
          },
          # private link
          {
          },
          # private link
          {
@@ -227,7 +226,6 @@ class SoundcloudIE(InfoExtractor):
                  'skip_download': True,
              },
          },
                  'skip_download': True,
              },
          },
-        # not available via api.soundcloud.com/i1/tracks/id/streams
          {
              'url': 'https://soundcloud.com/giovannisarani/mezzo-valzer',
              'md5': 'e22aecd2bc88e0e4e432d7dcc0a1abf7',
          {
              'url': 'https://soundcloud.com/giovannisarani/mezzo-valzer',
              'md5': 'e22aecd2bc88e0e4e432d7dcc0a1abf7',
@@ -236,7 +234,7 @@ class SoundcloudIE(InfoExtractor):
                  'ext': 'mp3',
                  'title': 'Mezzo Valzer',
                  'description': 'md5:4138d582f81866a530317bae316e8b61',
                  'ext': 'mp3',
                  'title': 'Mezzo Valzer',
                  'description': 'md5:4138d582f81866a530317bae316e8b61',
-                'uploader': 'Giovanni Sarani',
+                'uploader': 'Micronie',
                  'uploader_id': '3352531',
                  'timestamp': 1551394171,
                  'upload_date': '20190228',
                  'uploader_id': '3352531',
                  'timestamp': 1551394171,
                  'upload_date': '20190228',
@@ -248,14 +246,16 @@ class SoundcloudIE(InfoExtractor):
                  'comment_count': int,
                  'repost_count': int,
              },
                  'comment_count': int,
                  'repost_count': int,
              },
-            'expected_warnings': ['Unable to download JSON metadata'],
-        }
+        },
+        {
+            # with AAC HQ format available via OAuth token
+            'url': 'https://soundcloud.com/wandw/the-chainsmokers-ft-daya-dont-let-me-down-ww-remix-1',
+            'only_matching': True,
+        },
      ]
  
      ]
  
-    _API_BASE = 'https://api.soundcloud.com/'
      _API_V2_BASE = 'https://api-v2.soundcloud.com/'
      _BASE_URL = 'https://soundcloud.com/'
      _API_V2_BASE = 'https://api-v2.soundcloud.com/'
      _BASE_URL = 'https://soundcloud.com/'
-    _CLIENT_ID = 'UW9ajvMgVdMMW3cdeBi8lPfN6dvOVGji'
      _IMAGE_REPL_RE = r'-([0-9a-z]+)\.jpg'
  
      _ARTWORK_MAP = {
      _IMAGE_REPL_RE = r'-([0-9a-z]+)\.jpg'
  
      _ARTWORK_MAP = {
@@ -271,14 +271,53 @@ class SoundcloudIE(InfoExtractor):
          'original': 0,
      }
  
          'original': 0,
      }
  
+    def _store_client_id(self, client_id):
+        self._downloader.cache.store('soundcloud', 'client_id', client_id)
+
+    def _update_client_id(self):
+        webpage = self._download_webpage('https://soundcloud.com/', None)
+        for src in reversed(re.findall(r'<script[^>]+src="([^"]+)"', webpage)):
+            script = self._download_webpage(src, None, fatal=False)
+            if script:
+                client_id = self._search_regex(
+                    r'client_id\s*:\s*"([0-9a-zA-Z]{32})"',
+                    script, 'client id', default=None)
+                if client_id:
+                    self._CLIENT_ID = client_id
+                    self._store_client_id(client_id)
+                    return
+        raise ExtractorError('Unable to extract client id')
+
+    def _download_json(self, *args, **kwargs):
+        non_fatal = kwargs.get('fatal') is False
+        if non_fatal:
+            del kwargs['fatal']
+        query = kwargs.get('query', {}).copy()
+        for _ in range(2):
+            query['client_id'] = self._CLIENT_ID
+            kwargs['query'] = query
+            try:
+                return super(SoundcloudIE, self)._download_json(*args, **compat_kwargs(kwargs))
+            except ExtractorError as e:
+                if isinstance(e.cause, compat_HTTPError) and e.cause.code == 401:
+                    self._store_client_id(None)
+                    self._update_client_id()
+                    continue
+                elif non_fatal:
+                    self._downloader.report_warning(error_to_compat_str(e))
+                    return False
+                raise
+
+    def _real_initialize(self):
+        self._CLIENT_ID = self._downloader.cache.load('soundcloud', 'client_id') or 'YUKXoArFcqrlQn9tfNHvvyfnDISj04zk'
+
      @classmethod
      def _resolv_url(cls, url):
      @classmethod
      def _resolv_url(cls, url):
-        return SoundcloudIE._API_V2_BASE + 'resolve?url=' + url + '&client_id=' + cls._CLIENT_ID
+        return SoundcloudIE._API_V2_BASE + 'resolve?url=' + url
  
  
-    def _extract_info_dict(self, info, full_title=None, secret_token=None, version=2):
+    def _extract_info_dict(self, info, full_title=None, secret_token=None):
          track_id = compat_str(info['id'])
          title = info['title']
          track_id = compat_str(info['id'])
          title = info['title']
-        track_base_url = self._API_BASE + 'tracks/%s' % track_id
  
          format_urls = set()
          formats = []
  
          format_urls = set()
          formats = []
@@ -287,26 +326,27 @@ def _extract_info_dict(self, info, full_title=None, secret_token=None, version=2
              query['secret_token'] = secret_token
  
          if info.get('downloadable') and info.get('has_downloads_left'):
              query['secret_token'] = secret_token
  
          if info.get('downloadable') and info.get('has_downloads_left'):
-            format_url = update_url_query(
-                info.get('download_url') or track_base_url + '/download', query)
-            format_urls.add(format_url)
-            if version == 2:
-                v1_info = self._download_json(
-                    track_base_url, track_id, query=query, fatal=False) or {}
-            else:
-                v1_info = info
-            formats.append({
-                'format_id': 'download',
-                'ext': v1_info.get('original_format') or 'mp3',
-                'filesize': int_or_none(v1_info.get('original_content_size')),
-                'url': format_url,
-                'preference': 10,
-            })
+            download_url = update_url_query(
+                self._API_V2_BASE + 'tracks/' + track_id + '/download', query)
+            redirect_url = (self._download_json(download_url, track_id, fatal=False) or {}).get('redirectUri')
+            if redirect_url:
+                urlh = self._request_webpage(
+                    HEADRequest(redirect_url), track_id, fatal=False)
+                if urlh:
+                    format_url = urlh.geturl()
+                    format_urls.add(format_url)
+                    formats.append({
+                        'format_id': 'download',
+                        'ext': urlhandle_detect_ext(urlh) or 'mp3',
+                        'filesize': int_or_none(urlh.headers.get('Content-Length')),
+                        'url': format_url,
+                        'preference': 10,
+                    })
  
          def invalid_url(url):
  
          def invalid_url(url):
-            return not url or url in format_urls or re.search(r'/(?:preview|playlist)/0/30/', url)
+            return not url or url in format_urls
  
  
-        def add_format(f, protocol):
+        def add_format(f, protocol, is_preview=False):
              mobj = re.search(r'\.(?P<abr>\d+)\.(?P<ext>[0-9a-z]{3,4})(?=[/?])', stream_url)
              if mobj:
                  for k, v in mobj.groupdict().items():
              mobj = re.search(r'\.(?P<abr>\d+)\.(?P<ext>[0-9a-z]{3,4})(?=[/?])', stream_url)
              if mobj:
                  for k, v in mobj.groupdict().items():
@@ -315,16 +355,27 @@ def add_format(f, protocol):
              format_id_list = []
              if protocol:
                  format_id_list.append(protocol)
              format_id_list = []
              if protocol:
                  format_id_list.append(protocol)
+            ext = f.get('ext')
+            if ext == 'aac':
+                f['abr'] = '256'
              for k in ('ext', 'abr'):
                  v = f.get(k)
                  if v:
                      format_id_list.append(v)
              for k in ('ext', 'abr'):
                  v = f.get(k)
                  if v:
                      format_id_list.append(v)
+            preview = is_preview or re.search(r'/(?:preview|playlist)/0/30/', f['url'])
+            if preview:
+                format_id_list.append('preview')
              abr = f.get('abr')
              if abr:
                  f['abr'] = int(abr)
              abr = f.get('abr')
              if abr:
                  f['abr'] = int(abr)
+            if protocol == 'hls':
+                protocol = 'm3u8' if ext == 'aac' else 'm3u8_native'
+            else:
+                protocol = 'http'
              f.update({
                  'format_id': '_'.join(format_id_list),
              f.update({
                  'format_id': '_'.join(format_id_list),
-                'protocol': 'm3u8_native' if protocol == 'hls' else 'http',
+                'protocol': protocol,
+                'preference': -10 if preview else None,
              })
              formats.append(f)
  
              })
              formats.append(f)
  
@@ -335,7 +386,7 @@ def add_format(f, protocol):
              if not isinstance(t, dict):
                  continue
              format_url = url_or_none(t.get('url'))
              if not isinstance(t, dict):
                  continue
              format_url = url_or_none(t.get('url'))
-            if not format_url or t.get('snipped') or '/preview/' in format_url:
+            if not format_url:
                  continue
              stream = self._download_json(
                  format_url, track_id, query=query, fatal=False)
                  continue
              stream = self._download_json(
                  format_url, track_id, query=query, fatal=False)
@@ -358,44 +409,14 @@ def add_format(f, protocol):
              add_format({
                  'url': stream_url,
                  'ext': ext,
              add_format({
                  'url': stream_url,
                  'ext': ext,
-            }, 'http' if protocol == 'progressive' else protocol)
-
-        if not formats:
-            # Old API, does not work for some tracks (e.g.
-            # https://soundcloud.com/giovannisarani/mezzo-valzer)
-            # and might serve preview URLs (e.g.
-            # http://www.soundcloud.com/snbrn/ele)
-            format_dict = self._download_json(
-                track_base_url + '/streams', track_id,
-                'Downloading track url', query=query, fatal=False) or {}
-
-            for key, stream_url in format_dict.items():
-                if invalid_url(stream_url):
-                    continue
-                format_urls.add(stream_url)
-                mobj = re.search(r'(http|hls)_([^_]+)_(\d+)_url', key)
-                if mobj:
-                    protocol, ext, abr = mobj.groups()
-                    add_format({
-                        'abr': abr,
-                        'ext': ext,
-                        'url': stream_url,
-                    }, protocol)
-
-        if not formats:
-            # We fallback to the stream_url in the original info, this
-            # cannot be always used, sometimes it can give an HTTP 404 error
-            urlh = self._request_webpage(
-                HEADRequest(info.get('stream_url') or track_base_url + '/stream'),
-                track_id, query=query, fatal=False)
-            if urlh:
-                stream_url = urlh.geturl()
-                if not invalid_url(stream_url):
-                    add_format({'url': stream_url}, 'http')
+            }, 'http' if protocol == 'progressive' else protocol,
+                t.get('snipped') or '/preview/' in format_url)
  
          for f in formats:
              f['vcodec'] = 'none'
  
  
          for f in formats:
              f['vcodec'] = 'none'
  
+        if not formats and info.get('policy') == 'BLOCK':
+            self.raise_geo_restricted()
          self._sort_formats(formats)
  
          user = info.get('user') or {}
          self._sort_formats(formats)
  
          user = info.get('user') or {}
@@ -451,9 +472,7 @@ def _real_extract(self, url):
  
          track_id = mobj.group('track_id')
  
  
          track_id = mobj.group('track_id')
  
-        query = {
-            'client_id': self._CLIENT_ID,
-        }
+        query = {}
          if track_id:
              info_json_url = self._API_V2_BASE + 'tracks/' + track_id
              full_title = track_id
          if track_id:
              info_json_url = self._API_V2_BASE + 'tracks/' + track_id
              full_title = track_id
@@ -467,20 +486,24 @@ def _real_extract(self, url):
                  resolve_title += '/%s' % token
              info_json_url = self._resolv_url(self._BASE_URL + resolve_title)
  
                  resolve_title += '/%s' % token
              info_json_url = self._resolv_url(self._BASE_URL + resolve_title)
  
-        version = 2
          info = self._download_json(
          info = self._download_json(
-            info_json_url, full_title, 'Downloading info JSON', query=query, fatal=False)
-        if not info:
-            info = self._download_json(
-                info_json_url.replace(self._API_V2_BASE, self._API_BASE),
-                full_title, 'Downloading info JSON', query=query)
-            version = 1
+            info_json_url, full_title, 'Downloading info JSON', query=query)
  
  
-        return self._extract_info_dict(info, full_title, token, version)
+        return self._extract_info_dict(info, full_title, token)
  
  
  class SoundcloudPlaylistBaseIE(SoundcloudIE):
  
  
  class SoundcloudPlaylistBaseIE(SoundcloudIE):
-    def _extract_track_entries(self, tracks, token=None):
+    def _extract_set(self, playlist, token=None):
+        playlist_id = compat_str(playlist['id'])
+        tracks = playlist.get('tracks') or []
+        if not all([t.get('permalink_url') for t in tracks]) and token:
+            tracks = self._download_json(
+                self._API_V2_BASE + 'tracks', playlist_id,
+                'Downloading tracks', query={
+                    'ids': ','.join([compat_str(t['id']) for t in tracks]),
+                    'playlistId': playlist_id,
+                    'playlistSecretToken': token,
+                })
          entries = []
          for track in tracks:
              track_id = str_or_none(track.get('id'))
          entries = []
          for track in tracks:
              track_id = str_or_none(track.get('id'))
@@ -493,7 +516,10 @@ def _extract_track_entries(self, tracks, token=None):
                      url += '?secret_token=' + token
              entries.append(self.url_result(
                  url, SoundcloudIE.ie_key(), track_id))
                      url += '?secret_token=' + token
              entries.append(self.url_result(
                  url, SoundcloudIE.ie_key(), track_id))
-        return entries
+        return self.playlist_result(
+            entries, playlist_id,
+            playlist.get('title'),
+            playlist.get('description'))
  
  
  class SoundcloudSetIE(SoundcloudPlaylistBaseIE):
  
  
  class SoundcloudSetIE(SoundcloudPlaylistBaseIE):
@@ -504,6 +530,7 @@ class SoundcloudSetIE(SoundcloudPlaylistBaseIE):
          'info_dict': {
              'id': '2284613',
              'title': 'The Royal Concept EP',
          'info_dict': {
              'id': '2284613',
              'title': 'The Royal Concept EP',
+            'description': 'md5:71d07087c7a449e8941a70a29e34671e',
          },
          'playlist_mincount': 5,
      }, {
          },
          'playlist_mincount': 5,
      }, {
@@ -526,17 +553,13 @@ def _real_extract(self, url):
              msgs = (compat_str(err['error_message']) for err in info['errors'])
              raise ExtractorError('unable to download video webpage: %s' % ','.join(msgs))
  
              msgs = (compat_str(err['error_message']) for err in info['errors'])
              raise ExtractorError('unable to download video webpage: %s' % ','.join(msgs))
  
-        entries = self._extract_track_entries(info['tracks'], token)
-
-        return self.playlist_result(
-            entries, str_or_none(info.get('id')), info.get('title'))
+        return self._extract_set(info, token)
  
  
  
  
-class SoundcloudPagedPlaylistBaseIE(SoundcloudPlaylistBaseIE):
+class SoundcloudPagedPlaylistBaseIE(SoundcloudIE):
      def _extract_playlist(self, base_url, playlist_id, playlist_title):
          COMMON_QUERY = {
      def _extract_playlist(self, base_url, playlist_id, playlist_title):
          COMMON_QUERY = {
-            'limit': 2000000000,
-            'client_id': self._CLIENT_ID,
+            'limit': 80000,
              'linked_partitioning': '1',
          }
  
              'linked_partitioning': '1',
          }
  
@@ -722,9 +745,7 @@ def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
          playlist_id = mobj.group('id')
  
          mobj = re.match(self._VALID_URL, url)
          playlist_id = mobj.group('id')
  
-        query = {
-            'client_id': self._CLIENT_ID,
-        }
+        query = {}
          token = mobj.group('token')
          if token:
              query['secret_token'] = token
          token = mobj.group('token')
          if token:
              query['secret_token'] = token
@@ -733,10 +754,7 @@ def _real_extract(self, url):
              self._API_V2_BASE + 'playlists/' + playlist_id,
              playlist_id, 'Downloading playlist', query=query)
  
              self._API_V2_BASE + 'playlists/' + playlist_id,
              playlist_id, 'Downloading playlist', query=query)
  
-        entries = self._extract_track_entries(data['tracks'], token)
-
-        return self.playlist_result(
-            entries, playlist_id, data.get('title'), data.get('description'))
+        return self._extract_set(data, token)
  
  
  class SoundcloudSearchIE(SearchInfoExtractor, SoundcloudIE):
  
  
  class SoundcloudSearchIE(SearchInfoExtractor, SoundcloudIE):
@@ -761,7 +779,6 @@ def _get_collection(self, endpoint, collection_id, **query):
              self._MAX_RESULTS_PER_PAGE)
          query.update({
              'limit': limit,
              self._MAX_RESULTS_PER_PAGE)
          query.update({
              'limit': limit,
-            'client_id': self._CLIENT_ID,
              'linked_partitioning': 1,
              'offset': 0,
          })
              'linked_partitioning': 1,
              'offset': 0,
          })
diff --git a/youtube_dl/extractor/spankbang.py b/youtube_dl/extractor/spankbang.py

index e040ada29b24542582f72f08f31b843d928af251..61ca902ce286e6274c7d5776bd10c265a023643f 100644 (file)
--- a/youtube_dl/extractor/spankbang.py
+++ b/youtube_dl/extractor/spankbang.py
@@ -4,6 +4,7 @@
  
  from .common import InfoExtractor
  from ..utils import (
  
  from .common import InfoExtractor
  from ..utils import (
+    determine_ext,
      ExtractorError,
      merge_dicts,
      orderedSet,
      ExtractorError,
      merge_dicts,
      orderedSet,
@@ -64,7 +65,7 @@ def _real_extract(self, url):
              url.replace('/%s/embed' % video_id, '/%s/video' % video_id),
              video_id, headers={'Cookie': 'country=US'})
  
              url.replace('/%s/embed' % video_id, '/%s/video' % video_id),
              video_id, headers={'Cookie': 'country=US'})
  
-        if re.search(r'<[^>]+\bid=["\']video_removed', webpage):
+        if re.search(r'<[^>]+\b(?:id|class)=["\']video_removed', webpage):
              raise ExtractorError(
                  'Video %s is not available' % video_id, expected=True)
  
              raise ExtractorError(
                  'Video %s is not available' % video_id, expected=True)
  
@@ -75,11 +76,20 @@ def extract_format(format_id, format_url):
              if not f_url:
                  return
              f = parse_resolution(format_id)
              if not f_url:
                  return
              f = parse_resolution(format_id)
-            f.update({
-                'url': f_url,
-                'format_id': format_id,
-            })
-            formats.append(f)
+            ext = determine_ext(f_url)
+            if format_id.startswith('m3u8') or ext == 'm3u8':
+                formats.extend(self._extract_m3u8_formats(
+                    f_url, video_id, 'mp4', entry_protocol='m3u8_native',
+                    m3u8_id='hls', fatal=False))
+            elif format_id.startswith('mpd') or ext == 'mpd':
+                formats.extend(self._extract_mpd_formats(
+                    f_url, video_id, mpd_id='dash', fatal=False))
+            elif ext == 'mp4' or f.get('width') or f.get('height'):
+                f.update({
+                    'url': f_url,
+                    'format_id': format_id,
+                })
+                formats.append(f)
  
          STREAM_URL_PREFIX = 'stream_url_'
  
  
          STREAM_URL_PREFIX = 'stream_url_'
  
@@ -93,28 +103,22 @@ def extract_format(format_id, format_url):
                  r'data-streamkey\s*=\s*(["\'])(?P<value>(?:(?!\1).)+)\1',
                  webpage, 'stream key', group='value')
  
                  r'data-streamkey\s*=\s*(["\'])(?P<value>(?:(?!\1).)+)\1',
                  webpage, 'stream key', group='value')
  
-            sb_csrf_session = self._get_cookies(
-                'https://spankbang.com')['sb_csrf_session'].value
-
              stream = self._download_json(
                  'https://spankbang.com/api/videos/stream', video_id,
                  'Downloading stream JSON', data=urlencode_postdata({
                      'id': stream_key,
                      'data': 0,
              stream = self._download_json(
                  'https://spankbang.com/api/videos/stream', video_id,
                  'Downloading stream JSON', data=urlencode_postdata({
                      'id': stream_key,
                      'data': 0,
-                    'sb_csrf_session': sb_csrf_session,
                  }), headers={
                      'Referer': url,
                  }), headers={
                      'Referer': url,
-                    'X-CSRFToken': sb_csrf_session,
+                    'X-Requested-With': 'XMLHttpRequest',
                  })
  
              for format_id, format_url in stream.items():
                  })
  
              for format_id, format_url in stream.items():
-                if format_id.startswith(STREAM_URL_PREFIX):
-                    if format_url and isinstance(format_url, list):
-                        format_url = format_url[0]
-                    extract_format(
-                        format_id[len(STREAM_URL_PREFIX):], format_url)
+                if format_url and isinstance(format_url, list):
+                    format_url = format_url[0]
+                extract_format(format_id, format_url)
  
  
-        self._sort_formats(formats)
+        self._sort_formats(formats, field_preference=('preference', 'height', 'width', 'fps', 'tbr', 'format_id'))
  
          info = self._search_json_ld(webpage, video_id, default={})
  
  
          info = self._search_json_ld(webpage, video_id, default={})
  
diff --git a/youtube_dl/extractor/spankwire.py b/youtube_dl/extractor/spankwire.py

index 44d8fa52f3071ca00971624db81ce4ad6b2141e3..35ab9ec375989d412d94f7d40d2babadf93c44c4 100644 (file)
--- a/youtube_dl/extractor/spankwire.py
+++ b/youtube_dl/extractor/spankwire.py
@@ -3,34 +3,47 @@
  import re
  
  from .common import InfoExtractor
  import re
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_parse_unquote,
-    compat_urllib_parse_urlparse,
-)
  from ..utils import (
  from ..utils import (
-    sanitized_Request,
+    float_or_none,
+    int_or_none,
+    merge_dicts,
+    str_or_none,
      str_to_int,
      str_to_int,
-    unified_strdate,
+    url_or_none,
  )
  )
-from ..aes import aes_decrypt_text
  
  
  class SpankwireIE(InfoExtractor):
  
  
  class SpankwireIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?(?P<url>spankwire\.com/[^/]*/video(?P<id>[0-9]+)/?)'
+    _VALID_URL = r'''(?x)
+                    https?://
+                        (?:www\.)?spankwire\.com/
+                        (?:
+                            [^/]+/video|
+                            EmbedPlayer\.aspx/?\?.*?\bArticleId=
+                        )
+                        (?P<id>\d+)
+                    '''
      _TESTS = [{
          # download URL pattern: */<height>P_<tbr>K_<video_id>.mp4
          'url': 'http://www.spankwire.com/Buckcherry-s-X-Rated-Music-Video-Crazy-Bitch/video103545/',
      _TESTS = [{
          # download URL pattern: */<height>P_<tbr>K_<video_id>.mp4
          'url': 'http://www.spankwire.com/Buckcherry-s-X-Rated-Music-Video-Crazy-Bitch/video103545/',
-        'md5': '8bbfde12b101204b39e4b9fe7eb67095',
+        'md5': '5aa0e4feef20aad82cbcae3aed7ab7cd',
          'info_dict': {
              'id': '103545',
              'ext': 'mp4',
              'title': 'Buckcherry`s X Rated Music Video Crazy Bitch',
              'description': 'Crazy Bitch X rated music video.',
          'info_dict': {
              'id': '103545',
              'ext': 'mp4',
              'title': 'Buckcherry`s X Rated Music Video Crazy Bitch',
              'description': 'Crazy Bitch X rated music video.',
+            'duration': 222,
              'uploader': 'oreusz',
              'uploader_id': '124697',
              'uploader': 'oreusz',
              'uploader_id': '124697',
-            'upload_date': '20070507',
+            'timestamp': 1178587885,
+            'upload_date': '20070508',
+            'average_rating': float,
+            'view_count': int,
+            'comment_count': int,
              'age_limit': 18,
              'age_limit': 18,
-        }
+            'categories': list,
+            'tags': list,
+        },
      }, {
          # download URL pattern: */mp4_<format_id>_<video_id>.mp4
          'url': 'http://www.spankwire.com/Titcums-Compiloation-I/video1921551/',
      }, {
          # download URL pattern: */mp4_<format_id>_<video_id>.mp4
          'url': 'http://www.spankwire.com/Titcums-Compiloation-I/video1921551/',
@@ -45,83 +58,125 @@ class SpankwireIE(InfoExtractor):
              'upload_date': '20150822',
              'age_limit': 18,
          },
              'upload_date': '20150822',
              'age_limit': 18,
          },
+        'params': {
+            'proxy': '127.0.0.1:8118'
+        },
+        'skip': 'removed',
+    }, {
+        'url': 'https://www.spankwire.com/EmbedPlayer.aspx/?ArticleId=156156&autostart=true',
+        'only_matching': True,
      }]
  
      }]
  
+    @staticmethod
+    def _extract_urls(webpage):
+        return re.findall(
+            r'<iframe[^>]+\bsrc=["\']((?:https?:)?//(?:www\.)?spankwire\.com/EmbedPlayer\.aspx/?\?.*?\bArticleId=\d+)',
+            webpage)
+
      def _real_extract(self, url):
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
-
-        req = sanitized_Request('http://www.' + mobj.group('url'))
-        req.add_header('Cookie', 'age_verified=1')
-        webpage = self._download_webpage(req, video_id)
-
-        title = self._html_search_regex(
-            r'<h1>([^<]+)', webpage, 'title')
-        description = self._html_search_regex(
-            r'(?s)<div\s+id="descriptionContent">(.+?)</div>',
-            webpage, 'description', fatal=False)
-        thumbnail = self._html_search_regex(
-            r'playerData\.screenShot\s*=\s*["\']([^"\']+)["\']',
-            webpage, 'thumbnail', fatal=False)
-
-        uploader = self._html_search_regex(
-            r'by:\s*<a [^>]*>(.+?)</a>',
-            webpage, 'uploader', fatal=False)
-        uploader_id = self._html_search_regex(
-            r'by:\s*<a href="/(?:user/viewProfile|Profile\.aspx)\?.*?UserId=(\d+).*?"',
-            webpage, 'uploader id', fatal=False)
-        upload_date = unified_strdate(self._html_search_regex(
-            r'</a> on (.+?) at \d+:\d+',
-            webpage, 'upload date', fatal=False))
-
-        view_count = str_to_int(self._html_search_regex(
-            r'<div id="viewsCounter"><span>([\d,\.]+)</span> views</div>',
-            webpage, 'view count', fatal=False))
-        comment_count = str_to_int(self._html_search_regex(
-            r'<span\s+id="spCommentCount"[^>]*>([\d,\.]+)</span>',
-            webpage, 'comment count', fatal=False))
-
-        videos = re.findall(
-            r'playerData\.cdnPath([0-9]{3,})\s*=\s*(?:encodeURIComponent\()?["\']([^"\']+)["\']', webpage)
-        heights = [int(video[0]) for video in videos]
-        video_urls = list(map(compat_urllib_parse_unquote, [video[1] for video in videos]))
-        if webpage.find(r'flashvars\.encrypted = "true"') != -1:
-            password = self._search_regex(
-                r'flashvars\.video_title = "([^"]+)',
-                webpage, 'password').replace('+', ' ')
-            video_urls = list(map(
-                lambda s: aes_decrypt_text(s, password, 32).decode('utf-8'),
-                video_urls))
+        video_id = self._match_id(url)
+
+        video = self._download_json(
+            'https://www.spankwire.com/api/video/%s.json' % video_id, video_id)
+
+        title = video['title']
  
          formats = []
  
          formats = []
-        for height, video_url in zip(heights, video_urls):
-            path = compat_urllib_parse_urlparse(video_url).path
-            m = re.search(r'/(?P<height>\d+)[pP]_(?P<tbr>\d+)[kK]', path)
-            if m:
-                tbr = int(m.group('tbr'))
-                height = int(m.group('height'))
-            else:
-                tbr = None
-            formats.append({
-                'url': video_url,
-                'format_id': '%dp' % height,
-                'height': height,
-                'tbr': tbr,
+        videos = video.get('videos')
+        if isinstance(videos, dict):
+            for format_id, format_url in videos.items():
+                video_url = url_or_none(format_url)
+                if not format_url:
+                    continue
+                height = int_or_none(self._search_regex(
+                    r'(\d+)[pP]', format_id, 'height', default=None))
+                m = re.search(
+                    r'/(?P<height>\d+)[pP]_(?P<tbr>\d+)[kK]', video_url)
+                if m:
+                    tbr = int(m.group('tbr'))
+                    height = height or int(m.group('height'))
+                else:
+                    tbr = None
+                formats.append({
+                    'url': video_url,
+                    'format_id': '%dp' % height if height else format_id,
+                    'height': height,
+                    'tbr': tbr,
+                })
+        m3u8_url = url_or_none(video.get('HLS'))
+        if m3u8_url:
+            formats.extend(self._extract_m3u8_formats(
+                m3u8_url, video_id, 'mp4', entry_protocol='m3u8_native',
+                m3u8_id='hls', fatal=False))
+        self._sort_formats(formats, ('height', 'tbr', 'width', 'format_id'))
+
+        view_count = str_to_int(video.get('viewed'))
+
+        thumbnails = []
+        for preference, t in enumerate(('', '2x'), start=0):
+            thumbnail_url = url_or_none(video.get('poster%s' % t))
+            if not thumbnail_url:
+                continue
+            thumbnails.append({
+                'url': thumbnail_url,
+                'preference': preference,
              })
              })
-        self._sort_formats(formats)
  
  
-        age_limit = self._rta_search(webpage)
+        def extract_names(key):
+            entries_list = video.get(key)
+            if not isinstance(entries_list, list):
+                return
+            entries = []
+            for entry in entries_list:
+                name = str_or_none(entry.get('name'))
+                if name:
+                    entries.append(name)
+            return entries
+
+        categories = extract_names('categories')
+        tags = extract_names('tags')
  
  
-        return {
+        uploader = None
+        info = {}
+
+        webpage = self._download_webpage(
+            'https://www.spankwire.com/_/video%s/' % video_id, video_id,
+            fatal=False)
+        if webpage:
+            info = self._search_json_ld(webpage, video_id, default={})
+            thumbnail_url = None
+            if 'thumbnail' in info:
+                thumbnail_url = url_or_none(info['thumbnail'])
+                del info['thumbnail']
+            if not thumbnail_url:
+                thumbnail_url = self._og_search_thumbnail(webpage)
+            if thumbnail_url:
+                thumbnails.append({
+                    'url': thumbnail_url,
+                    'preference': 10,
+                })
+            uploader = self._html_search_regex(
+                r'(?s)by\s*<a[^>]+\bclass=["\']uploaded__by[^>]*>(.+?)</a>',
+                webpage, 'uploader', fatal=False)
+            if not view_count:
+                view_count = str_to_int(self._search_regex(
+                    r'data-views=["\']([\d,.]+)', webpage, 'view count',
+                    fatal=False))
+
+        return merge_dicts({
              'id': video_id,
              'title': title,
              'id': video_id,
              'title': title,
-            'description': description,
-            'thumbnail': thumbnail,
+            'description': video.get('description'),
+            'duration': int_or_none(video.get('duration')),
+            'thumbnails': thumbnails,
              'uploader': uploader,
              'uploader': uploader,
-            'uploader_id': uploader_id,
-            'upload_date': upload_date,
+            'uploader_id': str_or_none(video.get('userId')),
+            'timestamp': int_or_none(video.get('time_approved_on')),
+            'average_rating': float_or_none(video.get('rating')),
              'view_count': view_count,
              'view_count': view_count,
-            'comment_count': comment_count,
+            'comment_count': int_or_none(video.get('comments')),
+            'age_limit': 18,
+            'categories': categories,
+            'tags': tags,
              'formats': formats,
              'formats': formats,
-            'age_limit': age_limit,
-        }
+        }, info)
diff --git a/youtube_dl/extractor/spike.py b/youtube_dl/extractor/spike.py

index 7c11ea7aaf9306181fee00adab6080d5c40ac1d7..aabff7a3ce78d76b9aec3772a4e88a5bf802d192 100644 (file)
--- a/youtube_dl/extractor/spike.py
+++ b/youtube_dl/extractor/spike.py
@@ -8,15 +8,10 @@ class BellatorIE(MTVServicesInfoExtractor):
      _TESTS = [{
          'url': 'http://www.bellator.com/fight/atwr7k/bellator-158-michael-page-vs-evangelista-cyborg',
          'info_dict': {
      _TESTS = [{
          'url': 'http://www.bellator.com/fight/atwr7k/bellator-158-michael-page-vs-evangelista-cyborg',
          'info_dict': {
-            'id': 'b55e434e-fde1-4a98-b7cc-92003a034de4',
-            'ext': 'mp4',
-            'title': 'Douglas Lima vs. Paul Daley - Round 1',
-            'description': 'md5:805a8dd29310fd611d32baba2f767885',
-        },
-        'params': {
-            # m3u8 download
-            'skip_download': True,
+            'title': 'Michael Page vs. Evangelista Cyborg',
+            'description': 'md5:0d917fc00ffd72dd92814963fc6cbb05',
          },
          },
+        'playlist_count': 3,
      }, {
          'url': 'http://www.bellator.com/video-clips/bw6k7n/bellator-158-foundations-michael-venom-page',
          'only_matching': True,
      }, {
          'url': 'http://www.bellator.com/video-clips/bw6k7n/bellator-158-foundations-michael-venom-page',
          'only_matching': True,
@@ -25,6 +20,9 @@ class BellatorIE(MTVServicesInfoExtractor):
      _FEED_URL = 'http://www.bellator.com/feeds/mrss/'
      _GEO_COUNTRIES = ['US']
  
      _FEED_URL = 'http://www.bellator.com/feeds/mrss/'
      _GEO_COUNTRIES = ['US']
  
+    def _extract_mgid(self, webpage):
+        return self._extract_triforce_mgid(webpage)
+
  
  class ParamountNetworkIE(MTVServicesInfoExtractor):
      _VALID_URL = r'https?://(?:www\.)?paramountnetwork\.com/[^/]+/[\da-z]{6}(?:[/?#&]|$)'
  
  class ParamountNetworkIE(MTVServicesInfoExtractor):
      _VALID_URL = r'https?://(?:www\.)?paramountnetwork\.com/[^/]+/[\da-z]{6}(?:[/?#&]|$)'
diff --git a/youtube_dl/extractor/sportdeutschland.py b/youtube_dl/extractor/sportdeutschland.py

index a3c35a899a2186f1e937771cd0e34df408b2d361..378fc75686313f92a846aaa30579049e9a29eccc 100644 (file)
--- a/youtube_dl/extractor/sportdeutschland.py
+++ b/youtube_dl/extractor/sportdeutschland.py
@@ -13,36 +13,18 @@
  class SportDeutschlandIE(InfoExtractor):
      _VALID_URL = r'https?://sportdeutschland\.tv/(?P<sport>[^/?#]+)/(?P<id>[^?#/]+)(?:$|[?#])'
      _TESTS = [{
  class SportDeutschlandIE(InfoExtractor):
      _VALID_URL = r'https?://sportdeutschland\.tv/(?P<sport>[^/?#]+)/(?P<id>[^?#/]+)(?:$|[?#])'
      _TESTS = [{
-        'url': 'http://sportdeutschland.tv/badminton/live-li-ning-badminton-weltmeisterschaft-2014-kopenhagen',
+        'url': 'https://sportdeutschland.tv/badminton/re-live-deutsche-meisterschaften-2020-halbfinals?playlistId=0',
          'info_dict': {
          'info_dict': {
-            'id': 'live-li-ning-badminton-weltmeisterschaft-2014-kopenhagen',
+            'id': 're-live-deutsche-meisterschaften-2020-halbfinals',
              'ext': 'mp4',
              'ext': 'mp4',
-            'title': 're:Li-Ning Badminton Weltmeisterschaft 2014 Kopenhagen',
-            'categories': ['Badminton'],
+            'title': 're:Re-live: Deutsche Meisterschaften 2020.*Halbfinals',
+            'categories': ['Badminton-Deutschland'],
              'view_count': int,
              'view_count': int,
-            'thumbnail': r're:^https?://.*\.jpg$',
-            'description': r're:Die Badminton-WM 2014 aus Kopenhagen bei Sportdeutschland\.TV',
+            'thumbnail': r're:^https?://.*\.(?:jpg|png)$',
              'timestamp': int,
              'timestamp': int,
-            'upload_date': 're:^201408[23][0-9]$',
+            'upload_date': '20200201',
+            'description': 're:.*',  # meaningless description for THIS video
          },
          },
-        'params': {
-            'skip_download': 'Live stream',
-        },
-    }, {
-        'url': 'http://sportdeutschland.tv/li-ning-badminton-wm-2014/lee-li-ning-badminton-weltmeisterschaft-2014-kopenhagen-herren-einzel-wei-vs',
-        'info_dict': {
-            'id': 'lee-li-ning-badminton-weltmeisterschaft-2014-kopenhagen-herren-einzel-wei-vs',
-            'ext': 'mp4',
-            'upload_date': '20140825',
-            'description': 'md5:60a20536b57cee7d9a4ec005e8687504',
-            'timestamp': 1408976060,
-            'duration': 2732,
-            'title': 'Li-Ning Badminton Weltmeisterschaft 2014 Kopenhagen: Herren Einzel, Wei Lee vs. Keun Lee',
-            'thumbnail': r're:^https?://.*\.jpg$',
-            'view_count': int,
-            'categories': ['Li-Ning Badminton WM 2014'],
-
-        }
      }]
  
      def _real_extract(self, url):
      }]
  
      def _real_extract(self, url):
@@ -50,7 +32,7 @@ def _real_extract(self, url):
          video_id = mobj.group('id')
          sport_id = mobj.group('sport')
  
          video_id = mobj.group('id')
          sport_id = mobj.group('sport')
  
-        api_url = 'http://proxy.vidibusdynamic.net/sportdeutschland.tv/api/permalinks/%s/%s?access_token=true' % (
+        api_url = 'https://proxy.vidibusdynamic.net/ssl/backend.sportdeutschland.tv/api/permalinks/%s/%s?access_token=true' % (
              sport_id, video_id)
          req = sanitized_Request(api_url, headers={
              'Accept': 'application/vnd.vidibus.v2.html+json',
              sport_id, video_id)
          req = sanitized_Request(api_url, headers={
              'Accept': 'application/vnd.vidibus.v2.html+json',
diff --git a/youtube_dl/extractor/srmediathek.py b/youtube_dl/extractor/srmediathek.py

index 28baf901c9f021c15544f099f78dd5d5a6b9165c..359dadaa3cce4540f5abb2fb58a43596379e0c56 100644 (file)
--- a/youtube_dl/extractor/srmediathek.py
+++ b/youtube_dl/extractor/srmediathek.py
@@ -1,14 +1,14 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
  # coding: utf-8
  from __future__ import unicode_literals
  
-from .ard import ARDMediathekIE
+from .ard import ARDMediathekBaseIE
  from ..utils import (
      ExtractorError,
      get_element_by_attribute,
  )
  
  
  from ..utils import (
      ExtractorError,
      get_element_by_attribute,
  )
  
  
-class SRMediathekIE(ARDMediathekIE):
+class SRMediathekIE(ARDMediathekBaseIE):
      IE_NAME = 'sr:mediathek'
      IE_DESC = 'Saarländischer Rundfunk'
      _VALID_URL = r'https?://sr-mediathek(?:\.sr-online)?\.de/index\.php\?.*?&id=(?P<id>[0-9]+)'
      IE_NAME = 'sr:mediathek'
      IE_DESC = 'Saarländischer Rundfunk'
      _VALID_URL = r'https?://sr-mediathek(?:\.sr-online)?\.de/index\.php\?.*?&id=(?P<id>[0-9]+)'
diff --git a/youtube_dl/extractor/stretchinternet.py b/youtube_dl/extractor/stretchinternet.py

index ae2ac1b42fe0021c8b904721221d441368bf50ca..4dbead2ba428abdbfc678aea31403b998a1eb06c 100644 (file)
--- a/youtube_dl/extractor/stretchinternet.py
+++ b/youtube_dl/extractor/stretchinternet.py
@@ -5,44 +5,28 @@
  
  
  class StretchInternetIE(InfoExtractor):
  
  
  class StretchInternetIE(InfoExtractor):
-    _VALID_URL = r'https?://portal\.stretchinternet\.com/[^/]+/portal\.htm\?.*?\beventId=(?P<id>\d+)'
+    _VALID_URL = r'https?://portal\.stretchinternet\.com/[^/]+/(?:portal|full)\.htm\?.*?\beventId=(?P<id>\d+)'
      _TEST = {
      _TEST = {
-        'url': 'https://portal.stretchinternet.com/umary/portal.htm?eventId=313900&streamType=video',
+        'url': 'https://portal.stretchinternet.com/umary/portal.htm?eventId=573272&streamType=video',
          'info_dict': {
          'info_dict': {
-            'id': '313900',
+            'id': '573272',
              'ext': 'mp4',
              'ext': 'mp4',
-            'title': 'Augustana (S.D.) Baseball vs University of Mary',
-            'description': 'md5:7578478614aae3bdd4a90f578f787438',
-            'timestamp': 1490468400,
-            'upload_date': '20170325',
+            'title': 'University of Mary Wrestling vs. Upper Iowa',
+            'timestamp': 1575668361,
+            'upload_date': '20191206',
          }
      }
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
  
          }
      }
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
  
-        stream = self._download_json(
-            'https://neo-client.stretchinternet.com/streamservice/v1/media/stream/v%s'
-            % video_id, video_id)
-
-        video_url = 'https://%s' % stream['source']
-
          event = self._download_json(
          event = self._download_json(
-            'https://neo-client.stretchinternet.com/portal-ws/getEvent.json',
-            video_id, query={
-                'clientID': 99997,
-                'eventID': video_id,
-                'token': 'asdf',
-            })['event']
-
-        title = event.get('title') or event['mobileTitle']
-        description = event.get('customText')
-        timestamp = int_or_none(event.get('longtime'))
+            'https://api.stretchinternet.com/trinity/event/tcg/' + video_id,
+            video_id)[0]
  
          return {
              'id': video_id,
  
          return {
              'id': video_id,
-            'title': title,
-            'description': description,
-            'timestamp': timestamp,
-            'url': video_url,
+            'title': event['title'],
+            'timestamp': int_or_none(event.get('dateCreated'), 1000),
+            'url': 'https://' + event['media'][0]['url'],
          }
          }
diff --git a/youtube_dl/extractor/svt.py b/youtube_dl/extractor/svt.py

index 0901c3163e6cab4723d451b30df8574e359ba899..e12389cad80a83612e10d052b7f36bceba0f1fbf 100644 (file)
--- a/youtube_dl/extractor/svt.py
+++ b/youtube_dl/extractor/svt.py
@@ -4,19 +4,14 @@
  import re
  
  from .common import InfoExtractor
  import re
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_parse_qs,
-    compat_urllib_parse_urlparse,
-)
+from ..compat import compat_str
  from ..utils import (
      determine_ext,
      dict_get,
      int_or_none,
  from ..utils import (
      determine_ext,
      dict_get,
      int_or_none,
-    orderedSet,
+    str_or_none,
      strip_or_none,
      try_get,
      strip_or_none,
      try_get,
-    urljoin,
-    compat_str,
  )
  
  
  )
  
  
@@ -237,23 +232,23 @@ def _real_extract(self, url):
  
  
  class SVTSeriesIE(SVTPlayBaseIE):
  
  
  class SVTSeriesIE(SVTPlayBaseIE):
-    _VALID_URL = r'https?://(?:www\.)?svtplay\.se/(?P<id>[^/?&#]+)'
+    _VALID_URL = r'https?://(?:www\.)?svtplay\.se/(?P<id>[^/?&#]+)(?:.+?\btab=(?P<season_slug>[^&#]+))?'
      _TESTS = [{
          'url': 'https://www.svtplay.se/rederiet',
          'info_dict': {
      _TESTS = [{
          'url': 'https://www.svtplay.se/rederiet',
          'info_dict': {
-            'id': 'rederiet',
+            'id': '14445680',
              'title': 'Rederiet',
              'title': 'Rederiet',
-            'description': 'md5:505d491a58f4fcf6eb418ecab947e69e',
+            'description': 'md5:d9fdfff17f5d8f73468176ecd2836039',
          },
          'playlist_mincount': 318,
      }, {
          },
          'playlist_mincount': 318,
      }, {
-        'url': 'https://www.svtplay.se/rederiet?tab=sasong2',
+        'url': 'https://www.svtplay.se/rederiet?tab=season-2-14445680',
          'info_dict': {
          'info_dict': {
-            'id': 'rederiet-sasong2',
+            'id': 'season-2-14445680',
              'title': 'Rederiet - Säsong 2',
              'title': 'Rederiet - Säsong 2',
-            'description': 'md5:505d491a58f4fcf6eb418ecab947e69e',
+            'description': 'md5:d9fdfff17f5d8f73468176ecd2836039',
          },
          },
-        'playlist_count': 12,
+        'playlist_mincount': 12,
      }]
  
      @classmethod
      }]
  
      @classmethod
@@ -261,83 +256,87 @@ def suitable(cls, url):
          return False if SVTIE.suitable(url) or SVTPlayIE.suitable(url) else super(SVTSeriesIE, cls).suitable(url)
  
      def _real_extract(self, url):
          return False if SVTIE.suitable(url) or SVTPlayIE.suitable(url) else super(SVTSeriesIE, cls).suitable(url)
  
      def _real_extract(self, url):
-        series_id = self._match_id(url)
-
-        qs = compat_parse_qs(compat_urllib_parse_urlparse(url).query)
-        season_slug = qs.get('tab', [None])[0]
-
-        if season_slug:
-            series_id += '-%s' % season_slug
-
-        webpage = self._download_webpage(
-            url, series_id, 'Downloading series page')
-
-        root = self._parse_json(
-            self._search_regex(
-                self._SVTPLAY_RE, webpage, 'content', group='json'),
-            series_id)
+        series_slug, season_id = re.match(self._VALID_URL, url).groups()
+
+        series = self._download_json(
+            'https://api.svt.se/contento/graphql', series_slug,
+            'Downloading series page', query={
+                'query': '''{
+  listablesBySlug(slugs: ["%s"]) {
+    associatedContent(include: [productionPeriod, season]) {
+      items {
+        item {
+          ... on Episode {
+            videoSvtId
+          }
+        }
+      }
+      id
+      name
+    }
+    id
+    longDescription
+    name
+    shortDescription
+  }
+}''' % series_slug,
+            })['data']['listablesBySlug'][0]
  
          season_name = None
  
          entries = []
  
          season_name = None
  
          entries = []
-        for season in root['relatedVideoContent']['relatedVideosAccordion']:
+        for season in series['associatedContent']:
              if not isinstance(season, dict):
                  continue
              if not isinstance(season, dict):
                  continue
-            if season_slug:
-                if season.get('slug') != season_slug:
+            if season_id:
+                if season.get('id') != season_id:
                      continue
                  season_name = season.get('name')
                      continue
                  season_name = season.get('name')
-            videos = season.get('videos')
-            if not isinstance(videos, list):
+            items = season.get('items')
+            if not isinstance(items, list):
                  continue
                  continue
-            for video in videos:
-                content_url = video.get('contentUrl')
-                if not content_url or not isinstance(content_url, compat_str):
+            for item in items:
+                video = item.get('item') or {}
+                content_id = video.get('videoSvtId')
+                if not content_id or not isinstance(content_id, compat_str):
                      continue
                      continue
-                entries.append(
-                    self.url_result(
-                        urljoin(url, content_url),
-                        ie=SVTPlayIE.ie_key(),
-                        video_title=video.get('title')
-                    ))
-
-        metadata = root.get('metaData')
-        if not isinstance(metadata, dict):
-            metadata = {}
+                entries.append(self.url_result(
+                    'svt:' + content_id, SVTPlayIE.ie_key(), content_id))
  
  
-        title = metadata.get('title')
-        season_name = season_name or season_slug
+        title = series.get('name')
+        season_name = season_name or season_id
  
          if title and season_name:
              title = '%s - %s' % (title, season_name)
  
          if title and season_name:
              title = '%s - %s' % (title, season_name)
-        elif season_slug:
-            title = season_slug
+        elif season_id:
+            title = season_id
  
          return self.playlist_result(
  
          return self.playlist_result(
-            entries, series_id, title, metadata.get('description'))
+            entries, season_id or series.get('id'), title,
+            dict_get(series, ('longDescription', 'shortDescription')))
  
  
  class SVTPageIE(InfoExtractor):
  
  
  class SVTPageIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?svt\.se/(?:[^/]+/)*(?P<id>[^/?&#]+)'
+    _VALID_URL = r'https?://(?:www\.)?svt\.se/(?P<path>(?:[^/]+/)*(?P<id>[^/?&#]+))'
      _TESTS = [{
      _TESTS = [{
-        'url': 'https://www.svt.se/sport/oseedat/guide-sommartraningen-du-kan-gora-var-och-nar-du-vill',
+        'url': 'https://www.svt.se/sport/ishockey/bakom-masken-lehners-kamp-mot-mental-ohalsa',
          'info_dict': {
          'info_dict': {
-            'id': 'guide-sommartraningen-du-kan-gora-var-och-nar-du-vill',
-            'title': 'GUIDE: Sommarträning du kan göra var och när du vill',
+            'id': '25298267',
+            'title': 'Bakom masken – Lehners kamp mot mental ohälsa',
          },
          },
-        'playlist_count': 7,
+        'playlist_count': 4,
      }, {
      }, {
-        'url': 'https://www.svt.se/nyheter/inrikes/ebba-busch-thor-kd-har-delvis-ratt-om-no-go-zoner',
+        'url': 'https://www.svt.se/nyheter/utrikes/svenska-andrea-ar-en-mil-fran-branderna-i-kalifornien',
          'info_dict': {
          'info_dict': {
-            'id': 'ebba-busch-thor-kd-har-delvis-ratt-om-no-go-zoner',
-            'title': 'Ebba Busch Thor har bara delvis rätt om ”no-go-zoner”',
+            'id': '24243746',
+            'title': 'Svenska Andrea redo att fly sitt hem i Kalifornien',
          },
          },
-        'playlist_count': 1,
+        'playlist_count': 2,
      }, {
          # only programTitle
          'url': 'http://www.svt.se/sport/ishockey/jagr-tacklar-giroux-under-intervjun',
          'info_dict': {
      }, {
          # only programTitle
          'url': 'http://www.svt.se/sport/ishockey/jagr-tacklar-giroux-under-intervjun',
          'info_dict': {
-            'id': '2900353',
+            'id': '8439V2K',
              'ext': 'mp4',
              'title': 'Stjärnorna skojar till det - under SVT-intervjun',
              'duration': 27,
              'ext': 'mp4',
              'title': 'Stjärnorna skojar till det - under SVT-intervjun',
              'duration': 27,
@@ -356,16 +355,26 @@ def suitable(cls, url):
          return False if SVTIE.suitable(url) else super(SVTPageIE, cls).suitable(url)
  
      def _real_extract(self, url):
          return False if SVTIE.suitable(url) else super(SVTPageIE, cls).suitable(url)
  
      def _real_extract(self, url):
-        playlist_id = self._match_id(url)
+        path, display_id = re.match(self._VALID_URL, url).groups()
  
  
-        webpage = self._download_webpage(url, playlist_id)
+        article = self._download_json(
+            'https://api.svt.se/nss-api/page/' + path, display_id,
+            query={'q': 'articles'})['articles']['content'][0]
  
  
-        entries = [
-            self.url_result(
-                'svt:%s' % video_id, ie=SVTPlayIE.ie_key(), video_id=video_id)
-            for video_id in orderedSet(re.findall(
-                r'data-video-id=["\'](\d+)', webpage))]
+        entries = []
  
  
-        title = strip_or_none(self._og_search_title(webpage, default=None))
+        def _process_content(content):
+            if content.get('_type') in ('VIDEOCLIP', 'VIDEOEPISODE'):
+                video_id = compat_str(content['image']['svtId'])
+                entries.append(self.url_result(
+                    'svt:' + video_id, SVTPlayIE.ie_key(), video_id))
  
  
-        return self.playlist_result(entries, playlist_id, title)
+        for media in article.get('media', []):
+            _process_content(media)
+
+        for obj in article.get('structuredBody', []):
+            _process_content(obj.get('content') or {})
+
+        return self.playlist_result(
+            entries, str_or_none(article.get('id')),
+            strip_or_none(article.get('title')))
diff --git a/youtube_dl/extractor/teachable.py b/youtube_dl/extractor/teachable.py

index 7d2e34b3bc4204d3ec999968c1ac76db7687a0c4..a75369dbe8a3582595ae339d58887eaefd220536 100644 (file)
--- a/youtube_dl/extractor/teachable.py
+++ b/youtube_dl/extractor/teachable.py
@@ -4,11 +4,12 @@
  
  from .common import InfoExtractor
  from .wistia import WistiaIE
  
  from .common import InfoExtractor
  from .wistia import WistiaIE
-from ..compat import compat_str
  from ..utils import (
      clean_html,
      ExtractorError,
  from ..utils import (
      clean_html,
      ExtractorError,
+    int_or_none,
      get_element_by_class,
      get_element_by_class,
+    strip_or_none,
      urlencode_postdata,
      urljoin,
  )
      urlencode_postdata,
      urljoin,
  )
@@ -20,8 +21,8 @@ class TeachableBaseIE(InfoExtractor):
  
      _SITES = {
          # Only notable ones here
  
      _SITES = {
          # Only notable ones here
-        'upskillcourses.com': 'upskill',
-        'academy.gns3.com': 'gns3',
+        'v1.upskillcourses.com': 'upskill',
+        'gns3.teachable.com': 'gns3',
          'academyhacker.com': 'academyhacker',
          'stackskills.com': 'stackskills',
          'market.saleshacker.com': 'saleshacker',
          'academyhacker.com': 'academyhacker',
          'stackskills.com': 'stackskills',
          'market.saleshacker.com': 'saleshacker',
@@ -58,7 +59,7 @@ def is_logged(webpage):
              self._logged_in = True
              return
  
              self._logged_in = True
              return
  
-        login_url = compat_str(urlh.geturl())
+        login_url = urlh.geturl()
  
          login_form = self._hidden_inputs(login_page)
  
  
          login_form = self._hidden_inputs(login_page)
  
@@ -110,27 +111,29 @@ class TeachableIE(TeachableBaseIE):
                      ''' % TeachableBaseIE._VALID_URL_SUB_TUPLE
  
      _TESTS = [{
                      ''' % TeachableBaseIE._VALID_URL_SUB_TUPLE
  
      _TESTS = [{
-        'url': 'http://upskillcourses.com/courses/essential-web-developer-course/lectures/1747100',
+        'url': 'https://gns3.teachable.com/courses/gns3-certified-associate/lectures/6842364',
          'info_dict': {
          'info_dict': {
-            'id': 'uzw6zw58or',
-            'ext': 'mp4',
-            'title': 'Welcome to the Course!',
-            'description': 'md5:65edb0affa582974de4625b9cdea1107',
-            'duration': 138.763,
-            'timestamp': 1479846621,
-            'upload_date': '20161122',
+            'id': 'untlgzk1v7',
+            'ext': 'bin',
+            'title': 'Overview',
+            'description': 'md5:071463ff08b86c208811130ea1c2464c',
+            'duration': 736.4,
+            'timestamp': 1542315762,
+            'upload_date': '20181115',
+            'chapter': 'Welcome',
+            'chapter_number': 1,
          },
          'params': {
              'skip_download': True,
          },
      }, {
          },
          'params': {
              'skip_download': True,
          },
      }, {
-        'url': 'http://upskillcourses.com/courses/119763/lectures/1747100',
+        'url': 'http://v1.upskillcourses.com/courses/119763/lectures/1747100',
          'only_matching': True,
      }, {
          'only_matching': True,
      }, {
-        'url': 'https://academy.gns3.com/courses/423415/lectures/6885939',
+        'url': 'https://gns3.teachable.com/courses/423415/lectures/6885939',
          'only_matching': True,
      }, {
          'only_matching': True,
      }, {
-        'url': 'teachable:https://upskillcourses.com/courses/essential-web-developer-course/lectures/1747100',
+        'url': 'teachable:https://v1.upskillcourses.com/courses/essential-web-developer-course/lectures/1747100',
          'only_matching': True,
      }]
  
          'only_matching': True,
      }]
  
@@ -160,22 +163,51 @@ def _real_extract(self, url):
  
          webpage = self._download_webpage(url, video_id)
  
  
          webpage = self._download_webpage(url, video_id)
  
-        wistia_url = WistiaIE._extract_url(webpage)
-        if not wistia_url:
+        wistia_urls = WistiaIE._extract_urls(webpage)
+        if not wistia_urls:
              if any(re.search(p, webpage) for p in (
                      r'class=["\']lecture-contents-locked',
                      r'>\s*Lecture contents locked',
              if any(re.search(p, webpage) for p in (
                      r'class=["\']lecture-contents-locked',
                      r'>\s*Lecture contents locked',
-                    r'id=["\']lecture-locked')):
+                    r'id=["\']lecture-locked',
+                    # https://academy.tailoredtutors.co.uk/courses/108779/lectures/1955313
+                    r'class=["\'](?:inner-)?lesson-locked',
+                    r'>LESSON LOCKED<')):
                  self.raise_login_required('Lecture contents locked')
                  self.raise_login_required('Lecture contents locked')
+            raise ExtractorError('Unable to find video URL')
  
          title = self._og_search_title(webpage, default=None)
  
  
          title = self._og_search_title(webpage, default=None)
  
-        return {
+        chapter = None
+        chapter_number = None
+        section_item = self._search_regex(
+            r'(?s)(?P<li><li[^>]+\bdata-lecture-id=["\']%s[^>]+>.+?</li>)' % video_id,
+            webpage, 'section item', default=None, group='li')
+        if section_item:
+            chapter_number = int_or_none(self._search_regex(
+                r'data-ss-position=["\'](\d+)', section_item, 'section id',
+                default=None))
+            if chapter_number is not None:
+                sections = []
+                for s in re.findall(
+                        r'(?s)<div[^>]+\bclass=["\']section-title[^>]+>(.+?)</div>', webpage):
+                    section = strip_or_none(clean_html(s))
+                    if not section:
+                        sections = []
+                        break
+                    sections.append(section)
+                if chapter_number <= len(sections):
+                    chapter = sections[chapter_number - 1]
+
+        entries = [{
              '_type': 'url_transparent',
              'url': wistia_url,
              'ie_key': WistiaIE.ie_key(),
              'title': title,
              '_type': 'url_transparent',
              'url': wistia_url,
              'ie_key': WistiaIE.ie_key(),
              'title': title,
-        }
+            'chapter': chapter,
+            'chapter_number': chapter_number,
+        } for wistia_url in wistia_urls]
+
+        return self.playlist_result(entries, video_id, title)
  
  
  class TeachableCourseIE(TeachableBaseIE):
  
  
  class TeachableCourseIE(TeachableBaseIE):
@@ -187,20 +219,20 @@ class TeachableCourseIE(TeachableBaseIE):
                          /(?:courses|p)/(?:enrolled/)?(?P<id>[^/?#&]+)
                      ''' % TeachableBaseIE._VALID_URL_SUB_TUPLE
      _TESTS = [{
                          /(?:courses|p)/(?:enrolled/)?(?P<id>[^/?#&]+)
                      ''' % TeachableBaseIE._VALID_URL_SUB_TUPLE
      _TESTS = [{
-        'url': 'http://upskillcourses.com/courses/essential-web-developer-course/',
+        'url': 'http://v1.upskillcourses.com/courses/essential-web-developer-course/',
          'info_dict': {
              'id': 'essential-web-developer-course',
              'title': 'The Essential Web Developer Course (Free)',
          },
          'playlist_count': 192,
      }, {
          'info_dict': {
              'id': 'essential-web-developer-course',
              'title': 'The Essential Web Developer Course (Free)',
          },
          'playlist_count': 192,
      }, {
-        'url': 'http://upskillcourses.com/courses/119763/',
+        'url': 'http://v1.upskillcourses.com/courses/119763/',
          'only_matching': True,
      }, {
          'only_matching': True,
      }, {
-        'url': 'http://upskillcourses.com/courses/enrolled/119763',
+        'url': 'http://v1.upskillcourses.com/courses/enrolled/119763',
          'only_matching': True,
      }, {
          'only_matching': True,
      }, {
-        'url': 'https://academy.gns3.com/courses/enrolled/423415',
+        'url': 'https://gns3.teachable.com/courses/enrolled/423415',
          'only_matching': True,
      }, {
          'url': 'teachable:https://learn.vrdev.school/p/gear-vr-developer-mini',
          'only_matching': True,
      }, {
          'url': 'teachable:https://learn.vrdev.school/p/gear-vr-developer-mini',
diff --git a/youtube_dl/extractor/tele5.py b/youtube_dl/extractor/tele5.py

index 33a72083bff96a74e04020257ba3305d1cfee2ca..3e1a7a9e609a9eb80732348b86e5976e12093a0b 100644 (file)
--- a/youtube_dl/extractor/tele5.py
+++ b/youtube_dl/extractor/tele5.py
@@ -1,13 +1,21 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
  # coding: utf-8
  from __future__ import unicode_literals
  
+import re
+
  from .common import InfoExtractor
  from .common import InfoExtractor
+from .jwplatform import JWPlatformIE
  from .nexx import NexxIE
  from ..compat import compat_urlparse
  from .nexx import NexxIE
  from ..compat import compat_urlparse
+from ..utils import (
+    NO_DEFAULT,
+    smuggle_url,
+)
  
  
  class Tele5IE(InfoExtractor):
      _VALID_URL = r'https?://(?:www\.)?tele5\.de/(?:[^/]+/)*(?P<id>[^/?#&]+)'
  
  
  class Tele5IE(InfoExtractor):
      _VALID_URL = r'https?://(?:www\.)?tele5\.de/(?:[^/]+/)*(?P<id>[^/?#&]+)'
+    _GEO_COUNTRIES = ['DE']
      _TESTS = [{
          'url': 'https://www.tele5.de/mediathek/filme-online/videos?vid=1549416',
          'info_dict': {
      _TESTS = [{
          'url': 'https://www.tele5.de/mediathek/filme-online/videos?vid=1549416',
          'info_dict': {
@@ -20,6 +28,21 @@ class Tele5IE(InfoExtractor):
          'params': {
              'skip_download': True,
          },
          'params': {
              'skip_download': True,
          },
+    }, {
+        # jwplatform, nexx unavailable
+        'url': 'https://www.tele5.de/filme/ghoul-das-geheimnis-des-friedhofmonsters/',
+        'info_dict': {
+            'id': 'WJuiOlUp',
+            'ext': 'mp4',
+            'upload_date': '20200603',
+            'timestamp': 1591214400,
+            'title': 'Ghoul - Das Geheimnis des Friedhofmonsters',
+            'description': 'md5:42002af1d887ff3d5b2b3ca1f8137d97',
+        },
+        'params': {
+            'skip_download': True,
+        },
+        'add_ie': [JWPlatformIE.ie_key()],
      }, {
          'url': 'https://www.tele5.de/kalkofes-mattscheibe/video-clips/politik-und-gesellschaft?ve_id=1551191',
          'only_matching': True,
      }, {
          'url': 'https://www.tele5.de/kalkofes-mattscheibe/video-clips/politik-und-gesellschaft?ve_id=1551191',
          'only_matching': True,
@@ -44,14 +67,42 @@ def _real_extract(self, url):
          qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
          video_id = (qs.get('vid') or qs.get('ve_id') or [None])[0]
  
          qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
          video_id = (qs.get('vid') or qs.get('ve_id') or [None])[0]
  
-        if not video_id:
+        NEXX_ID_RE = r'\d{6,}'
+        JWPLATFORM_ID_RE = r'[a-zA-Z0-9]{8}'
+
+        def nexx_result(nexx_id):
+            return self.url_result(
+                'https://api.nexx.cloud/v3/759/videos/byid/%s' % nexx_id,
+                ie=NexxIE.ie_key(), video_id=nexx_id)
+
+        nexx_id = jwplatform_id = None
+
+        if video_id:
+            if re.match(NEXX_ID_RE, video_id):
+                return nexx_result(video_id)
+            elif re.match(JWPLATFORM_ID_RE, video_id):
+                jwplatform_id = video_id
+
+        if not nexx_id:
              display_id = self._match_id(url)
              webpage = self._download_webpage(url, display_id)
              display_id = self._match_id(url)
              webpage = self._download_webpage(url, display_id)
-            video_id = self._html_search_regex(
-                (r'id\s*=\s*["\']video-player["\'][^>]+data-id\s*=\s*["\'](\d+)',
-                 r'\s+id\s*=\s*["\']player_(\d{6,})',
-                 r'\bdata-id\s*=\s*["\'](\d{6,})'), webpage, 'video id')
+
+            def extract_id(pattern, name, default=NO_DEFAULT):
+                return self._html_search_regex(
+                    (r'id\s*=\s*["\']video-player["\'][^>]+data-id\s*=\s*["\'](%s)' % pattern,
+                     r'\s+id\s*=\s*["\']player_(%s)' % pattern,
+                     r'\bdata-id\s*=\s*["\'](%s)' % pattern), webpage, name,
+                    default=default)
+
+            nexx_id = extract_id(NEXX_ID_RE, 'nexx id', default=None)
+            if nexx_id:
+                return nexx_result(nexx_id)
+
+            if not jwplatform_id:
+                jwplatform_id = extract_id(JWPLATFORM_ID_RE, 'jwplatform id')
  
          return self.url_result(
  
          return self.url_result(
-            'https://api.nexx.cloud/v3/759/videos/byid/%s' % video_id,
-            ie=NexxIE.ie_key(), video_id=video_id)
+            smuggle_url(
+                'jwplatform:%s' % jwplatform_id,
+                {'geo_countries': self._GEO_COUNTRIES}),
+            ie=JWPlatformIE.ie_key(), video_id=jwplatform_id)
diff --git a/youtube_dl/extractor/telecinco.py b/youtube_dl/extractor/telecinco.py

index d37e1b0557cf3ba241a25e7e56d28c8dc679b1d0..9ba3da341dac65d18a599a790bff9c95b0e52eb8 100644 (file)
--- a/youtube_dl/extractor/telecinco.py
+++ b/youtube_dl/extractor/telecinco.py
@@ -11,6 +11,7 @@
      determine_ext,
      int_or_none,
      str_or_none,
      determine_ext,
      int_or_none,
      str_or_none,
+    try_get,
      urljoin,
  )
  
      urljoin,
  )
  
@@ -24,7 +25,7 @@ class TelecincoIE(InfoExtractor):
          'info_dict': {
              'id': '1876350223',
              'title': 'Bacalao con kokotxas al pil-pil',
          'info_dict': {
              'id': '1876350223',
              'title': 'Bacalao con kokotxas al pil-pil',
-            'description': 'md5:1382dacd32dd4592d478cbdca458e5bb',
+            'description': 'md5:716caf5601e25c3c5ab6605b1ae71529',
          },
          'playlist': [{
              'md5': 'adb28c37238b675dad0f042292f209a7',
          },
          'playlist': [{
              'md5': 'adb28c37238b675dad0f042292f209a7',
@@ -55,6 +56,26 @@ class TelecincoIE(InfoExtractor):
              'description': 'md5:2771356ff7bfad9179c5f5cd954f1477',
              'duration': 50,
          },
              'description': 'md5:2771356ff7bfad9179c5f5cd954f1477',
              'duration': 50,
          },
+    }, {
+        # video in opening's content
+        'url': 'https://www.telecinco.es/vivalavida/fiorella-sobrina-edmundo-arrocet-entrevista_18_2907195140.html',
+        'info_dict': {
+            'id': '2907195140',
+            'title': 'La surrealista entrevista a la sobrina de Edmundo Arrocet: "No puedes venir aquí y tomarnos por tontos"',
+            'description': 'md5:73f340a7320143d37ab895375b2bf13a',
+        },
+        'playlist': [{
+            'md5': 'adb28c37238b675dad0f042292f209a7',
+            'info_dict': {
+                'id': 'TpI2EttSDAReWpJ1o0NVh2',
+                'ext': 'mp4',
+                'title': 'La surrealista entrevista a la sobrina de Edmundo Arrocet: "No puedes venir aquí y tomarnos por tontos"',
+                'duration': 1015,
+            },
+        }],
+        'params': {
+            'skip_download': True,
+        },
      }, {
          'url': 'http://www.telecinco.es/informativos/nacional/Pablo_Iglesias-Informativos_Telecinco-entrevista-Pedro_Piqueras_2_1945155182.html',
          'only_matching': True,
      }, {
          'url': 'http://www.telecinco.es/informativos/nacional/Pablo_Iglesias-Informativos_Telecinco-entrevista-Pedro_Piqueras_2_1945155182.html',
          'only_matching': True,
@@ -135,17 +156,28 @@ def _real_extract(self, url):
          display_id = self._match_id(url)
          webpage = self._download_webpage(url, display_id)
          article = self._parse_json(self._search_regex(
          display_id = self._match_id(url)
          webpage = self._download_webpage(url, display_id)
          article = self._parse_json(self._search_regex(
-            r'window\.\$REACTBASE_STATE\.article\s*=\s*({.+})',
+            r'window\.\$REACTBASE_STATE\.article(?:_multisite)?\s*=\s*({.+})',
              webpage, 'article'), display_id)['article']
          title = article.get('title')
              webpage, 'article'), display_id)['article']
          title = article.get('title')
-        description = clean_html(article.get('leadParagraph'))
+        description = clean_html(article.get('leadParagraph')) or ''
          if article.get('editorialType') != 'VID':
              entries = []
          if article.get('editorialType') != 'VID':
              entries = []
-            for p in article.get('body', []):
+            body = [article.get('opening')]
+            body.extend(try_get(article, lambda x: x['body'], list) or [])
+            for p in body:
+                if not isinstance(p, dict):
+                    continue
                  content = p.get('content')
                  content = p.get('content')
-                if p.get('type') != 'video' or not content:
+                if not content:
+                    continue
+                type_ = p.get('type')
+                if type_ == 'paragraph':
+                    content_str = str_or_none(content)
+                    if content_str:
+                        description += content_str
                      continue
                      continue
-                entries.append(self._parse_content(content, url))
+                if type_ == 'video' and isinstance(content, dict):
+                    entries.append(self._parse_content(content, url))
              return self.playlist_result(
                  entries, str_or_none(article.get('id')), title, description)
          content = article['opening']['content']
              return self.playlist_result(
                  entries, str_or_none(article.get('id')), title, description)
          content = article['opening']['content']
diff --git a/youtube_dl/extractor/telequebec.py b/youtube_dl/extractor/telequebec.py

index ae9f66787439462967baa63dc58f39870fb89382..c82c94b3a0009da2cf0938c92910feec84de018b 100644 (file)
--- a/youtube_dl/extractor/telequebec.py
+++ b/youtube_dl/extractor/telequebec.py
@@ -38,8 +38,6 @@ class TeleQuebecIE(TeleQuebecBaseIE):
              'ext': 'mp4',
              'title': 'Un petit choc et puis repart!',
              'description': 'md5:b04a7e6b3f74e32d7b294cffe8658374',
              'ext': 'mp4',
              'title': 'Un petit choc et puis repart!',
              'description': 'md5:b04a7e6b3f74e32d7b294cffe8658374',
-            'upload_date': '20180222',
-            'timestamp': 1519326631,
          },
          'params': {
              'skip_download': True,
          },
          'params': {
              'skip_download': True,
diff --git a/youtube_dl/extractor/tenplay.py b/youtube_dl/extractor/tenplay.py

index dff44a4e2f4f5e4105def539c186dc630b411b34..af325fea8fcd68ce5cf9b8bb8ec33975950b5c32 100644 (file)
--- a/youtube_dl/extractor/tenplay.py
+++ b/youtube_dl/extractor/tenplay.py
@@ -10,8 +10,8 @@
  
  
  class TenPlayIE(InfoExtractor):
  
  
  class TenPlayIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?10play\.com\.au/[^/]+/episodes/[^/]+/[^/]+/(?P<id>tpv\d{6}[a-z]{5})'
-    _TEST = {
+    _VALID_URL = r'https?://(?:www\.)?10play\.com\.au/(?:[^/]+/)+(?P<id>tpv\d{6}[a-z]{5})'
+    _TESTS = [{
          'url': 'https://10play.com.au/masterchef/episodes/season-1/masterchef-s1-ep-1/tpv190718kwzga',
          'info_dict': {
              'id': '6060533435001',
          'url': 'https://10play.com.au/masterchef/episodes/season-1/masterchef-s1-ep-1/tpv190718kwzga',
          'info_dict': {
              'id': '6060533435001',
@@ -27,7 +27,10 @@ class TenPlayIE(InfoExtractor):
              'format': 'bestvideo',
              'skip_download': True,
          }
              'format': 'bestvideo',
              'skip_download': True,
          }
-    }
+    }, {
+        'url': 'https://10play.com.au/how-to-stay-married/web-extras/season-1/terrys-talks-ep-1-embracing-change/tpv190915ylupc',
+        'only_matching': True,
+    }]
      BRIGHTCOVE_URL_TEMPLATE = 'https://players.brightcove.net/2199827728001/cN6vRtRQt_default/index.html?videoId=%s'
  
      def _real_extract(self, url):
      BRIGHTCOVE_URL_TEMPLATE = 'https://players.brightcove.net/2199827728001/cN6vRtRQt_default/index.html?videoId=%s'
  
      def _real_extract(self, url):
diff --git a/youtube_dl/extractor/tfo.py b/youtube_dl/extractor/tfo.py

index 0e2370cd828f78a2e1a708852a392a05d96e3039..0631cb7aba8a7068a291fb8e67de0d5e04acf482 100644 (file)
--- a/youtube_dl/extractor/tfo.py
+++ b/youtube_dl/extractor/tfo.py
@@ -17,14 +17,12 @@ class TFOIE(InfoExtractor):
      _VALID_URL = r'https?://(?:www\.)?tfo\.org/(?:en|fr)/(?:[^/]+/){2}(?P<id>\d+)'
      _TEST = {
          'url': 'http://www.tfo.org/en/universe/tfo-247/100463871/video-game-hackathon',
      _VALID_URL = r'https?://(?:www\.)?tfo\.org/(?:en|fr)/(?:[^/]+/){2}(?P<id>\d+)'
      _TEST = {
          'url': 'http://www.tfo.org/en/universe/tfo-247/100463871/video-game-hackathon',
-        'md5': '47c987d0515561114cf03d1226a9d4c7',
+        'md5': 'cafbe4f47a8dae0ca0159937878100d6',
          'info_dict': {
          'info_dict': {
-            'id': '100463871',
+            'id': '7da3d50e495c406b8fc0b997659cc075',
              'ext': 'mp4',
              'title': 'Video Game Hackathon',
              'description': 'md5:558afeba217c6c8d96c60e5421795c07',
              'ext': 'mp4',
              'title': 'Video Game Hackathon',
              'description': 'md5:558afeba217c6c8d96c60e5421795c07',
-            'upload_date': '20160212',
-            'timestamp': 1455310233,
          }
      }
  
          }
      }
  
diff --git a/youtube_dl/extractor/thisoldhouse.py b/youtube_dl/extractor/thisoldhouse.py

index 6ab147ad726306ba9250599d34491a50e64e82d0..a3d9b4017b93b1d9381b896967fa9e68da59eaec 100644 (file)
--- a/youtube_dl/extractor/thisoldhouse.py
+++ b/youtube_dl/extractor/thisoldhouse.py
@@ -2,43 +2,46 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
-from ..compat import compat_str
-from ..utils import try_get
  
  
  class ThisOldHouseIE(InfoExtractor):
  
  
  class ThisOldHouseIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?thisoldhouse\.com/(?:watch|how-to|tv-episode)/(?P<id>[^/?#]+)'
+    _VALID_URL = r'https?://(?:www\.)?thisoldhouse\.com/(?:watch|how-to|tv-episode|(?:[^/]+/)?\d+)/(?P<id>[^/?#]+)'
      _TESTS = [{
          'url': 'https://www.thisoldhouse.com/how-to/how-to-build-storage-bench',
      _TESTS = [{
          'url': 'https://www.thisoldhouse.com/how-to/how-to-build-storage-bench',
-        'md5': '568acf9ca25a639f0c4ff905826b662f',
          'info_dict': {
          'info_dict': {
-            'id': '2REGtUDQ',
+            'id': '5dcdddf673c3f956ef5db202',
              'ext': 'mp4',
              'title': 'How to Build a Storage Bench',
              'description': 'In the workshop, Tom Silva and Kevin O\'Connor build a storage bench for an entryway.',
              'timestamp': 1442548800,
              'upload_date': '20150918',
              'ext': 'mp4',
              'title': 'How to Build a Storage Bench',
              'description': 'In the workshop, Tom Silva and Kevin O\'Connor build a storage bench for an entryway.',
              'timestamp': 1442548800,
              'upload_date': '20150918',
-        }
+        },
+        'params': {
+            'skip_download': True,
+        },
      }, {
          'url': 'https://www.thisoldhouse.com/watch/arlington-arts-crafts-arts-and-crafts-class-begins',
          'only_matching': True,
      }, {
          'url': 'https://www.thisoldhouse.com/tv-episode/ask-toh-shelf-rough-electric',
          'only_matching': True,
      }, {
          'url': 'https://www.thisoldhouse.com/watch/arlington-arts-crafts-arts-and-crafts-class-begins',
          'only_matching': True,
      }, {
          'url': 'https://www.thisoldhouse.com/tv-episode/ask-toh-shelf-rough-electric',
          'only_matching': True,
+    }, {
+        'url': 'https://www.thisoldhouse.com/furniture/21017078/how-to-build-a-storage-bench',
+        'only_matching': True,
+    }, {
+        'url': 'https://www.thisoldhouse.com/21113884/s41-e13-paradise-lost',
+        'only_matching': True,
+    }, {
+        # iframe www.thisoldhouse.com
+        'url': 'https://www.thisoldhouse.com/21083431/seaside-transformation-the-westerly-project',
+        'only_matching': True,
      }]
      }]
+    _ZYPE_TMPL = 'https://player.zype.com/embed/%s.html?api_key=hsOk_yMSPYNrT22e9pu8hihLXjaZf0JW5jsOWv4ZqyHJFvkJn6rtToHl09tbbsbe'
  
      def _real_extract(self, url):
          display_id = self._match_id(url)
          webpage = self._download_webpage(url, display_id)
          video_id = self._search_regex(
  
      def _real_extract(self, url):
          display_id = self._match_id(url)
          webpage = self._download_webpage(url, display_id)
          video_id = self._search_regex(
-            (r'data-mid=(["\'])(?P<id>(?:(?!\1).)+)\1',
-             r'id=(["\'])inline-video-player-(?P<id>(?:(?!\1).)+)\1'),
-            webpage, 'video id', default=None, group='id')
-        if not video_id:
-            drupal_settings = self._parse_json(self._search_regex(
-                r'jQuery\.extend\(Drupal\.settings\s*,\s*({.+?})\);',
-                webpage, 'drupal settings'), display_id)
-            video_id = try_get(
-                drupal_settings, lambda x: x['jwplatform']['video_id'],
-                compat_str) or list(drupal_settings['comScore'])[0]
-        return self.url_result('jwplatform:' + video_id, 'JWPlatform', video_id)
+            r'<iframe[^>]+src=[\'"](?:https?:)?//(?:www\.)?thisoldhouse\.(?:chorus\.build|com)/videos/zype/([0-9a-f]{24})',
+            webpage, 'video id')
+        return self.url_result(self._ZYPE_TMPL % video_id, 'Zype', video_id)
diff --git a/youtube_dl/extractor/toggle.py b/youtube_dl/extractor/toggle.py

index 5e5efda0f0780fb98b7c37b788ad2734a837e90d..ca2e36efe4216ad66d46252c662ea4cc5395c3ca 100644 (file)
--- a/youtube_dl/extractor/toggle.py
+++ b/youtube_dl/extractor/toggle.py
@@ -17,9 +17,9 @@
  
  class ToggleIE(InfoExtractor):
      IE_NAME = 'toggle'
  
  class ToggleIE(InfoExtractor):
      IE_NAME = 'toggle'
-    _VALID_URL = r'https?://video\.toggle\.sg/(?:en|zh)/(?:[^/]+/){2,}(?P<id>[0-9]+)'
+    _VALID_URL = r'https?://(?:(?:www\.)?mewatch|video\.toggle)\.sg/(?:en|zh)/(?:[^/]+/){2,}(?P<id>[0-9]+)'
      _TESTS = [{
      _TESTS = [{
-        'url': 'http://video.toggle.sg/en/series/lion-moms-tif/trailers/lion-moms-premier/343115',
+        'url': 'http://www.mewatch.sg/en/series/lion-moms-tif/trailers/lion-moms-premier/343115',
          'info_dict': {
              'id': '343115',
              'ext': 'mp4',
          'info_dict': {
              'id': '343115',
              'ext': 'mp4',
@@ -33,7 +33,7 @@ class ToggleIE(InfoExtractor):
          }
      }, {
          'note': 'DRM-protected video',
          }
      }, {
          'note': 'DRM-protected video',
-        'url': 'http://video.toggle.sg/en/movies/dug-s-special-mission/341413',
+        'url': 'http://www.mewatch.sg/en/movies/dug-s-special-mission/341413',
          'info_dict': {
              'id': '341413',
              'ext': 'wvm',
          'info_dict': {
              'id': '341413',
              'ext': 'wvm',
@@ -48,7 +48,7 @@ class ToggleIE(InfoExtractor):
      }, {
          # this also tests correct video id extraction
          'note': 'm3u8 links are geo-restricted, but Android/mp4 is okay',
      }, {
          # this also tests correct video id extraction
          'note': 'm3u8 links are geo-restricted, but Android/mp4 is okay',
-        'url': 'http://video.toggle.sg/en/series/28th-sea-games-5-show/28th-sea-games-5-show-ep11/332861',
+        'url': 'http://www.mewatch.sg/en/series/28th-sea-games-5-show/28th-sea-games-5-show-ep11/332861',
          'info_dict': {
              'id': '332861',
              'ext': 'mp4',
          'info_dict': {
              'id': '332861',
              'ext': 'mp4',
@@ -65,19 +65,22 @@ class ToggleIE(InfoExtractor):
          'url': 'http://video.toggle.sg/en/clips/seraph-sun-aloysius-will-suddenly-sing-some-old-songs-in-high-pitch-on-set/343331',
          'only_matching': True,
      }, {
          'url': 'http://video.toggle.sg/en/clips/seraph-sun-aloysius-will-suddenly-sing-some-old-songs-in-high-pitch-on-set/343331',
          'only_matching': True,
      }, {
-        'url': 'http://video.toggle.sg/zh/series/zero-calling-s2-hd/ep13/336367',
+        'url': 'http://www.mewatch.sg/en/clips/seraph-sun-aloysius-will-suddenly-sing-some-old-songs-in-high-pitch-on-set/343331',
          'only_matching': True,
      }, {
          'only_matching': True,
      }, {
-        'url': 'http://video.toggle.sg/en/series/vetri-s2/webisodes/jeeva-is-an-orphan-vetri-s2-webisode-7/342302',
+        'url': 'http://www.mewatch.sg/zh/series/zero-calling-s2-hd/ep13/336367',
          'only_matching': True,
      }, {
          'only_matching': True,
      }, {
-        'url': 'http://video.toggle.sg/en/movies/seven-days/321936',
+        'url': 'http://www.mewatch.sg/en/series/vetri-s2/webisodes/jeeva-is-an-orphan-vetri-s2-webisode-7/342302',
          'only_matching': True,
      }, {
          'only_matching': True,
      }, {
-        'url': 'https://video.toggle.sg/en/tv-show/news/may-2017-cna-singapore-tonight/fri-19-may-2017/512456',
+        'url': 'http://www.mewatch.sg/en/movies/seven-days/321936',
          'only_matching': True,
      }, {
          'only_matching': True,
      }, {
-        'url': 'http://video.toggle.sg/en/channels/eleven-plus/401585',
+        'url': 'https://www.mewatch.sg/en/tv-show/news/may-2017-cna-singapore-tonight/fri-19-may-2017/512456',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.mewatch.sg/en/channels/eleven-plus/401585',
          'only_matching': True,
      }]
  
          'only_matching': True,
      }]
  
diff --git a/youtube_dl/extractor/trunews.py b/youtube_dl/extractor/trunews.py

index b0c7caabf336499dea40adbb62a321e43a6a03f4..cca5b5cebd7c7dedd081156fcb670be3b8bd6027 100644 (file)
--- a/youtube_dl/extractor/trunews.py
+++ b/youtube_dl/extractor/trunews.py
@@ -1,21 +1,12 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
-from ..utils import (
-    dict_get,
-    float_or_none,
-    int_or_none,
-    unified_timestamp,
-    update_url_query,
-    url_or_none,
-)
  
  
  class TruNewsIE(InfoExtractor):
      _VALID_URL = r'https?://(?:www\.)?trunews\.com/stream/(?P<id>[^/?#&]+)'
      _TEST = {
          'url': 'https://www.trunews.com/stream/will-democrats-stage-a-circus-during-president-trump-s-state-of-the-union-speech',
  
  
  class TruNewsIE(InfoExtractor):
      _VALID_URL = r'https?://(?:www\.)?trunews\.com/stream/(?P<id>[^/?#&]+)'
      _TEST = {
          'url': 'https://www.trunews.com/stream/will-democrats-stage-a-circus-during-president-trump-s-state-of-the-union-speech',
-        'md5': 'a19c024c3906ff954fac9b96ce66bb08',
          'info_dict': {
              'id': '5c5a21e65d3c196e1c0020cc',
              'display_id': 'will-democrats-stage-a-circus-during-president-trump-s-state-of-the-union-speech',
          'info_dict': {
              'id': '5c5a21e65d3c196e1c0020cc',
              'display_id': 'will-democrats-stage-a-circus-during-president-trump-s-state-of-the-union-speech',
@@ -28,48 +19,16 @@ class TruNewsIE(InfoExtractor):
          },
          'add_ie': ['Zype'],
      }
          },
          'add_ie': ['Zype'],
      }
+    _ZYPE_TEMPL = 'https://player.zype.com/embed/%s.js?api_key=X5XnahkjCwJrT_l5zUqypnaLEObotyvtUKJWWlONxDoHVjP8vqxlArLV8llxMbyt'
  
      def _real_extract(self, url):
          display_id = self._match_id(url)
  
  
      def _real_extract(self, url):
          display_id = self._match_id(url)
  
-        video = self._download_json(
+        zype_id = self._download_json(
              'https://api.zype.com/videos', display_id, query={
                  'app_key': 'PUVKp9WgGUb3-JUw6EqafLx8tFVP6VKZTWbUOR-HOm__g4fNDt1bCsm_LgYf_k9H',
                  'per_page': 1,
                  'active': 'true',
                  'friendly_title': display_id,
              'https://api.zype.com/videos', display_id, query={
                  'app_key': 'PUVKp9WgGUb3-JUw6EqafLx8tFVP6VKZTWbUOR-HOm__g4fNDt1bCsm_LgYf_k9H',
                  'per_page': 1,
                  'active': 'true',
                  'friendly_title': display_id,
-            })['response'][0]
-
-        zype_id = video['_id']
-
-        thumbnails = []
-        thumbnails_list = video.get('thumbnails')
-        if isinstance(thumbnails_list, list):
-            for thumbnail in thumbnails_list:
-                if not isinstance(thumbnail, dict):
-                    continue
-                thumbnail_url = url_or_none(thumbnail.get('url'))
-                if not thumbnail_url:
-                    continue
-                thumbnails.append({
-                    'url': thumbnail_url,
-                    'width': int_or_none(thumbnail.get('width')),
-                    'height': int_or_none(thumbnail.get('height')),
-                })
-
-        return {
-            '_type': 'url_transparent',
-            'url': update_url_query(
-                'https://player.zype.com/embed/%s.js' % zype_id,
-                {'api_key': 'X5XnahkjCwJrT_l5zUqypnaLEObotyvtUKJWWlONxDoHVjP8vqxlArLV8llxMbyt'}),
-            'ie_key': 'Zype',
-            'id': zype_id,
-            'display_id': display_id,
-            'title': video.get('title'),
-            'description': dict_get(video, ('description', 'ott_description', 'short_description')),
-            'duration': int_or_none(video.get('duration')),
-            'timestamp': unified_timestamp(video.get('published_at')),
-            'average_rating': float_or_none(video.get('rating')),
-            'view_count': int_or_none(video.get('request_count')),
-            'thumbnails': thumbnails,
-        }
+            })['response'][0]['_id']
+        return self.url_result(self._ZYPE_TEMPL % zype_id, 'Zype', zype_id)
diff --git a/youtube_dl/extractor/tumblr.py b/youtube_dl/extractor/tumblr.py

index edbb0aa6944ba82b36415875f2d99e570b3373fc..ae584ad697bdf3f460eff033b8f43e75776942ee 100644 (file)
--- a/youtube_dl/extractor/tumblr.py
+++ b/youtube_dl/extractor/tumblr.py
@@ -4,7 +4,6 @@
  import re
  
  from .common import InfoExtractor
  import re
  
  from .common import InfoExtractor
-from ..compat import compat_str
  from ..utils import (
      ExtractorError,
      int_or_none,
  from ..utils import (
      ExtractorError,
      int_or_none,
@@ -151,7 +150,7 @@ def _real_extract(self, url):
          url = 'http://%s.tumblr.com/post/%s/' % (blog, video_id)
          webpage, urlh = self._download_webpage_handle(url, video_id)
  
          url = 'http://%s.tumblr.com/post/%s/' % (blog, video_id)
          webpage, urlh = self._download_webpage_handle(url, video_id)
  
-        redirect_url = compat_str(urlh.geturl())
+        redirect_url = urlh.geturl()
          if 'tumblr.com/safe-mode' in redirect_url or redirect_url.startswith('/safe-mode'):
              raise ExtractorError(
                  'This Tumblr may contain sensitive media. '
          if 'tumblr.com/safe-mode' in redirect_url or redirect_url.startswith('/safe-mode'):
              raise ExtractorError(
                  'This Tumblr may contain sensitive media. '
diff --git a/youtube_dl/extractor/tv2dk.py b/youtube_dl/extractor/tv2dk.py

index 611fdc0c6c7002c1669200c7ace75bf498a85c6c..8bda9348d723073b894d2d77b6556b51d89dad80 100644 (file)
--- a/youtube_dl/extractor/tv2dk.py
+++ b/youtube_dl/extractor/tv2dk.py
@@ -106,7 +106,7 @@ def _real_extract(self, url):
          video_id = self._match_id(url)
  
          video = self._download_json(
          video_id = self._match_id(url)
  
          video = self._download_json(
-            'http://play.tv2bornholm.dk/controls/AJAX.aspx/specifikVideo', video_id,
+            'https://play.tv2bornholm.dk/controls/AJAX.aspx/specifikVideo', video_id,
              data=json.dumps({
                  'playlist_id': video_id,
                  'serienavn': '',
              data=json.dumps({
                  'playlist_id': video_id,
                  'serienavn': '',
diff --git a/youtube_dl/extractor/tv4.py b/youtube_dl/extractor/tv4.py

index a819d048c613929b79f090facc4a82a097e1cb73..c498b0191623220071d38764f04d3ba1fc114558 100644 (file)
--- a/youtube_dl/extractor/tv4.py
+++ b/youtube_dl/extractor/tv4.py
@@ -99,7 +99,7 @@ def _real_extract(self, url):
              manifest_url.replace('.m3u8', '.f4m'),
              video_id, f4m_id='hds', fatal=False))
          formats.extend(self._extract_ism_formats(
              manifest_url.replace('.m3u8', '.f4m'),
              video_id, f4m_id='hds', fatal=False))
          formats.extend(self._extract_ism_formats(
-            re.sub(r'\.ism/.+?\.m3u8', r'.ism/Manifest', manifest_url),
+            re.sub(r'\.ism/.*?\.m3u8', r'.ism/Manifest', manifest_url),
              video_id, ism_id='mss', fatal=False))
  
          if not formats and info.get('is_geo_restricted'):
              video_id, ism_id='mss', fatal=False))
  
          if not formats and info.get('is_geo_restricted'):
diff --git a/youtube_dl/extractor/tv5mondeplus.py b/youtube_dl/extractor/tv5mondeplus.py

index 88b6baa316b54eb58e3deb5d69f2fd04c1795bba..b7fe082b9c00e6600b6bf9f6e29849a3fd4eb3b6 100644 (file)
--- a/youtube_dl/extractor/tv5mondeplus.py
+++ b/youtube_dl/extractor/tv5mondeplus.py
@@ -3,31 +3,51 @@
  
  from .common import InfoExtractor
  from ..utils import (
  
  from .common import InfoExtractor
  from ..utils import (
-    clean_html,
      determine_ext,
      extract_attributes,
      determine_ext,
      extract_attributes,
-    get_element_by_class,
      int_or_none,
      parse_duration,
      int_or_none,
      parse_duration,
-    parse_iso8601,
  )
  
  
  class TV5MondePlusIE(InfoExtractor):
      IE_DESC = 'TV5MONDE+'
  )
  
  
  class TV5MondePlusIE(InfoExtractor):
      IE_DESC = 'TV5MONDE+'
-    _VALID_URL = r'https?://(?:www\.)?tv5mondeplus\.com/toutes-les-videos/[^/]+/(?P<id>[^/?#]+)'
-    _TEST = {
-        'url': 'http://www.tv5mondeplus.com/toutes-les-videos/documentaire/tdah-mon-amour-tele-quebec-tdah-mon-amour-ep001-enfants',
-        'md5': '12130fc199f020673138a83466542ec6',
+    _VALID_URL = r'https?://(?:www\.)?(?:tv5mondeplus|revoir\.tv5monde)\.com/toutes-les-videos/[^/]+/(?P<id>[^/?#]+)'
+    _TESTS = [{
+        # movie
+        'url': 'https://revoir.tv5monde.com/toutes-les-videos/cinema/rendez-vous-a-atlit',
+        'md5': '8cbde5ea7b296cf635073e27895e227f',
          'info_dict': {
          'info_dict': {
-            'id': 'tdah-mon-amour-tele-quebec-tdah-mon-amour-ep001-enfants',
+            'id': '822a4756-0712-7329-1859-a13ac7fd1407',
+            'display_id': 'rendez-vous-a-atlit',
              'ext': 'mp4',
              'ext': 'mp4',
-            'title': 'Tdah, mon amour - Enfants',
-            'description': 'md5:230e3aca23115afcf8006d1bece6df74',
-            'upload_date': '20170401',
-            'timestamp': 1491022860,
-        }
-    }
+            'title': 'Rendez-vous à Atlit',
+            'description': 'md5:2893a4c5e1dbac3eedff2d87956e4efb',
+            'upload_date': '20200130',
+        },
+    }, {
+        # series episode
+        'url': 'https://revoir.tv5monde.com/toutes-les-videos/series-fictions/c-est-la-vie-ennemie-juree',
+        'info_dict': {
+            'id': '0df7007c-4900-3936-c601-87a13a93a068',
+            'display_id': 'c-est-la-vie-ennemie-juree',
+            'ext': 'mp4',
+            'title': "C'est la vie - Ennemie jurée",
+            'description': 'md5:dfb5c63087b6f35fe0cc0af4fe44287e',
+            'upload_date': '20200130',
+            'series': "C'est la vie",
+            'episode': 'Ennemie jurée',
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }, {
+        'url': 'https://revoir.tv5monde.com/toutes-les-videos/series-fictions/neuf-jours-en-hiver-neuf-jours-en-hiver',
+        'only_matching': True,
+    }, {
+        'url': 'https://revoir.tv5monde.com/toutes-les-videos/info-societe/le-journal-de-la-rts-edition-du-30-01-20-19h30',
+        'only_matching': True,
+    }]
      _GEO_BYPASS = False
  
      def _real_extract(self, url):
      _GEO_BYPASS = False
  
      def _real_extract(self, url):
@@ -37,11 +57,7 @@ def _real_extract(self, url):
          if ">Ce programme n'est malheureusement pas disponible pour votre zone géographique.<" in webpage:
              self.raise_geo_restricted(countries=['FR'])
  
          if ">Ce programme n'est malheureusement pas disponible pour votre zone géographique.<" in webpage:
              self.raise_geo_restricted(countries=['FR'])
  
-        series = get_element_by_class('video-detail__title', webpage)
-        title = episode = get_element_by_class(
-            'video-detail__subtitle', webpage) or series
-        if series and series != title:
-            title = '%s - %s' % (series, title)
+        title = episode = self._html_search_regex(r'<h1>([^<]+)', webpage, 'title')
          vpl_data = extract_attributes(self._search_regex(
              r'(<[^>]+class="video_player_loader"[^>]+>)',
              webpage, 'video player loader'))
          vpl_data = extract_attributes(self._search_regex(
              r'(<[^>]+class="video_player_loader"[^>]+>)',
              webpage, 'video player loader'))
@@ -65,15 +81,37 @@ def _real_extract(self, url):
                  })
          self._sort_formats(formats)
  
                  })
          self._sort_formats(formats)
  
+        description = self._html_search_regex(
+            r'(?s)<div[^>]+class=["\']episode-texte[^>]+>(.+?)</div>', webpage,
+            'description', fatal=False)
+
+        series = self._html_search_regex(
+            r'<p[^>]+class=["\']episode-emission[^>]+>([^<]+)', webpage,
+            'series', default=None)
+
+        if series and series != title:
+            title = '%s - %s' % (series, title)
+
+        upload_date = self._search_regex(
+            r'(?:date_publication|publish_date)["\']\s*:\s*["\'](\d{4}_\d{2}_\d{2})',
+            webpage, 'upload date', default=None)
+        if upload_date:
+            upload_date = upload_date.replace('_', '')
+
+        video_id = self._search_regex(
+            (r'data-guid=["\']([\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})',
+             r'id_contenu["\']\s:\s*(\d+)'), webpage, 'video id',
+            default=display_id)
+
          return {
          return {
-            'id': display_id,
+            'id': video_id,
              'display_id': display_id,
              'title': title,
              'display_id': display_id,
              'title': title,
-            'description': clean_html(get_element_by_class('video-detail__description', webpage)),
+            'description': description,
              'thumbnail': vpl_data.get('data-image'),
              'duration': int_or_none(vpl_data.get('data-duration')) or parse_duration(self._html_search_meta('duration', webpage)),
              'thumbnail': vpl_data.get('data-image'),
              'duration': int_or_none(vpl_data.get('data-duration')) or parse_duration(self._html_search_meta('duration', webpage)),
-            'timestamp': parse_iso8601(self._html_search_meta('uploadDate', webpage)),
+            'upload_date': upload_date,
              'formats': formats,
              'formats': formats,
-            'episode': episode,
              'series': series,
              'series': series,
+            'episode': episode,
          }
          }
diff --git a/youtube_dl/extractor/tva.py b/youtube_dl/extractor/tva.py

index 0b863df2ff4ad214162c6187ac7aaa65fe3fc6c9..443f46e8a3537165d620c2db8863634e9f922ab6 100644 (file)
--- a/youtube_dl/extractor/tva.py
+++ b/youtube_dl/extractor/tva.py
@@ -9,8 +9,8 @@
  
  
  class TVAIE(InfoExtractor):
  
  
  class TVAIE(InfoExtractor):
-    _VALID_URL = r'https?://videos\.tva\.ca/details/_(?P<id>\d+)'
-    _TEST = {
+    _VALID_URL = r'https?://videos?\.tva\.ca/details/_(?P<id>\d+)'
+    _TESTS = [{
          'url': 'https://videos.tva.ca/details/_5596811470001',
          'info_dict': {
              'id': '5596811470001',
          'url': 'https://videos.tva.ca/details/_5596811470001',
          'info_dict': {
              'id': '5596811470001',
@@ -24,7 +24,10 @@ class TVAIE(InfoExtractor):
              # m3u8 download
              'skip_download': True,
          }
              # m3u8 download
              'skip_download': True,
          }
-    }
+    }, {
+        'url': 'https://video.tva.ca/details/_5596811470001',
+        'only_matching': True,
+    }]
      BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/5481942443001/default_default/index.html?videoId=%s'
  
      def _real_extract(self, url):
      BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/5481942443001/default_default/index.html?videoId=%s'
  
      def _real_extract(self, url):
diff --git a/youtube_dl/extractor/tvplay.py b/youtube_dl/extractor/tvplay.py

index d82d48f94ecc026629479d84aa8ca513eaf0db26..3c2450dd0c8733d3a96a1a842c54505294b70ca6 100644 (file)
--- a/youtube_dl/extractor/tvplay.py
+++ b/youtube_dl/extractor/tvplay.py
@@ -6,7 +6,6 @@
  from .common import InfoExtractor
  from ..compat import (
      compat_HTTPError,
  from .common import InfoExtractor
  from ..compat import (
      compat_HTTPError,
-    compat_str,
      compat_urlparse,
  )
  from ..utils import (
      compat_urlparse,
  )
  from ..utils import (
@@ -15,9 +14,7 @@
      int_or_none,
      parse_iso8601,
      qualities,
      int_or_none,
      parse_iso8601,
      qualities,
-    smuggle_url,
      try_get,
      try_get,
-    unsmuggle_url,
      update_url_query,
      url_or_none,
  )
      update_url_query,
      url_or_none,
  )
@@ -235,11 +232,6 @@ class TVPlayIE(InfoExtractor):
      ]
  
      def _real_extract(self, url):
      ]
  
      def _real_extract(self, url):
-        url, smuggled_data = unsmuggle_url(url, {})
-        self._initialize_geo_bypass({
-            'countries': smuggled_data.get('geo_countries'),
-        })
-
          video_id = self._match_id(url)
          geo_country = self._search_regex(
              r'https?://[^/]+\.([a-z]{2})', url,
          video_id = self._match_id(url)
          geo_country = self._search_regex(
              r'https?://[^/]+\.([a-z]{2})', url,
@@ -285,8 +277,6 @@ def _real_extract(self, url):
                      'ext': ext,
                  }
                  if video_url.startswith('rtmp'):
                      'ext': ext,
                  }
                  if video_url.startswith('rtmp'):
-                    if smuggled_data.get('skip_rtmp'):
-                        continue
                      m = re.search(
                          r'^(?P<url>rtmp://[^/]+/(?P<app>[^/]+))/(?P<playpath>.+)$', video_url)
                      if not m:
                      m = re.search(
                          r'^(?P<url>rtmp://[^/]+/(?P<app>[^/]+))/(?P<playpath>.+)$', video_url)
                      if not m:
@@ -347,115 +337,80 @@ class ViafreeIE(InfoExtractor):
      _VALID_URL = r'''(?x)
                      https?://
                          (?:www\.)?
      _VALID_URL = r'''(?x)
                      https?://
                          (?:www\.)?
-                        viafree\.
-                        (?:
-                            (?:dk|no)/programmer|
-                            se/program
-                        )
-                        /(?:[^/]+/)+(?P<id>[^/?#&]+)
+                        viafree\.(?P<country>dk|no|se)
+                        /(?P<id>program(?:mer)?/(?:[^/]+/)+[^/?#&]+)
                      '''
      _TESTS = [{
                      '''
      _TESTS = [{
-        'url': 'http://www.viafree.se/program/livsstil/husraddarna/sasong-2/avsnitt-2',
+        'url': 'http://www.viafree.no/programmer/underholdning/det-beste-vorspielet/sesong-2/episode-1',
          'info_dict': {
          'info_dict': {
-            'id': '395375',
+            'id': '757786',
              'ext': 'mp4',
              'ext': 'mp4',
-            'title': 'Husräddarna S02E02',
-            'description': 'md5:4db5c933e37db629b5a2f75dfb34829e',
-            'series': 'Husräddarna',
-            'season': 'Säsong 2',
+            'title': 'Det beste vorspielet - Sesong 2 - Episode 1',
+            'description': 'md5:b632cb848331404ccacd8cd03e83b4c3',
+            'series': 'Det beste vorspielet',
              'season_number': 2,
              'season_number': 2,
-            'duration': 2576,
-            'timestamp': 1400596321,
-            'upload_date': '20140520',
+            'duration': 1116,
+            'timestamp': 1471200600,
+            'upload_date': '20160814',
          },
          'params': {
              'skip_download': True,
          },
          },
          'params': {
              'skip_download': True,
          },
-        'add_ie': [TVPlayIE.ie_key()],
      }, {
          # with relatedClips
          'url': 'http://www.viafree.se/program/reality/sommaren-med-youtube-stjarnorna/sasong-1/avsnitt-1',
      }, {
          # with relatedClips
          'url': 'http://www.viafree.se/program/reality/sommaren-med-youtube-stjarnorna/sasong-1/avsnitt-1',
-        'info_dict': {
-            'id': '758770',
-            'ext': 'mp4',
-            'title': 'Sommaren med YouTube-stjärnorna S01E01',
-            'description': 'md5:2bc69dce2c4bb48391e858539bbb0e3f',
-            'series': 'Sommaren med YouTube-stjärnorna',
-            'season': 'Säsong 1',
-            'season_number': 1,
-            'duration': 1326,
-            'timestamp': 1470905572,
-            'upload_date': '20160811',
-        },
-        'params': {
-            'skip_download': True,
-        },
-        'add_ie': [TVPlayIE.ie_key()],
+        'only_matching': True,
      }, {
          # Different og:image URL schema
          'url': 'http://www.viafree.se/program/reality/sommaren-med-youtube-stjarnorna/sasong-1/avsnitt-2',
          'only_matching': True,
      }, {
      }, {
          # Different og:image URL schema
          'url': 'http://www.viafree.se/program/reality/sommaren-med-youtube-stjarnorna/sasong-1/avsnitt-2',
          'only_matching': True,
      }, {
-        'url': 'http://www.viafree.no/programmer/underholdning/det-beste-vorspielet/sesong-2/episode-1',
+        'url': 'http://www.viafree.se/program/livsstil/husraddarna/sasong-2/avsnitt-2',
          'only_matching': True,
      }, {
          'url': 'http://www.viafree.dk/programmer/reality/paradise-hotel/saeson-7/episode-5',
          'only_matching': True,
      }]
          'only_matching': True,
      }, {
          'url': 'http://www.viafree.dk/programmer/reality/paradise-hotel/saeson-7/episode-5',
          'only_matching': True,
      }]
+    _GEO_BYPASS = False
  
      @classmethod
      def suitable(cls, url):
          return False if TVPlayIE.suitable(url) else super(ViafreeIE, cls).suitable(url)
  
      def _real_extract(self, url):
  
      @classmethod
      def suitable(cls, url):
          return False if TVPlayIE.suitable(url) else super(ViafreeIE, cls).suitable(url)
  
      def _real_extract(self, url):
-        video_id = self._match_id(url)
+        country, path = re.match(self._VALID_URL, url).groups()
+        content = self._download_json(
+            'https://viafree-content.mtg-api.com/viafree-content/v1/%s/path/%s' % (country, path), path)
+        program = content['_embedded']['viafreeBlocks'][0]['_embedded']['program']
+        guid = program['guid']
+        meta = content['meta']
+        title = meta['title']
  
  
-        webpage = self._download_webpage(url, video_id)
+        try:
+            stream_href = self._download_json(
+                program['_links']['streamLink']['href'], guid,
+                headers=self.geo_verification_headers())['embedded']['prioritizedStreams'][0]['links']['stream']['href']
+        except ExtractorError as e:
+            if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
+                self.raise_geo_restricted(countries=[country])
+            raise
+
+        formats = self._extract_m3u8_formats(stream_href, guid, 'mp4')
+        self._sort_formats(formats)
+        episode = program.get('episode') or {}
  
  
-        data = self._parse_json(
-            self._search_regex(
-                r'(?s)window\.App\s*=\s*({.+?})\s*;\s*</script',
-                webpage, 'data', default='{}'),
-            video_id, transform_source=lambda x: re.sub(
-                r'(?s)function\s+[a-zA-Z_][\da-zA-Z_]*\s*\([^)]*\)\s*{[^}]*}\s*',
-                'null', x), fatal=False)
-
-        video_id = None
-
-        if data:
-            video_id = try_get(
-                data, lambda x: x['context']['dispatcher']['stores'][
-                    'ContentPageProgramStore']['currentVideo']['id'],
-                compat_str)
-
-        # Fallback #1 (extract from og:image URL schema)
-        if not video_id:
-            thumbnail = self._og_search_thumbnail(webpage, default=None)
-            if thumbnail:
-                video_id = self._search_regex(
-                    # Patterns seen:
-                    #  http://cdn.playapi.mtgx.tv/imagecache/600x315/cloud/content-images/inbox/765166/a2e95e5f1d735bab9f309fa345cc3f25.jpg
-                    #  http://cdn.playapi.mtgx.tv/imagecache/600x315/cloud/content-images/seasons/15204/758770/4a5ba509ca8bc043e1ebd1a76131cdf2.jpg
-                    r'https?://[^/]+/imagecache/(?:[^/]+/)+(\d{6,})/',
-                    thumbnail, 'video id', default=None)
-
-        # Fallback #2. Extract from raw JSON string.
-        # May extract wrong video id if relatedClips is present.
-        if not video_id:
-            video_id = self._search_regex(
-                r'currentVideo["\']\s*:\s*.+?["\']id["\']\s*:\s*["\'](\d{6,})',
-                webpage, 'video id')
-
-        return self.url_result(
-            smuggle_url(
-                'mtg:%s' % video_id,
-                {
-                    'geo_countries': [
-                        compat_urlparse.urlparse(url).netloc.rsplit('.', 1)[-1]],
-                    # rtmp host mtgfs.fplive.net for viafree is unresolvable
-                    'skip_rtmp': True,
-                }),
-            ie=TVPlayIE.ie_key(), video_id=video_id)
+        return {
+            'id': guid,
+            'title': title,
+            'thumbnail': meta.get('image'),
+            'description': meta.get('description'),
+            'series': episode.get('seriesTitle'),
+            'episode_number': int_or_none(episode.get('episodeNumber')),
+            'season_number': int_or_none(episode.get('seasonNumber')),
+            'duration': int_or_none(try_get(program, lambda x: x['video']['duration']['milliseconds']), 1000),
+            'timestamp': parse_iso8601(try_get(program, lambda x: x['availability']['start'])),
+            'formats': formats,
+        }
  
  
  class TVPlayHomeIE(InfoExtractor):
  
  
  class TVPlayHomeIE(InfoExtractor):
diff --git a/youtube_dl/extractor/twentyfourvideo.py b/youtube_dl/extractor/twentyfourvideo.py

index 1d66eeaff6e80cd1c629f79ad98a1a68f1d52564..74d14049b482a702bf464a40f2e5f361dc7cd72a 100644 (file)
--- a/youtube_dl/extractor/twentyfourvideo.py
+++ b/youtube_dl/extractor/twentyfourvideo.py
@@ -17,8 +17,8 @@ class TwentyFourVideoIE(InfoExtractor):
      _VALID_URL = r'''(?x)
                      https?://
                          (?P<host>
      _VALID_URL = r'''(?x)
                      https?://
                          (?P<host>
-                            (?:(?:www|porno)\.)?24video\.
-                            (?:net|me|xxx|sexy?|tube|adult|site)
+                            (?:(?:www|porno?)\.)?24video\.
+                            (?:net|me|xxx|sexy?|tube|adult|site|vip)
                          )/
                          (?:
                              video/(?:(?:view|xml)/)?|
                          )/
                          (?:
                              video/(?:(?:view|xml)/)?|
@@ -59,6 +59,12 @@ class TwentyFourVideoIE(InfoExtractor):
      }, {
          'url': 'https://porno.24video.net/video/2640421-vsya-takaya-gibkaya-i-v-masle',
          'only_matching': True,
      }, {
          'url': 'https://porno.24video.net/video/2640421-vsya-takaya-gibkaya-i-v-masle',
          'only_matching': True,
+    }, {
+        'url': 'https://www.24video.vip/video/view/1044982',
+        'only_matching': True,
+    }, {
+        'url': 'https://porn.24video.net/video/2640421-vsya-takay',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
      }]
  
      def _real_extract(self, url):
diff --git a/youtube_dl/extractor/twitch.py b/youtube_dl/extractor/twitch.py

index a8c2502af8132834a34b8ef9c8ade935dd432604..e211cd4c84cb4c713e937cf0e1eef83de7c40a93 100644 (file)
--- a/youtube_dl/extractor/twitch.py
+++ b/youtube_dl/extractor/twitch.py
@@ -21,6 +21,8 @@
      orderedSet,
      parse_duration,
      parse_iso8601,
      orderedSet,
      parse_duration,
      parse_iso8601,
+    qualities,
+    str_or_none,
      try_get,
      unified_timestamp,
      update_url_query,
      try_get,
      unified_timestamp,
      update_url_query,
@@ -50,8 +52,14 @@ def _handle_error(self, response):
  
      def _call_api(self, path, item_id, *args, **kwargs):
          headers = kwargs.get('headers', {}).copy()
  
      def _call_api(self, path, item_id, *args, **kwargs):
          headers = kwargs.get('headers', {}).copy()
-        headers['Client-ID'] = self._CLIENT_ID
-        kwargs['headers'] = headers
+        headers.update({
+            'Accept': 'application/vnd.twitchtv.v5+json; charset=UTF-8',
+            'Client-ID': self._CLIENT_ID,
+        })
+        kwargs.update({
+            'headers': headers,
+            'expected_status': (400, 410),
+        })
          response = self._download_json(
              '%s/%s' % (self._API_BASE, path), item_id,
              *args, **compat_kwargs(kwargs))
          response = self._download_json(
              '%s/%s' % (self._API_BASE, path), item_id,
              *args, **compat_kwargs(kwargs))
@@ -186,12 +194,27 @@ def _extract_info(self, info):
              is_live = False
          else:
              is_live = None
              is_live = False
          else:
              is_live = None
+        _QUALITIES = ('small', 'medium', 'large')
+        quality_key = qualities(_QUALITIES)
+        thumbnails = []
+        preview = info.get('preview')
+        if isinstance(preview, dict):
+            for thumbnail_id, thumbnail_url in preview.items():
+                thumbnail_url = url_or_none(thumbnail_url)
+                if not thumbnail_url:
+                    continue
+                if thumbnail_id not in _QUALITIES:
+                    continue
+                thumbnails.append({
+                    'url': thumbnail_url,
+                    'preference': quality_key(thumbnail_id),
+                })
          return {
              'id': info['_id'],
              'title': info.get('title') or 'Untitled Broadcast',
              'description': info.get('description'),
              'duration': int_or_none(info.get('length')),
          return {
              'id': info['_id'],
              'title': info.get('title') or 'Untitled Broadcast',
              'description': info.get('description'),
              'duration': int_or_none(info.get('length')),
-            'thumbnail': info.get('preview'),
+            'thumbnails': thumbnails,
              'uploader': info.get('channel', {}).get('display_name'),
              'uploader_id': info.get('channel', {}).get('name'),
              'timestamp': parse_iso8601(info.get('recorded_at')),
              'uploader': info.get('channel', {}).get('display_name'),
              'uploader_id': info.get('channel', {}).get('name'),
              'timestamp': parse_iso8601(info.get('recorded_at')),
@@ -572,11 +595,19 @@ def suitable(cls, url):
                  else super(TwitchStreamIE, cls).suitable(url))
  
      def _real_extract(self, url):
                  else super(TwitchStreamIE, cls).suitable(url))
  
      def _real_extract(self, url):
-        channel_id = self._match_id(url)
+        channel_name = self._match_id(url)
+
+        access_token = self._call_api(
+            'api/channels/%s/access_token' % channel_name, channel_name,
+            'Downloading access token JSON')
+
+        token = access_token['token']
+        channel_id = compat_str(self._parse_json(
+            token, channel_name)['channel_id'])
  
          stream = self._call_api(
  
          stream = self._call_api(
-            'kraken/streams/%s?stream_type=all' % channel_id, channel_id,
-            'Downloading stream JSON').get('stream')
+            'kraken/streams/%s?stream_type=all' % channel_id,
+            channel_id, 'Downloading stream JSON').get('stream')
  
          if not stream:
              raise ExtractorError('%s is offline' % channel_id, expected=True)
  
          if not stream:
              raise ExtractorError('%s is offline' % channel_id, expected=True)
@@ -585,11 +616,9 @@ def _real_extract(self, url):
          # (e.g. http://www.twitch.tv/TWITCHPLAYSPOKEMON) that will lead to constructing
          # an invalid m3u8 URL. Working around by use of original channel name from stream
          # JSON and fallback to lowercase if it's not available.
          # (e.g. http://www.twitch.tv/TWITCHPLAYSPOKEMON) that will lead to constructing
          # an invalid m3u8 URL. Working around by use of original channel name from stream
          # JSON and fallback to lowercase if it's not available.
-        channel_id = stream.get('channel', {}).get('name') or channel_id.lower()
-
-        access_token = self._call_api(
-            'api/channels/%s/access_token' % channel_id, channel_id,
-            'Downloading channel access token')
+        channel_name = try_get(
+            stream, lambda x: x['channel']['name'],
+            compat_str) or channel_name.lower()
  
          query = {
              'allow_source': 'true',
  
          query = {
              'allow_source': 'true',
@@ -600,11 +629,11 @@ def _real_extract(self, url):
              'playlist_include_framerate': 'true',
              'segment_preference': '4',
              'sig': access_token['sig'].encode('utf-8'),
              'playlist_include_framerate': 'true',
              'segment_preference': '4',
              'sig': access_token['sig'].encode('utf-8'),
-            'token': access_token['token'].encode('utf-8'),
+            'token': token.encode('utf-8'),
          }
          formats = self._extract_m3u8_formats(
              '%s/api/channel/hls/%s.m3u8?%s'
          }
          formats = self._extract_m3u8_formats(
              '%s/api/channel/hls/%s.m3u8?%s'
-            % (self._USHER_BASE, channel_id, compat_urllib_parse_urlencode(query)),
+            % (self._USHER_BASE, channel_name, compat_urllib_parse_urlencode(query)),
              channel_id, 'mp4')
          self._prefer_source(formats)
  
              channel_id, 'mp4')
          self._prefer_source(formats)
  
@@ -627,8 +656,8 @@ def _real_extract(self, url):
              })
  
          return {
              })
  
          return {
-            'id': compat_str(stream['_id']),
-            'display_id': channel_id,
+            'id': str_or_none(stream.get('_id')) or channel_id,
+            'display_id': channel_name,
              'title': title,
              'description': description,
              'thumbnails': thumbnails,
              'title': title,
              'description': description,
              'thumbnails': thumbnails,
@@ -643,7 +672,14 @@ def _real_extract(self, url):
  
  class TwitchClipsIE(TwitchBaseIE):
      IE_NAME = 'twitch:clips'
  
  class TwitchClipsIE(TwitchBaseIE):
      IE_NAME = 'twitch:clips'
-    _VALID_URL = r'https?://(?:clips\.twitch\.tv/(?:embed\?.*?\bclip=|(?:[^/]+/)*)|(?:www\.)?twitch\.tv/[^/]+/clip/)(?P<id>[^/?#&]+)'
+    _VALID_URL = r'''(?x)
+                    https?://
+                        (?:
+                            clips\.twitch\.tv/(?:embed\?.*?\bclip=|(?:[^/]+/)*)|
+                            (?:(?:www|go|m)\.)?twitch\.tv/[^/]+/clip/
+                        )
+                        (?P<id>[^/?#&]+)
+                    '''
  
      _TESTS = [{
          'url': 'https://clips.twitch.tv/FaintLightGullWholeWheat',
  
      _TESTS = [{
          'url': 'https://clips.twitch.tv/FaintLightGullWholeWheat',
@@ -669,6 +705,12 @@ class TwitchClipsIE(TwitchBaseIE):
      }, {
          'url': 'https://clips.twitch.tv/embed?clip=InquisitiveBreakableYogurtJebaited',
          'only_matching': True,
      }, {
          'url': 'https://clips.twitch.tv/embed?clip=InquisitiveBreakableYogurtJebaited',
          'only_matching': True,
+    }, {
+        'url': 'https://m.twitch.tv/rossbroadcast/clip/ConfidentBraveHumanChefFrank',
+        'only_matching': True,
+    }, {
+        'url': 'https://go.twitch.tv/rossbroadcast/clip/ConfidentBraveHumanChefFrank',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
      }]
  
      def _real_extract(self, url):
diff --git a/youtube_dl/extractor/twitter.py b/youtube_dl/extractor/twitter.py

index 5f8d90fb4e5c13d19cd0601f86a44eec5568fadc..4284487db4994b25990c4151afca71c48a271751 100644 (file)
--- a/youtube_dl/extractor/twitter.py
+++ b/youtube_dl/extractor/twitter.py
@@ -251,10 +251,10 @@ class TwitterIE(TwitterBaseIE):
          'info_dict': {
              'id': '700207533655363584',
              'ext': 'mp4',
          'info_dict': {
              'id': '700207533655363584',
              'ext': 'mp4',
-            'title': 'Simon Vertugo - BEAT PROD: @suhmeduh #Damndaniel',
+            'title': 'simon vetugo - BEAT PROD: @suhmeduh #Damndaniel',
              'description': 'BEAT PROD: @suhmeduh  https://t.co/HBrQ4AfpvZ #Damndaniel https://t.co/byBooq2ejZ',
              'thumbnail': r're:^https?://.*\.jpg',
              'description': 'BEAT PROD: @suhmeduh  https://t.co/HBrQ4AfpvZ #Damndaniel https://t.co/byBooq2ejZ',
              'thumbnail': r're:^https?://.*\.jpg',
-            'uploader': 'Simon Vertugo',
+            'uploader': 'simon vetugo',
              'uploader_id': 'simonvertugo',
              'duration': 30.0,
              'timestamp': 1455777459,
              'uploader_id': 'simonvertugo',
              'duration': 30.0,
              'timestamp': 1455777459,
@@ -376,6 +376,10 @@ class TwitterIE(TwitterBaseIE):
          # Twitch Clip Embed
          'url': 'https://twitter.com/GunB1g/status/1163218564784017422',
          'only_matching': True,
          # Twitch Clip Embed
          'url': 'https://twitter.com/GunB1g/status/1163218564784017422',
          'only_matching': True,
+    }, {
+        # promo_video_website card
+        'url': 'https://twitter.com/GunB1g/status/1163218564784017422',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
      }]
  
      def _real_extract(self, url):
@@ -458,10 +462,11 @@ def get_binding_value(k):
                      return try_get(o, lambda x: x[x['type'].lower() + '_value'])
  
                  card_name = card['name'].split(':')[-1]
                      return try_get(o, lambda x: x[x['type'].lower() + '_value'])
  
                  card_name = card['name'].split(':')[-1]
-                if card_name == 'amplify':
-                    formats = self._extract_formats_from_vmap_url(
-                        get_binding_value('amplify_url_vmap'),
-                        get_binding_value('amplify_content_id') or twid)
+                if card_name in ('amplify', 'promo_video_website'):
+                    is_amplify = card_name == 'amplify'
+                    vmap_url = get_binding_value('amplify_url_vmap') if is_amplify else get_binding_value('player_stream_url')
+                    content_id = get_binding_value('%s_content_id' % (card_name if is_amplify else 'player'))
+                    formats = self._extract_formats_from_vmap_url(vmap_url, content_id or twid)
                      self._sort_formats(formats)
  
                      thumbnails = []
                      self._sort_formats(formats)
  
                      thumbnails = []
@@ -573,6 +578,18 @@ class TwitterBroadcastIE(TwitterBaseIE, PeriscopeBaseIE):
      IE_NAME = 'twitter:broadcast'
      _VALID_URL = TwitterBaseIE._BASE_REGEX + r'i/broadcasts/(?P<id>[0-9a-zA-Z]{13})'
  
      IE_NAME = 'twitter:broadcast'
      _VALID_URL = TwitterBaseIE._BASE_REGEX + r'i/broadcasts/(?P<id>[0-9a-zA-Z]{13})'
  
+    _TEST = {
+        # untitled Periscope video
+        'url': 'https://twitter.com/i/broadcasts/1yNGaQLWpejGj',
+        'info_dict': {
+            'id': '1yNGaQLWpejGj',
+            'ext': 'mp4',
+            'title': 'Andrea May Sahouri - Periscope Broadcast',
+            'uploader': 'Andrea May Sahouri',
+            'uploader_id': '1PXEdBZWpGwKe',
+        },
+    }
+
      def _real_extract(self, url):
          broadcast_id = self._match_id(url)
          broadcast = self._call_api(
      def _real_extract(self, url):
          broadcast_id = self._match_id(url)
          broadcast = self._call_api(
diff --git a/youtube_dl/extractor/uol.py b/youtube_dl/extractor/uol.py

index 08f0c072e28b09dfbbde2f662b52f5de4cf46f3d..628adf2199ca7a1d7ebbc9ffece05e2447f59133 100644 (file)
--- a/youtube_dl/extractor/uol.py
+++ b/youtube_dl/extractor/uol.py
@@ -2,12 +2,17 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
+from ..compat import (
+    compat_str,
+    compat_urllib_parse_urlencode,
+)
  from ..utils import (
      clean_html,
      int_or_none,
      parse_duration,
  from ..utils import (
      clean_html,
      int_or_none,
      parse_duration,
+    parse_iso8601,
+    qualities,
      update_url_query,
      update_url_query,
-    str_or_none,
  )
  
  
  )
  
  
@@ -16,21 +21,25 @@ class UOLIE(InfoExtractor):
      _VALID_URL = r'https?://(?:.+?\.)?uol\.com\.br/.*?(?:(?:mediaId|v)=|view/(?:[a-z0-9]+/)?|video(?:=|/(?:\d{4}/\d{2}/\d{2}/)?))(?P<id>\d+|[\w-]+-[A-Z0-9]+)'
      _TESTS = [{
          'url': 'http://player.mais.uol.com.br/player_video_v3.swf?mediaId=15951931',
      _VALID_URL = r'https?://(?:.+?\.)?uol\.com\.br/.*?(?:(?:mediaId|v)=|view/(?:[a-z0-9]+/)?|video(?:=|/(?:\d{4}/\d{2}/\d{2}/)?))(?P<id>\d+|[\w-]+-[A-Z0-9]+)'
      _TESTS = [{
          'url': 'http://player.mais.uol.com.br/player_video_v3.swf?mediaId=15951931',
-        'md5': '25291da27dc45e0afb5718a8603d3816',
+        'md5': '4f1e26683979715ff64e4e29099cf020',
          'info_dict': {
              'id': '15951931',
              'ext': 'mp4',
              'title': 'Miss simpatia é encontrada morta',
              'description': 'md5:3f8c11a0c0556d66daf7e5b45ef823b2',
          'info_dict': {
              'id': '15951931',
              'ext': 'mp4',
              'title': 'Miss simpatia é encontrada morta',
              'description': 'md5:3f8c11a0c0556d66daf7e5b45ef823b2',
+            'timestamp': 1470421860,
+            'upload_date': '20160805',
          }
      }, {
          'url': 'http://tvuol.uol.com.br/video/incendio-destroi-uma-das-maiores-casas-noturnas-de-londres-04024E9A3268D4C95326',
          }
      }, {
          'url': 'http://tvuol.uol.com.br/video/incendio-destroi-uma-das-maiores-casas-noturnas-de-londres-04024E9A3268D4C95326',
-        'md5': 'e41a2fb7b7398a3a46b6af37b15c00c9',
+        'md5': '2850a0e8dfa0a7307e04a96c5bdc5bc2',
          'info_dict': {
              'id': '15954259',
              'ext': 'mp4',
              'title': 'Incêndio destrói uma das maiores casas noturnas de Londres',
              'description': 'Em Londres, um incêndio destruiu uma das maiores boates da cidade. Não há informações sobre vítimas.',
          'info_dict': {
              'id': '15954259',
              'ext': 'mp4',
              'title': 'Incêndio destrói uma das maiores casas noturnas de Londres',
              'description': 'Em Londres, um incêndio destruiu uma das maiores boates da cidade. Não há informações sobre vítimas.',
+            'timestamp': 1470674520,
+            'upload_date': '20160808',
          }
      }, {
          'url': 'http://mais.uol.com.br/static/uolplayer/index.html?mediaId=15951931',
          }
      }, {
          'url': 'http://mais.uol.com.br/static/uolplayer/index.html?mediaId=15951931',
@@ -55,91 +64,55 @@ class UOLIE(InfoExtractor):
          'only_matching': True,
      }]
  
          'only_matching': True,
      }]
  
-    _FORMATS = {
-        '2': {
-            'width': 640,
-            'height': 360,
-        },
-        '5': {
-            'width': 1280,
-            'height': 720,
-        },
-        '6': {
-            'width': 426,
-            'height': 240,
-        },
-        '7': {
-            'width': 1920,
-            'height': 1080,
-        },
-        '8': {
-            'width': 192,
-            'height': 144,
-        },
-        '9': {
-            'width': 568,
-            'height': 320,
-        },
-        '11': {
-            'width': 640,
-            'height': 360,
-        }
-    }
-
      def _real_extract(self, url):
          video_id = self._match_id(url)
      def _real_extract(self, url):
          video_id = self._match_id(url)
-        media_id = None
-
-        if video_id.isdigit():
-            media_id = video_id
-
-        if not media_id:
-            embed_page = self._download_webpage(
-                'https://jsuol.com.br/c/tv/uol/embed/?params=[embed,%s]' % video_id,
-                video_id, 'Downloading embed page', fatal=False)
-            if embed_page:
-                media_id = self._search_regex(
-                    (r'uol\.com\.br/(\d+)', r'mediaId=(\d+)'),
-                    embed_page, 'media id', default=None)
-
-        if not media_id:
-            webpage = self._download_webpage(url, video_id)
-            media_id = self._search_regex(r'mediaId=(\d+)', webpage, 'media id')
  
          video_data = self._download_json(
  
          video_data = self._download_json(
-            'http://mais.uol.com.br/apiuol/v3/player/getMedia/%s.json' % media_id,
-            media_id)['item']
+            # https://api.mais.uol.com.br/apiuol/v4/player/data/[MEDIA_ID]
+            'https://api.mais.uol.com.br/apiuol/v3/media/detail/' + video_id,
+            video_id)['item']
+        media_id = compat_str(video_data['mediaId'])
          title = video_data['title']
          title = video_data['title']
+        ver = video_data.get('revision', 2)
  
  
-        query = {
-            'ver': video_data.get('numRevision', 2),
-            'r': 'http://mais.uol.com.br',
-        }
-        for k in ('token', 'sign'):
-            v = video_data.get(k)
-            if v:
-                query[k] = v
-
+        uol_formats = self._download_json(
+            'https://croupier.mais.uol.com.br/v3/formats/%s/jsonp' % media_id,
+            media_id)
+        quality = qualities(['mobile', 'WEBM', '360p', '720p', '1080p'])
          formats = []
          formats = []
-        for f in video_data.get('formats', []):
+        for format_id, f in uol_formats.items():
+            if not isinstance(f, dict):
+                continue
              f_url = f.get('url') or f.get('secureUrl')
              if not f_url:
                  continue
              f_url = f.get('url') or f.get('secureUrl')
              if not f_url:
                  continue
+            query = {
+                'ver': ver,
+                'r': 'http://mais.uol.com.br',
+            }
+            for k in ('token', 'sign'):
+                v = f.get(k)
+                if v:
+                    query[k] = v
              f_url = update_url_query(f_url, query)
              f_url = update_url_query(f_url, query)
-            format_id = str_or_none(f.get('id'))
-            if format_id == '10':
-                formats.extend(self._extract_m3u8_formats(
-                    f_url, video_id, 'mp4', 'm3u8_native',
-                    m3u8_id='hls', fatal=False))
+            format_id = format_id
+            if format_id == 'HLS':
+                m3u8_formats = self._extract_m3u8_formats(
+                    f_url, media_id, 'mp4', 'm3u8_native',
+                    m3u8_id='hls', fatal=False)
+                encoded_query = compat_urllib_parse_urlencode(query)
+                for m3u8_f in m3u8_formats:
+                    m3u8_f['extra_param_to_segment_url'] = encoded_query
+                    m3u8_f['url'] = update_url_query(m3u8_f['url'], query)
+                formats.extend(m3u8_formats)
                  continue
                  continue
-            fmt = {
+            formats.append({
                  'format_id': format_id,
                  'url': f_url,
                  'format_id': format_id,
                  'url': f_url,
-                'source_preference': 1,
-            }
-            fmt.update(self._FORMATS.get(format_id, {}))
-            formats.append(fmt)
-        self._sort_formats(formats, ('height', 'width', 'source_preference', 'tbr', 'ext'))
+                'quality': quality(format_id),
+                'preference': -1,
+            })
+        self._sort_formats(formats)
  
          tags = []
          for tag in video_data.get('tags', []):
  
          tags = []
          for tag in video_data.get('tags', []):
@@ -148,12 +121,24 @@ def _real_extract(self, url):
                  continue
              tags.append(tag_description)
  
                  continue
              tags.append(tag_description)
  
+        thumbnails = []
+        for q in ('Small', 'Medium', 'Wmedium', 'Large', 'Wlarge', 'Xlarge'):
+            q_url = video_data.get('thumb' + q)
+            if not q_url:
+                continue
+            thumbnails.append({
+                'id': q,
+                'url': q_url,
+            })
+
          return {
              'id': media_id,
              'title': title,
          return {
              'id': media_id,
              'title': title,
-            'description': clean_html(video_data.get('desMedia')),
-            'thumbnail': video_data.get('thumbnail'),
-            'duration': int_or_none(video_data.get('durationSeconds')) or parse_duration(video_data.get('duration')),
+            'description': clean_html(video_data.get('description')),
+            'thumbnails': thumbnails,
+            'duration': parse_duration(video_data.get('duration')),
              'tags': tags,
              'formats': formats,
              'tags': tags,
              'formats': formats,
+            'timestamp': parse_iso8601(video_data.get('publishDate'), ' '),
+            'view_count': int_or_none(video_data.get('viewsQtty')),
          }
          }
diff --git a/youtube_dl/extractor/vice.py b/youtube_dl/extractor/vice.py

index 8fdfd743d04a6ee4319a64337ea67398d2243896..e37499512856234d9b989aec53dfd59c42647300 100644 (file)
--- a/youtube_dl/extractor/vice.py
+++ b/youtube_dl/extractor/vice.py
@@ -1,35 +1,50 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
  # coding: utf-8
  from __future__ import unicode_literals
  
-import re
-import time
+import functools
  import hashlib
  import json
  import random
  import hashlib
  import json
  import random
+import re
+import time
  
  from .adobepass import AdobePassIE
  
  from .adobepass import AdobePassIE
-from .youtube import YoutubeIE
  from .common import InfoExtractor
  from .common import InfoExtractor
+from .youtube import YoutubeIE
  from ..compat import (
      compat_HTTPError,
      compat_str,
  )
  from ..utils import (
  from ..compat import (
      compat_HTTPError,
      compat_str,
  )
  from ..utils import (
+    clean_html,
      ExtractorError,
      int_or_none,
      ExtractorError,
      int_or_none,
+    OnDemandPagedList,
      parse_age_limit,
      str_or_none,
      try_get,
  )
  
  
      parse_age_limit,
      str_or_none,
      try_get,
  )
  
  
-class ViceIE(AdobePassIE):
+class ViceBaseIE(InfoExtractor):
+    def _call_api(self, resource, resource_key, resource_id, locale, fields, args=''):
+        return self._download_json(
+            'https://video.vice.com/api/v1/graphql', resource_id, query={
+                'query': '''{
+  %s(locale: "%s", %s: "%s"%s) {
+    %s
+  }
+}''' % (resource, locale, resource_key, resource_id, args, fields),
+            })['data'][resource]
+
+
+class ViceIE(ViceBaseIE, AdobePassIE):
      IE_NAME = 'vice'
      IE_NAME = 'vice'
-    _VALID_URL = r'https?://(?:(?:video|vms)\.vice|(?:www\.)?viceland)\.com/(?P<locale>[^/]+)/(?:video/[^/]+|embed)/(?P<id>[\da-f]+)'
+    _VALID_URL = r'https?://(?:(?:video|vms)\.vice|(?:www\.)?vice(?:land|tv))\.com/(?P<locale>[^/]+)/(?:video/[^/]+|embed)/(?P<id>[\da-f]{24})'
      _TESTS = [{
          'url': 'https://video.vice.com/en_us/video/pet-cremator/58c69e38a55424f1227dc3f7',
          'info_dict': {
      _TESTS = [{
          'url': 'https://video.vice.com/en_us/video/pet-cremator/58c69e38a55424f1227dc3f7',
          'info_dict': {
-            'id': '5e647f0125e145c9aef2069412c0cbde',
+            'id': '58c69e38a55424f1227dc3f7',
              'ext': 'mp4',
              'title': '10 Questions You Always Wanted To Ask: Pet Cremator',
              'description': 'md5:fe856caacf61fe0e74fab15ce2b07ca5',
              'ext': 'mp4',
              'title': '10 Questions You Always Wanted To Ask: Pet Cremator',
              'description': 'md5:fe856caacf61fe0e74fab15ce2b07ca5',
@@ -43,17 +58,16 @@ class ViceIE(AdobePassIE):
              # m3u8 download
              'skip_download': True,
          },
              # m3u8 download
              'skip_download': True,
          },
-        'add_ie': ['UplynkPreplay'],
      }, {
          # geo restricted to US
          'url': 'https://video.vice.com/en_us/video/the-signal-from-tolva/5816510690b70e6c5fd39a56',
          'info_dict': {
      }, {
          # geo restricted to US
          'url': 'https://video.vice.com/en_us/video/the-signal-from-tolva/5816510690b70e6c5fd39a56',
          'info_dict': {
-            'id': '930c0ad1f47141cc955087eecaddb0e2',
+            'id': '5816510690b70e6c5fd39a56',
              'ext': 'mp4',
              'ext': 'mp4',
-            'uploader': 'waypoint',
+            'uploader': 'vice',
              'title': 'The Signal From Tölva',
              'description': 'md5:3927e3c79f9e8094606a2b3c5b5e55d5',
              'title': 'The Signal From Tölva',
              'description': 'md5:3927e3c79f9e8094606a2b3c5b5e55d5',
-            'uploader_id': '57f7d621e05ca860fa9ccaf9',
+            'uploader_id': '57a204088cb727dec794c67b',
              'timestamp': 1477941983,
              'upload_date': '20161031',
          },
              'timestamp': 1477941983,
              'upload_date': '20161031',
          },
@@ -61,15 +75,14 @@ class ViceIE(AdobePassIE):
              # m3u8 download
              'skip_download': True,
          },
              # m3u8 download
              'skip_download': True,
          },
-        'add_ie': ['UplynkPreplay'],
      }, {
          'url': 'https://video.vice.com/alps/video/ulfs-wien-beruchtigste-grafitti-crew-part-1/581b12b60a0e1f4c0fb6ea2f',
          'info_dict': {
              'id': '581b12b60a0e1f4c0fb6ea2f',
              'ext': 'mp4',
              'title': 'ULFs - Wien berüchtigste Grafitti Crew - Part 1',
      }, {
          'url': 'https://video.vice.com/alps/video/ulfs-wien-beruchtigste-grafitti-crew-part-1/581b12b60a0e1f4c0fb6ea2f',
          'info_dict': {
              'id': '581b12b60a0e1f4c0fb6ea2f',
              'ext': 'mp4',
              'title': 'ULFs - Wien berüchtigste Grafitti Crew - Part 1',
-            'description': '<p>Zwischen Hinterzimmer-Tattoos und U-Bahnschächten erzählen uns die Ulfs, wie es ist, "süchtig nach Sachbeschädigung" zu sein.</p>',
-            'uploader': 'VICE',
+            'description': 'Zwischen Hinterzimmer-Tattoos und U-Bahnschächten erzählen uns die Ulfs, wie es ist, "süchtig nach Sachbeschädigung" zu sein.',
+            'uploader': 'vice',
              'uploader_id': '57a204088cb727dec794c67b',
              'timestamp': 1485368119,
              'upload_date': '20170125',
              'uploader_id': '57a204088cb727dec794c67b',
              'timestamp': 1485368119,
              'upload_date': '20170125',
@@ -78,9 +91,7 @@ class ViceIE(AdobePassIE):
          'params': {
              # AES-encrypted m3u8
              'skip_download': True,
          'params': {
              # AES-encrypted m3u8
              'skip_download': True,
-            'proxy': '127.0.0.1:8118',
          },
          },
-        'add_ie': ['UplynkPreplay'],
      }, {
          'url': 'https://video.vice.com/en_us/video/pizza-show-trailer/56d8c9a54d286ed92f7f30e4',
          'only_matching': True,
      }, {
          'url': 'https://video.vice.com/en_us/video/pizza-show-trailer/56d8c9a54d286ed92f7f30e4',
          'only_matching': True,
@@ -98,7 +109,7 @@ class ViceIE(AdobePassIE):
      @staticmethod
      def _extract_urls(webpage):
          return re.findall(
      @staticmethod
      def _extract_urls(webpage):
          return re.findall(
-            r'<iframe\b[^>]+\bsrc=["\']((?:https?:)?//video\.vice\.com/[^/]+/embed/[\da-f]+)',
+            r'<iframe\b[^>]+\bsrc=["\']((?:https?:)?//video\.vice\.com/[^/]+/embed/[\da-f]{24})',
              webpage)
  
      @staticmethod
              webpage)
  
      @staticmethod
@@ -109,31 +120,16 @@ def _extract_url(webpage):
      def _real_extract(self, url):
          locale, video_id = re.match(self._VALID_URL, url).groups()
  
      def _real_extract(self, url):
          locale, video_id = re.match(self._VALID_URL, url).groups()
  
-        webpage = self._download_webpage(
-            'https://video.vice.com/%s/embed/%s' % (locale, video_id),
-            video_id)
-
-        video = self._parse_json(
-            self._search_regex(
-                r'PREFETCH_DATA\s*=\s*({.+?})\s*;\s*\n', webpage,
-                'app state'), video_id)['video']
-        video_id = video.get('vms_id') or video.get('id') or video_id
-        title = video['title']
-        is_locked = video.get('locked')
+        video = self._call_api('videos', 'id', video_id, locale, '''body
+    locked
+    rating
+    thumbnail_url
+    title''')[0]
+        title = video['title'].strip()
          rating = video.get('rating')
          rating = video.get('rating')
-        thumbnail = video.get('thumbnail_url')
-        duration = int_or_none(video.get('duration'))
-        series = try_get(
-            video, lambda x: x['episode']['season']['show']['title'],
-            compat_str)
-        episode_number = try_get(
-            video, lambda x: x['episode']['episode_number'])
-        season_number = try_get(
-            video, lambda x: x['episode']['season']['season_number'])
-        uploader = None
  
          query = {}
  
          query = {}
-        if is_locked:
+        if video.get('locked'):
              resource = self._get_mvpd_resource(
                  'VICELAND', title, video_id, rating)
              query['tvetoken'] = self._extract_mvpd_auth(
              resource = self._get_mvpd_resource(
                  'VICELAND', title, video_id, rating)
              query['tvetoken'] = self._extract_mvpd_auth(
@@ -148,12 +144,9 @@ def _real_extract(self, url):
          query.update({
              'exp': exp,
              'sign': hashlib.sha512(('%s:GET:%d' % (video_id, exp)).encode()).hexdigest(),
          query.update({
              'exp': exp,
              'sign': hashlib.sha512(('%s:GET:%d' % (video_id, exp)).encode()).hexdigest(),
-            '_ad_blocked': None,
-            '_ad_unit': '',
-            '_debug': '',
+            'skipadstitching': 1,
              'platform': 'desktop',
              'rn': random.randint(10000, 100000),
              'platform': 'desktop',
              'rn': random.randint(10000, 100000),
-            'fbprebidtoken': '',
          })
  
          try:
          })
  
          try:
@@ -169,85 +162,94 @@ def _real_extract(self, url):
              raise
  
          video_data = preplay['video']
              raise
  
          video_data = preplay['video']
-        base = video_data['base']
-        uplynk_preplay_url = preplay['preplayURL']
-        episode = video_data.get('episode', {})
-        channel = video_data.get('channel', {})
+        formats = self._extract_m3u8_formats(
+            preplay['playURL'], video_id, 'mp4', 'm3u8_native')
+        self._sort_formats(formats)
+        episode = video_data.get('episode') or {}
+        channel = video_data.get('channel') or {}
+        season = video_data.get('season') or {}
  
          subtitles = {}
  
          subtitles = {}
-        cc_url = preplay.get('ccURL')
-        if cc_url:
-            subtitles['en'] = [{
+        for subtitle in preplay.get('subtitleURLs', []):
+            cc_url = subtitle.get('url')
+            if not cc_url:
+                continue
+            language_code = try_get(subtitle, lambda x: x['languages'][0]['language_code'], compat_str) or 'en'
+            subtitles.setdefault(language_code, []).append({
                  'url': cc_url,
                  'url': cc_url,
-            }]
+            })
  
          return {
  
          return {
-            '_type': 'url_transparent',
-            'url': uplynk_preplay_url,
+            'formats': formats,
              'id': video_id,
              'title': title,
              'id': video_id,
              'title': title,
-            'description': base.get('body') or base.get('display_body'),
-            'thumbnail': thumbnail,
-            'duration': int_or_none(video_data.get('video_duration')) or duration,
+            'description': clean_html(video.get('body')),
+            'thumbnail': video.get('thumbnail_url'),
+            'duration': int_or_none(video_data.get('video_duration')),
              'timestamp': int_or_none(video_data.get('created_at'), 1000),
              'timestamp': int_or_none(video_data.get('created_at'), 1000),
-            'age_limit': parse_age_limit(video_data.get('video_rating')),
-            'series': video_data.get('show_title') or series,
-            'episode_number': int_or_none(episode.get('episode_number') or episode_number),
+            'age_limit': parse_age_limit(video_data.get('video_rating') or rating),
+            'series': try_get(video_data, lambda x: x['show']['base']['display_title'], compat_str),
+            'episode_number': int_or_none(episode.get('episode_number')),
              'episode_id': str_or_none(episode.get('id') or video_data.get('episode_id')),
              'episode_id': str_or_none(episode.get('id') or video_data.get('episode_id')),
-            'season_number': int_or_none(season_number),
-            'season_id': str_or_none(episode.get('season_id')),
-            'uploader': channel.get('base', {}).get('title') or channel.get('name') or uploader,
+            'season_number': int_or_none(season.get('season_number')),
+            'season_id': str_or_none(season.get('id') or video_data.get('season_id')),
+            'uploader': channel.get('name'),
              'uploader_id': str_or_none(channel.get('id')),
              'subtitles': subtitles,
              'uploader_id': str_or_none(channel.get('id')),
              'subtitles': subtitles,
-            'ie_key': 'UplynkPreplay',
          }
  
  
          }
  
  
-class ViceShowIE(InfoExtractor):
+class ViceShowIE(ViceBaseIE):
      IE_NAME = 'vice:show'
      IE_NAME = 'vice:show'
-    _VALID_URL = r'https?://(?:.+?\.)?vice\.com/(?:[^/]+/)?show/(?P<id>[^/?#&]+)'
-
-    _TEST = {
-        'url': 'https://munchies.vice.com/en/show/fuck-thats-delicious-2',
+    _VALID_URL = r'https?://(?:video\.vice|(?:www\.)?vice(?:land|tv))\.com/(?P<locale>[^/]+)/show/(?P<id>[^/?#&]+)'
+    _PAGE_SIZE = 25
+    _TESTS = [{
+        'url': 'https://video.vice.com/en_us/show/fck-thats-delicious',
          'info_dict': {
          'info_dict': {
-            'id': 'fuck-thats-delicious-2',
-            'title': "Fuck, That's Delicious",
-            'description': 'Follow the culinary adventures of rapper Action Bronson during his ongoing world tour.',
+            'id': '57a2040c8cb727dec794c901',
+            'title': 'F*ck, That’s Delicious',
+            'description': 'The life and eating habits of rap’s greatest bon vivant, Action Bronson.',
          },
          },
-        'playlist_count': 17,
-    }
+        'playlist_mincount': 64,
+    }, {
+        'url': 'https://www.vicetv.com/en_us/show/fck-thats-delicious',
+        'only_matching': True,
+    }]
  
  
-    def _real_extract(self, url):
-        show_id = self._match_id(url)
-        webpage = self._download_webpage(url, show_id)
+    def _fetch_page(self, locale, show_id, page):
+        videos = self._call_api('videos', 'show_id', show_id, locale, '''body
+    id
+    url''', ', page: %d, per_page: %d' % (page + 1, self._PAGE_SIZE))
+        for video in videos:
+            yield self.url_result(
+                video['url'], ViceIE.ie_key(), video.get('id'))
  
  
-        entries = [
-            self.url_result(video_url, ViceIE.ie_key())
-            for video_url, _ in re.findall(
-                r'<h2[^>]+class="article-title"[^>]+data-id="\d+"[^>]*>\s*<a[^>]+href="(%s.*?)"'
-                % ViceIE._VALID_URL, webpage)]
+    def _real_extract(self, url):
+        locale, display_id = re.match(self._VALID_URL, url).groups()
+        show = self._call_api('shows', 'slug', display_id, locale, '''dek
+    id
+    title''')[0]
+        show_id = show['id']
  
  
-        title = self._search_regex(
-            r'<title>(.+?)</title>', webpage, 'title', default=None)
-        if title:
-            title = re.sub(r'(.+)\s*\|\s*.+$', r'\1', title).strip()
-        description = self._html_search_meta(
-            'description', webpage, 'description')
+        entries = OnDemandPagedList(
+            functools.partial(self._fetch_page, locale, show_id),
+            self._PAGE_SIZE)
  
  
-        return self.playlist_result(entries, show_id, title, description)
+        return self.playlist_result(
+            entries, show_id, show.get('title'), show.get('dek'))
  
  
  
  
-class ViceArticleIE(InfoExtractor):
+class ViceArticleIE(ViceBaseIE):
      IE_NAME = 'vice:article'
      IE_NAME = 'vice:article'
-    _VALID_URL = r'https://www\.vice\.com/[^/]+/article/(?P<id>[^?#]+)'
+    _VALID_URL = r'https://(?:www\.)?vice\.com/(?P<locale>[^/]+)/article/(?:[0-9a-z]{6}/)?(?P<id>[^?#]+)'
  
      _TESTS = [{
          'url': 'https://www.vice.com/en_us/article/on-set-with-the-woman-making-mormon-porn-in-utah',
          'info_dict': {
  
      _TESTS = [{
          'url': 'https://www.vice.com/en_us/article/on-set-with-the-woman-making-mormon-porn-in-utah',
          'info_dict': {
-            'id': '41eae2a47b174a1398357cec55f1f6fc',
+            'id': '58dc0a3dee202d2a0ccfcbd8',
              'ext': 'mp4',
              'ext': 'mp4',
-            'title': 'Mormon War on Porn ',
-            'description': 'md5:6394a8398506581d0346b9ab89093fef',
+            'title': 'Mormon War on Porn',
+            'description': 'md5:1c5d91fe25fa8aa304f9def118b92dbf',
              'uploader': 'vice',
              'uploader_id': '57a204088cb727dec794c67b',
              'timestamp': 1491883129,
              'uploader': 'vice',
              'uploader_id': '57a204088cb727dec794c67b',
              'timestamp': 1491883129,
@@ -258,10 +260,10 @@ class ViceArticleIE(InfoExtractor):
              # AES-encrypted m3u8
              'skip_download': True,
          },
              # AES-encrypted m3u8
              'skip_download': True,
          },
-        'add_ie': ['UplynkPreplay'],
+        'add_ie': [ViceIE.ie_key()],
      }, {
          'url': 'https://www.vice.com/en_us/article/how-to-hack-a-car',
      }, {
          'url': 'https://www.vice.com/en_us/article/how-to-hack-a-car',
-        'md5': '7fe8ebc4fa3323efafc127b82bd821d9',
+        'md5': '13010ee0bc694ea87ec40724397c2349',
          'info_dict': {
              'id': '3jstaBeXgAs',
              'ext': 'mp4',
          'info_dict': {
              'id': '3jstaBeXgAs',
              'ext': 'mp4',
@@ -271,15 +273,15 @@ class ViceArticleIE(InfoExtractor):
              'uploader_id': 'MotherboardTV',
              'upload_date': '20140529',
          },
              'uploader_id': 'MotherboardTV',
              'upload_date': '20140529',
          },
-        'add_ie': ['Youtube'],
+        'add_ie': [YoutubeIE.ie_key()],
      }, {
          'url': 'https://www.vice.com/en_us/article/znm9dx/karley-sciortino-slutever-reloaded',
          'md5': 'a7ecf64ee4fa19b916c16f4b56184ae2',
          'info_dict': {
      }, {
          'url': 'https://www.vice.com/en_us/article/znm9dx/karley-sciortino-slutever-reloaded',
          'md5': 'a7ecf64ee4fa19b916c16f4b56184ae2',
          'info_dict': {
-            'id': 'e2ed435eb67e43efb66e6ef9a6930a88',
+            'id': '57f41d3556a0a80f54726060',
              'ext': 'mp4',
              'title': "Making The World's First Male Sex Doll",
              'ext': 'mp4',
              'title': "Making The World's First Male Sex Doll",
-            'description': 'md5:916078ef0e032d76343116208b6cc2c4',
+            'description': 'md5:19b00b215b99961cf869c40fbe9df755',
              'uploader': 'vice',
              'uploader_id': '57a204088cb727dec794c67b',
              'timestamp': 1476919911,
              'uploader': 'vice',
              'uploader_id': '57a204088cb727dec794c67b',
              'timestamp': 1476919911,
@@ -288,6 +290,7 @@ class ViceArticleIE(InfoExtractor):
          },
          'params': {
              'skip_download': True,
          },
          'params': {
              'skip_download': True,
+            'format': 'bestvideo',
          },
          'add_ie': [ViceIE.ie_key()],
      }, {
          },
          'add_ie': [ViceIE.ie_key()],
      }, {
@@ -299,14 +302,11 @@ class ViceArticleIE(InfoExtractor):
      }]
  
      def _real_extract(self, url):
      }]
  
      def _real_extract(self, url):
-        display_id = self._match_id(url)
-
-        webpage = self._download_webpage(url, display_id)
+        locale, display_id = re.match(self._VALID_URL, url).groups()
  
  
-        prefetch_data = self._parse_json(self._search_regex(
-            r'__APP_STATE\s*=\s*({.+?})(?:\s*\|\|\s*{}\s*)?;\s*\n',
-            webpage, 'app state'), display_id)['pageData']
-        body = prefetch_data['body']
+        article = self._call_api('articles', 'slug', display_id, locale, '''body
+    embed_code''')[0]
+        body = article['body']
  
          def _url_res(video_url, ie_key):
              return {
  
          def _url_res(video_url, ie_key):
              return {
@@ -316,7 +316,7 @@ def _url_res(video_url, ie_key):
                  'ie_key': ie_key,
              }
  
                  'ie_key': ie_key,
              }
  
-        vice_url = ViceIE._extract_url(webpage)
+        vice_url = ViceIE._extract_url(body)
          if vice_url:
              return _url_res(vice_url, ViceIE.ie_key())
  
          if vice_url:
              return _url_res(vice_url, ViceIE.ie_key())
  
@@ -332,6 +332,6 @@ def _url_res(video_url, ie_key):
  
          video_url = self._html_search_regex(
              r'data-video-url="([^"]+)"',
  
          video_url = self._html_search_regex(
              r'data-video-url="([^"]+)"',
-            prefetch_data['embed_code'], 'video URL')
+            article['embed_code'], 'video URL')
  
          return _url_res(video_url, ViceIE.ie_key())
  
          return _url_res(video_url, ViceIE.ie_key())
diff --git a/youtube_dl/extractor/viewlift.py b/youtube_dl/extractor/viewlift.py

index 851ad936cfc012b02c3125c9a0a3e898f9c6f005..d6b92b1c833072bab2bbacb8503693b6da156427 100644 (file)
--- a/youtube_dl/extractor/viewlift.py
+++ b/youtube_dl/extractor/viewlift.py
@@ -1,28 +1,62 @@
  from __future__ import unicode_literals
  
  from __future__ import unicode_literals
  
-import base64
+import json
  import re
  
  from .common import InfoExtractor
  import re
  
  from .common import InfoExtractor
-from ..compat import compat_urllib_parse_unquote
+from ..compat import compat_HTTPError
  from ..utils import (
      ExtractorError,
  from ..utils import (
      ExtractorError,
-    clean_html,
-    determine_ext,
      int_or_none,
      int_or_none,
-    js_to_json,
      parse_age_limit,
      parse_age_limit,
-    parse_duration,
-    try_get,
  )
  
  
  class ViewLiftBaseIE(InfoExtractor):
  )
  
  
  class ViewLiftBaseIE(InfoExtractor):
-    _DOMAINS_REGEX = r'(?:(?:main\.)?snagfilms|snagxtreme|funnyforfree|kiddovid|winnersview|(?:monumental|lax)sportsnetwork|vayafilm)\.com|hoichoi\.tv'
+    _API_BASE = 'https://prod-api.viewlift.com/'
+    _DOMAINS_REGEX = r'(?:(?:main\.)?snagfilms|snagxtreme|funnyforfree|kiddovid|winnersview|(?:monumental|lax)sportsnetwork|vayafilm|failarmy|ftfnext|lnppass\.legapallacanestro|moviespree|app\.myoutdoortv|neoufitness|pflmma|theidentitytb)\.com|(?:hoichoi|app\.horseandcountry|kronon|marquee|supercrosslive)\.tv'
+    _SITE_MAP = {
+        'ftfnext': 'lax',
+        'funnyforfree': 'snagfilms',
+        'hoichoi': 'hoichoitv',
+        'kiddovid': 'snagfilms',
+        'laxsportsnetwork': 'lax',
+        'legapallacanestro': 'lnp',
+        'marquee': 'marquee-tv',
+        'monumentalsportsnetwork': 'monumental-network',
+        'moviespree': 'bingeflix',
+        'pflmma': 'pfl',
+        'snagxtreme': 'snagfilms',
+        'theidentitytb': 'tampabay',
+        'vayafilm': 'snagfilms',
+    }
+    _TOKENS = {}
+
+    def _call_api(self, site, path, video_id, query):
+        token = self._TOKENS.get(site)
+        if not token:
+            token_query = {'site': site}
+            email, password = self._get_login_info(netrc_machine=site)
+            if email:
+                resp = self._download_json(
+                    self._API_BASE + 'identity/signin', video_id,
+                    'Logging in', query=token_query, data=json.dumps({
+                        'email': email,
+                        'password': password,
+                    }).encode())
+            else:
+                resp = self._download_json(
+                    self._API_BASE + 'identity/anonymous-token', video_id,
+                    'Downloading authorization token', query=token_query)
+            self._TOKENS[site] = token = resp['authorizationToken']
+        return self._download_json(
+            self._API_BASE + path, video_id,
+            headers={'Authorization': token}, query=query)
  
  
  class ViewLiftEmbedIE(ViewLiftBaseIE):
  
  
  class ViewLiftEmbedIE(ViewLiftBaseIE):
-    _VALID_URL = r'https?://(?:(?:www|embed)\.)?(?:%s)/embed/player\?.*\bfilmId=(?P<id>[\da-f]{8}-(?:[\da-f]{4}-){3}[\da-f]{12})' % ViewLiftBaseIE._DOMAINS_REGEX
+    IE_NAME = 'viewlift:embed'
+    _VALID_URL = r'https?://(?:(?:www|embed)\.)?(?P<domain>%s)/embed/player\?.*\bfilmId=(?P<id>[\da-f]{8}-(?:[\da-f]{4}-){3}[\da-f]{12})' % ViewLiftBaseIE._DOMAINS_REGEX
      _TESTS = [{
          'url': 'http://embed.snagfilms.com/embed/player?filmId=74849a00-85a9-11e1-9660-123139220831&w=500',
          'md5': '2924e9215c6eff7a55ed35b72276bd93',
      _TESTS = [{
          'url': 'http://embed.snagfilms.com/embed/player?filmId=74849a00-85a9-11e1-9660-123139220831&w=500',
          'md5': '2924e9215c6eff7a55ed35b72276bd93',
@@ -30,6 +64,9 @@ class ViewLiftEmbedIE(ViewLiftBaseIE):
              'id': '74849a00-85a9-11e1-9660-123139220831',
              'ext': 'mp4',
              'title': '#whilewewatch',
              'id': '74849a00-85a9-11e1-9660-123139220831',
              'ext': 'mp4',
              'title': '#whilewewatch',
+            'description': 'md5:b542bef32a6f657dadd0df06e26fb0c8',
+            'timestamp': 1334350096,
+            'upload_date': '20120413',
          }
      }, {
          # invalid labels, 360p is better that 480p
          }
      }, {
          # invalid labels, 360p is better that 480p
@@ -39,7 +76,8 @@ class ViewLiftEmbedIE(ViewLiftBaseIE):
              'id': '17ca0950-a74a-11e0-a92a-0026bb61d036',
              'ext': 'mp4',
              'title': 'Life in Limbo',
              'id': '17ca0950-a74a-11e0-a92a-0026bb61d036',
              'ext': 'mp4',
              'title': 'Life in Limbo',
-        }
+        },
+        'skip': 'The video does not exist',
      }, {
          'url': 'http://www.snagfilms.com/embed/player?filmId=0000014c-de2f-d5d6-abcf-ffef58af0017',
          'only_matching': True,
      }, {
          'url': 'http://www.snagfilms.com/embed/player?filmId=0000014c-de2f-d5d6-abcf-ffef58af0017',
          'only_matching': True,
@@ -54,67 +92,68 @@ def _extract_url(webpage):
              return mobj.group('url')
  
      def _real_extract(self, url):
              return mobj.group('url')
  
      def _real_extract(self, url):
-        video_id = self._match_id(url)
-
-        webpage = self._download_webpage(url, video_id)
-
-        if '>This film is not playable in your area.<' in webpage:
-            raise ExtractorError(
-                'Film %s is not playable in your area.' % video_id, expected=True)
+        domain, film_id = re.match(self._VALID_URL, url).groups()
+        site = domain.split('.')[-2]
+        if site in self._SITE_MAP:
+            site = self._SITE_MAP[site]
+        try:
+            content_data = self._call_api(
+                site, 'entitlement/video/status', film_id, {
+                    'id': film_id
+                })['video']
+        except ExtractorError as e:
+            if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
+                error_message = self._parse_json(e.cause.read().decode(), film_id).get('errorMessage')
+                if error_message == 'User does not have a valid subscription or has not purchased this content.':
+                    self.raise_login_required()
+                raise ExtractorError(error_message, expected=True)
+            raise
+        gist = content_data['gist']
+        title = gist['title']
+        video_assets = content_data['streamingInfo']['videoAssets']
  
          formats = []
  
          formats = []
-        has_bitrate = False
-        sources = self._parse_json(self._search_regex(
-            r'(?s)sources:\s*(\[.+?\]),', webpage,
-            'sources', default='[]'), video_id, js_to_json)
-        for source in sources:
-            file_ = source.get('file')
-            if not file_:
+        mpeg_video_assets = video_assets.get('mpeg') or []
+        for video_asset in mpeg_video_assets:
+            video_asset_url = video_asset.get('url')
+            if not video_asset:
                  continue
                  continue
-            type_ = source.get('type')
-            ext = determine_ext(file_)
-            format_id = source.get('label') or ext
-            if all(v in ('m3u8', 'hls') for v in (type_, ext)):
-                formats.extend(self._extract_m3u8_formats(
-                    file_, video_id, 'mp4', 'm3u8_native',
-                    m3u8_id='hls', fatal=False))
-            else:
-                bitrate = int_or_none(self._search_regex(
-                    [r'(\d+)kbps', r'_\d{1,2}x\d{1,2}_(\d{3,})\.%s' % ext],
-                    file_, 'bitrate', default=None))
-                if not has_bitrate and bitrate:
-                    has_bitrate = True
-                height = int_or_none(self._search_regex(
-                    r'^(\d+)[pP]$', format_id, 'height', default=None))
-                formats.append({
-                    'url': file_,
-                    'format_id': 'http-%s%s' % (format_id, ('-%dk' % bitrate if bitrate else '')),
-                    'tbr': bitrate,
-                    'height': height,
-                })
-        if not formats:
-            hls_url = self._parse_json(self._search_regex(
-                r'filmInfo\.src\s*=\s*({.+?});',
-                webpage, 'src'), video_id, js_to_json)['src']
-            formats = self._extract_m3u8_formats(
-                hls_url, video_id, 'mp4', 'm3u8_native',
-                m3u8_id='hls', fatal=False)
-        field_preference = None if has_bitrate else ('height', 'tbr', 'format_id')
-        self._sort_formats(formats, field_preference)
-
-        title = self._search_regex(
-            [r"title\s*:\s*'([^']+)'", r'<title>([^<]+)</title>'],
-            webpage, 'title')
-
-        return {
-            'id': video_id,
+            bitrate = int_or_none(video_asset.get('bitrate'))
+            height = int_or_none(self._search_regex(
+                r'^_?(\d+)[pP]$', video_asset.get('renditionValue'),
+                'height', default=None))
+            formats.append({
+                'url': video_asset_url,
+                'format_id': 'http%s' % ('-%d' % bitrate if bitrate else ''),
+                'tbr': bitrate,
+                'height': height,
+                'vcodec': video_asset.get('codec'),
+            })
+
+        hls_url = video_assets.get('hls')
+        if hls_url:
+            formats.extend(self._extract_m3u8_formats(
+                hls_url, film_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
+        self._sort_formats(formats, ('height', 'tbr', 'format_id'))
+
+        info = {
+            'id': film_id,
              'title': title,
              'title': title,
+            'description': gist.get('description'),
+            'thumbnail': gist.get('videoImageUrl'),
+            'duration': int_or_none(gist.get('runtime')),
+            'age_limit': parse_age_limit(content_data.get('parentalRating')),
+            'timestamp': int_or_none(gist.get('publishDate'), 1000),
              'formats': formats,
          }
              'formats': formats,
          }
+        for k in ('categories', 'tags'):
+            info[k] = [v['title'] for v in content_data.get(k, []) if v.get('title')]
+        return info
  
  
  class ViewLiftIE(ViewLiftBaseIE):
  
  
  class ViewLiftIE(ViewLiftBaseIE):
-    _VALID_URL = r'https?://(?:www\.)?(?P<domain>%s)(?:/(?:films/title|show|(?:news/)?videos?))?/(?P<id>[^?#]+)' % ViewLiftBaseIE._DOMAINS_REGEX
+    IE_NAME = 'viewlift'
+    _VALID_URL = r'https?://(?:www\.)?(?P<domain>%s)(?P<path>(?:/(?:films/title|show|(?:news/)?videos?|watch))?/(?P<id>[^?#]+))' % ViewLiftBaseIE._DOMAINS_REGEX
      _TESTS = [{
          'url': 'http://www.snagfilms.com/films/title/lost_for_life',
          'md5': '19844f897b35af219773fd63bdec2942',
      _TESTS = [{
          'url': 'http://www.snagfilms.com/films/title/lost_for_life',
          'md5': '19844f897b35af219773fd63bdec2942',
@@ -151,10 +190,13 @@ class ViewLiftIE(ViewLiftBaseIE):
              'id': '00000148-7b53-de26-a9fb-fbf306f70020',
              'display_id': 'augie_alone/s_2_ep_12_love',
              'ext': 'mp4',
              'id': '00000148-7b53-de26-a9fb-fbf306f70020',
              'display_id': 'augie_alone/s_2_ep_12_love',
              'ext': 'mp4',
-            'title': 'Augie, Alone:S. 2 Ep. 12 - Love',
-            'description': 'md5:db2a5c72d994f16a780c1eb353a8f403',
+            'title': 'S. 2 Ep. 12 - Love',
+            'description': 'Augie finds love.',
              'thumbnail': r're:^https?://.*\.jpg',
              'duration': 107,
              'thumbnail': r're:^https?://.*\.jpg',
              'duration': 107,
+            'upload_date': '20141012',
+            'timestamp': 1413129540,
+            'age_limit': 17,
          },
          'params': {
              'skip_download': True,
          },
          'params': {
              'skip_download': True,
@@ -177,6 +219,9 @@ class ViewLiftIE(ViewLiftBaseIE):
          # Was once Kaltura embed
          'url': 'https://www.monumentalsportsnetwork.com/videos/john-carlson-postgame-2-25-15',
          'only_matching': True,
          # Was once Kaltura embed
          'url': 'https://www.monumentalsportsnetwork.com/videos/john-carlson-postgame-2-25-15',
          'only_matching': True,
+    }, {
+        'url': 'https://www.marquee.tv/watch/sadlerswells-sacredmonsters',
+        'only_matching': True,
      }]
  
      @classmethod
      }]
  
      @classmethod
@@ -184,119 +229,22 @@ def suitable(cls, url):
          return False if ViewLiftEmbedIE.suitable(url) else super(ViewLiftIE, cls).suitable(url)
  
      def _real_extract(self, url):
          return False if ViewLiftEmbedIE.suitable(url) else super(ViewLiftIE, cls).suitable(url)
  
      def _real_extract(self, url):
-        domain, display_id = re.match(self._VALID_URL, url).groups()
-
-        webpage = self._download_webpage(url, display_id)
-
-        if ">Sorry, the Film you're looking for is not available.<" in webpage:
-            raise ExtractorError(
-                'Film %s is not available.' % display_id, expected=True)
-
-        initial_store_state = self._search_regex(
-            r"window\.initialStoreState\s*=.*?JSON\.parse\(unescape\(atob\('([^']+)'\)\)\)",
-            webpage, 'Initial Store State', default=None)
-        if initial_store_state:
-            modules = self._parse_json(compat_urllib_parse_unquote(base64.b64decode(
-                initial_store_state).decode()), display_id)['page']['data']['modules']
-            content_data = next(m['contentData'][0] for m in modules if m.get('moduleType') == 'VideoDetailModule')
-            gist = content_data['gist']
-            film_id = gist['id']
-            title = gist['title']
-            video_assets = try_get(
-                content_data, lambda x: x['streamingInfo']['videoAssets'], dict)
-            if not video_assets:
-                token = self._download_json(
-                    'https://prod-api.viewlift.com/identity/anonymous-token',
-                    film_id, 'Downloading authorization token',
-                    query={'site': 'snagfilms'})['authorizationToken']
-                video_assets = self._download_json(
-                    'https://prod-api.viewlift.com/entitlement/video/status',
-                    film_id, headers={
-                        'Authorization': token,
-                        'Referer': url,
-                    }, query={
-                        'id': film_id
-                    })['video']['streamingInfo']['videoAssets']
-
-            formats = []
-            mpeg_video_assets = video_assets.get('mpeg') or []
-            for video_asset in mpeg_video_assets:
-                video_asset_url = video_asset.get('url')
-                if not video_asset:
-                    continue
-                bitrate = int_or_none(video_asset.get('bitrate'))
-                height = int_or_none(self._search_regex(
-                    r'^_?(\d+)[pP]$', video_asset.get('renditionValue'),
-                    'height', default=None))
-                formats.append({
-                    'url': video_asset_url,
-                    'format_id': 'http%s' % ('-%d' % bitrate if bitrate else ''),
-                    'tbr': bitrate,
-                    'height': height,
-                    'vcodec': video_asset.get('codec'),
-                })
-
-            hls_url = video_assets.get('hls')
-            if hls_url:
-                formats.extend(self._extract_m3u8_formats(
-                    hls_url, film_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
-            self._sort_formats(formats, ('height', 'tbr', 'format_id'))
-
-            info = {
-                'id': film_id,
-                'display_id': display_id,
-                'title': title,
-                'description': gist.get('description'),
-                'thumbnail': gist.get('videoImageUrl'),
-                'duration': int_or_none(gist.get('runtime')),
-                'age_limit': parse_age_limit(content_data.get('parentalRating')),
-                'timestamp': int_or_none(gist.get('publishDate'), 1000),
-                'formats': formats,
-            }
-            for k in ('categories', 'tags'):
-                info[k] = [v['title'] for v in content_data.get(k, []) if v.get('title')]
-            return info
-        else:
-            film_id = self._search_regex(r'filmId=([\da-f-]{36})"', webpage, 'film id')
-
-            snag = self._parse_json(
-                self._search_regex(
-                    r'Snag\.page\.data\s*=\s*(\[.+?\]);', webpage, 'snag', default='[]'),
-                display_id)
-
-            for item in snag:
-                if item.get('data', {}).get('film', {}).get('id') == film_id:
-                    data = item['data']['film']
-                    title = data['title']
-                    description = clean_html(data.get('synopsis'))
-                    thumbnail = data.get('image')
-                    duration = int_or_none(data.get('duration') or data.get('runtime'))
-                    categories = [
-                        category['title'] for category in data.get('categories', [])
-                        if category.get('title')]
-                    break
-            else:
-                title = self._html_search_regex(
-                    (r'itemprop="title">([^<]+)<',
-                     r'(?s)itemprop="title">(.+?)<div'), webpage, 'title')
-                description = self._html_search_regex(
-                    r'(?s)<div itemprop="description" class="film-synopsis-inner ">(.+?)</div>',
-                    webpage, 'description', default=None) or self._og_search_description(webpage)
-                thumbnail = self._og_search_thumbnail(webpage)
-                duration = parse_duration(self._search_regex(
-                    r'<span itemprop="duration" class="film-duration strong">([^<]+)<',
-                    webpage, 'duration', fatal=False))
-                categories = re.findall(r'<a href="/movies/[^"]+">([^<]+)</a>', webpage)
-
-            return {
-                '_type': 'url_transparent',
-                'url': 'http://%s/embed/player?filmId=%s' % (domain, film_id),
-                'id': film_id,
-                'display_id': display_id,
-                'title': title,
-                'description': description,
-                'thumbnail': thumbnail,
-                'duration': duration,
-                'categories': categories,
-                'ie_key': 'ViewLiftEmbed',
-            }
+        domain, path, display_id = re.match(self._VALID_URL, url).groups()
+        site = domain.split('.')[-2]
+        if site in self._SITE_MAP:
+            site = self._SITE_MAP[site]
+        modules = self._call_api(
+            site, 'content/pages', display_id, {
+                'includeContent': 'true',
+                'moduleOffset': 1,
+                'path': path,
+                'site': site,
+            })['modules']
+        film_id = next(m['contentData'][0]['gist']['id'] for m in modules if m.get('moduleType') == 'VideoDetailModule')
+        return {
+            '_type': 'url_transparent',
+            'url': 'http://%s/embed/player?filmId=%s' % (domain, film_id),
+            'id': film_id,
+            'display_id': display_id,
+            'ie_key': 'ViewLiftEmbed',
+        }
diff --git a/youtube_dl/extractor/vimeo.py b/youtube_dl/extractor/vimeo.py

index baa46d5f3513cbde337f144c7143a9c501455ff4..421795b94d9f6cf10fe3d5725503baa86383062c 100644 (file)
--- a/youtube_dl/extractor/vimeo.py
+++ b/youtube_dl/extractor/vimeo.py
@@ -33,6 +33,7 @@
      unified_timestamp,
      unsmuggle_url,
      urlencode_postdata,
      unified_timestamp,
      unsmuggle_url,
      urlencode_postdata,
+    urljoin,
      unescapeHTML,
  )
  
      unescapeHTML,
  )
  
@@ -139,28 +140,28 @@ def _parse_config(self, config, video_id):
              })
  
          # TODO: fix handling of 308 status code returned for live archive manifest requests
              })
  
          # TODO: fix handling of 308 status code returned for live archive manifest requests
+        sep_pattern = r'/sep/video/'
          for files_type in ('hls', 'dash'):
              for cdn_name, cdn_data in config_files.get(files_type, {}).get('cdns', {}).items():
                  manifest_url = cdn_data.get('url')
                  if not manifest_url:
                      continue
                  format_id = '%s-%s' % (files_type, cdn_name)
          for files_type in ('hls', 'dash'):
              for cdn_name, cdn_data in config_files.get(files_type, {}).get('cdns', {}).items():
                  manifest_url = cdn_data.get('url')
                  if not manifest_url:
                      continue
                  format_id = '%s-%s' % (files_type, cdn_name)
-                if files_type == 'hls':
-                    formats.extend(self._extract_m3u8_formats(
-                        manifest_url, video_id, 'mp4',
-                        'm3u8' if is_live else 'm3u8_native', m3u8_id=format_id,
-                        note='Downloading %s m3u8 information' % cdn_name,
-                        fatal=False))
-                elif files_type == 'dash':
-                    mpd_pattern = r'/%s/(?:sep/)?video/' % video_id
-                    mpd_manifest_urls = []
-                    if re.search(mpd_pattern, manifest_url):
-                        for suffix, repl in (('', 'video'), ('_sep', 'sep/video')):
-                            mpd_manifest_urls.append((format_id + suffix, re.sub(
-                                mpd_pattern, '/%s/%s/' % (video_id, repl), manifest_url)))
-                    else:
-                        mpd_manifest_urls = [(format_id, manifest_url)]
-                    for f_id, m_url in mpd_manifest_urls:
+                sep_manifest_urls = []
+                if re.search(sep_pattern, manifest_url):
+                    for suffix, repl in (('', 'video'), ('_sep', 'sep/video')):
+                        sep_manifest_urls.append((format_id + suffix, re.sub(
+                            sep_pattern, '/%s/' % repl, manifest_url)))
+                else:
+                    sep_manifest_urls = [(format_id, manifest_url)]
+                for f_id, m_url in sep_manifest_urls:
+                    if files_type == 'hls':
+                        formats.extend(self._extract_m3u8_formats(
+                            m_url, video_id, 'mp4',
+                            'm3u8' if is_live else 'm3u8_native', m3u8_id=f_id,
+                            note='Downloading %s m3u8 information' % cdn_name,
+                            fatal=False))
+                    elif files_type == 'dash':
                          if 'json=1' in m_url:
                              real_m_url = (self._download_json(m_url, video_id, fatal=False) or {}).get('url')
                              if real_m_url:
                          if 'json=1' in m_url:
                              real_m_url = (self._download_json(m_url, video_id, fatal=False) or {}).get('url')
                              if real_m_url:
@@ -169,11 +170,6 @@ def _parse_config(self, config, video_id):
                              m_url.replace('/master.json', '/master.mpd'), video_id, f_id,
                              'Downloading %s MPD information' % cdn_name,
                              fatal=False)
                              m_url.replace('/master.json', '/master.mpd'), video_id, f_id,
                              'Downloading %s MPD information' % cdn_name,
                              fatal=False)
-                        for f in mpd_formats:
-                            if f.get('vcodec') == 'none':
-                                f['preference'] = -50
-                            elif f.get('acodec') == 'none':
-                                f['preference'] = -40
                          formats.extend(mpd_formats)
  
          live_archive = live_event.get('archive') or {}
                          formats.extend(mpd_formats)
  
          live_archive = live_event.get('archive') or {}
@@ -185,13 +181,19 @@ def _parse_config(self, config, video_id):
                  'preference': 1,
              })
  
                  'preference': 1,
              })
  
+        for f in formats:
+            if f.get('vcodec') == 'none':
+                f['preference'] = -50
+            elif f.get('acodec') == 'none':
+                f['preference'] = -40
+
          subtitles = {}
          text_tracks = config['request'].get('text_tracks')
          if text_tracks:
              for tt in text_tracks:
                  subtitles[tt['lang']] = [{
                      'ext': 'vtt',
          subtitles = {}
          text_tracks = config['request'].get('text_tracks')
          if text_tracks:
              for tt in text_tracks:
                  subtitles[tt['lang']] = [{
                      'ext': 'vtt',
-                    'url': 'https://vimeo.com' + tt['url'],
+                    'url': urljoin('https://vimeo.com', tt['url']),
                  }]
  
          thumbnails = []
                  }]
  
          thumbnails = []
@@ -591,7 +593,7 @@ def _real_extract(self, url):
              # Retrieve video webpage to extract further information
              webpage, urlh = self._download_webpage_handle(
                  url, video_id, headers=headers)
              # Retrieve video webpage to extract further information
              webpage, urlh = self._download_webpage_handle(
                  url, video_id, headers=headers)
-            redirect_url = compat_str(urlh.geturl())
+            redirect_url = urlh.geturl()
          except ExtractorError as ee:
              if isinstance(ee.cause, compat_HTTPError) and ee.cause.code == 403:
                  errmsg = ee.cause.read()
          except ExtractorError as ee:
              if isinstance(ee.cause, compat_HTTPError) and ee.cause.code == 403:
                  errmsg = ee.cause.read()
@@ -841,33 +843,6 @@ def _extract_list_title(self, webpage):
          return self._TITLE or self._html_search_regex(
              self._TITLE_RE, webpage, 'list title', fatal=False)
  
          return self._TITLE or self._html_search_regex(
              self._TITLE_RE, webpage, 'list title', fatal=False)
  
-    def _login_list_password(self, page_url, list_id, webpage):
-        login_form = self._search_regex(
-            r'(?s)<form[^>]+?id="pw_form"(.*?)</form>',
-            webpage, 'login form', default=None)
-        if not login_form:
-            return webpage
-
-        password = self._downloader.params.get('videopassword')
-        if password is None:
-            raise ExtractorError('This album is protected by a password, use the --video-password option', expected=True)
-        fields = self._hidden_inputs(login_form)
-        token, vuid = self._extract_xsrft_and_vuid(webpage)
-        fields['token'] = token
-        fields['password'] = password
-        post = urlencode_postdata(fields)
-        password_path = self._search_regex(
-            r'action="([^"]+)"', login_form, 'password URL')
-        password_url = compat_urlparse.urljoin(page_url, password_path)
-        password_request = sanitized_Request(password_url, post)
-        password_request.add_header('Content-type', 'application/x-www-form-urlencoded')
-        self._set_vimeo_cookie('vuid', vuid)
-        self._set_vimeo_cookie('xsrft', token)
-
-        return self._download_webpage(
-            password_request, list_id,
-            'Verifying the password', 'Wrong password')
-
      def _title_and_entries(self, list_id, base_url):
          for pagenum in itertools.count(1):
              page_url = self._page_url(base_url, pagenum)
      def _title_and_entries(self, list_id, base_url):
          for pagenum in itertools.count(1):
              page_url = self._page_url(base_url, pagenum)
@@ -876,7 +851,6 @@ def _title_and_entries(self, list_id, base_url):
                  'Downloading page %s' % pagenum)
  
              if pagenum == 1:
                  'Downloading page %s' % pagenum)
  
              if pagenum == 1:
-                webpage = self._login_list_password(page_url, list_id, webpage)
                  yield self._extract_list_title(webpage)
  
              # Try extracting href first since not all videos are available via
                  yield self._extract_list_title(webpage)
  
              # Try extracting href first since not all videos are available via
@@ -923,7 +897,7 @@ class VimeoUserIE(VimeoChannelIE):
      _BASE_URL_TEMPL = 'https://vimeo.com/%s'
  
  
      _BASE_URL_TEMPL = 'https://vimeo.com/%s'
  
  
-class VimeoAlbumIE(VimeoChannelIE):
+class VimeoAlbumIE(VimeoBaseInfoExtractor):
      IE_NAME = 'vimeo:album'
      _VALID_URL = r'https://vimeo\.com/(?:album|showcase)/(?P<id>\d+)(?:$|[?#]|/(?!video))'
      _TITLE_RE = r'<header id="page_header">\n\s*<h1>(.*?)</h1>'
      IE_NAME = 'vimeo:album'
      _VALID_URL = r'https://vimeo\.com/(?:album|showcase)/(?P<id>\d+)(?:$|[?#]|/(?!video))'
      _TITLE_RE = r'<header id="page_header">\n\s*<h1>(.*?)</h1>'
@@ -973,13 +947,39 @@ def _fetch_page(self, album_id, authorizaion, hashed_pass, page):
      def _real_extract(self, url):
          album_id = self._match_id(url)
          webpage = self._download_webpage(url, album_id)
      def _real_extract(self, url):
          album_id = self._match_id(url)
          webpage = self._download_webpage(url, album_id)
-        webpage = self._login_list_password(url, album_id, webpage)
-        api_config = self._extract_vimeo_config(webpage, album_id)['api']
+        viewer = self._parse_json(self._search_regex(
+            r'bootstrap_data\s*=\s*({.+?})</script>',
+            webpage, 'bootstrap data'), album_id)['viewer']
+        jwt = viewer['jwt']
+        album = self._download_json(
+            'https://api.vimeo.com/albums/' + album_id,
+            album_id, headers={'Authorization': 'jwt ' + jwt},
+            query={'fields': 'description,name,privacy'})
+        hashed_pass = None
+        if try_get(album, lambda x: x['privacy']['view']) == 'password':
+            password = self._downloader.params.get('videopassword')
+            if not password:
+                raise ExtractorError(
+                    'This album is protected by a password, use the --video-password option',
+                    expected=True)
+            self._set_vimeo_cookie('vuid', viewer['vuid'])
+            try:
+                hashed_pass = self._download_json(
+                    'https://vimeo.com/showcase/%s/auth' % album_id,
+                    album_id, 'Verifying the password', data=urlencode_postdata({
+                        'password': password,
+                        'token': viewer['xsrft'],
+                    }), headers={
+                        'X-Requested-With': 'XMLHttpRequest',
+                    })['hashed_pass']
+            except ExtractorError as e:
+                if isinstance(e.cause, compat_HTTPError) and e.cause.code == 401:
+                    raise ExtractorError('Wrong password', expected=True)
+                raise
          entries = OnDemandPagedList(functools.partial(
          entries = OnDemandPagedList(functools.partial(
-            self._fetch_page, album_id, api_config['jwt'],
-            api_config.get('hashed_pass')), self._PAGE_SIZE)
-        return self.playlist_result(entries, album_id, self._html_search_regex(
-            r'<title>\s*(.+?)(?:\s+on Vimeo)?</title>', webpage, 'title', fatal=False))
+            self._fetch_page, album_id, jwt, hashed_pass), self._PAGE_SIZE)
+        return self.playlist_result(
+            entries, album_id, album.get('name'), album.get('description'))
  
  
  class VimeoGroupsIE(VimeoChannelIE):
  
  
  class VimeoGroupsIE(VimeoChannelIE):
diff --git a/youtube_dl/extractor/vlive.py b/youtube_dl/extractor/vlive.py

index c3429f723ddec36cb89dfc9f329f766f45b953e1..f79531e6f3a2e922b0369706cc0d76a22feb2499 100644 (file)
--- a/youtube_dl/extractor/vlive.py
+++ b/youtube_dl/extractor/vlive.py
@@ -6,22 +6,18 @@
  import itertools
  
  from .common import InfoExtractor
  import itertools
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_parse_urlencode,
-    compat_str,
-)
+from .naver import NaverBaseIE
+from ..compat import compat_str
  from ..utils import (
  from ..utils import (
-    dict_get,
      ExtractorError,
      ExtractorError,
-    float_or_none,
-    int_or_none,
+    merge_dicts,
      remove_start,
      try_get,
      urlencode_postdata,
  )
  
  
      remove_start,
      try_get,
      urlencode_postdata,
  )
  
  
-class VLiveIE(InfoExtractor):
+class VLiveIE(NaverBaseIE):
      IE_NAME = 'vlive'
      _VALID_URL = r'https?://(?:(?:www|m)\.)?vlive\.tv/video/(?P<id>[0-9]+)'
      _NETRC_MACHINE = 'vlive'
      IE_NAME = 'vlive'
      _VALID_URL = r'https?://(?:(?:www|m)\.)?vlive\.tv/video/(?P<id>[0-9]+)'
      _NETRC_MACHINE = 'vlive'
@@ -34,6 +30,7 @@ class VLiveIE(InfoExtractor):
              'title': "[V LIVE] Girl's Day's Broadcast",
              'creator': "Girl's Day",
              'view_count': int,
              'title': "[V LIVE] Girl's Day's Broadcast",
              'creator': "Girl's Day",
              'view_count': int,
+            'uploader_id': 'muploader_a',
          },
      }, {
          'url': 'http://www.vlive.tv/video/16937',
          },
      }, {
          'url': 'http://www.vlive.tv/video/16937',
@@ -44,6 +41,7 @@ class VLiveIE(InfoExtractor):
              'creator': 'EXO',
              'view_count': int,
              'subtitles': 'mincount:12',
              'creator': 'EXO',
              'view_count': int,
              'subtitles': 'mincount:12',
+            'uploader_id': 'muploader_j',
          },
          'params': {
              'skip_download': True,
          },
          'params': {
              'skip_download': True,
@@ -187,45 +185,9 @@ def _replay(self, video_id, webpage, long_video_id, key):
                      'This video is only available for CH+ subscribers')
              long_video_id, key = video_info['vid'], video_info['inkey']
  
                      'This video is only available for CH+ subscribers')
              long_video_id, key = video_info['vid'], video_info['inkey']
  
-        playinfo = self._download_json(
-            'http://global.apis.naver.com/rmcnmv/rmcnmv/vod_play_videoInfo.json?%s'
-            % compat_urllib_parse_urlencode({
-                'videoId': long_video_id,
-                'key': key,
-                'ptc': 'http',
-                'doct': 'json',  # document type (xml or json)
-                'cpt': 'vtt',  # captions type (vtt or ttml)
-            }), video_id)
-
-        formats = [{
-            'url': vid['source'],
-            'format_id': vid.get('encodingOption', {}).get('name'),
-            'abr': float_or_none(vid.get('bitrate', {}).get('audio')),
-            'vbr': float_or_none(vid.get('bitrate', {}).get('video')),
-            'width': int_or_none(vid.get('encodingOption', {}).get('width')),
-            'height': int_or_none(vid.get('encodingOption', {}).get('height')),
-            'filesize': int_or_none(vid.get('size')),
-        } for vid in playinfo.get('videos', {}).get('list', []) if vid.get('source')]
-        self._sort_formats(formats)
-
-        view_count = int_or_none(playinfo.get('meta', {}).get('count'))
-
-        subtitles = {}
-        for caption in playinfo.get('captions', {}).get('list', []):
-            lang = dict_get(caption, ('locale', 'language', 'country', 'label'))
-            if lang and caption.get('source'):
-                subtitles[lang] = [{
-                    'ext': 'vtt',
-                    'url': caption['source']}]
-
-        info = self._get_common_fields(webpage)
-        info.update({
-            'id': video_id,
-            'formats': formats,
-            'view_count': view_count,
-            'subtitles': subtitles,
-        })
-        return info
+        return merge_dicts(
+            self._get_common_fields(webpage),
+            self._extract_video_info(video_id, long_video_id, key))
  
      def _download_init_page(self, video_id):
          return self._download_webpage(
  
      def _download_init_page(self, video_id):
          return self._download_webpage(
diff --git a/youtube_dl/extractor/vodplatform.py b/youtube_dl/extractor/vodplatform.py

index 239644340384b60c8e1a80d40b50cabbd0fd2c9e..74d2257e7c059cdb410db69d77c25e1c7e8c9902 100644 (file)
--- a/youtube_dl/extractor/vodplatform.py
+++ b/youtube_dl/extractor/vodplatform.py
@@ -6,8 +6,8 @@
  
  
  class VODPlatformIE(InfoExtractor):
  
  
  class VODPlatformIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?vod-platform\.net/[eE]mbed/(?P<id>[^/?#]+)'
-    _TEST = {
+    _VALID_URL = r'https?://(?:(?:www\.)?vod-platform\.net|embed\.kwikmotion\.com)/[eE]mbed/(?P<id>[^/?#]+)'
+    _TESTS = [{
          # from http://www.lbcgroup.tv/watch/chapter/29143/52844/%D8%A7%D9%84%D9%86%D8%B5%D8%B1%D8%A9-%D9%81%D9%8A-%D8%B6%D9%8A%D8%A7%D9%81%D8%A9-%D8%A7%D9%84%D9%80-cnn/ar
          'url': 'http://vod-platform.net/embed/RufMcytHDolTH1MuKHY9Fw',
          'md5': '1db2b7249ce383d6be96499006e951fc',
          # from http://www.lbcgroup.tv/watch/chapter/29143/52844/%D8%A7%D9%84%D9%86%D8%B5%D8%B1%D8%A9-%D9%81%D9%8A-%D8%B6%D9%8A%D8%A7%D9%81%D8%A9-%D8%A7%D9%84%D9%80-cnn/ar
          'url': 'http://vod-platform.net/embed/RufMcytHDolTH1MuKHY9Fw',
          'md5': '1db2b7249ce383d6be96499006e951fc',
@@ -16,7 +16,10 @@ class VODPlatformIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'LBCi News_ النصرة في ضيافة الـ "سي.أن.أن"',
          }
              'ext': 'mp4',
              'title': 'LBCi News_ النصرة في ضيافة الـ "سي.أن.أن"',
          }
-    }
+    }, {
+        'url': 'http://embed.kwikmotion.com/embed/RufMcytHDolTH1MuKHY9Fw',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
diff --git a/youtube_dl/extractor/voicerepublic.py b/youtube_dl/extractor/voicerepublic.py

index 59e1359c48628af9b4c53bedc337fa6b9b3d1396..a52e40afa2892a10538251ba40e4d2a44a10a67d 100644 (file)
--- a/youtube_dl/extractor/voicerepublic.py
+++ b/youtube_dl/extractor/voicerepublic.py
@@ -1,17 +1,12 @@
  from __future__ import unicode_literals
  
  from __future__ import unicode_literals
  
-import re
-
  from .common import InfoExtractor
  from .common import InfoExtractor
-from ..compat import (
-    compat_str,
-    compat_urlparse,
-)
+from ..compat import compat_str
  from ..utils import (
      ExtractorError,
      determine_ext,
      int_or_none,
  from ..utils import (
      ExtractorError,
      determine_ext,
      int_or_none,
-    sanitized_Request,
+    urljoin,
  )
  
  
  )
  
  
@@ -26,8 +21,7 @@ class VoiceRepublicIE(InfoExtractor):
              'ext': 'm4a',
              'title': 'Watching the Watchers: Building a Sousveillance State',
              'description': 'Secret surveillance programs have metadata too. The people and companies that operate secret surveillance programs can be surveilled.',
              'ext': 'm4a',
              'title': 'Watching the Watchers: Building a Sousveillance State',
              'description': 'Secret surveillance programs have metadata too. The people and companies that operate secret surveillance programs can be surveilled.',
-            'thumbnail': r're:^https?://.*\.(?:png|jpg)$',
-            'duration': 1800,
+            'duration': 1556,
              'view_count': int,
          }
      }, {
              'view_count': int,
          }
      }, {
@@ -38,63 +32,31 @@ class VoiceRepublicIE(InfoExtractor):
      def _real_extract(self, url):
          display_id = self._match_id(url)
  
      def _real_extract(self, url):
          display_id = self._match_id(url)
  
-        req = sanitized_Request(
-            compat_urlparse.urljoin(url, '/talks/%s' % display_id))
-        # Older versions of Firefox get redirected to an "upgrade browser" page
-        req.add_header('User-Agent', 'youtube-dl')
-        webpage = self._download_webpage(req, display_id)
+        webpage = self._download_webpage(url, display_id)
  
          if '>Queued for processing, please stand by...<' in webpage:
              raise ExtractorError(
                  'Audio is still queued for processing', expected=True)
  
  
          if '>Queued for processing, please stand by...<' in webpage:
              raise ExtractorError(
                  'Audio is still queued for processing', expected=True)
  
-        config = self._search_regex(
-            r'(?s)return ({.+?});\s*\n', webpage,
-            'data', default=None)
-        data = self._parse_json(config, display_id, fatal=False) if config else None
-        if data:
-            title = data['title']
-            description = data.get('teaser')
-            talk_id = compat_str(data.get('talk_id') or display_id)
-            talk = data['talk']
-            duration = int_or_none(talk.get('duration'))
-            formats = [{
-                'url': compat_urlparse.urljoin(url, talk_url),
-                'format_id': format_id,
-                'ext': determine_ext(talk_url) or format_id,
-                'vcodec': 'none',
-            } for format_id, talk_url in talk['links'].items()]
-        else:
-            title = self._og_search_title(webpage)
-            description = self._html_search_regex(
-                r"(?s)<div class='talk-teaser'[^>]*>(.+?)</div>",
-                webpage, 'description', fatal=False)
-            talk_id = self._search_regex(
-                [r"id='jc-(\d+)'", r"data-shareable-id='(\d+)'"],
-                webpage, 'talk id', default=None) or display_id
-            duration = None
-            player = self._search_regex(
-                r"class='vr-player jp-jplayer'([^>]+)>", webpage, 'player')
-            formats = [{
-                'url': compat_urlparse.urljoin(url, talk_url),
-                'format_id': format_id,
-                'ext': determine_ext(talk_url) or format_id,
-                'vcodec': 'none',
-            } for format_id, talk_url in re.findall(r"data-([^=]+)='([^']+)'", player)]
+        talk = self._parse_json(self._search_regex(
+            r'initialSnapshot\s*=\s*({.+?});',
+            webpage, 'talk'), display_id)['talk']
+        title = talk['title']
+        formats = [{
+            'url': urljoin(url, talk_url),
+            'format_id': format_id,
+            'ext': determine_ext(talk_url) or format_id,
+            'vcodec': 'none',
+        } for format_id, talk_url in talk['media_links'].items()]
          self._sort_formats(formats)
  
          self._sort_formats(formats)
  
-        thumbnail = self._og_search_thumbnail(webpage)
-        view_count = int_or_none(self._search_regex(
-            r"class='play-count[^']*'>\s*(\d+) plays",
-            webpage, 'play count', fatal=False))
-
          return {
          return {
-            'id': talk_id,
+            'id': compat_str(talk.get('id') or display_id),
              'display_id': display_id,
              'title': title,
              'display_id': display_id,
              'title': title,
-            'description': description,
-            'thumbnail': thumbnail,
-            'duration': duration,
-            'view_count': view_count,
+            'description': talk.get('teaser'),
+            'thumbnail': talk.get('image_url'),
+            'duration': int_or_none(talk.get('archived_duration')),
+            'view_count': int_or_none(talk.get('play_count')),
              'formats': formats,
          }
              'formats': formats,
          }
diff --git a/youtube_dl/extractor/wistia.py b/youtube_dl/extractor/wistia.py

index 0fbc888ec03ce6d09f853a82ea96328c927e4ae5..77febd2eb1b1cada3942c212739725135a36682b 100644 (file)
--- a/youtube_dl/extractor/wistia.py
+++ b/youtube_dl/extractor/wistia.py
@@ -13,8 +13,7 @@
  
  class WistiaIE(InfoExtractor):
      _VALID_URL = r'(?:wistia:|https?://(?:fast\.)?wistia\.(?:net|com)/embed/(?:iframe|medias)/)(?P<id>[a-z0-9]{10})'
  
  class WistiaIE(InfoExtractor):
      _VALID_URL = r'(?:wistia:|https?://(?:fast\.)?wistia\.(?:net|com)/embed/(?:iframe|medias)/)(?P<id>[a-z0-9]{10})'
-    _API_URL = 'http://fast.wistia.com/embed/medias/%s.json'
-    _IFRAME_URL = 'http://fast.wistia.net/embed/iframe/%s'
+    _EMBED_BASE_URL = 'http://fast.wistia.com/embed/'
  
      _TESTS = [{
          'url': 'http://fast.wistia.net/embed/iframe/sh7fpupwlt',
  
      _TESTS = [{
          'url': 'http://fast.wistia.net/embed/iframe/sh7fpupwlt',
@@ -46,31 +45,32 @@ class WistiaIE(InfoExtractor):
      # https://wistia.com/support/embed-and-share/video-on-your-website
      @staticmethod
      def _extract_url(webpage):
      # https://wistia.com/support/embed-and-share/video-on-your-website
      @staticmethod
      def _extract_url(webpage):
-        match = re.search(
-            r'<(?:meta[^>]+?content|(?:iframe|script)[^>]+?src)=["\'](?P<url>(?:https?:)?//(?:fast\.)?wistia\.(?:net|com)/embed/(?:iframe|medias)/[a-z0-9]{10})', webpage)
-        if match:
-            return unescapeHTML(match.group('url'))
+        urls = WistiaIE._extract_urls(webpage)
+        return urls[0] if urls else None
  
  
-        match = re.search(
-            r'''(?sx)
-                <script[^>]+src=(["'])(?:https?:)?//fast\.wistia\.com/assets/external/E-v1\.js\1[^>]*>.*?
-                <div[^>]+class=(["']).*?\bwistia_async_(?P<id>[a-z0-9]{10})\b.*?\2
-            ''', webpage)
-        if match:
-            return 'wistia:%s' % match.group('id')
-
-        match = re.search(r'(?:data-wistia-?id=["\']|Wistia\.embed\(["\']|id=["\']wistia_)(?P<id>[a-z0-9]{10})', webpage)
-        if match:
-            return 'wistia:%s' % match.group('id')
+    @staticmethod
+    def _extract_urls(webpage):
+        urls = []
+        for match in re.finditer(
+                r'<(?:meta[^>]+?content|(?:iframe|script)[^>]+?src)=["\'](?P<url>(?:https?:)?//(?:fast\.)?wistia\.(?:net|com)/embed/(?:iframe|medias)/[a-z0-9]{10})', webpage):
+            urls.append(unescapeHTML(match.group('url')))
+        for match in re.finditer(
+                r'''(?sx)
+                    <div[^>]+class=(["'])(?:(?!\1).)*?\bwistia_async_(?P<id>[a-z0-9]{10})\b(?:(?!\1).)*?\1
+                ''', webpage):
+            urls.append('wistia:%s' % match.group('id'))
+        for match in re.finditer(r'(?:data-wistia-?id=["\']|Wistia\.embed\(["\']|id=["\']wistia_)(?P<id>[a-z0-9]{10})', webpage):
+            urls.append('wistia:%s' % match.group('id'))
+        return urls
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
  
          data_json = self._download_json(
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
  
          data_json = self._download_json(
-            self._API_URL % video_id, video_id,
+            self._EMBED_BASE_URL + 'medias/%s.json' % video_id, video_id,
              # Some videos require this.
              headers={
              # Some videos require this.
              headers={
-                'Referer': url if url.startswith('http') else self._IFRAME_URL % video_id,
+                'Referer': url if url.startswith('http') else self._EMBED_BASE_URL + 'iframe/' + video_id,
              })
  
          if data_json.get('error'):
              })
  
          if data_json.get('error'):
@@ -95,27 +95,61 @@ def _real_extract(self, url):
                      'url': aurl,
                      'width': int_or_none(a.get('width')),
                      'height': int_or_none(a.get('height')),
                      'url': aurl,
                      'width': int_or_none(a.get('width')),
                      'height': int_or_none(a.get('height')),
+                    'filesize': int_or_none(a.get('size')),
                  })
              else:
                  aext = a.get('ext')
                  })
              else:
                  aext = a.get('ext')
-                is_m3u8 = a.get('container') == 'm3u8' or aext == 'm3u8'
-                formats.append({
-                    'format_id': atype,
+                display_name = a.get('display_name')
+                format_id = atype
+                if atype and atype.endswith('_video') and display_name:
+                    format_id = '%s-%s' % (atype[:-6], display_name)
+                f = {
+                    'format_id': format_id,
                      'url': aurl,
                      'url': aurl,
-                    'tbr': int_or_none(a.get('bitrate')),
-                    'vbr': int_or_none(a.get('opt_vbitrate')),
-                    'width': int_or_none(a.get('width')),
-                    'height': int_or_none(a.get('height')),
-                    'filesize': int_or_none(a.get('size')),
-                    'vcodec': a.get('codec'),
-                    'container': a.get('container'),
-                    'ext': 'mp4' if is_m3u8 else aext,
-                    'protocol': 'm3u8' if is_m3u8 else None,
+                    'tbr': int_or_none(a.get('bitrate')) or None,
                      'preference': 1 if atype == 'original' else None,
                      'preference': 1 if atype == 'original' else None,
-                })
+                }
+                if display_name == 'Audio':
+                    f.update({
+                        'vcodec': 'none',
+                    })
+                else:
+                    f.update({
+                        'width': int_or_none(a.get('width')),
+                        'height': int_or_none(a.get('height')),
+                        'vcodec': a.get('codec'),
+                    })
+                if a.get('container') == 'm3u8' or aext == 'm3u8':
+                    ts_f = f.copy()
+                    ts_f.update({
+                        'ext': 'ts',
+                        'format_id': f['format_id'].replace('hls-', 'ts-'),
+                        'url': f['url'].replace('.bin', '.ts'),
+                    })
+                    formats.append(ts_f)
+                    f.update({
+                        'ext': 'mp4',
+                        'protocol': 'm3u8_native',
+                    })
+                else:
+                    f.update({
+                        'container': a.get('container'),
+                        'ext': aext,
+                        'filesize': int_or_none(a.get('size')),
+                    })
+                formats.append(f)
  
          self._sort_formats(formats)
  
  
          self._sort_formats(formats)
  
+        subtitles = {}
+        for caption in data.get('captions', []):
+            language = caption.get('language')
+            if not language:
+                continue
+            subtitles[language] = [{
+                'url': self._EMBED_BASE_URL + 'captions/' + video_id + '.vtt?language=' + language,
+            }]
+
          return {
              'id': video_id,
              'title': title,
          return {
              'id': video_id,
              'title': title,
@@ -124,4 +158,5 @@ def _real_extract(self, url):
              'thumbnails': thumbnails,
              'duration': float_or_none(data.get('duration')),
              'timestamp': int_or_none(data.get('createdAt')),
              'thumbnails': thumbnails,
              'duration': float_or_none(data.get('duration')),
              'timestamp': int_or_none(data.get('createdAt')),
+            'subtitles': subtitles,
          }
          }
diff --git a/youtube_dl/extractor/xhamster.py b/youtube_dl/extractor/xhamster.py

index a5b94d2794166d452464b728ec40b2b258459c64..902a3ed338e914c9e20edc3c643d0e5274d71fe7 100644 (file)
--- a/youtube_dl/extractor/xhamster.py
+++ b/youtube_dl/extractor/xhamster.py
@@ -20,13 +20,13 @@
  
  
  class XHamsterIE(InfoExtractor):
  
  
  class XHamsterIE(InfoExtractor):
-    _DOMAINS = r'(?:xhamster\.(?:com|one|desi)|xhms\.pro|xhamster[27]\.com)'
+    _DOMAINS = r'(?:xhamster\.(?:com|one|desi)|xhms\.pro|xhamster\d+\.com)'
      _VALID_URL = r'''(?x)
                      https?://
                          (?:.+?\.)?%s/
                          (?:
      _VALID_URL = r'''(?x)
                      https?://
                          (?:.+?\.)?%s/
                          (?:
-                            movies/(?P<id>\d+)/(?P<display_id>[^/]*)\.html|
-                            videos/(?P<display_id_2>[^/]*)-(?P<id_2>\d+)
+                            movies/(?P<id>[\dA-Za-z]+)/(?P<display_id>[^/]*)\.html|
+                            videos/(?P<display_id_2>[^/]*)-(?P<id_2>[\dA-Za-z]+)
                          )
                      ''' % _DOMAINS
      _TESTS = [{
                          )
                      ''' % _DOMAINS
      _TESTS = [{
@@ -99,12 +99,21 @@ class XHamsterIE(InfoExtractor):
      }, {
          'url': 'https://xhamster2.com/videos/femaleagent-shy-beauty-takes-the-bait-1509445',
          'only_matching': True,
      }, {
          'url': 'https://xhamster2.com/videos/femaleagent-shy-beauty-takes-the-bait-1509445',
          'only_matching': True,
+    }, {
+        'url': 'https://xhamster11.com/videos/femaleagent-shy-beauty-takes-the-bait-1509445',
+        'only_matching': True,
+    }, {
+        'url': 'https://xhamster26.com/videos/femaleagent-shy-beauty-takes-the-bait-1509445',
+        'only_matching': True,
      }, {
          'url': 'http://xhamster.com/movies/1509445/femaleagent_shy_beauty_takes_the_bait.html',
          'only_matching': True,
      }, {
          'url': 'http://xhamster.com/movies/2221348/britney_spears_sexy_booty.html?hd',
          'only_matching': True,
      }, {
          'url': 'http://xhamster.com/movies/1509445/femaleagent_shy_beauty_takes_the_bait.html',
          'only_matching': True,
      }, {
          'url': 'http://xhamster.com/movies/2221348/britney_spears_sexy_booty.html?hd',
          'only_matching': True,
+    }, {
+        'url': 'http://de.xhamster.com/videos/skinny-girl-fucks-herself-hard-in-the-forest-xhnBJZx',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
      }]
  
      def _real_extract(self, url):
@@ -113,7 +122,7 @@ def _real_extract(self, url):
          display_id = mobj.group('display_id') or mobj.group('display_id_2')
  
          desktop_url = re.sub(r'^(https?://(?:.+?\.)?)m\.', r'\1', url)
          display_id = mobj.group('display_id') or mobj.group('display_id_2')
  
          desktop_url = re.sub(r'^(https?://(?:.+?\.)?)m\.', r'\1', url)
-        webpage = self._download_webpage(desktop_url, video_id)
+        webpage, urlh = self._download_webpage_handle(desktop_url, video_id)
  
          error = self._html_search_regex(
              r'<div[^>]+id=["\']videoClosed["\'][^>]*>(.+?)</div>',
  
          error = self._html_search_regex(
              r'<div[^>]+id=["\']videoClosed["\'][^>]*>(.+?)</div>',
@@ -129,7 +138,7 @@ def get_height(s):
  
          initials = self._parse_json(
              self._search_regex(
  
          initials = self._parse_json(
              self._search_regex(
-                r'window\.initials\s*=\s*({.+?})\s*;\s*\n', webpage, 'initials',
+                r'window\.initials\s*=\s*({.+?})\s*;', webpage, 'initials',
                  default='{}'),
              video_id, fatal=False)
          if initials:
                  default='{}'),
              video_id, fatal=False)
          if initials:
@@ -161,6 +170,9 @@ def get_height(s):
                          'ext': determine_ext(format_url, 'mp4'),
                          'height': get_height(quality),
                          'filesize': filesize,
                          'ext': determine_ext(format_url, 'mp4'),
                          'height': get_height(quality),
                          'filesize': filesize,
+                        'http_headers': {
+                            'Referer': urlh.geturl(),
+                        },
                      })
              self._sort_formats(formats)
  
                      })
              self._sort_formats(formats)
  
diff --git a/youtube_dl/extractor/xtube.py b/youtube_dl/extractor/xtube.py

index c6c0b3291c8320064fa0a7529be5b5d78f14461c..01b253dcb1e8c92232a06c0b2b4153a545dabcc1 100644 (file)
--- a/youtube_dl/extractor/xtube.py
+++ b/youtube_dl/extractor/xtube.py
@@ -47,7 +47,7 @@ class XTubeIE(InfoExtractor):
              'display_id': 'A-Super-Run-Part-1-YT',
              'ext': 'flv',
              'title': 'A Super Run - Part 1 (YT)',
              'display_id': 'A-Super-Run-Part-1-YT',
              'ext': 'flv',
              'title': 'A Super Run - Part 1 (YT)',
-            'description': 'md5:ca0d47afff4a9b2942e4b41aa970fd93',
+            'description': 'md5:4cc3af1aa1b0413289babc88f0d4f616',
              'uploader': 'tshirtguy59',
              'duration': 579,
              'view_count': int,
              'uploader': 'tshirtguy59',
              'duration': 579,
              'view_count': int,
@@ -87,10 +87,24 @@ def _real_extract(self, url):
                  'Cookie': 'age_verified=1; cookiesAccepted=1',
              })
  
                  'Cookie': 'age_verified=1; cookiesAccepted=1',
              })
  
-        sources = self._parse_json(self._search_regex(
-            r'(["\'])?sources\1?\s*:\s*(?P<sources>{.+?}),',
-            webpage, 'sources', group='sources'), video_id,
-            transform_source=js_to_json)
+        title, thumbnail, duration = [None] * 3
+
+        config = self._parse_json(self._search_regex(
+            r'playerConf\s*=\s*({.+?})\s*,\s*\n', webpage, 'config',
+            default='{}'), video_id, transform_source=js_to_json, fatal=False)
+        if config:
+            config = config.get('mainRoll')
+            if isinstance(config, dict):
+                title = config.get('title')
+                thumbnail = config.get('poster')
+                duration = int_or_none(config.get('duration'))
+                sources = config.get('sources') or config.get('format')
+
+        if not isinstance(sources, dict):
+            sources = self._parse_json(self._search_regex(
+                r'(["\'])?sources\1?\s*:\s*(?P<sources>{.+?}),',
+                webpage, 'sources', group='sources'), video_id,
+                transform_source=js_to_json)
  
          formats = []
          for format_id, format_url in sources.items():
  
          formats = []
          for format_id, format_url in sources.items():
@@ -102,20 +116,25 @@ def _real_extract(self, url):
          self._remove_duplicate_formats(formats)
          self._sort_formats(formats)
  
          self._remove_duplicate_formats(formats)
          self._sort_formats(formats)
  
-        title = self._search_regex(
-            (r'<h1>\s*(?P<title>[^<]+?)\s*</h1>', r'videoTitle\s*:\s*(["\'])(?P<title>.+?)\1'),
-            webpage, 'title', group='title')
-        description = self._search_regex(
+        if not title:
+            title = self._search_regex(
+                (r'<h1>\s*(?P<title>[^<]+?)\s*</h1>', r'videoTitle\s*:\s*(["\'])(?P<title>.+?)\1'),
+                webpage, 'title', group='title')
+        description = self._og_search_description(
+            webpage, default=None) or self._html_search_meta(
+            'twitter:description', webpage, default=None) or self._search_regex(
              r'</h1>\s*<p>([^<]+)', webpage, 'description', fatal=False)
          uploader = self._search_regex(
              (r'<input[^>]+name="contentOwnerId"[^>]+value="([^"]+)"',
               r'<span[^>]+class="nickname"[^>]*>([^<]+)'),
              webpage, 'uploader', fatal=False)
              r'</h1>\s*<p>([^<]+)', webpage, 'description', fatal=False)
          uploader = self._search_regex(
              (r'<input[^>]+name="contentOwnerId"[^>]+value="([^"]+)"',
               r'<span[^>]+class="nickname"[^>]*>([^<]+)'),
              webpage, 'uploader', fatal=False)
-        duration = parse_duration(self._search_regex(
-            r'<dt>Runtime:?</dt>\s*<dd>([^<]+)</dd>',
-            webpage, 'duration', fatal=False))
+        if not duration:
+            duration = parse_duration(self._search_regex(
+                r'<dt>Runtime:?</dt>\s*<dd>([^<]+)</dd>',
+                webpage, 'duration', fatal=False))
          view_count = str_to_int(self._search_regex(
          view_count = str_to_int(self._search_regex(
-            r'<dt>Views:?</dt>\s*<dd>([\d,\.]+)</dd>',
+            (r'["\']viewsCount["\'][^>]*>(\d+)\s+views',
+             r'<dt>Views:?</dt>\s*<dd>([\d,\.]+)</dd>'),
              webpage, 'view count', fatal=False))
          comment_count = str_to_int(self._html_search_regex(
              r'>Comments? \(([\d,\.]+)\)<',
              webpage, 'view count', fatal=False))
          comment_count = str_to_int(self._html_search_regex(
              r'>Comments? \(([\d,\.]+)\)<',
@@ -126,6 +145,7 @@ def _real_extract(self, url):
              'display_id': display_id,
              'title': title,
              'description': description,
              'display_id': display_id,
              'title': title,
              'description': description,
+            'thumbnail': thumbnail,
              'uploader': uploader,
              'duration': duration,
              'view_count': view_count,
              'uploader': uploader,
              'duration': duration,
              'view_count': view_count,
@@ -144,7 +164,7 @@ class XTubeUserIE(InfoExtractor):
              'id': 'greenshowers-4056496',
              'age_limit': 18,
          },
              'id': 'greenshowers-4056496',
              'age_limit': 18,
          },
-        'playlist_mincount': 155,
+        'playlist_mincount': 154,
      }
  
      def _real_extract(self, url):
      }
  
      def _real_extract(self, url):
diff --git a/youtube_dl/extractor/yahoo.py b/youtube_dl/extractor/yahoo.py

index 238d9cea0c729912351895e5bd6ad453d43b7d31..e4615376c428432f7035c2141d1cbecc738496cc 100644 (file)
--- a/youtube_dl/extractor/yahoo.py
+++ b/youtube_dl/extractor/yahoo.py
@@ -12,6 +12,7 @@
  )
  from ..utils import (
      clean_html,
  )
  from ..utils import (
      clean_html,
+    ExtractorError,
      int_or_none,
      mimetype2ext,
      parse_iso8601,
      int_or_none,
      mimetype2ext,
      parse_iso8601,
@@ -368,31 +369,47 @@ class YahooGyaOPlayerIE(InfoExtractor):
          'url': 'https://gyao.yahoo.co.jp/episode/%E3%81%8D%E3%81%AE%E3%81%86%E4%BD%95%E9%A3%9F%E3%81%B9%E3%81%9F%EF%BC%9F%20%E7%AC%AC2%E8%A9%B1%202019%2F4%2F12%E6%94%BE%E9%80%81%E5%88%86/5cb02352-b725-409e-9f8d-88f947a9f682',
          'only_matching': True,
      }]
          'url': 'https://gyao.yahoo.co.jp/episode/%E3%81%8D%E3%81%AE%E3%81%86%E4%BD%95%E9%A3%9F%E3%81%B9%E3%81%9F%EF%BC%9F%20%E7%AC%AC2%E8%A9%B1%202019%2F4%2F12%E6%94%BE%E9%80%81%E5%88%86/5cb02352-b725-409e-9f8d-88f947a9f682',
          'only_matching': True,
      }]
+    _GEO_BYPASS = False
  
      def _real_extract(self, url):
          video_id = self._match_id(url).replace('/', ':')
  
      def _real_extract(self, url):
          video_id = self._match_id(url).replace('/', ':')
-        video = self._download_json(
-            'https://gyao.yahoo.co.jp/dam/v1/videos/' + video_id,
-            video_id, query={
-                'fields': 'longDescription,title,videoId',
-            }, headers={
-                'X-User-Agent': 'Unknown Pc GYAO!/2.0.0 Web',
-            })
+        headers = self.geo_verification_headers()
+        headers['Accept'] = 'application/json'
+        resp = self._download_json(
+            'https://gyao.yahoo.co.jp/apis/playback/graphql', video_id, query={
+                'appId': 'dj00aiZpPUNJeDh2cU1RazU3UCZzPWNvbnN1bWVyc2VjcmV0Jng9NTk-',
+                'query': '''{
+  content(parameter: {contentId: "%s", logicaAgent: PC_WEB}) {
+    video {
+      delivery {
+        id
+      }
+      title
+    }
+  }
+}''' % video_id,
+            }, headers=headers)
+        content = resp['data']['content']
+        if not content:
+            msg = resp['errors'][0]['message']
+            if msg == 'not in japan':
+                self.raise_geo_restricted(countries=['JP'])
+            raise ExtractorError(msg)
+        video = content['video']
          return {
              '_type': 'url_transparent',
              'id': video_id,
              'title': video['title'],
              'url': smuggle_url(
          return {
              '_type': 'url_transparent',
              'id': video_id,
              'title': video['title'],
              'url': smuggle_url(
-                'http://players.brightcove.net/4235717419001/SyG5P0gjb_default/index.html?videoId=' + video['videoId'],
+                'http://players.brightcove.net/4235717419001/SyG5P0gjb_default/index.html?videoId=' + video['delivery']['id'],
                  {'geo_countries': ['JP']}),
                  {'geo_countries': ['JP']}),
-            'description': video.get('longDescription'),
              'ie_key': BrightcoveNewIE.ie_key(),
          }
  
  
  class YahooGyaOIE(InfoExtractor):
      IE_NAME = 'yahoo:gyao'
              'ie_key': BrightcoveNewIE.ie_key(),
          }
  
  
  class YahooGyaOIE(InfoExtractor):
      IE_NAME = 'yahoo:gyao'
-    _VALID_URL = r'https?://(?:gyao\.yahoo\.co\.jp/(?:p|title/[^/]+)|streaming\.yahoo\.co\.jp/p/y)/(?P<id>\d+/v\d+|[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})'
+    _VALID_URL = r'https?://(?:gyao\.yahoo\.co\.jp/(?:p|title(?:/[^/]+)?)|streaming\.yahoo\.co\.jp/p/y)/(?P<id>\d+/v\d+|[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})'
      _TESTS = [{
          'url': 'https://gyao.yahoo.co.jp/p/00449/v03102/',
          'info_dict': {
      _TESTS = [{
          'url': 'https://gyao.yahoo.co.jp/p/00449/v03102/',
          'info_dict': {
@@ -405,6 +422,9 @@ class YahooGyaOIE(InfoExtractor):
      }, {
          'url': 'https://gyao.yahoo.co.jp/title/%E3%81%97%E3%82%83%E3%81%B9%E3%81%8F%E3%82%8A007/5b025a49-b2e5-4dc7-945c-09c6634afacf',
          'only_matching': True,
      }, {
          'url': 'https://gyao.yahoo.co.jp/title/%E3%81%97%E3%82%83%E3%81%B9%E3%81%8F%E3%82%8A007/5b025a49-b2e5-4dc7-945c-09c6634afacf',
          'only_matching': True,
+    }, {
+        'url': 'https://gyao.yahoo.co.jp/title/5b025a49-b2e5-4dc7-945c-09c6634afacf',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
      }]
  
      def _real_extract(self, url):
diff --git a/youtube_dl/extractor/youjizz.py b/youtube_dl/extractor/youjizz.py

index dff69fcb7aca250373fc0e70b2f8278ed2661755..88aabd272c98e944523f3b333174342ba23c9fe1 100644 (file)
--- a/youtube_dl/extractor/youjizz.py
+++ b/youtube_dl/extractor/youjizz.py
@@ -44,7 +44,7 @@ def _real_extract(self, url):
  
          encodings = self._parse_json(
              self._search_regex(
  
          encodings = self._parse_json(
              self._search_regex(
-                r'encodings\s*=\s*(\[.+?\]);\n', webpage, 'encodings',
+                r'[Ee]ncodings\s*=\s*(\[.+?\]);\n', webpage, 'encodings',
                  default='[]'),
              video_id, fatal=False)
          for encoding in encodings:
                  default='[]'),
              video_id, fatal=False)
          for encoding in encodings:
diff --git a/youtube_dl/extractor/youporn.py b/youtube_dl/extractor/youporn.py

index d4eccb4b2a48efafec0232a451b3ee617e6bc859..e7fca22dec9a17b10ef76efcd43c4b9e0e4da208 100644 (file)
--- a/youtube_dl/extractor/youporn.py
+++ b/youtube_dl/extractor/youporn.py
@@ -5,7 +5,6 @@
  from .common import InfoExtractor
  from ..utils import (
      int_or_none,
  from .common import InfoExtractor
  from ..utils import (
      int_or_none,
-    sanitized_Request,
      str_to_int,
      unescapeHTML,
      unified_strdate,
      str_to_int,
      unescapeHTML,
      unified_strdate,
@@ -15,7 +14,7 @@
  
  
  class YouPornIE(InfoExtractor):
  
  
  class YouPornIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?youporn\.com/watch/(?P<id>\d+)/(?P<display_id>[^/?#&]+)'
+    _VALID_URL = r'https?://(?:www\.)?youporn\.com/(?:watch|embed)/(?P<id>\d+)(?:/(?P<display_id>[^/?#&]+))?'
      _TESTS = [{
          'url': 'http://www.youporn.com/watch/505835/sex-ed-is-it-safe-to-masturbate-daily/',
          'md5': '3744d24c50438cf5b6f6d59feb5055c2',
      _TESTS = [{
          'url': 'http://www.youporn.com/watch/505835/sex-ed-is-it-safe-to-masturbate-daily/',
          'md5': '3744d24c50438cf5b6f6d59feb5055c2',
@@ -57,16 +56,28 @@ class YouPornIE(InfoExtractor):
          'params': {
              'skip_download': True,
          },
          'params': {
              'skip_download': True,
          },
+    }, {
+        'url': 'https://www.youporn.com/embed/505835/sex-ed-is-it-safe-to-masturbate-daily/',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.youporn.com/watch/505835',
+        'only_matching': True,
      }]
  
      }]
  
+    @staticmethod
+    def _extract_urls(webpage):
+        return re.findall(
+            r'<iframe[^>]+\bsrc=["\']((?:https?:)?//(?:www\.)?youporn\.com/embed/\d+)',
+            webpage)
+
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
          video_id = mobj.group('id')
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
          video_id = mobj.group('id')
-        display_id = mobj.group('display_id')
+        display_id = mobj.group('display_id') or video_id
  
  
-        request = sanitized_Request(url)
-        request.add_header('Cookie', 'age_verified=1')
-        webpage = self._download_webpage(request, display_id)
+        webpage = self._download_webpage(
+            'http://www.youporn.com/watch/%s' % video_id, display_id,
+            headers={'Cookie': 'age_verified=1'})
  
          title = self._html_search_regex(
              r'(?s)<div[^>]+class=["\']watchVideoTitle[^>]+>(.+?)</div>',
  
          title = self._html_search_regex(
              r'(?s)<div[^>]+class=["\']watchVideoTitle[^>]+>(.+?)</div>',
diff --git a/youtube_dl/extractor/yourporn.py b/youtube_dl/extractor/yourporn.py

index 8a2d5f63bdb929edd310c250b9242bd2ed409208..98347491ee00b66d2c2f8df69ddf663c1bffbd84 100644 (file)
--- a/youtube_dl/extractor/yourporn.py
+++ b/youtube_dl/extractor/yourporn.py
@@ -1,6 +1,7 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
+from ..compat import compat_str
  from ..utils import (
      parse_duration,
      urljoin,
  from ..utils import (
      parse_duration,
      urljoin,
@@ -8,9 +9,9 @@
  
  
  class YourPornIE(InfoExtractor):
  
  
  class YourPornIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?(?:yourporn\.sexy|sxyprn\.com)/post/(?P<id>[^/?#&.]+)'
+    _VALID_URL = r'https?://(?:www\.)?sxyprn\.com/post/(?P<id>[^/?#&.]+)'
      _TESTS = [{
      _TESTS = [{
-        'url': 'https://yourporn.sexy/post/57ffcb2e1179b.html',
+        'url': 'https://sxyprn.com/post/57ffcb2e1179b.html',
          'md5': '6f8682b6464033d87acaa7a8ff0c092e',
          'info_dict': {
              'id': '57ffcb2e1179b',
          'md5': '6f8682b6464033d87acaa7a8ff0c092e',
          'info_dict': {
              'id': '57ffcb2e1179b',
@@ -33,11 +34,19 @@ def _real_extract(self, url):
  
          webpage = self._download_webpage(url, video_id)
  
  
          webpage = self._download_webpage(url, video_id)
  
-        video_url = urljoin(url, self._parse_json(
+        parts = self._parse_json(
              self._search_regex(
                  r'data-vnfo=(["\'])(?P<data>{.+?})\1', webpage, 'data info',
                  group='data'),
              self._search_regex(
                  r'data-vnfo=(["\'])(?P<data>{.+?})\1', webpage, 'data info',
                  group='data'),
-            video_id)[video_id]).replace('/cdn/', '/cdn5/')
+            video_id)[video_id].split('/')
+
+        num = 0
+        for c in parts[6] + parts[7]:
+            if c.isnumeric():
+                num += int(c)
+        parts[5] = compat_str(int(parts[5]) - num)
+        parts[1] += '8'
+        video_url = urljoin(url, '/'.join(parts))
  
          title = (self._search_regex(
              r'<[^>]+\bclass=["\']PostEditTA[^>]+>([^<]+)', webpage, 'title',
  
          title = (self._search_regex(
              r'<[^>]+\bclass=["\']PostEditTA[^>]+>([^<]+)', webpage, 'title',
@@ -54,4 +63,5 @@ def _real_extract(self, url):
              'thumbnail': thumbnail,
              'duration': duration,
              'age_limit': 18,
              'thumbnail': thumbnail,
              'duration': duration,
              'age_limit': 18,
+            'ext': 'mp4',
          }
          }
diff --git a/youtube_dl/extractor/youtube.py b/youtube_dl/extractor/youtube.py

index b913d07a63920de2c56570644f7b8175afd5fd8a..b35bf03aafc7c7c45b3c35735a68d00f86aed988 100644 (file)
--- a/youtube_dl/extractor/youtube.py
+++ b/youtube_dl/extractor/youtube.py
@@ -29,7 +29,6 @@
  from ..utils import (
      bool_or_none,
      clean_html,
  from ..utils import (
      bool_or_none,
      clean_html,
-    dict_get,
      error_to_compat_str,
      extract_attributes,
      ExtractorError,
      error_to_compat_str,
      extract_attributes,
      ExtractorError,
@@ -71,9 +70,14 @@ class YoutubeBaseInfoExtractor(InfoExtractor):
  
      _PLAYLIST_ID_RE = r'(?:PL|LL|EC|UU|FL|RD|UL|TL|PU|OLAK5uy_)[0-9A-Za-z-_]{10,}'
  
  
      _PLAYLIST_ID_RE = r'(?:PL|LL|EC|UU|FL|RD|UL|TL|PU|OLAK5uy_)[0-9A-Za-z-_]{10,}'
  
+    _YOUTUBE_CLIENT_HEADERS = {
+        'x-youtube-client-name': '1',
+        'x-youtube-client-version': '1.20200609.04.02',
+    }
+
      def _set_language(self):
          self._set_cookie(
      def _set_language(self):
          self._set_cookie(
-            '.youtube.com', 'PREF', 'f1=50000000&hl=en',
+            '.youtube.com', 'PREF', 'f1=50000000&f6=8&hl=en',
              # YouTube sets the expire time to about two months
              expire_time=time.time() + 2 * 30 * 24 * 3600)
  
              # YouTube sets the expire time to about two months
              expire_time=time.time() + 2 * 30 * 24 * 3600)
  
@@ -299,10 +303,11 @@ def _entries(self, page, playlist_id):
                      # Downloading page may result in intermittent 5xx HTTP error
                      # that is usually worked around with a retry
                      more = self._download_json(
                      # Downloading page may result in intermittent 5xx HTTP error
                      # that is usually worked around with a retry
                      more = self._download_json(
-                        'https://youtube.com/%s' % mobj.group('more'), playlist_id,
+                        'https://www.youtube.com/%s' % mobj.group('more'), playlist_id,
                          'Downloading page #%s%s'
                          % (page_num, ' (retry #%d)' % count if count else ''),
                          'Downloading page #%s%s'
                          % (page_num, ' (retry #%d)' % count if count else ''),
-                        transform_source=uppercase_escape)
+                        transform_source=uppercase_escape,
+                        headers=self._YOUTUBE_CLIENT_HEADERS)
                      break
                  except ExtractorError as e:
                      if isinstance(e.cause, compat_HTTPError) and e.cause.code in (500, 503):
                      break
                  except ExtractorError as e:
                      if isinstance(e.cause, compat_HTTPError) and e.cause.code in (500, 503):
@@ -389,8 +394,15 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                              (?:www\.)?invidious\.drycat\.fr/|
                              (?:www\.)?tube\.poal\.co/|
                              (?:www\.)?vid\.wxzm\.sx/|
                              (?:www\.)?invidious\.drycat\.fr/|
                              (?:www\.)?tube\.poal\.co/|
                              (?:www\.)?vid\.wxzm\.sx/|
+                            (?:www\.)?yewtu\.be/|
                              (?:www\.)?yt\.elukerio\.org/|
                              (?:www\.)?yt\.lelux\.fi/|
                              (?:www\.)?yt\.elukerio\.org/|
                              (?:www\.)?yt\.lelux\.fi/|
+                            (?:www\.)?invidious\.ggc-project\.de/|
+                            (?:www\.)?yt\.maisputain\.ovh/|
+                            (?:www\.)?invidious\.13ad\.de/|
+                            (?:www\.)?invidious\.toot\.koeln/|
+                            (?:www\.)?invidious\.fdn\.fr/|
+                            (?:www\.)?watch\.nettohikari\.com/|
                              (?:www\.)?kgg2m7yk5aybusll\.onion/|
                              (?:www\.)?qklhadlycap4cnod\.onion/|
                              (?:www\.)?axqzx4s6s54s32yentfqojs3x5i7faxza6xo3ehd4bzzsg2ii4fv2iid\.onion/|
                              (?:www\.)?kgg2m7yk5aybusll\.onion/|
                              (?:www\.)?qklhadlycap4cnod\.onion/|
                              (?:www\.)?axqzx4s6s54s32yentfqojs3x5i7faxza6xo3ehd4bzzsg2ii4fv2iid\.onion/|
@@ -398,6 +410,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                              (?:www\.)?fz253lmuao3strwbfbmx46yu7acac2jz27iwtorgmbqlkurlclmancad\.onion/|
                              (?:www\.)?invidious\.l4qlywnpwqsluw65ts7md3khrivpirse744un3x7mlskqauz5pyuzgqd\.onion/|
                              (?:www\.)?owxfohz4kjyv25fvlqilyxast7inivgiktls3th44jhk3ej3i7ya\.b32\.i2p/|
                              (?:www\.)?fz253lmuao3strwbfbmx46yu7acac2jz27iwtorgmbqlkurlclmancad\.onion/|
                              (?:www\.)?invidious\.l4qlywnpwqsluw65ts7md3khrivpirse744un3x7mlskqauz5pyuzgqd\.onion/|
                              (?:www\.)?owxfohz4kjyv25fvlqilyxast7inivgiktls3th44jhk3ej3i7ya\.b32\.i2p/|
+                            (?:www\.)?4l2dgddgsrkf2ous66i6seeyi6etzfgrue332grh2n7madpwopotugyd\.onion/|
                              youtube\.googleapis\.com/)                        # the various hostnames, with wildcard subdomains
                           (?:.*?\#/)?                                          # handle anchor (#/) redirect urls
                           (?:                                                  # the various things that can precede the ID:
                              youtube\.googleapis\.com/)                        # the various hostnames, with wildcard subdomains
                           (?:.*?\#/)?                                          # handle anchor (#/) redirect urls
                           (?:                                                  # the various things that can precede the ID:
@@ -427,6 +440,10 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                       (?(1).+)?                                                # if we found the ID, everything can follow
                       $""" % {'playlist_id': YoutubeBaseInfoExtractor._PLAYLIST_ID_RE}
      _NEXT_URL_RE = r'[\?&]next_url=([^&]+)'
                       (?(1).+)?                                                # if we found the ID, everything can follow
                       $""" % {'playlist_id': YoutubeBaseInfoExtractor._PLAYLIST_ID_RE}
      _NEXT_URL_RE = r'[\?&]next_url=([^&]+)'
+    _PLAYER_INFO_RE = (
+        r'/(?P<id>[a-zA-Z0-9_-]{8,})/player_ias\.vflset(?:/[a-zA-Z]{2,3}_[a-zA-Z]{2,3})?/base\.(?P<ext>[a-z]+)$',
+        r'\b(?P<id>vfl[a-zA-Z0-9_-]+)\b.*?\.(?P<ext>[a-z]+)$',
+    )
      _formats = {
          '5': {'ext': 'flv', 'width': 400, 'height': 240, 'acodec': 'mp3', 'abr': 64, 'vcodec': 'h263'},
          '6': {'ext': 'flv', 'width': 450, 'height': 270, 'acodec': 'mp3', 'abr': 64, 'vcodec': 'h263'},
      _formats = {
          '5': {'ext': 'flv', 'width': 400, 'height': 240, 'acodec': 'mp3', 'abr': 64, 'vcodec': 'h263'},
          '6': {'ext': 'flv', 'width': 450, 'height': 270, 'acodec': 'mp3', 'abr': 64, 'vcodec': 'h263'},
@@ -570,7 +587,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                  'upload_date': '20120506',
                  'title': 'Icona Pop - I Love It (feat. Charli XCX) [OFFICIAL VIDEO]',
                  'alt_title': 'I Love It (feat. Charli XCX)',
                  'upload_date': '20120506',
                  'title': 'Icona Pop - I Love It (feat. Charli XCX) [OFFICIAL VIDEO]',
                  'alt_title': 'I Love It (feat. Charli XCX)',
-                'description': 'md5:f3ceb5ef83a08d95b9d146f973157cc8',
+                'description': 'md5:19a2f98d9032b9311e686ed039564f63',
                  'tags': ['Icona Pop i love it', 'sweden', 'pop music', 'big beat records', 'big beat', 'charli',
                           'xcx', 'charli xcx', 'girls', 'hbo', 'i love it', "i don't care", 'icona', 'pop',
                           'iconic ep', 'iconic', 'love', 'it'],
                  'tags': ['Icona Pop i love it', 'sweden', 'pop music', 'big beat records', 'big beat', 'charli',
                           'xcx', 'charli xcx', 'girls', 'hbo', 'i love it', "i don't care", 'icona', 'pop',
                           'iconic ep', 'iconic', 'love', 'it'],
@@ -685,12 +702,11 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                  'id': 'nfWlot6h_JM',
                  'ext': 'm4a',
                  'title': 'Taylor Swift - Shake It Off',
                  'id': 'nfWlot6h_JM',
                  'ext': 'm4a',
                  'title': 'Taylor Swift - Shake It Off',
-                'description': 'md5:bec2185232c05479482cb5a9b82719bf',
+                'description': 'md5:307195cd21ff7fa352270fe884570ef0',
                  'duration': 242,
                  'uploader': 'TaylorSwiftVEVO',
                  'uploader_id': 'TaylorSwiftVEVO',
                  'upload_date': '20140818',
                  'duration': 242,
                  'uploader': 'TaylorSwiftVEVO',
                  'uploader_id': 'TaylorSwiftVEVO',
                  'upload_date': '20140818',
-                'creator': 'Taylor Swift',
              },
              'params': {
                  'youtube_include_dash_manifest': True,
              },
              'params': {
                  'youtube_include_dash_manifest': True,
@@ -755,11 +771,11 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                  'upload_date': '20100430',
                  'uploader_id': 'deadmau5',
                  'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/deadmau5',
                  'upload_date': '20100430',
                  'uploader_id': 'deadmau5',
                  'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/deadmau5',
-                'creator': 'deadmau5',
+                'creator': 'Dada Life, deadmau5',
                  'description': 'md5:12c56784b8032162bb936a5f76d55360',
                  'uploader': 'deadmau5',
                  'title': 'Deadmau5 - Some Chords (HD)',
                  'description': 'md5:12c56784b8032162bb936a5f76d55360',
                  'uploader': 'deadmau5',
                  'title': 'Deadmau5 - Some Chords (HD)',
-                'alt_title': 'Some Chords',
+                'alt_title': 'This Machine Kills Some Chords',
              },
              'expected_warnings': [
                  'DASH manifest missing',
              },
              'expected_warnings': [
                  'DASH manifest missing',
@@ -1135,6 +1151,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                  'skip_download': True,
                  'youtube_include_dash_manifest': False,
              },
                  'skip_download': True,
                  'youtube_include_dash_manifest': False,
              },
+            'skip': 'not actual anymore',
          },
          {
              # Youtube Music Auto-generated description
          },
          {
              # Youtube Music Auto-generated description
@@ -1145,8 +1162,8 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                  'title': 'Voyeur Girl',
                  'description': 'md5:7ae382a65843d6df2685993e90a8628f',
                  'upload_date': '20190312',
                  'title': 'Voyeur Girl',
                  'description': 'md5:7ae382a65843d6df2685993e90a8628f',
                  'upload_date': '20190312',
-                'uploader': 'Various Artists - Topic',
-                'uploader_id': 'UCVWKBi1ELZn0QX2CBLSkiyw',
+                'uploader': 'Stephen - Topic',
+                'uploader_id': 'UC-pWHpBjdGG69N9mM2auIAA',
                  'artist': 'Stephen',
                  'track': 'Voyeur Girl',
                  'album': 'it\'s too much love to know my dear',
                  'artist': 'Stephen',
                  'track': 'Voyeur Girl',
                  'album': 'it\'s too much love to know my dear',
@@ -1210,7 +1227,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                  'id': '-hcAI0g-f5M',
                  'ext': 'mp4',
                  'title': 'Put It On Me',
                  'id': '-hcAI0g-f5M',
                  'ext': 'mp4',
                  'title': 'Put It On Me',
-                'description': 'md5:93c55acc682ae7b0c668f2e34e1c069e',
+                'description': 'md5:f6422397c07c4c907c6638e1fee380a5',
                  'upload_date': '20180426',
                  'uploader': 'Matt Maeson - Topic',
                  'uploader_id': 'UCnEkIGqtGcQMLk73Kp-Q5LQ',
                  'upload_date': '20180426',
                  'uploader': 'Matt Maeson - Topic',
                  'uploader_id': 'UCnEkIGqtGcQMLk73Kp-Q5LQ',
@@ -1228,6 +1245,26 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
              'url': 'https://www.youtubekids.com/watch?v=3b8nCWDgZ6Q',
              'only_matching': True,
          },
              'url': 'https://www.youtubekids.com/watch?v=3b8nCWDgZ6Q',
              'only_matching': True,
          },
+        {
+            # invalid -> valid video id redirection
+            'url': 'DJztXj2GPfl',
+            'info_dict': {
+                'id': 'DJztXj2GPfk',
+                'ext': 'mp4',
+                'title': 'Panjabi MC - Mundian To Bach Ke (The Dictator Soundtrack)',
+                'description': 'md5:bf577a41da97918e94fa9798d9228825',
+                'upload_date': '20090125',
+                'uploader': 'Prochorowka',
+                'uploader_id': 'Prochorowka',
+                'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/Prochorowka',
+                'artist': 'Panjabi MC',
+                'track': 'Beware of the Boys (Mundian to Bach Ke) - Motivo Hi-Lectro Remix',
+                'album': 'Beware of the Boys (Mundian To Bach Ke)',
+            },
+            'params': {
+                'skip_download': True,
+            },
+        }
      ]
  
      def __init__(self, *args, **kwargs):
      ]
  
      def __init__(self, *args, **kwargs):
@@ -1254,14 +1291,18 @@ def _signature_cache_id(self, example_sig):
          """ Return a string representation of a signature """
          return '.'.join(compat_str(len(part)) for part in example_sig.split('.'))
  
          """ Return a string representation of a signature """
          return '.'.join(compat_str(len(part)) for part in example_sig.split('.'))
  
-    def _extract_signature_function(self, video_id, player_url, example_sig):
-        id_m = re.match(
-            r'.*?-(?P<id>[a-zA-Z0-9_-]+)(?:/watch_as3|/html5player(?:-new)?|(?:/[a-z]{2,3}_[A-Z]{2})?/base)?\.(?P<ext>[a-z]+)$',
-            player_url)
-        if not id_m:
+    @classmethod
+    def _extract_player_info(cls, player_url):
+        for player_re in cls._PLAYER_INFO_RE:
+            id_m = re.search(player_re, player_url)
+            if id_m:
+                break
+        else:
              raise ExtractorError('Cannot identify player %r' % player_url)
              raise ExtractorError('Cannot identify player %r' % player_url)
-        player_type = id_m.group('ext')
-        player_id = id_m.group('id')
+        return id_m.group('ext'), id_m.group('id')
+
+    def _extract_signature_function(self, video_id, player_url, example_sig):
+        player_type, player_id = self._extract_player_info(player_url)
  
          # Read from filesystem cache
          func_id = '%s_%s_%s' % (
  
          # Read from filesystem cache
          func_id = '%s_%s_%s' % (
@@ -1343,6 +1384,7 @@ def _parse_sig_js(self, jscode):
          funcname = self._search_regex(
              (r'\b[cs]\s*&&\s*[adf]\.set\([^,]+\s*,\s*encodeURIComponent\s*\(\s*(?P<sig>[a-zA-Z0-9$]+)\(',
               r'\b[a-zA-Z0-9]+\s*&&\s*[a-zA-Z0-9]+\.set\([^,]+\s*,\s*encodeURIComponent\s*\(\s*(?P<sig>[a-zA-Z0-9$]+)\(',
          funcname = self._search_regex(
              (r'\b[cs]\s*&&\s*[adf]\.set\([^,]+\s*,\s*encodeURIComponent\s*\(\s*(?P<sig>[a-zA-Z0-9$]+)\(',
               r'\b[a-zA-Z0-9]+\s*&&\s*[a-zA-Z0-9]+\.set\([^,]+\s*,\s*encodeURIComponent\s*\(\s*(?P<sig>[a-zA-Z0-9$]+)\(',
+             r'(?:\b|[^a-zA-Z0-9$])(?P<sig>[a-zA-Z0-9$]{2})\s*=\s*function\(\s*a\s*\)\s*{\s*a\s*=\s*a\.split\(\s*""\s*\)',
               r'(?P<sig>[a-zA-Z0-9$]+)\s*=\s*function\(\s*a\s*\)\s*{\s*a\s*=\s*a\.split\(\s*""\s*\)',
               # Obsolete patterns
               r'(["\'])signature\1\s*,\s*(?P<sig>[a-zA-Z0-9$]+)\(',
               r'(?P<sig>[a-zA-Z0-9$]+)\s*=\s*function\(\s*a\s*\)\s*{\s*a\s*=\s*a\.split\(\s*""\s*\)',
               # Obsolete patterns
               r'(["\'])signature\1\s*,\s*(?P<sig>[a-zA-Z0-9$]+)\(',
@@ -1616,8 +1658,63 @@ def extract_id(cls, url):
          video_id = mobj.group(2)
          return video_id
  
          video_id = mobj.group(2)
          return video_id
  
+    def _extract_chapters_from_json(self, webpage, video_id, duration):
+        if not webpage:
+            return
+        player = self._parse_json(
+            self._search_regex(
+                r'RELATED_PLAYER_ARGS["\']\s*:\s*({.+})\s*,?\s*\n', webpage,
+                'player args', default='{}'),
+            video_id, fatal=False)
+        if not player or not isinstance(player, dict):
+            return
+        watch_next_response = player.get('watch_next_response')
+        if not isinstance(watch_next_response, compat_str):
+            return
+        response = self._parse_json(watch_next_response, video_id, fatal=False)
+        if not response or not isinstance(response, dict):
+            return
+        chapters_list = try_get(
+            response,
+            lambda x: x['playerOverlays']
+                       ['playerOverlayRenderer']
+                       ['decoratedPlayerBarRenderer']
+                       ['decoratedPlayerBarRenderer']
+                       ['playerBar']
+                       ['chapteredPlayerBarRenderer']
+                       ['chapters'],
+            list)
+        if not chapters_list:
+            return
+
+        def chapter_time(chapter):
+            return float_or_none(
+                try_get(
+                    chapter,
+                    lambda x: x['chapterRenderer']['timeRangeStartMillis'],
+                    int),
+                scale=1000)
+        chapters = []
+        for next_num, chapter in enumerate(chapters_list, start=1):
+            start_time = chapter_time(chapter)
+            if start_time is None:
+                continue
+            end_time = (chapter_time(chapters_list[next_num])
+                        if next_num < len(chapters_list) else duration)
+            if end_time is None:
+                continue
+            title = try_get(
+                chapter, lambda x: x['chapterRenderer']['title']['simpleText'],
+                compat_str)
+            chapters.append({
+                'start_time': start_time,
+                'end_time': end_time,
+                'title': title,
+            })
+        return chapters
+
      @staticmethod
      @staticmethod
-    def _extract_chapters(description, duration):
+    def _extract_chapters_from_description(description, duration):
          if not description:
              return None
          chapter_lines = re.findall(
          if not description:
              return None
          chapter_lines = re.findall(
@@ -1651,6 +1748,10 @@ def _extract_chapters(description, duration):
              })
          return chapters
  
              })
          return chapters
  
+    def _extract_chapters(self, webpage, description, video_id, duration):
+        return (self._extract_chapters_from_json(webpage, video_id, duration)
+                or self._extract_chapters_from_description(description, duration))
+
      def _real_extract(self, url):
          url, smuggled_data = unsmuggle_url(url, {})
  
      def _real_extract(self, url):
          url, smuggled_data = unsmuggle_url(url, {})
  
@@ -1678,7 +1779,10 @@ def _real_extract(self, url):
  
          # Get video webpage
          url = proto + '://www.youtube.com/watch?v=%s&gl=US&hl=en&has_verified=1&bpctr=9999999999' % video_id
  
          # Get video webpage
          url = proto + '://www.youtube.com/watch?v=%s&gl=US&hl=en&has_verified=1&bpctr=9999999999' % video_id
-        video_webpage = self._download_webpage(url, video_id)
+        video_webpage, urlh = self._download_webpage_handle(url, video_id)
+
+        qs = compat_parse_qs(compat_urllib_parse_urlparse(urlh.geturl()).query)
+        video_id = qs.get('v', [None])[0] or video_id
  
          # Attempt to extract SWF player URL
          mobj = re.search(r'swfConfig.*?"(https?:\\/\\/.*?watch.*?-.*?\.swf)"', video_webpage)
  
          # Attempt to extract SWF player URL
          mobj = re.search(r'swfConfig.*?"(https?:\\/\\/.*?watch.*?-.*?\.swf)"', video_webpage)
@@ -1707,9 +1811,6 @@ def add_dash_mpd_pr(pl_response):
          def extract_view_count(v_info):
              return int_or_none(try_get(v_info, lambda x: x['view_count'][0]))
  
          def extract_view_count(v_info):
              return int_or_none(try_get(v_info, lambda x: x['view_count'][0]))
  
-        def extract_token(v_info):
-            return dict_get(v_info, ('account_playback_token', 'accountPlaybackToken', 'token'))
-
          def extract_player_response(player_response, video_id):
              pl_response = str_or_none(player_response)
              if not pl_response:
          def extract_player_response(player_response, video_id):
              pl_response = str_or_none(player_response)
              if not pl_response:
@@ -1722,6 +1823,7 @@ def extract_player_response(player_response, video_id):
          player_response = {}
  
          # Get video info
          player_response = {}
  
          # Get video info
+        video_info = {}
          embed_webpage = None
          if re.search(r'player-age-gate-content">', video_webpage) is not None:
              age_gate = True
          embed_webpage = None
          if re.search(r'player-age-gate-content">', video_webpage) is not None:
              age_gate = True
@@ -1736,19 +1838,21 @@ def extract_player_response(player_response, video_id):
                      r'"sts"\s*:\s*(\d+)', embed_webpage, 'sts', default=''),
              })
              video_info_url = proto + '://www.youtube.com/get_video_info?' + data
                      r'"sts"\s*:\s*(\d+)', embed_webpage, 'sts', default=''),
              })
              video_info_url = proto + '://www.youtube.com/get_video_info?' + data
-            video_info_webpage = self._download_webpage(
-                video_info_url, video_id,
-                note='Refetching age-gated info webpage',
-                errnote='unable to download video info webpage')
-            video_info = compat_parse_qs(video_info_webpage)
-            pl_response = video_info.get('player_response', [None])[0]
-            player_response = extract_player_response(pl_response, video_id)
-            add_dash_mpd(video_info)
-            view_count = extract_view_count(video_info)
+            try:
+                video_info_webpage = self._download_webpage(
+                    video_info_url, video_id,
+                    note='Refetching age-gated info webpage',
+                    errnote='unable to download video info webpage')
+            except ExtractorError:
+                video_info_webpage = None
+            if video_info_webpage:
+                video_info = compat_parse_qs(video_info_webpage)
+                pl_response = video_info.get('player_response', [None])[0]
+                player_response = extract_player_response(pl_response, video_id)
+                add_dash_mpd(video_info)
+                view_count = extract_view_count(video_info)
          else:
              age_gate = False
          else:
              age_gate = False
-            video_info = None
-            sts = None
              # Try looking directly into the video webpage
              ytplayer_config = self._get_ytplayer_config(video_id, video_webpage)
              if ytplayer_config:
              # Try looking directly into the video webpage
              ytplayer_config = self._get_ytplayer_config(video_id, video_webpage)
              if ytplayer_config:
@@ -1765,61 +1869,10 @@ def extract_player_response(player_response, video_id):
                          args['ypc_vid'], YoutubeIE.ie_key(), video_id=args['ypc_vid'])
                  if args.get('livestream') == '1' or args.get('live_playback') == 1:
                      is_live = True
                          args['ypc_vid'], YoutubeIE.ie_key(), video_id=args['ypc_vid'])
                  if args.get('livestream') == '1' or args.get('live_playback') == 1:
                      is_live = True
-                sts = ytplayer_config.get('sts')
                  if not player_response:
                      player_response = extract_player_response(args.get('player_response'), video_id)
              if not video_info or self._downloader.params.get('youtube_include_dash_manifest', True):
                  add_dash_mpd_pr(player_response)
                  if not player_response:
                      player_response = extract_player_response(args.get('player_response'), video_id)
              if not video_info or self._downloader.params.get('youtube_include_dash_manifest', True):
                  add_dash_mpd_pr(player_response)
-                # We also try looking in get_video_info since it may contain different dashmpd
-                # URL that points to a DASH manifest with possibly different itag set (some itags
-                # are missing from DASH manifest pointed by webpage's dashmpd, some - from DASH
-                # manifest pointed by get_video_info's dashmpd).
-                # The general idea is to take a union of itags of both DASH manifests (for example
-                # video with such 'manifest behavior' see https://github.com/ytdl-org/youtube-dl/issues/6093)
-                self.report_video_info_webpage_download(video_id)
-                for el in ('embedded', 'detailpage', 'vevo', ''):
-                    query = {
-                        'video_id': video_id,
-                        'ps': 'default',
-                        'eurl': '',
-                        'gl': 'US',
-                        'hl': 'en',
-                    }
-                    if el:
-                        query['el'] = el
-                    if sts:
-                        query['sts'] = sts
-                    video_info_webpage = self._download_webpage(
-                        '%s://www.youtube.com/get_video_info' % proto,
-                        video_id, note=False,
-                        errnote='unable to download video info webpage',
-                        fatal=False, query=query)
-                    if not video_info_webpage:
-                        continue
-                    get_video_info = compat_parse_qs(video_info_webpage)
-                    if not player_response:
-                        pl_response = get_video_info.get('player_response', [None])[0]
-                        player_response = extract_player_response(pl_response, video_id)
-                    add_dash_mpd(get_video_info)
-                    if view_count is None:
-                        view_count = extract_view_count(get_video_info)
-                    if not video_info:
-                        video_info = get_video_info
-                    get_token = extract_token(get_video_info)
-                    if get_token:
-                        # Different get_video_info requests may report different results, e.g.
-                        # some may report video unavailability, but some may serve it without
-                        # any complaint (see https://github.com/ytdl-org/youtube-dl/issues/7362,
-                        # the original webpage as well as el=info and el=embedded get_video_info
-                        # requests report video unavailability due to geo restriction while
-                        # el=detailpage succeeds and returns valid data). This is probably
-                        # due to YouTube measures against IP ranges of hosting providers.
-                        # Working around by preferring the first succeeded video_info containing
-                        # the token if no such video_info yet was found.
-                        token = extract_token(video_info)
-                        if not token:
-                            video_info = get_video_info
-                        break
  
          def extract_unavailable_message():
              messages = []
  
          def extract_unavailable_message():
              messages = []
@@ -1832,16 +1885,22 @@ def extract_unavailable_message():
              if messages:
                  return '\n'.join(messages)
  
              if messages:
                  return '\n'.join(messages)
  
-        if not video_info:
+        if not video_info and not player_response:
              unavailable_message = extract_unavailable_message()
              if not unavailable_message:
                  unavailable_message = 'Unable to extract video data'
              raise ExtractorError(
                  'YouTube said: %s' % unavailable_message, expected=True, video_id=video_id)
  
              unavailable_message = extract_unavailable_message()
              if not unavailable_message:
                  unavailable_message = 'Unable to extract video data'
              raise ExtractorError(
                  'YouTube said: %s' % unavailable_message, expected=True, video_id=video_id)
  
+        if not isinstance(video_info, dict):
+            video_info = {}
+
          video_details = try_get(
              player_response, lambda x: x['videoDetails'], dict) or {}
  
          video_details = try_get(
              player_response, lambda x: x['videoDetails'], dict) or {}
  
+        microformat = try_get(
+            player_response, lambda x: x['microformat']['playerMicroformatRenderer'], dict) or {}
+
          video_title = video_info.get('title', [None])[0] or video_details.get('title')
          if not video_title:
              self._downloader.report_warning('Unable to extract video title')
          video_title = video_info.get('title', [None])[0] or video_details.get('title')
          if not video_title:
              self._downloader.report_warning('Unable to extract video title')
@@ -1871,7 +1930,7 @@ def replace_url(m):
              ''', replace_url, video_description)
              video_description = clean_html(video_description)
          else:
              ''', replace_url, video_description)
              video_description = clean_html(video_description)
          else:
-            video_description = self._html_search_meta('description', video_webpage) or video_details.get('shortDescription')
+            video_description = video_details.get('shortDescription') or self._html_search_meta('description', video_webpage)
  
          if not smuggled_data.get('force_singlefeed', False):
              if not self._downloader.params.get('noplaylist'):
  
          if not smuggled_data.get('force_singlefeed', False):
              if not self._downloader.params.get('noplaylist'):
@@ -1888,15 +1947,26 @@ def replace_url(m):
                          # fields may contain comma as well (see
                          # https://github.com/ytdl-org/youtube-dl/issues/8536)
                          feed_data = compat_parse_qs(compat_urllib_parse_unquote_plus(feed))
                          # fields may contain comma as well (see
                          # https://github.com/ytdl-org/youtube-dl/issues/8536)
                          feed_data = compat_parse_qs(compat_urllib_parse_unquote_plus(feed))
+
+                        def feed_entry(name):
+                            return try_get(feed_data, lambda x: x[name][0], compat_str)
+
+                        feed_id = feed_entry('id')
+                        if not feed_id:
+                            continue
+                        feed_title = feed_entry('title')
+                        title = video_title
+                        if feed_title:
+                            title += ' (%s)' % feed_title
                          entries.append({
                              '_type': 'url_transparent',
                              'ie_key': 'Youtube',
                              'url': smuggle_url(
                                  '%s://www.youtube.com/watch?v=%s' % (proto, feed_data['id'][0]),
                                  {'force_singlefeed': True}),
                          entries.append({
                              '_type': 'url_transparent',
                              'ie_key': 'Youtube',
                              'url': smuggle_url(
                                  '%s://www.youtube.com/watch?v=%s' % (proto, feed_data['id'][0]),
                                  {'force_singlefeed': True}),
-                            'title': '%s (%s)' % (video_title, feed_data['title'][0]),
+                            'title': title,
                          })
                          })
-                        feed_ids.append(feed_data['id'][0])
+                        feed_ids.append(feed_id)
                      self.to_screen(
                          'Downloading multifeed video (%s) - add --no-playlist to just download video %s'
                          % (', '.join(feed_ids), video_id))
                      self.to_screen(
                          'Downloading multifeed video (%s) - add --no-playlist to just download video %s'
                          % (', '.join(feed_ids), video_id))
@@ -1908,6 +1978,8 @@ def replace_url(m):
              view_count = extract_view_count(video_info)
          if view_count is None and video_details:
              view_count = int_or_none(video_details.get('viewCount'))
              view_count = extract_view_count(video_info)
          if view_count is None and video_details:
              view_count = int_or_none(video_details.get('viewCount'))
+        if view_count is None and microformat:
+            view_count = int_or_none(microformat.get('viewCount'))
  
          if is_live is None:
              is_live = bool_or_none(video_details.get('isLive'))
  
          if is_live is None:
              is_live = bool_or_none(video_details.get('isLive'))
@@ -1967,12 +2039,12 @@ def _extract_filesize(media_url):
                  }
  
              for fmt in streaming_formats:
                  }
  
              for fmt in streaming_formats:
-                if fmt.get('drm_families'):
+                if fmt.get('drmFamilies') or fmt.get('drm_families'):
                      continue
                  url = url_or_none(fmt.get('url'))
  
                  if not url:
                      continue
                  url = url_or_none(fmt.get('url'))
  
                  if not url:
-                    cipher = fmt.get('cipher')
+                    cipher = fmt.get('cipher') or fmt.get('signatureCipher')
                      if not cipher:
                          continue
                      url_data = compat_parse_qs(cipher)
                      if not cipher:
                          continue
                      url_data = compat_parse_qs(cipher)
@@ -2023,22 +2095,10 @@ def _extract_filesize(media_url):
  
                          if self._downloader.params.get('verbose'):
                              if player_url is None:
  
                          if self._downloader.params.get('verbose'):
                              if player_url is None:
-                                player_version = 'unknown'
                                  player_desc = 'unknown'
                              else:
                                  player_desc = 'unknown'
                              else:
-                                if player_url.endswith('swf'):
-                                    player_version = self._search_regex(
-                                        r'-(.+?)(?:/watch_as3)?\.swf$', player_url,
-                                        'flash player', fatal=False)
-                                    player_desc = 'flash player %s' % player_version
-                                else:
-                                    player_version = self._search_regex(
-                                        [r'html5player-([^/]+?)(?:/html5player(?:-new)?)?\.js',
-                                         r'(?:www|player(?:_ias)?)-([^/]+)(?:/[a-z]{2,3}_[A-Z]{2})?/base\.js'],
-                                        player_url,
-                                        'html5 player', fatal=False)
-                                    player_desc = 'html5 player %s' % player_version
-
+                                player_type, player_version = self._extract_player_info(player_url)
+                                player_desc = '%s player %s' % ('flash' if player_type == 'swf' else 'html5', player_version)
                              parts_sizes = self._signature_cache_id(encrypted_sig)
                              self.to_screen('{%s} signature length %s, %s' %
                                             (format_id, parts_sizes, player_desc))
                              parts_sizes = self._signature_cache_id(encrypted_sig)
                              self.to_screen('{%s} signature length %s, %s' %
                                             (format_id, parts_sizes, player_desc))
@@ -2171,7 +2231,12 @@ def _extract_filesize(media_url):
              video_uploader_id = mobj.group('uploader_id')
              video_uploader_url = mobj.group('uploader_url')
          else:
              video_uploader_id = mobj.group('uploader_id')
              video_uploader_url = mobj.group('uploader_url')
          else:
-            self._downloader.report_warning('unable to extract uploader nickname')
+            owner_profile_url = url_or_none(microformat.get('ownerProfileUrl'))
+            if owner_profile_url:
+                video_uploader_id = self._search_regex(
+                    r'(?:user|channel)/([^/]+)', owner_profile_url, 'uploader id',
+                    default=None)
+                video_uploader_url = owner_profile_url
  
          channel_id = (
              str_or_none(video_details.get('channelId'))
  
          channel_id = (
              str_or_none(video_details.get('channelId'))
@@ -2182,17 +2247,33 @@ def _extract_filesize(media_url):
                  video_webpage, 'channel id', default=None, group='id'))
          channel_url = 'http://www.youtube.com/channel/%s' % channel_id if channel_id else None
  
                  video_webpage, 'channel id', default=None, group='id'))
          channel_url = 'http://www.youtube.com/channel/%s' % channel_id if channel_id else None
  
-        # thumbnail image
-        # We try first to get a high quality image:
-        m_thumb = re.search(r'<span itemprop="thumbnail".*?href="(.*?)">',
-                            video_webpage, re.DOTALL)
-        if m_thumb is not None:
-            video_thumbnail = m_thumb.group(1)
-        elif 'thumbnail_url' not in video_info:
-            self._downloader.report_warning('unable to extract video thumbnail')
+        thumbnails = []
+        thumbnails_list = try_get(
+            video_details, lambda x: x['thumbnail']['thumbnails'], list) or []
+        for t in thumbnails_list:
+            if not isinstance(t, dict):
+                continue
+            thumbnail_url = url_or_none(t.get('url'))
+            if not thumbnail_url:
+                continue
+            thumbnails.append({
+                'url': thumbnail_url,
+                'width': int_or_none(t.get('width')),
+                'height': int_or_none(t.get('height')),
+            })
+
+        if not thumbnails:
              video_thumbnail = None
              video_thumbnail = None
-        else:   # don't panic if we can't find it
-            video_thumbnail = compat_urllib_parse_unquote_plus(video_info['thumbnail_url'][0])
+            # We try first to get a high quality image:
+            m_thumb = re.search(r'<span itemprop="thumbnail".*?href="(.*?)">',
+                                video_webpage, re.DOTALL)
+            if m_thumb is not None:
+                video_thumbnail = m_thumb.group(1)
+            thumbnail_url = try_get(video_info, lambda x: x['thumbnail_url'][0], compat_str)
+            if thumbnail_url:
+                video_thumbnail = compat_urllib_parse_unquote_plus(thumbnail_url)
+            if video_thumbnail:
+                thumbnails.append({'url': video_thumbnail})
  
          # upload date
          upload_date = self._html_search_meta(
  
          # upload date
          upload_date = self._html_search_meta(
@@ -2202,6 +2283,8 @@ def _extract_filesize(media_url):
                  [r'(?s)id="eow-date.*?>(.*?)</span>',
                   r'(?:id="watch-uploader-info".*?>.*?|["\']simpleText["\']\s*:\s*["\'])(?:Published|Uploaded|Streamed live|Started) on (.+?)[<"\']'],
                  video_webpage, 'upload date', default=None)
                  [r'(?s)id="eow-date.*?>(.*?)</span>',
                   r'(?:id="watch-uploader-info".*?>.*?|["\']simpleText["\']\s*:\s*["\'])(?:Published|Uploaded|Streamed live|Started) on (.+?)[<"\']'],
                  video_webpage, 'upload date', default=None)
+        if not upload_date:
+            upload_date = microformat.get('publishDate') or microformat.get('uploadDate')
          upload_date = unified_strdate(upload_date)
  
          video_license = self._html_search_regex(
          upload_date = unified_strdate(upload_date)
  
          video_license = self._html_search_regex(
@@ -2273,17 +2356,21 @@ def extract_meta(field):
          m_cat_container = self._search_regex(
              r'(?s)<h4[^>]*>\s*Category\s*</h4>\s*<ul[^>]*>(.*?)</ul>',
              video_webpage, 'categories', default=None)
          m_cat_container = self._search_regex(
              r'(?s)<h4[^>]*>\s*Category\s*</h4>\s*<ul[^>]*>(.*?)</ul>',
              video_webpage, 'categories', default=None)
+        category = None
          if m_cat_container:
              category = self._html_search_regex(
                  r'(?s)<a[^<]+>(.*?)</a>', m_cat_container, 'category',
                  default=None)
          if m_cat_container:
              category = self._html_search_regex(
                  r'(?s)<a[^<]+>(.*?)</a>', m_cat_container, 'category',
                  default=None)
-            video_categories = None if category is None else [category]
-        else:
-            video_categories = None
+        if not category:
+            category = try_get(
+                microformat, lambda x: x['category'], compat_str)
+        video_categories = None if category is None else [category]
  
          video_tags = [
              unescapeHTML(m.group('content'))
              for m in re.finditer(self._meta_regex('og:video:tag'), video_webpage)]
  
          video_tags = [
              unescapeHTML(m.group('content'))
              for m in re.finditer(self._meta_regex('og:video:tag'), video_webpage)]
+        if not video_tags:
+            video_tags = try_get(video_details, lambda x: x['keywords'], list)
  
          def _extract_count(count_name):
              return str_to_int(self._search_regex(
  
          def _extract_count(count_name):
              return str_to_int(self._search_regex(
@@ -2334,7 +2421,7 @@ def _extract_count(count_name):
                      errnote='Unable to download video annotations', fatal=False,
                      data=urlencode_postdata({xsrf_field_name: xsrf_token}))
  
                      errnote='Unable to download video annotations', fatal=False,
                      data=urlencode_postdata({xsrf_field_name: xsrf_token}))
  
-        chapters = self._extract_chapters(description_original, video_duration)
+        chapters = self._extract_chapters(video_webpage, description_original, video_id, video_duration)
  
          # Look for the DASH manifest
          if self._downloader.params.get('youtube_include_dash_manifest', True):
  
          # Look for the DASH manifest
          if self._downloader.params.get('youtube_include_dash_manifest', True):
@@ -2391,30 +2478,23 @@ def decrypt_sig(mobj):
                          f['stretched_ratio'] = ratio
  
          if not formats:
                          f['stretched_ratio'] = ratio
  
          if not formats:
-            token = extract_token(video_info)
-            if not token:
-                if 'reason' in video_info:
-                    if 'The uploader has not made this video available in your country.' in video_info['reason']:
-                        regions_allowed = self._html_search_meta(
-                            'regionsAllowed', video_webpage, default=None)
-                        countries = regions_allowed.split(',') if regions_allowed else None
-                        self.raise_geo_restricted(
-                            msg=video_info['reason'][0], countries=countries)
-                    reason = video_info['reason'][0]
-                    if 'Invalid parameters' in reason:
-                        unavailable_message = extract_unavailable_message()
-                        if unavailable_message:
-                            reason = unavailable_message
-                    raise ExtractorError(
-                        'YouTube said: %s' % reason,
-                        expected=True, video_id=video_id)
-                else:
-                    raise ExtractorError(
-                        '"token" parameter not in video info for unknown reason',
-                        video_id=video_id)
-
-        if not formats and (video_info.get('license_info') or try_get(player_response, lambda x: x['streamingData']['licenseInfos'])):
-            raise ExtractorError('This video is DRM protected.', expected=True)
+            if 'reason' in video_info:
+                if 'The uploader has not made this video available in your country.' in video_info['reason']:
+                    regions_allowed = self._html_search_meta(
+                        'regionsAllowed', video_webpage, default=None)
+                    countries = regions_allowed.split(',') if regions_allowed else None
+                    self.raise_geo_restricted(
+                        msg=video_info['reason'][0], countries=countries)
+                reason = video_info['reason'][0]
+                if 'Invalid parameters' in reason:
+                    unavailable_message = extract_unavailable_message()
+                    if unavailable_message:
+                        reason = unavailable_message
+                raise ExtractorError(
+                    'YouTube said: %s' % reason,
+                    expected=True, video_id=video_id)
+            if video_info.get('license_info') or try_get(player_response, lambda x: x['streamingData']['licenseInfos']):
+                raise ExtractorError('This video is DRM protected.', expected=True)
  
          self._sort_formats(formats)
  
  
          self._sort_formats(formats)
  
@@ -2432,7 +2512,7 @@ def decrypt_sig(mobj):
              'creator': video_creator or artist,
              'title': video_title,
              'alt_title': video_alt_title or track,
              'creator': video_creator or artist,
              'title': video_title,
              'alt_title': video_alt_title or track,
-            'thumbnail': video_thumbnail,
+            'thumbnails': thumbnails,
              'description': video_description,
              'categories': video_categories,
              'tags': video_tags,
              'description': video_description,
              'categories': video_categories,
              'tags': video_tags,
@@ -2494,20 +2574,23 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
      _VIDEO_RE = _VIDEO_RE_TPL % r'(?P<id>[0-9A-Za-z_-]{11})'
      IE_NAME = 'youtube:playlist'
      _TESTS = [{
      _VIDEO_RE = _VIDEO_RE_TPL % r'(?P<id>[0-9A-Za-z_-]{11})'
      IE_NAME = 'youtube:playlist'
      _TESTS = [{
-        'url': 'https://www.youtube.com/playlist?list=PLwiyx1dc3P2JR9N8gQaQN_BCvlSlap7re',
+        'url': 'https://www.youtube.com/playlist?list=PL4lCao7KL_QFVb7Iudeipvc2BCavECqzc',
          'info_dict': {
          'info_dict': {
-            'title': 'ytdl test PL',
-            'id': 'PLwiyx1dc3P2JR9N8gQaQN_BCvlSlap7re',
+            'uploader_id': 'UCmlqkdCBesrv2Lak1mF_MxA',
+            'uploader': 'Sergey M.',
+            'id': 'PL4lCao7KL_QFVb7Iudeipvc2BCavECqzc',
+            'title': 'youtube-dl public playlist',
          },
          },
-        'playlist_count': 3,
+        'playlist_count': 1,
      }, {
      }, {
-        'url': 'https://www.youtube.com/playlist?list=PLtPgu7CB4gbZDA7i_euNxn75ISqxwZPYx',
+        'url': 'https://www.youtube.com/playlist?list=PL4lCao7KL_QFodcLWhDpGCYnngnHtQ-Xf',
          'info_dict': {
          'info_dict': {
-            'id': 'PLtPgu7CB4gbZDA7i_euNxn75ISqxwZPYx',
-            'title': 'YDL_Empty_List',
+            'uploader_id': 'UCmlqkdCBesrv2Lak1mF_MxA',
+            'uploader': 'Sergey M.',
+            'id': 'PL4lCao7KL_QFodcLWhDpGCYnngnHtQ-Xf',
+            'title': 'youtube-dl empty playlist',
          },
          'playlist_count': 0,
          },
          'playlist_count': 0,
-        'skip': 'This playlist is private',
      }, {
          'note': 'Playlist with deleted videos (#651). As a bonus, the video #51 is also twice in this list.',
          'url': 'https://www.youtube.com/playlist?list=PLwP_SiAcdui0KVebT0mU9Apz359a4ubsC',
      }, {
          'note': 'Playlist with deleted videos (#651). As a bonus, the video #51 is also twice in this list.',
          'url': 'https://www.youtube.com/playlist?list=PLwP_SiAcdui0KVebT0mU9Apz359a4ubsC',
@@ -2517,7 +2600,7 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
              'uploader': 'Christiaan008',
              'uploader_id': 'ChRiStIaAn008',
          },
              'uploader': 'Christiaan008',
              'uploader_id': 'ChRiStIaAn008',
          },
-        'playlist_count': 95,
+        'playlist_count': 96,
      }, {
          'note': 'issue #673',
          'url': 'PLBB231211A4F62143',
      }, {
          'note': 'issue #673',
          'url': 'PLBB231211A4F62143',
@@ -2693,7 +2776,7 @@ def _extract_mix(self, playlist_id):
          ids = []
          last_id = playlist_id[-11:]
          for n in itertools.count(1):
          ids = []
          last_id = playlist_id[-11:]
          for n in itertools.count(1):
-            url = 'https://youtube.com/watch?v=%s&list=%s' % (last_id, playlist_id)
+            url = 'https://www.youtube.com/watch?v=%s&list=%s' % (last_id, playlist_id)
              webpage = self._download_webpage(
                  url, playlist_id, 'Downloading page {0} of Youtube mix'.format(n))
              new_ids = orderedSet(re.findall(
              webpage = self._download_webpage(
                  url, playlist_id, 'Downloading page {0} of Youtube mix'.format(n))
              new_ids = orderedSet(re.findall(
@@ -3033,7 +3116,7 @@ def _real_extract(self, url):
  
  class YoutubePlaylistsIE(YoutubePlaylistsBaseInfoExtractor):
      IE_DESC = 'YouTube.com user/channel playlists'
  
  class YoutubePlaylistsIE(YoutubePlaylistsBaseInfoExtractor):
      IE_DESC = 'YouTube.com user/channel playlists'
-    _VALID_URL = r'https?://(?:\w+\.)?youtube\.com/(?:user|channel)/(?P<id>[^/]+)/playlists'
+    _VALID_URL = r'https?://(?:\w+\.)?youtube\.com/(?:user|channel|c)/(?P<id>[^/]+)/playlists'
      IE_NAME = 'youtube:playlists'
  
      _TESTS = [{
      IE_NAME = 'youtube:playlists'
  
      _TESTS = [{
@@ -3059,6 +3142,9 @@ class YoutubePlaylistsIE(YoutubePlaylistsBaseInfoExtractor):
              'title': 'Chem Player',
          },
          'skip': 'Blocked',
              'title': 'Chem Player',
          },
          'skip': 'Blocked',
+    }, {
+        'url': 'https://www.youtube.com/c/ChristophLaimer/playlists',
+        'only_matching': True,
      }]
  
  
      }]
  
  
@@ -3203,9 +3289,10 @@ def _entries(self, page):
                  break
  
              more = self._download_json(
                  break
  
              more = self._download_json(
-                'https://youtube.com/%s' % mobj.group('more'), self._PLAYLIST_TITLE,
+                'https://www.youtube.com/%s' % mobj.group('more'), self._PLAYLIST_TITLE,
                  'Downloading page #%s' % page_num,
                  'Downloading page #%s' % page_num,
-                transform_source=uppercase_escape)
+                transform_source=uppercase_escape,
+                headers=self._YOUTUBE_CLIENT_HEADERS)
              content_html = more['content_html']
              more_widget_html = more['load_more_widget_html']
  
              content_html = more['content_html']
              more_widget_html = more['load_more_widget_html']
  
diff --git a/youtube_dl/extractor/zapiks.py b/youtube_dl/extractor/zapiks.py

index bacb82eeeb2a549edbb0cbf6d0a67e07f28b595b..f6496f5168cf057c9c415cc7461105462ad66370 100644 (file)
--- a/youtube_dl/extractor/zapiks.py
+++ b/youtube_dl/extractor/zapiks.py
@@ -29,7 +29,6 @@ class ZapiksIE(InfoExtractor):
                  'timestamp': 1359044972,
                  'upload_date': '20130124',
                  'view_count': int,
                  'timestamp': 1359044972,
                  'upload_date': '20130124',
                  'view_count': int,
-                'comment_count': int,
              },
          },
          {
              },
          },
          {
diff --git a/youtube_dl/extractor/zdf.py b/youtube_dl/extractor/zdf.py

index 145c123a42fee5e67c0fd8c2750ea13562632666..656864b2ed8a9982c1da934b01aff6bcb7126acf 100644 (file)
--- a/youtube_dl/extractor/zdf.py
+++ b/youtube_dl/extractor/zdf.py
@@ -244,14 +244,14 @@ class ZDFChannelIE(ZDFBaseIE):
              'id': 'das-aktuelle-sportstudio',
              'title': 'das aktuelle sportstudio | ZDF',
          },
              'id': 'das-aktuelle-sportstudio',
              'title': 'das aktuelle sportstudio | ZDF',
          },
-        'playlist_count': 21,
+        'playlist_mincount': 23,
      }, {
          'url': 'https://www.zdf.de/dokumentation/planet-e',
          'info_dict': {
              'id': 'planet-e',
              'title': 'planet e.',
          },
      }, {
          'url': 'https://www.zdf.de/dokumentation/planet-e',
          'info_dict': {
              'id': 'planet-e',
              'title': 'planet e.',
          },
-        'playlist_count': 4,
+        'playlist_mincount': 50,
      }, {
          'url': 'https://www.zdf.de/filme/taunuskrimi/',
          'only_matching': True,
      }, {
          'url': 'https://www.zdf.de/filme/taunuskrimi/',
          'only_matching': True,
diff --git a/youtube_dl/extractor/zype.py b/youtube_dl/extractor/zype.py

index 3b16e703b9b9ef54497b3172deac0d299efb7373..2e2e97a0c4454971dab30136518ca26e903b9470 100644 (file)
--- a/youtube_dl/extractor/zype.py
+++ b/youtube_dl/extractor/zype.py
@@ -4,10 +4,20 @@
  import re
  
  from .common import InfoExtractor
  import re
  
  from .common import InfoExtractor
+from ..compat import compat_HTTPError
+from ..utils import (
+    dict_get,
+    ExtractorError,
+    int_or_none,
+    js_to_json,
+    parse_iso8601,
+)
  
  
  class ZypeIE(InfoExtractor):
  
  
  class ZypeIE(InfoExtractor):
-    _VALID_URL = r'https?://player\.zype\.com/embed/(?P<id>[\da-fA-F]+)\.js\?.*?api_key=[^&]+'
+    _ID_RE = r'[\da-fA-F]+'
+    _COMMON_RE = r'//player\.zype\.com/embed/%s\.(?:js|json|html)\?.*?(?:access_token|(?:ap[ip]|player)_key)='
+    _VALID_URL = r'https?:%s[^&]+' % (_COMMON_RE % ('(?P<id>%s)' % _ID_RE))
      _TEST = {
          'url': 'https://player.zype.com/embed/5b400b834b32992a310622b9.js?api_key=jZ9GUhRmxcPvX7M3SlfejB6Hle9jyHTdk2jVxG7wOHPLODgncEKVdPYBhuz9iWXQ&autoplay=false&controls=true&da=false',
          'md5': 'eaee31d474c76a955bdaba02a505c595',
      _TEST = {
          'url': 'https://player.zype.com/embed/5b400b834b32992a310622b9.js?api_key=jZ9GUhRmxcPvX7M3SlfejB6Hle9jyHTdk2jVxG7wOHPLODgncEKVdPYBhuz9iWXQ&autoplay=false&controls=true&da=false',
          'md5': 'eaee31d474c76a955bdaba02a505c595',
@@ -16,6 +26,9 @@ class ZypeIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Smoky Barbecue Favorites',
              'thumbnail': r're:^https?://.*\.jpe?g',
              'ext': 'mp4',
              'title': 'Smoky Barbecue Favorites',
              'thumbnail': r're:^https?://.*\.jpe?g',
+            'description': 'md5:5ff01e76316bd8d46508af26dc86023b',
+            'timestamp': 1504915200,
+            'upload_date': '20170909',
          },
      }
  
          },
      }
  
@@ -24,34 +37,98 @@ def _extract_urls(webpage):
          return [
              mobj.group('url')
              for mobj in re.finditer(
          return [
              mobj.group('url')
              for mobj in re.finditer(
-                r'<script[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//player\.zype\.com/embed/[\da-fA-F]+\.js\?.*?api_key=.+?)\1',
+                r'<script[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?%s.+?)\1' % (ZypeIE._COMMON_RE % ZypeIE._ID_RE),
                  webpage)]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
  
                  webpage)]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
  
-        webpage = self._download_webpage(url, video_id)
+        try:
+            response = self._download_json(re.sub(
+                r'\.(?:js|html)\?', '.json?', url), video_id)['response']
+        except ExtractorError as e:
+            if isinstance(e.cause, compat_HTTPError) and e.cause.code in (400, 401, 403):
+                raise ExtractorError(self._parse_json(
+                    e.cause.read().decode(), video_id)['message'], expected=True)
+            raise
  
  
-        title = self._search_regex(
-            r'video_title\s*[:=]\s*(["\'])(?P<value>(?:(?!\1).)+)\1', webpage,
-            'title', group='value')
+        body = response['body']
+        video = response['video']
+        title = video['title']
  
  
-        m3u8_url = self._search_regex(
-            r'(["\'])(?P<url>(?:(?!\1).)+\.m3u8(?:(?!\1).)*)\1', webpage,
-            'm3u8 url', group='url')
-
-        formats = self._extract_m3u8_formats(
-            m3u8_url, video_id, 'mp4', entry_protocol='m3u8_native',
-            m3u8_id='hls')
+        if isinstance(body, dict):
+            formats = []
+            for output in body.get('outputs', []):
+                output_url = output.get('url')
+                if not output_url:
+                    continue
+                name = output.get('name')
+                if name == 'm3u8':
+                    formats = self._extract_m3u8_formats(
+                        output_url, video_id, 'mp4',
+                        'm3u8_native', m3u8_id='hls', fatal=False)
+                else:
+                    f = {
+                        'format_id': name,
+                        'tbr': int_or_none(output.get('bitrate')),
+                        'url': output_url,
+                    }
+                    if name in ('m4a', 'mp3'):
+                        f['vcodec'] = 'none'
+                    else:
+                        f.update({
+                            'height': int_or_none(output.get('height')),
+                            'width': int_or_none(output.get('width')),
+                        })
+                    formats.append(f)
+            text_tracks = body.get('subtitles') or []
+        else:
+            m3u8_url = self._search_regex(
+                r'(["\'])(?P<url>(?:(?!\1).)+\.m3u8(?:(?!\1).)*)\1',
+                body, 'm3u8 url', group='url')
+            formats = self._extract_m3u8_formats(
+                m3u8_url, video_id, 'mp4', 'm3u8_native', m3u8_id='hls')
+            text_tracks = self._search_regex(
+                r'textTracks\s*:\s*(\[[^]]+\])',
+                body, 'text tracks', default=None)
+            if text_tracks:
+                text_tracks = self._parse_json(
+                    text_tracks, video_id, js_to_json, False)
          self._sort_formats(formats)
  
          self._sort_formats(formats)
  
-        thumbnail = self._search_regex(
-            r'poster\s*[:=]\s*(["\'])(?P<url>(?:(?!\1).)+)\1', webpage, 'thumbnail',
-            default=False, group='url')
+        subtitles = {}
+        if text_tracks:
+            for text_track in text_tracks:
+                tt_url = dict_get(text_track, ('file', 'src'))
+                if not tt_url:
+                    continue
+                subtitles.setdefault(text_track.get('label') or 'English', []).append({
+                    'url': tt_url,
+                })
+
+        thumbnails = []
+        for thumbnail in video.get('thumbnails', []):
+            thumbnail_url = thumbnail.get('url')
+            if not thumbnail_url:
+                continue
+            thumbnails.append({
+                'url': thumbnail_url,
+                'width': int_or_none(thumbnail.get('width')),
+                'height': int_or_none(thumbnail.get('height')),
+            })
  
          return {
              'id': video_id,
  
          return {
              'id': video_id,
+            'display_id': video.get('friendly_title'),
              'title': title,
              'title': title,
-            'thumbnail': thumbnail,
+            'thumbnails': thumbnails,
+            'description': dict_get(video, ('description', 'ott_description', 'short_description')),
+            'timestamp': parse_iso8601(video.get('published_at')),
+            'duration': int_or_none(video.get('duration')),
+            'view_count': int_or_none(video.get('request_count')),
+            'average_rating': int_or_none(video.get('rating')),
+            'season_number': int_or_none(video.get('season')),
+            'episode_number': int_or_none(video.get('episode')),
              'formats': formats,
              'formats': formats,
+            'subtitles': subtitles,
          }
          }
diff --git a/youtube_dl/options.py b/youtube_dl/options.py

index 1ffabc62bedacb42aeb34f585d04ed7bc3ff8045..6d5ac62b3bab68de226060c6bcd3cd9299f02825 100644 (file)
--- a/youtube_dl/options.py
+++ b/youtube_dl/options.py
@@ -134,7 +134,7 @@ def _comma_separated_values_options_callback(option, opt_str, value, parser):
          action='help',
          help='Print this help text and exit')
      general.add_option(
          action='help',
          help='Print this help text and exit')
      general.add_option(
-        '-v', '--version',
+        '--version',
          action='version',
          help='Print program version and exit')
      general.add_option(
          action='version',
          help='Print program version and exit')
      general.add_option(
@@ -853,7 +853,7 @@ def _comma_separated_values_options_callback(option, opt_str, value, parser):
      postproc.add_option(
          '--exec',
          metavar='CMD', dest='exec_cmd',
      postproc.add_option(
          '--exec',
          metavar='CMD', dest='exec_cmd',
-        help='Execute a command on the file after downloading, similar to find\'s -exec syntax. Example: --exec \'adb push {} /sdcard/Music/ && rm {}\'')
+        help='Execute a command on the file after downloading and post-processing, similar to find\'s -exec syntax. Example: --exec \'adb push {} /sdcard/Music/ && rm {}\'')
      postproc.add_option(
          '--convert-subs', '--convert-subtitles',
          metavar='FORMAT', dest='convertsubtitles', default=None,
      postproc.add_option(
          '--convert-subs', '--convert-subtitles',
          metavar='FORMAT', dest='convertsubtitles', default=None,
diff --git a/youtube_dl/postprocessor/ffmpeg.py b/youtube_dl/postprocessor/ffmpeg.py

index fd3f921a8a11da2e8c31573889ea4d7f5a9fea25..5f7298345b3f550c9a8d8b7ba2b602d2f084ed36 100644 (file)
--- a/youtube_dl/postprocessor/ffmpeg.py
+++ b/youtube_dl/postprocessor/ffmpeg.py
@@ -447,6 +447,13 @@ def add(meta_list, info_list=None):
                          metadata[meta_f] = info[info_f]
                      break
  
                          metadata[meta_f] = info[info_f]
                      break
  
+        # See [1-4] for some info on media metadata/metadata supported
+        # by ffmpeg.
+        # 1. https://kdenlive.org/en/project/adding-meta-data-to-mp4-video/
+        # 2. https://wiki.multimedia.cx/index.php/FFmpeg_Metadata
+        # 3. https://kodi.wiki/view/Video_file_tagging
+        # 4. http://atomicparsley.sourceforge.net/mpeg-4files.html
+
          add('title', ('track', 'title'))
          add('date', 'upload_date')
          add(('description', 'comment'), 'description')
          add('title', ('track', 'title'))
          add('date', 'upload_date')
          add(('description', 'comment'), 'description')
@@ -457,6 +464,10 @@ def add(meta_list, info_list=None):
          add('album')
          add('album_artist')
          add('disc', 'disc_number')
          add('album')
          add('album_artist')
          add('disc', 'disc_number')
+        add('show', 'series')
+        add('season_number')
+        add('episode_id', ('episode', 'episode_id'))
+        add('episode_sort', 'episode_number')
  
          if not metadata:
              self._downloader.to_screen('[ffmpeg] There isn\'t any metadata to add')
  
          if not metadata:
              self._downloader.to_screen('[ffmpeg] There isn\'t any metadata to add')
diff --git a/youtube_dl/update.py b/youtube_dl/update.py

index 002ea7f3386215c61bcf3bc60419d0059abf2bc2..84c9646171e0b8f8d6a6397bff5339205cdadcd7 100644 (file)
--- a/youtube_dl/update.py
+++ b/youtube_dl/update.py
@@ -9,6 +9,7 @@
  import sys
  from zipimport import zipimporter
  
  import sys
  from zipimport import zipimporter
  
+from .compat import compat_realpath
  from .utils import encode_compat_str
  
  from .version import __version__
  from .utils import encode_compat_str
  
  from .version import __version__
@@ -84,7 +85,9 @@ def version_tuple(version_str):
      print_notes(to_screen, versions_info['versions'])
  
      # sys.executable is set to the full pathname of the exe-file for py2exe
      print_notes(to_screen, versions_info['versions'])
  
      # sys.executable is set to the full pathname of the exe-file for py2exe
-    filename = sys.executable if hasattr(sys, 'frozen') else sys.argv[0]
+    # though symlinks are not followed so that we need to do this manually
+    # with help of realpath
+    filename = compat_realpath(sys.executable if hasattr(sys, 'frozen') else sys.argv[0])
  
      if not os.access(filename, os.W_OK):
          to_screen('ERROR: no write permissions on %s' % filename)
  
      if not os.access(filename, os.W_OK):
          to_screen('ERROR: no write permissions on %s' % filename)
diff --git a/youtube_dl/utils.py b/youtube_dl/utils.py

index f6204692a81002cdfc44b02d183126e755283bd9..d1eca3760a66e89fbfc1673e70a2d526035776cf 100644 (file)
--- a/youtube_dl/utils.py
+++ b/youtube_dl/utils.py
@@ -7,6 +7,7 @@
  import binascii
  import calendar
  import codecs
  import binascii
  import calendar
  import codecs
+import collections
  import contextlib
  import ctypes
  import datetime
  import contextlib
  import ctypes
  import datetime
@@ -30,6 +31,7 @@
  import subprocess
  import sys
  import tempfile
  import subprocess
  import sys
  import tempfile
+import time
  import traceback
  import xml.etree.ElementTree
  import zlib
  import traceback
  import xml.etree.ElementTree
  import zlib
@@ -1835,6 +1837,12 @@ def write_json_file(obj, fn):
                  os.unlink(fn)
              except OSError:
                  pass
                  os.unlink(fn)
              except OSError:
                  pass
+        try:
+            mask = os.umask(0)
+            os.umask(mask)
+            os.chmod(tf.name, 0o666 & ~mask)
+        except OSError:
+            pass
          os.rename(tf.name, fn)
      except Exception:
          try:
          os.rename(tf.name, fn)
      except Exception:
          try:
@@ -2729,15 +2737,72 @@ def https_open(self, req):
  
  
  class YoutubeDLCookieJar(compat_cookiejar.MozillaCookieJar):
  
  
  class YoutubeDLCookieJar(compat_cookiejar.MozillaCookieJar):
+    """
+    See [1] for cookie file format.
+
+    1. https://curl.haxx.se/docs/http-cookies.html
+    """
      _HTTPONLY_PREFIX = '#HttpOnly_'
      _HTTPONLY_PREFIX = '#HttpOnly_'
+    _ENTRY_LEN = 7
+    _HEADER = '''# Netscape HTTP Cookie File
+# This file is generated by youtube-dl.  Do not edit.
+
+'''
+    _CookieFileEntry = collections.namedtuple(
+        'CookieFileEntry',
+        ('domain_name', 'include_subdomains', 'path', 'https_only', 'expires_at', 'name', 'value'))
  
      def save(self, filename=None, ignore_discard=False, ignore_expires=False):
  
      def save(self, filename=None, ignore_discard=False, ignore_expires=False):
+        """
+        Save cookies to a file.
+
+        Most of the code is taken from CPython 3.8 and slightly adapted
+        to support cookie files with UTF-8 in both python 2 and 3.
+        """
+        if filename is None:
+            if self.filename is not None:
+                filename = self.filename
+            else:
+                raise ValueError(compat_cookiejar.MISSING_FILENAME_TEXT)
+
          # Store session cookies with `expires` set to 0 instead of an empty
          # string
          for cookie in self:
              if cookie.expires is None:
                  cookie.expires = 0
          # Store session cookies with `expires` set to 0 instead of an empty
          # string
          for cookie in self:
              if cookie.expires is None:
                  cookie.expires = 0
-        compat_cookiejar.MozillaCookieJar.save(self, filename, ignore_discard, ignore_expires)
+
+        with io.open(filename, 'w', encoding='utf-8') as f:
+            f.write(self._HEADER)
+            now = time.time()
+            for cookie in self:
+                if not ignore_discard and cookie.discard:
+                    continue
+                if not ignore_expires and cookie.is_expired(now):
+                    continue
+                if cookie.secure:
+                    secure = 'TRUE'
+                else:
+                    secure = 'FALSE'
+                if cookie.domain.startswith('.'):
+                    initial_dot = 'TRUE'
+                else:
+                    initial_dot = 'FALSE'
+                if cookie.expires is not None:
+                    expires = compat_str(cookie.expires)
+                else:
+                    expires = ''
+                if cookie.value is None:
+                    # cookies.txt regards 'Set-Cookie: foo' as a cookie
+                    # with no name, whereas http.cookiejar regards it as a
+                    # cookie with no value.
+                    name = ''
+                    value = cookie.name
+                else:
+                    name = cookie.name
+                    value = cookie.value
+                f.write(
+                    '\t'.join([cookie.domain, initial_dot, cookie.path,
+                               secure, expires, name, value]) + '\n')
  
      def load(self, filename=None, ignore_discard=False, ignore_expires=False):
          """Load cookies from a file."""
  
      def load(self, filename=None, ignore_discard=False, ignore_expires=False):
          """Load cookies from a file."""
@@ -2747,12 +2812,30 @@ def load(self, filename=None, ignore_discard=False, ignore_expires=False):
              else:
                  raise ValueError(compat_cookiejar.MISSING_FILENAME_TEXT)
  
              else:
                  raise ValueError(compat_cookiejar.MISSING_FILENAME_TEXT)
  
+        def prepare_line(line):
+            if line.startswith(self._HTTPONLY_PREFIX):
+                line = line[len(self._HTTPONLY_PREFIX):]
+            # comments and empty lines are fine
+            if line.startswith('#') or not line.strip():
+                return line
+            cookie_list = line.split('\t')
+            if len(cookie_list) != self._ENTRY_LEN:
+                raise compat_cookiejar.LoadError('invalid length %d' % len(cookie_list))
+            cookie = self._CookieFileEntry(*cookie_list)
+            if cookie.expires_at and not cookie.expires_at.isdigit():
+                raise compat_cookiejar.LoadError('invalid expires at %s' % cookie.expires_at)
+            return line
+
          cf = io.StringIO()
          cf = io.StringIO()
-        with open(filename) as f:
+        with io.open(filename, encoding='utf-8') as f:
              for line in f:
              for line in f:
-                if line.startswith(self._HTTPONLY_PREFIX):
-                    line = line[len(self._HTTPONLY_PREFIX):]
-                cf.write(compat_str(line))
+                try:
+                    cf.write(prepare_line(line))
+                except compat_cookiejar.LoadError as e:
+                    write_string(
+                        'WARNING: skipping cookie file entry due to %s: %r\n'
+                        % (e, line), sys.stderr)
+                    continue
          cf.seek(0)
          self._really_load(cf, filename, ignore_discard, ignore_expires)
          # Session cookies are denoted by either `expires` field set to
          cf.seek(0)
          self._really_load(cf, filename, ignore_discard, ignore_expires)
          # Session cookies are denoted by either `expires` field set to
@@ -2795,6 +2878,15 @@ def http_response(self, request, response):
      https_response = http_response
  
  
      https_response = http_response
  
  
+class YoutubeDLRedirectHandler(compat_urllib_request.HTTPRedirectHandler):
+    if sys.version_info[0] < 3:
+        def redirect_request(self, req, fp, code, msg, headers, newurl):
+            # On python 2 urlh.geturl() may sometimes return redirect URL
+            # as byte string instead of unicode. This workaround allows
+            # to force it always return unicode.
+            return compat_urllib_request.HTTPRedirectHandler.redirect_request(self, req, fp, code, msg, headers, compat_str(newurl))
+
+
  def extract_timezone(date_str):
      m = re.search(
          r'^.{8,}?(?P<tz>Z$| ?(?P<sign>\+|-)(?P<hours>[0-9]{2}):?(?P<minutes>[0-9]{2})$)',
  def extract_timezone(date_str):
      m = re.search(
          r'^.{8,}?(?P<tz>Z$| ?(?P<sign>\+|-)(?P<hours>[0-9]{2}):?(?P<minutes>[0-9]{2})$)',
diff --git a/youtube_dl/version.py b/youtube_dl/version.py

index 1227abc0a74c7a7617650e1ac37a7eab7ebbe78c..17101fa47501d9bae1d6f223e35d7cb4dd3f8d5e 100644 (file)
--- a/youtube_dl/version.py
+++ b/youtube_dl/version.py
@@ -1,3 +1,3 @@
  from __future__ import unicode_literals
  
  from __future__ import unicode_literals
  
-__version__ = '2019.11.28'
+__version__ = '2020.07.28'
author	Aakash Gajjar <redacted>
	Tue, 25 Aug 2020 14:53:34 +0000 (20:23 +0530)
committer	GitHub <redacted>
	Tue, 25 Aug 2020 14:53:34 +0000 (20:23 +0530)
.github/ISSUE_TEMPLATE/1_broken_site.md		patch \| blob \| blame \| history
.github/ISSUE_TEMPLATE/2_site_support_request.md		patch \| blob \| blame \| history
.github/ISSUE_TEMPLATE/3_site_feature_request.md		patch \| blob \| blame \| history
.github/ISSUE_TEMPLATE/4_bug_report.md		patch \| blob \| blame \| history
.github/ISSUE_TEMPLATE/5_feature_request.md		patch \| blob \| blame \| history
.travis.yml		patch \| blob \| blame \| history
CONTRIBUTING.md		patch \| blob \| blame \| history
ChangeLog		patch \| blob \| blame \| history
README.md		patch \| blob \| blame \| history
devscripts/create-github-release.py		patch \| blob \| blame \| history
docs/supportedsites.md		patch \| blob \| blame \| history
test/test_YoutubeDL.py		patch \| blob \| blame \| history
test/test_YoutubeDLCookieJar.py		patch \| blob \| blame \| history
test/test_subtitles.py		patch \| blob \| blame \| history
test/test_youtube_chapters.py		patch \| blob \| blame \| history
test/test_youtube_signature.py		patch \| blob \| blame \| history
test/testdata/cookies/malformed_cookies.txt	[new file with mode: 0644]	patch \| blob
youtube_dl/YoutubeDL.py		patch \| blob \| blame \| history
youtube_dl/compat.py		patch \| blob \| blame \| history
youtube_dl/downloader/http.py		patch \| blob \| blame \| history
youtube_dl/extractor/abc.py		patch \| blob \| blame \| history
youtube_dl/extractor/americastestkitchen.py		patch \| blob \| blame \| history
youtube_dl/extractor/ard.py		patch \| blob \| blame \| history
youtube_dl/extractor/azmedien.py		patch \| blob \| blame \| history
youtube_dl/extractor/bbc.py		patch \| blob \| blame \| history
youtube_dl/extractor/bellmedia.py		patch \| blob \| blame \| history
youtube_dl/extractor/bilibili.py		patch \| blob \| blame \| history
youtube_dl/extractor/brightcove.py		patch \| blob \| blame \| history
youtube_dl/extractor/businessinsider.py		patch \| blob \| blame \| history
youtube_dl/extractor/canvas.py		patch \| blob \| blame \| history
youtube_dl/extractor/cbc.py		patch \| blob \| blame \| history
youtube_dl/extractor/cloudflarestream.py		patch \| blob \| blame \| history
youtube_dl/extractor/common.py		patch \| blob \| blame \| history
youtube_dl/extractor/crunchyroll.py		patch \| blob \| blame \| history
youtube_dl/extractor/dailymotion.py		patch \| blob \| blame \| history
youtube_dl/extractor/dctp.py		patch \| blob \| blame \| history
youtube_dl/extractor/discovery.py		patch \| blob \| blame \| history
youtube_dl/extractor/eporner.py		patch \| blob \| blame \| history
youtube_dl/extractor/extractors.py		patch \| blob \| blame \| history
youtube_dl/extractor/facebook.py		patch \| blob \| blame \| history
youtube_dl/extractor/franceculture.py		patch \| blob \| blame \| history
youtube_dl/extractor/generic.py		patch \| blob \| blame \| history
youtube_dl/extractor/giantbomb.py		patch \| blob \| blame \| history
youtube_dl/extractor/hellporno.py		patch \| blob \| blame \| history
youtube_dl/extractor/imdb.py		patch \| blob \| blame \| history
youtube_dl/extractor/indavideo.py		patch \| blob \| blame \| history
youtube_dl/extractor/iprima.py		patch \| blob \| blame \| history
youtube_dl/extractor/ivi.py		patch \| blob \| blame \| history
youtube_dl/extractor/jpopsukitv.py	[deleted file]	patch \| blob \| blame \| history
youtube_dl/extractor/jwplatform.py		patch \| blob \| blame \| history
youtube_dl/extractor/kaltura.py		patch \| blob \| blame \| history
youtube_dl/extractor/lecturio.py		patch \| blob \| blame \| history
youtube_dl/extractor/lego.py		patch \| blob \| blame \| history
youtube_dl/extractor/limelight.py		patch \| blob \| blame \| history
youtube_dl/extractor/linuxacademy.py		patch \| blob \| blame \| history
youtube_dl/extractor/mailru.py		patch \| blob \| blame \| history
youtube_dl/extractor/malltv.py		patch \| blob \| blame \| history
youtube_dl/extractor/mediaset.py		patch \| blob \| blame \| history
youtube_dl/extractor/mediasite.py		patch \| blob \| blame \| history
youtube_dl/extractor/mitele.py		patch \| blob \| blame \| history
youtube_dl/extractor/mofosex.py		patch \| blob \| blame \| history
youtube_dl/extractor/motherless.py		patch \| blob \| blame \| history
youtube_dl/extractor/naver.py		patch \| blob \| blame \| history
youtube_dl/extractor/nbc.py		patch \| blob \| blame \| history
youtube_dl/extractor/ndr.py		patch \| blob \| blame \| history
youtube_dl/extractor/nhk.py		patch \| blob \| blame \| history
youtube_dl/extractor/nova.py		patch \| blob \| blame \| history
youtube_dl/extractor/npr.py		patch \| blob \| blame \| history
youtube_dl/extractor/nrk.py		patch \| blob \| blame \| history
youtube_dl/extractor/nytimes.py		patch \| blob \| blame \| history
youtube_dl/extractor/orf.py		patch \| blob \| blame \| history
youtube_dl/extractor/pandatv.py	[deleted file]	patch \| blob \| blame \| history
youtube_dl/extractor/peertube.py		patch \| blob \| blame \| history
youtube_dl/extractor/periscope.py		patch \| blob \| blame \| history
youtube_dl/extractor/platzi.py		patch \| blob \| blame \| history
youtube_dl/extractor/pokemon.py		patch \| blob \| blame \| history
youtube_dl/extractor/popcorntimes.py	[new file with mode: 0644]	patch \| blob
youtube_dl/extractor/pornhd.py		patch \| blob \| blame \| history
youtube_dl/extractor/pornhub.py		patch \| blob \| blame \| history
youtube_dl/extractor/prosiebensat1.py		patch \| blob \| blame \| history
youtube_dl/extractor/puhutv.py		patch \| blob \| blame \| history
youtube_dl/extractor/redtube.py		patch \| blob \| blame \| history
youtube_dl/extractor/safari.py		patch \| blob \| blame \| history
youtube_dl/extractor/scrippsnetworks.py		patch \| blob \| blame \| history
youtube_dl/extractor/servus.py		patch \| blob \| blame \| history
youtube_dl/extractor/soundcloud.py		patch \| blob \| blame \| history
youtube_dl/extractor/spankbang.py		patch \| blob \| blame \| history
youtube_dl/extractor/spankwire.py		patch \| blob \| blame \| history
youtube_dl/extractor/spike.py		patch \| blob \| blame \| history
youtube_dl/extractor/sportdeutschland.py		patch \| blob \| blame \| history
youtube_dl/extractor/srmediathek.py		patch \| blob \| blame \| history
youtube_dl/extractor/stretchinternet.py		patch \| blob \| blame \| history
youtube_dl/extractor/svt.py		patch \| blob \| blame \| history
youtube_dl/extractor/teachable.py		patch \| blob \| blame \| history
youtube_dl/extractor/tele5.py		patch \| blob \| blame \| history
youtube_dl/extractor/telecinco.py		patch \| blob \| blame \| history
youtube_dl/extractor/telequebec.py		patch \| blob \| blame \| history
youtube_dl/extractor/tenplay.py		patch \| blob \| blame \| history
youtube_dl/extractor/tfo.py		patch \| blob \| blame \| history
youtube_dl/extractor/thisoldhouse.py		patch \| blob \| blame \| history
youtube_dl/extractor/toggle.py		patch \| blob \| blame \| history
youtube_dl/extractor/trunews.py		patch \| blob \| blame \| history
youtube_dl/extractor/tumblr.py		patch \| blob \| blame \| history
youtube_dl/extractor/tv2dk.py		patch \| blob \| blame \| history
youtube_dl/extractor/tv4.py		patch \| blob \| blame \| history
youtube_dl/extractor/tv5mondeplus.py		patch \| blob \| blame \| history
youtube_dl/extractor/tva.py		patch \| blob \| blame \| history
youtube_dl/extractor/tvplay.py		patch \| blob \| blame \| history
youtube_dl/extractor/twentyfourvideo.py		patch \| blob \| blame \| history
youtube_dl/extractor/twitch.py		patch \| blob \| blame \| history
youtube_dl/extractor/twitter.py		patch \| blob \| blame \| history
youtube_dl/extractor/uol.py		patch \| blob \| blame \| history
youtube_dl/extractor/vice.py		patch \| blob \| blame \| history
youtube_dl/extractor/viewlift.py		patch \| blob \| blame \| history
youtube_dl/extractor/vimeo.py		patch \| blob \| blame \| history
youtube_dl/extractor/vlive.py		patch \| blob \| blame \| history
youtube_dl/extractor/vodplatform.py		patch \| blob \| blame \| history
youtube_dl/extractor/voicerepublic.py		patch \| blob \| blame \| history
youtube_dl/extractor/wistia.py		patch \| blob \| blame \| history
youtube_dl/extractor/xhamster.py		patch \| blob \| blame \| history
youtube_dl/extractor/xtube.py		patch \| blob \| blame \| history
youtube_dl/extractor/yahoo.py		patch \| blob \| blame \| history
youtube_dl/extractor/youjizz.py		patch \| blob \| blame \| history
youtube_dl/extractor/youporn.py		patch \| blob \| blame \| history
youtube_dl/extractor/yourporn.py		patch \| blob \| blame \| history
youtube_dl/extractor/youtube.py		patch \| blob \| blame \| history
youtube_dl/extractor/zapiks.py		patch \| blob \| blame \| history
youtube_dl/extractor/zdf.py		patch \| blob \| blame \| history
youtube_dl/extractor/zype.py		patch \| blob \| blame \| history
youtube_dl/options.py		patch \| blob \| blame \| history
youtube_dl/postprocessor/ffmpeg.py		patch \| blob \| blame \| history
youtube_dl/update.py		patch \| blob \| blame \| history
youtube_dl/utils.py		patch \| blob \| blame \| history
youtube_dl/version.py		patch \| blob \| blame \| history