<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
-- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.11.28. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
+- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.09.06. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape.
- Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates.
-->
- [ ] I'm reporting a broken site support
-- [ ] I've verified that I'm running youtube-dl version **2019.11.28**
+- [ ] I've verified that I'm running youtube-dl version **2020.09.06**
- [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
- [ ] I've searched the bugtracker for similar issues including closed ones
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
- [debug] youtube-dl version 2019.11.28
+ [debug] youtube-dl version 2020.09.06
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
-- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.11.28. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
+- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.09.06. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that site you are requesting is not dedicated to copyright infringement, see https://yt-dl.org/copyright-infringement. youtube-dl does not support such sites. In order for site support request to be accepted all provided example URLs should not violate any copyrights.
- Search the bugtracker for similar site support requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
-->
- [ ] I'm reporting a new site support request
-- [ ] I've verified that I'm running youtube-dl version **2019.11.28**
+- [ ] I've verified that I'm running youtube-dl version **2020.09.06**
- [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that none of provided URLs violate any copyrights
- [ ] I've searched the bugtracker for similar site support requests including closed ones
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
-- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.11.28. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
+- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.09.06. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Search the bugtracker for similar site feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
- Finally, put x into all relevant boxes (like this [x])
-->
- [ ] I'm reporting a site feature request
-- [ ] I've verified that I'm running youtube-dl version **2019.11.28**
+- [ ] I've verified that I'm running youtube-dl version **2020.09.06**
- [ ] I've searched the bugtracker for similar site feature requests including closed ones
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
-- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.11.28. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
+- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.09.06. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape.
- Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates.
-->
- [ ] I'm reporting a broken site support issue
-- [ ] I've verified that I'm running youtube-dl version **2019.11.28**
+- [ ] I've verified that I'm running youtube-dl version **2020.09.06**
- [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
- [ ] I've searched the bugtracker for similar bug reports including closed ones
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
- [debug] youtube-dl version 2019.11.28
+ [debug] youtube-dl version 2020.09.06
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
-- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2019.11.28. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
+- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.09.06. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Search the bugtracker for similar feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
- Finally, put x into all relevant boxes (like this [x])
-->
- [ ] I'm reporting a feature request
-- [ ] I've verified that I'm running youtube-dl version **2019.11.28**
+- [ ] I've verified that I'm running youtube-dl version **2020.09.06**
- [ ] I've searched the bugtracker for similar feature requests including closed ones
--- /dev/null
+# This workflows will upload a Python Package using Twine when a release is created
+# For more information see: https://help.github.com/en/actions/language-and-framework-guides/using-python-with-github-actions#publishing-to-package-registries
+
+name: Upload Python Package
+
+on:
+ push:
+ branches:
+ - release
+
+jobs:
+ deploy:
+
+ runs-on: ubuntu-latest
+
+ steps:
+ - uses: actions/checkout@v2
+ - name: Set up Python
+ uses: actions/setup-python@v2
+ with:
+ python-version: '3.x'
+ - name: Install dependencies
+ run: |
+ python -m pip install --upgrade pip
+ pip install setuptools wheel twine
+ - name: Build and publish
+ env:
+ TWINE_USERNAME: ${{ secrets.PYPI_USERNAME }}
+ TWINE_PASSWORD: ${{ secrets.PYPI_PASSWORD }}
+ run: |
+ rm -rf dist/*
+ python setup.py sdist bdist_wheel
+ twine upload dist/*
MANIFEST
README.txt
youtube-dl.1
+youtube-dlc.1
youtube-dl.bash-completion
+youtube-dlc.bash-completion
youtube-dl.fish
+youtube-dlc.fish
youtube_dl/extractor/lazy_extractors.py
+youtube_dlc/extractor/lazy_extractors.py
youtube-dl
+youtube-dlc
youtube-dl.exe
+youtube-dlc.exe
youtube-dl.tar.gz
+youtube-dlc.tar.gz
+youtube-dlc.spec
.coverage
cover/
updates_key.pem
test/local_parameters.json
.tox
youtube-dl.zsh
+youtube-dlc.zsh
# IntelliJ related files
.idea
dist: trusty
env:
- YTDL_TEST_SET=core
- - YTDL_TEST_SET=download
-matrix:
+jobs:
include:
- python: 3.7
dist: xenial
env: YTDL_TEST_SET=core
- - python: 3.7
- dist: xenial
- env: YTDL_TEST_SET=download
- python: 3.8
dist: xenial
env: YTDL_TEST_SET=core
- - python: 3.8
- dist: xenial
- env: YTDL_TEST_SET=download
- python: 3.8-dev
dist: xenial
env: YTDL_TEST_SET=core
- - python: 3.8-dev
- dist: xenial
- env: YTDL_TEST_SET=download
- env: JYTHON=true; YTDL_TEST_SET=core
- - env: JYTHON=true; YTDL_TEST_SET=download
+ - name: flake8
+ python: 3.8
+ dist: xenial
+ install: pip install flake8
+ script: flake8 .
fast_finish: true
allow_failures:
- env: YTDL_TEST_SET=download
- env: JYTHON=true; YTDL_TEST_SET=core
- - env: JYTHON=true; YTDL_TEST_SET=download
before_install:
- if [ "$JYTHON" == "true" ]; then ./devscripts/install_jython.sh; export PATH="$HOME/jython/bin:$PATH"; fi
script: ./devscripts/run_tests.sh
5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/extractor/extractors.py).
6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc. Note that tests with `only_matching` key in test's dict are not counted in.
7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/ytdl-org/youtube-dl/blob/7f41a598b3fba1bcab2817de64a08941200aa3c8/youtube_dl/extractor/common.py#L94-L303). Add tests and code for as many as you want.
-8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](http://flake8.pycqa.org/en/latest/index.html#quickstart):
+8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](https://flake8.pycqa.org/en/latest/index.html#quickstart):
$ flake8 youtube_dl/extractor/yourextractor.py
-version <unreleased>
+version 2020.09.06
+
+Core
++ [utils] Recognize wav mimetype (#26463)
+
+Extractors
+* [nrktv:episode] Improve video id extraction (#25594, #26369, #26409)
+* [youtube] Fix age gate content detection (#26100, #26152, #26311, #26384)
+* [youtube:user] Extend URL regular expression (#26443)
+* [xhamster] Improve initials regular expression (#26526, #26353)
+* [svtplay] Fix video id extraction (#26425, #26428, #26438)
+* [twitch] Rework extractors (#12297, #20414, #20604, #21811, #21812, #22979,
+ #24263, #25010, #25553, #25606)
+ * Switch to GraphQL
+ + Add support for collections
+ + Add support for clips and collections playlists
+* [biqle] Improve video ext extraction
+* [xhamster] Fix extraction (#26157, #26254)
+* [xhamster] Extend URL regular expression (#25789, #25804, #25927))
+
+
+version 2020.07.28
+
+Extractors
+* [youtube] Fix sigfunc name extraction (#26134, #26135, #26136, #26137)
+* [youtube] Improve description extraction (#25937, #25980)
+* [wistia] Restrict embed regular expression (#25969)
+* [youtube] Prevent excess HTTP 301 (#25786)
++ [youtube:playlists] Extend URL regular expression (#25810)
++ [bellmedia] Add support for cp24.com clip URLs (#25764)
+* [brightcove] Improve embed detection (#25674)
+
+
+version 2020.06.16.1
+
+Extractors
+* [youtube] Force old layout (#25682, #25683, #25680, #25686)
+* [youtube] Fix categories and improve tags extraction
+
+
+version 2020.06.16
+
+Extractors
+* [youtube] Fix uploader id and uploader URL extraction
+* [youtube] Improve view count extraction
+* [youtube] Fix upload date extraction (#25677)
+* [youtube] Fix thumbnails extraction (#25676)
+* [youtube] Fix playlist and feed extraction (#25675)
++ [facebook] Add support for single-video ID links
++ [youtube] Extract chapters from JSON (#24819)
++ [kaltura] Add support for multiple embeds on a webpage (#25523)
+
+
+version 2020.06.06
+
+Extractors
+* [tele5] Bypass geo restriction
++ [jwplatform] Add support for bypass geo restriction
+* [tele5] Prefer jwplatform over nexx (#25533)
+* [twitch:stream] Expect 400 and 410 HTTP errors from API
+* [twitch:stream] Fix extraction (#25528)
+* [twitch] Fix thumbnails extraction (#25531)
++ [twitch] Pass v5 Accept HTTP header (#25531)
+* [brightcove] Fix subtitles extraction (#25540)
++ [malltv] Add support for sk.mall.tv (#25445)
+* [periscope] Fix untitled broadcasts (#25482)
+* [jwplatform] Improve embeds extraction (#25467)
+
+
+version 2020.05.29
+
+Core
+* [postprocessor/ffmpeg] Embed series metadata with --add-metadata
+* [utils] Fix file permissions in write_json_file (#12471, #25122)
+
+Extractors
+* [ard:beta] Extend URL regular expression (#25405)
++ [youtube] Add support for more invidious instances (#25417)
+* [giantbomb] Extend URL regular expression (#25222)
+* [ard] Improve URL regular expression (#25134, #25198)
+* [redtube] Improve formats extraction and extract m3u8 formats (#25311,
+ #25321)
+* [indavideo] Switch to HTTPS for API request (#25191)
+* [redtube] Improve title extraction (#25208)
+* [vimeo] Improve format extraction and sorting (#25285)
+* [soundcloud] Reduce API playlist page limit (#25274)
++ [youtube] Add support for yewtu.be (#25226)
+* [mailru] Fix extraction (#24530, #25239)
+* [bellator] Fix mgid extraction (#25195)
+
+
+version 2020.05.08
+
+Core
+* [downloader/http] Request last data block of exact remaining size
+* [downloader/http] Finish downloading once received data length matches
+ expected
+* [extractor/common] Use compat_cookiejar_Cookie for _set_cookie to always
+ ensure cookie name and value are bytestrings on python 2 (#23256, #24776)
++ [compat] Introduce compat_cookiejar_Cookie
+* [utils] Improve cookie files support
+ + Add support for UTF-8 in cookie files
+ * Skip malformed cookie file entries instead of crashing (invalid entry
+ length, invalid expires at)
+
+Extractors
+* [youtube] Improve signature cipher extraction (#25187, #25188)
+* [iprima] Improve extraction (#25138)
+* [uol] Fix extraction (#22007)
++ [orf] Add support for more radio stations (#24938, #24968)
+* [dailymotion] Fix typo
+- [puhutv] Remove no longer available HTTP formats (#25124)
+
+
+version 2020.05.03
+
+Core
++ [extractor/common] Extract multiple JSON-LD entries
+* [options] Clarify doc on --exec command (#19087, #24883)
+* [extractor/common] Skip malformed ISM manifest XMLs while extracting
+ ISM formats (#24667)
+
+Extractors
+* [crunchyroll] Fix and improve extraction (#25096, #25060)
+* [youtube] Improve player id extraction
+* [youtube] Use redirected video id if any (#25063)
+* [yahoo] Fix GYAO Player extraction and relax URL regular expression
+ (#24178, #24778)
+* [tvplay] Fix Viafree extraction (#15189, #24473, #24789)
+* [tenplay] Relax URL regular expression (#25001)
++ [prosiebensat1] Extract series metadata
+* [prosiebensat1] Improve extraction and remove 7tv.de support (#24948)
+- [prosiebensat1] Remove 7tv.de support (#24948)
+* [youtube] Fix DRM videos detection (#24736)
+* [thisoldhouse] Fix video id extraction (#24548, #24549)
++ [soundcloud] Extract AAC format (#19173, #24708)
+* [youtube] Skip broken multifeed videos (#24711)
+* [nova:embed] Fix extraction (#24700)
+* [motherless] Fix extraction (#24699)
+* [twitch:clips] Extend URL regular expression (#24290, #24642)
+* [tv4] Fix ISM formats extraction (#24667)
+* [tele5] Fix extraction (#24553)
++ [mofosex] Add support for generic embeds (#24633)
++ [youporn] Add support for generic embeds
++ [spankwire] Add support for generic embeds (#24633)
+* [spankwire] Fix extraction (#18924, #20648)
+
+
+version 2020.03.24
+
+Core
+- [utils] Revert support for cookie files with spaces used instead of tabs
+
+Extractors
+* [teachable] Update upskillcourses and gns3 domains
+* [generic] Look for teachable embeds before wistia
++ [teachable] Extract chapter metadata (#24421)
++ [bilibili] Add support for player.bilibili.com (#24402)
++ [bilibili] Add support for new URL schema with BV ids (#24439, #24442)
+* [limelight] Remove disabled API requests (#24255)
+* [soundcloud] Fix download URL extraction (#24394)
++ [cbc:watch] Add support for authentication (#19160)
+* [hellporno] Fix extraction (#24399)
+* [xtube] Fix formats extraction (#24348)
+* [ndr] Fix extraction (#24326)
+* [nhk] Update m3u8 URL and use native HLS downloader (#24329)
+- [nhk] Remove obsolete rtmp formats (#24329)
+* [nhk] Relax URL regular expression (#24329)
+- [vimeo] Revert fix showcase password protected video extraction (#24224)
+
+
+version 2020.03.08
+
+Core
++ [utils] Add support for cookie files with spaces used instead of tabs
+
+Extractors
++ [pornhub] Add support for pornhubpremium.com (#24288)
+- [youtube] Remove outdated code and unnecessary requests
+* [youtube] Improve extraction in 429 HTTP error conditions (#24283)
+* [nhk] Update API version (#24270)
+
+
+version 2020.03.06
+
+Extractors
+* [youtube] Fix age-gated videos support without login (#24248)
+* [vimeo] Fix showcase password protected video extraction (#24224)
+* [pornhub] Improve title extraction (#24184)
+* [peertube] Improve extraction (#23657)
++ [servus] Add support for new URL schema (#23475, #23583, #24142)
+* [vimeo] Fix subtitles URLs (#24209)
+
+
+version 2020.03.01
+
+Core
+* [YoutubeDL] Force redirect URL to unicode on python 2
+- [options] Remove duplicate short option -v for --version (#24162)
+
+Extractors
+* [xhamster] Fix extraction (#24205)
+* [franceculture] Fix extraction (#24204)
++ [telecinco] Add support for article opening videos
+* [telecinco] Fix extraction (#24195)
+* [xtube] Fix metadata extraction (#21073, #22455)
+* [youjizz] Fix extraction (#24181)
+- Remove no longer needed compat_str around geturl
+* [pornhd] Fix extraction (#24128)
++ [teachable] Add support for multiple videos per lecture (#24101)
++ [wistia] Add support for multiple generic embeds (#8347, 11385)
+* [imdb] Fix extraction (#23443)
+* [tv2dk:bornholm:play] Fix extraction (#24076)
+
+
+version 2020.02.16
+
+Core
+* [YoutubeDL] Fix playlist entry indexing with --playlist-items (#10591,
+ #10622)
+* [update] Fix updating via symlinks (#23991)
++ [compat] Introduce compat_realpath (#23991)
+
+Extractors
++ [npr] Add support for streams (#24042)
++ [24video] Add support for porn.24video.net (#23779, #23784)
+- [jpopsuki] Remove extractor (#23858)
+* [nova] Improve extraction (#23690)
+* [nova:embed] Improve (#23690)
+* [nova:embed] Fix extraction (#23672)
++ [abc:iview] Add support for 720p (#22907, #22921)
+* [nytimes] Improve format sorting (#24010)
++ [toggle] Add support for mewatch.sg (#23895, #23930)
+* [thisoldhouse] Fix extraction (#23951)
++ [popcorntimes] Add support for popcorntimes.tv (#23949)
+* [sportdeutschland] Update to new API
+* [twitch:stream] Lowercase channel id for stream request (#23917)
+* [tv5mondeplus] Fix extraction (#23907, #23911)
+* [tva] Relax URL regular expression (#23903)
+* [vimeo] Fix album extraction (#23864)
+* [viewlift] Improve extraction
+ * Fix extraction (#23851)
+ + Add support for authentication
+ + Add support for more domains
+* [svt] Fix series extraction (#22297)
+* [svt] Fix article extraction (#22897, #22919)
+* [soundcloud] Imporve private playlist/set tracks extraction (#3707)
+
+
+version 2020.01.24
+
+Extractors
+* [youtube] Fix sigfunc name extraction (#23819)
+* [stretchinternet] Fix extraction (#4319)
+* [voicerepublic] Fix extraction
+* [azmedien] Fix extraction (#23783)
+* [businessinsider] Fix jwplatform id extraction (#22929, #22954)
++ [24video] Add support for 24video.vip (#23753)
+* [ivi:compilation] Fix entries extraction (#23770)
+* [ard] Improve extraction (#23761)
+ * Simplify extraction
+ + Extract age limit and series
+ * Bypass geo-restriction
++ [nbc] Add support for nbc multi network URLs (#23049)
+* [americastestkitchen] Fix extraction
+* [zype] Improve extraction
+ + Extract subtitles (#21258)
+ + Support URLs with alternative keys/tokens (#21258)
+ + Extract more metadata
+* [orf:tvthek] Improve geo restricted videos detection (#23741)
+* [soundcloud] Restore previews extraction (#23739)
+
+
+version 2020.01.15
+
+Extractors
+* [yourporn] Fix extraction (#21645, #22255, #23459)
++ [canvas] Add support for new API endpoint (#17680, #18629)
+* [ndr:base:embed] Improve thumbnails extraction (#23731)
++ [vodplatform] Add support for embed.kwikmotion.com domain
++ [twitter] Add support for promo_video_website cards (#23711)
+* [orf:radio] Clean description and improve extraction
+* [orf:fm4] Fix extraction (#23599)
+* [safari] Fix kaltura session extraction (#23679, #23670)
+* [lego] Fix extraction and extract subtitle (#23687)
+* [cloudflarestream] Improve extraction
+ + Add support for bytehighway.net domain
+ + Add support for signed URLs
+ + Extract thumbnail
+* [naver] Improve extraction
+ * Improve geo-restriction handling
+ + Extract automatic captions
+ + Extract uploader metadata
+ + Extract VLive HLS formats
+ * Improve metadata extraction
+- [pandatv] Remove extractor (#23630)
+* [dctp] Fix format extraction (#23656)
++ [scrippsnetworks] Add support for www.discovery.com videos
+* [discovery] Fix anonymous token extraction (#23650)
+* [nrktv:seriebase] Fix extraction (#23625, #23537)
+* [wistia] Improve format extraction and extract subtitles (#22590)
+* [vice] Improve extraction (#23631)
+* [redtube] Detect private videos (#23518)
+
+
+version 2020.01.01
+
+Extractors
+* [brightcove] Invalidate policy key cache on failing requests
+* [pornhub] Improve locked videos detection (#22449, #22780)
++ [pornhub] Add support for m3u8 formats
+* [pornhub] Fix extraction (#22749, #23082)
+* [brightcove] Update policy key on failing requests
+* [spankbang] Improve removed video detection (#23423)
+* [spankbang] Fix extraction (#23307, #23423, #23444)
+* [soundcloud] Automatically update client id on failing requests
+* [prosiebensat1] Improve geo restriction handling (#23571)
+* [brightcove] Cache brightcove player policy keys
+* [teachable] Fail with error message if no video URL found
+* [teachable] Improve locked lessons detection (#23528)
++ [scrippsnetworks] Add support for Scripps Networks sites (#19857, #22981)
+* [mitele] Fix extraction (#21354, #23456)
+* [soundcloud] Update client id (#23516)
+* [mailru] Relax URL regular expressions (#23509)
+
+
+version 2019.12.25
Core
* [utils] Improve str_to_int
+ [downloader/hls] Add ability to override AES decryption key URL (#17521)
Extractors
+* [mediaset] Fix parse formats (#23508)
+ [tv2dk:bornholm:play] Add support for play.tv2bornholm.dk (#23291)
+ [slideslive] Add support for url and vimeo service names (#23414)
* [slideslive] Fix extraction (#23413)
include LICENSE
include AUTHORS
include ChangeLog
-include youtube-dl.bash-completion
-include youtube-dl.fish
-include youtube-dl.1
+include youtube-dlc.bash-completion
+include youtube-dlc.fish
+include youtube-dlc.1
recursive-include docs Makefile conf.py *.rst
recursive-include test *
-all: youtube-dl README.md CONTRIBUTING.md README.txt youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish supportedsites
+all: youtube-dlc README.md CONTRIBUTING.md README.txt youtube-dlc.1 youtube-dlc.bash-completion youtube-dlc.zsh youtube-dlc.fish supportedsites
clean:
- rm -rf youtube-dl.1.temp.md youtube-dl.1 youtube-dl.bash-completion README.txt MANIFEST build/ dist/ .coverage cover/ youtube-dl.tar.gz youtube-dl.zsh youtube-dl.fish youtube_dl/extractor/lazy_extractors.py *.dump *.part* *.ytdl *.info.json *.mp4 *.m4a *.flv *.mp3 *.avi *.mkv *.webm *.3gp *.wav *.ape *.swf *.jpg *.png CONTRIBUTING.md.tmp youtube-dl youtube-dl.exe
+ rm -rf youtube-dlc.1.temp.md youtube-dlc.1 youtube-dlc.bash-completion README.txt MANIFEST build/ dist/ .coverage cover/ youtube-dlc.tar.gz youtube-dlc.zsh youtube-dlc.fish youtube_dlc/extractor/lazy_extractors.py *.dump *.part* *.ytdl *.info.json *.mp4 *.m4a *.flv *.mp3 *.avi *.mkv *.webm *.3gp *.wav *.ape *.swf *.jpg *.png CONTRIBUTING.md.tmp youtube-dlc youtube-dlc.exe
find . -name "*.pyc" -delete
find . -name "*.class" -delete
# set markdown input format to "markdown-smart" for pandoc version 2 and to "markdown" for pandoc prior to version 2
MARKDOWN = $(shell if [ `pandoc -v | head -n1 | cut -d" " -f2 | head -c1` = "2" ]; then echo markdown-smart; else echo markdown; fi)
-install: youtube-dl youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish
+install: youtube-dlc youtube-dlc.1 youtube-dlc.bash-completion youtube-dlc.zsh youtube-dlc.fish
install -d $(DESTDIR)$(BINDIR)
- install -m 755 youtube-dl $(DESTDIR)$(BINDIR)
+ install -m 755 youtube-dlc $(DESTDIR)$(BINDIR)
install -d $(DESTDIR)$(MANDIR)/man1
- install -m 644 youtube-dl.1 $(DESTDIR)$(MANDIR)/man1
+ install -m 644 youtube-dlc.1 $(DESTDIR)$(MANDIR)/man1
install -d $(DESTDIR)$(SYSCONFDIR)/bash_completion.d
- install -m 644 youtube-dl.bash-completion $(DESTDIR)$(SYSCONFDIR)/bash_completion.d/youtube-dl
+ install -m 644 youtube-dlc.bash-completion $(DESTDIR)$(SYSCONFDIR)/bash_completion.d/youtube-dlc
install -d $(DESTDIR)$(SHAREDIR)/zsh/site-functions
- install -m 644 youtube-dl.zsh $(DESTDIR)$(SHAREDIR)/zsh/site-functions/_youtube-dl
+ install -m 644 youtube-dlc.zsh $(DESTDIR)$(SHAREDIR)/zsh/site-functions/_youtube-dlc
install -d $(DESTDIR)$(SYSCONFDIR)/fish/completions
- install -m 644 youtube-dl.fish $(DESTDIR)$(SYSCONFDIR)/fish/completions/youtube-dl.fish
+ install -m 644 youtube-dlc.fish $(DESTDIR)$(SYSCONFDIR)/fish/completions/youtube-dlc.fish
codetest:
flake8 .
test:
- #nosetests --with-coverage --cover-package=youtube_dl --cover-html --verbose --processes 4 test
+ #nosetests --with-coverage --cover-package=youtube_dlc --cover-html --verbose --processes 4 test
nosetests --verbose test
$(MAKE) codetest
--exclude test_youtube_lists.py \
--exclude test_youtube_signature.py
-tar: youtube-dl.tar.gz
+tar: youtube-dlc.tar.gz
.PHONY: all clean install test tar bash-completion pypi-files zsh-completion fish-completion ot offlinetest codetest supportedsites
-pypi-files: youtube-dl.bash-completion README.txt youtube-dl.1 youtube-dl.fish
+pypi-files: youtube-dlc.bash-completion README.txt youtube-dlc.1 youtube-dlc.fish
-youtube-dl: youtube_dl/*.py youtube_dl/*/*.py
+youtube-dlc: youtube_dlc/*.py youtube_dlc/*/*.py
mkdir -p zip
- for d in youtube_dl youtube_dl/downloader youtube_dl/extractor youtube_dl/postprocessor ; do \
+ for d in youtube_dlc youtube_dlc/downloader youtube_dlc/extractor youtube_dlc/postprocessor ; do \
mkdir -p zip/$$d ;\
cp -pPR $$d/*.py zip/$$d/ ;\
done
- touch -t 200001010101 zip/youtube_dl/*.py zip/youtube_dl/*/*.py
- mv zip/youtube_dl/__main__.py zip/
- cd zip ; zip -q ../youtube-dl youtube_dl/*.py youtube_dl/*/*.py __main__.py
+ touch -t 200001010101 zip/youtube_dlc/*.py zip/youtube_dlc/*/*.py
+ mv zip/youtube_dlc/__main__.py zip/
+ cd zip ; zip -q ../youtube-dlc youtube_dlc/*.py youtube_dlc/*/*.py __main__.py
rm -rf zip
- echo '#!$(PYTHON)' > youtube-dl
- cat youtube-dl.zip >> youtube-dl
- rm youtube-dl.zip
- chmod a+x youtube-dl
+ echo '#!$(PYTHON)' > youtube-dlc
+ cat youtube-dlc.zip >> youtube-dlc
+ rm youtube-dlc.zip
+ chmod a+x youtube-dlc
-README.md: youtube_dl/*.py youtube_dl/*/*.py
- COLUMNS=80 $(PYTHON) youtube_dl/__main__.py --help | $(PYTHON) devscripts/make_readme.py
+README.md: youtube_dlc/*.py youtube_dlc/*/*.py
+ COLUMNS=80 $(PYTHON) youtube_dlc/__main__.py --help | $(PYTHON) devscripts/make_readme.py
CONTRIBUTING.md: README.md
$(PYTHON) devscripts/make_contributing.py README.md CONTRIBUTING.md
-issuetemplates: devscripts/make_issue_template.py .github/ISSUE_TEMPLATE_tmpl/1_broken_site.md .github/ISSUE_TEMPLATE_tmpl/2_site_support_request.md .github/ISSUE_TEMPLATE_tmpl/3_site_feature_request.md .github/ISSUE_TEMPLATE_tmpl/4_bug_report.md .github/ISSUE_TEMPLATE_tmpl/5_feature_request.md youtube_dl/version.py
+issuetemplates: devscripts/make_issue_template.py .github/ISSUE_TEMPLATE_tmpl/1_broken_site.md .github/ISSUE_TEMPLATE_tmpl/2_site_support_request.md .github/ISSUE_TEMPLATE_tmpl/3_site_feature_request.md .github/ISSUE_TEMPLATE_tmpl/4_bug_report.md .github/ISSUE_TEMPLATE_tmpl/5_feature_request.md youtube_dlc/version.py
$(PYTHON) devscripts/make_issue_template.py .github/ISSUE_TEMPLATE_tmpl/1_broken_site.md .github/ISSUE_TEMPLATE/1_broken_site.md
$(PYTHON) devscripts/make_issue_template.py .github/ISSUE_TEMPLATE_tmpl/2_site_support_request.md .github/ISSUE_TEMPLATE/2_site_support_request.md
$(PYTHON) devscripts/make_issue_template.py .github/ISSUE_TEMPLATE_tmpl/3_site_feature_request.md .github/ISSUE_TEMPLATE/3_site_feature_request.md
README.txt: README.md
pandoc -f $(MARKDOWN) -t plain README.md -o README.txt
-youtube-dl.1: README.md
- $(PYTHON) devscripts/prepare_manpage.py youtube-dl.1.temp.md
- pandoc -s -f $(MARKDOWN) -t man youtube-dl.1.temp.md -o youtube-dl.1
- rm -f youtube-dl.1.temp.md
+youtube-dlc.1: README.md
+ $(PYTHON) devscripts/prepare_manpage.py youtube-dlc.1.temp.md
+ pandoc -s -f $(MARKDOWN) -t man youtube-dlc.1.temp.md -o youtube-dlc.1
+ rm -f youtube-dlc.1.temp.md
-youtube-dl.bash-completion: youtube_dl/*.py youtube_dl/*/*.py devscripts/bash-completion.in
+youtube-dlc.bash-completion: youtube_dlc/*.py youtube_dlc/*/*.py devscripts/bash-completion.in
$(PYTHON) devscripts/bash-completion.py
-bash-completion: youtube-dl.bash-completion
+bash-completion: youtube-dlc.bash-completion
-youtube-dl.zsh: youtube_dl/*.py youtube_dl/*/*.py devscripts/zsh-completion.in
+youtube-dlc.zsh: youtube_dlc/*.py youtube_dlc/*/*.py devscripts/zsh-completion.in
$(PYTHON) devscripts/zsh-completion.py
-zsh-completion: youtube-dl.zsh
+zsh-completion: youtube-dlc.zsh
-youtube-dl.fish: youtube_dl/*.py youtube_dl/*/*.py devscripts/fish-completion.in
+youtube-dlc.fish: youtube_dlc/*.py youtube_dlc/*/*.py devscripts/fish-completion.in
$(PYTHON) devscripts/fish-completion.py
-fish-completion: youtube-dl.fish
+fish-completion: youtube-dlc.fish
-lazy-extractors: youtube_dl/extractor/lazy_extractors.py
+lazy-extractors: youtube_dlc/extractor/lazy_extractors.py
-_EXTRACTOR_FILES = $(shell find youtube_dl/extractor -iname '*.py' -and -not -iname 'lazy_extractors.py')
-youtube_dl/extractor/lazy_extractors.py: devscripts/make_lazy_extractors.py devscripts/lazy_load_template.py $(_EXTRACTOR_FILES)
+_EXTRACTOR_FILES = $(shell find youtube_dlc/extractor -iname '*.py' -and -not -iname 'lazy_extractors.py')
+youtube_dlc/extractor/lazy_extractors.py: devscripts/make_lazy_extractors.py devscripts/lazy_load_template.py $(_EXTRACTOR_FILES)
$(PYTHON) devscripts/make_lazy_extractors.py $@
-youtube-dl.tar.gz: youtube-dl README.md README.txt youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish ChangeLog AUTHORS
- @tar -czf youtube-dl.tar.gz --transform "s|^|youtube-dl/|" --owner 0 --group 0 \
+youtube-dlc.tar.gz: youtube-dlc README.md README.txt youtube-dlc.1 youtube-dlc.bash-completion youtube-dlc.zsh youtube-dlc.fish ChangeLog AUTHORS
+ @tar -czf youtube-dlc.tar.gz --transform "s|^|youtube-dlc/|" --owner 0 --group 0 \
--exclude '*.DS_Store' \
--exclude '*.kate-swp' \
--exclude '*.pyc' \
--exclude '.git' \
--exclude 'docs/_build' \
-- \
- bin devscripts test youtube_dl docs \
+ bin devscripts test youtube_dlc docs \
ChangeLog AUTHORS LICENSE README.md README.txt \
- Makefile MANIFEST.in youtube-dl.1 youtube-dl.bash-completion \
- youtube-dl.zsh youtube-dl.fish setup.py setup.cfg \
- youtube-dl
+ Makefile MANIFEST.in youtube-dlc.1 youtube-dlc.bash-completion \
+ youtube-dlc.zsh youtube-dlc.fish setup.py setup.cfg \
+ youtube-dlc
-[![Build Status](https://travis-ci.org/ytdl-org/youtube-dl.svg?branch=master)](https://travis-ci.org/ytdl-org/youtube-dl)
+[![PyPi](https://img.shields.io/pypi/v/youtube-dlc.svg)](https://pypi.org/project/youtube-dlc)
+[![Build Status](https://travis-ci.com/blackjack4494/youtube-dlc.svg?branch=master)](https://travis-ci.com/blackjack4494/youtube-dlc)
+[![Downloads](https://pepy.tech/badge/youtube-dlc)](https://pepy.tech/project/youtube-dlc)
-youtube-dl - download videos from youtube.com or other video platforms
+[![Gitter chat](https://badges.gitter.im/youtube-dlc/gitter.png)](https://gitter.im/youtube-dlc)
+[![License: Unlicense](https://img.shields.io/badge/license-Unlicense-blue.svg)](https://github.com/blackjack4494/youtube-dlc/blob/master/LICENSE)
+
+youtube-dlc - download videos from youtube.com or other video platforms
- [INSTALLATION](#installation)
- [DESCRIPTION](#description)
- [OPTIONS](#options)
-- [CONFIGURATION](#configuration)
-- [OUTPUT TEMPLATE](#output-template)
-- [FORMAT SELECTION](#format-selection)
-- [VIDEO SELECTION](#video-selection)
-- [FAQ](#faq)
-- [DEVELOPER INSTRUCTIONS](#developer-instructions)
-- [EMBEDDING YOUTUBE-DL](#embedding-youtube-dl)
-- [BUGS](#bugs)
- [COPYRIGHT](#copyright)
# INSTALLATION
-To install it right away for all UNIX users (Linux, macOS, etc.), type:
+**All Platforms**
+Preferred way using pip:
+You may want to use `python3` instead of `python`
- sudo curl -L https://yt-dl.org/downloads/latest/youtube-dl -o /usr/local/bin/youtube-dl
- sudo chmod a+rx /usr/local/bin/youtube-dl
+ python -m pip install --upgrade youtube-dlc
-If you do not have curl, you can alternatively use a recent wget:
+**UNIX** (Linux, macOS, etc.)
+Using wget:
- sudo wget https://yt-dl.org/downloads/latest/youtube-dl -O /usr/local/bin/youtube-dl
- sudo chmod a+rx /usr/local/bin/youtube-dl
+ sudo wget https://github.com/blackjack4494/youtube-dlc/releases/latest/download/youtube-dlc -O /usr/local/bin/youtube-dlc
+ sudo chmod a+rx /usr/local/bin/youtube-dlc
-Windows users can [download an .exe file](https://yt-dl.org/latest/youtube-dl.exe) and place it in any location on their [PATH](https://en.wikipedia.org/wiki/PATH_%28variable%29) except for `%SYSTEMROOT%\System32` (e.g. **do not** put in `C:\Windows\System32`).
+Using curl:
-You can also use pip:
+ sudo curl -L https://github.com/blackjack4494/youtube-dlc/releases/latest/download/youtube-dlc -o /usr/local/bin/youtube-dlc
+ sudo chmod a+rx /usr/local/bin/youtube-dlc
- sudo -H pip install --upgrade youtube-dl
-
-This command will update youtube-dl if you have already installed it. See the [pypi page](https://pypi.python.org/pypi/youtube_dl) for more information.
-macOS users can install youtube-dl with [Homebrew](https://brew.sh/):
+**Windows** users can download [youtube-dlc.exe](https://github.com/blackjack4494/youtube-dlc/releases/latest/download/youtube-dlc.exe) (**do not** put in `C:\Windows\System32`!).
+
+**Compile**
+To build the Windows executable yourself
- brew install youtube-dl
+ python -m pip install --upgrade pyinstaller
+ pyinstaller.exe youtube_dlc\__main__.py --onefile --name youtube-dlc
+
+Or simply execute the `make_win.bat` if pyinstaller is installed.
+There will be a `youtube-dlc.exe` in `/dist`
-Or with [MacPorts](https://www.macports.org/):
+For Unix:
+You will need the required build tools
+python, make (GNU), pandoc, zip, nosetests
+Then simply type this
- sudo port install youtube-dl
+ make
-Alternatively, refer to the [developer instructions](#developer-instructions) for how to check out and work with the git repository. For further options, including PGP signatures, see the [youtube-dl Download Page](https://ytdl-org.github.io/youtube-dl/download.html).
# DESCRIPTION
-**youtube-dl** is a command-line program to download videos from YouTube.com and a few more sites. It requires the Python interpreter, version 2.6, 2.7, or 3.2+, and it is not platform specific. It should work on your Unix box, on Windows or on macOS. It is released to the public domain, which means you can modify it, redistribute it or use it however you like.
+**youtube-dlc** is a command-line program to download videos from YouTube.com and a few more sites. It requires the Python interpreter, version 2.6, 2.7, or 3.2+, and it is not platform specific. It should work on your Unix box, on Windows or on macOS. It is released to the public domain, which means you can modify it, redistribute it or use it however you like.
- youtube-dl [OPTIONS] URL [URL...]
+ youtube-dlc [OPTIONS] URL [URL...]
# OPTIONS
-h, --help Print this help text and exit
extractor
--default-search PREFIX Use this prefix for unqualified URLs. For
example "gvsearch2:" downloads two videos
- from google videos for youtube-dl "large
+ from google videos for youtube-dlc "large
apple". Use the value "auto" to let
- youtube-dl guess ("auto_warning" to emit a
+ youtube-dlc guess ("auto_warning" to emit a
warning when guessing). "error" just throws
an error. The default value "fixup_error"
repairs broken URLs, but emits an error if
this is not possible instead of searching.
--ignore-config Do not read configuration files. When given
in the global configuration file
- /etc/youtube-dl.conf: Do not read the user
+ /etc/youtube-dlc.conf: Do not read the user
configuration in ~/.config/youtube-
- dl/config (%APPDATA%/youtube-dl/config.txt
- on Windows)
+ dlc/config (%APPDATA%/youtube-
+ dlc/config.txt on Windows)
--config-location PATH Location of the configuration file; either
the path to the config or its containing
directory.
filenames
-w, --no-overwrites Do not overwrite files
-c, --continue Force resume of partially downloaded files.
- By default, youtube-dl will resume
+ By default, youtube-dlc will resume
downloads if possible.
--no-continue Do not resume partially downloaded files
(restart from beginning)
option)
--cookies FILE File to read cookies from and dump cookie
jar in
- --cache-dir DIR Location in the filesystem where youtube-dl
- can store some downloaded information
+ --cache-dir DIR Location in the filesystem where youtube-
+ dlc can store some downloaded information
permanently. By default
- $XDG_CACHE_HOME/youtube-dl or
- ~/.cache/youtube-dl . At the moment, only
+ $XDG_CACHE_HOME/youtube-dlc or
+ ~/.cache/youtube-dlc . At the moment, only
YouTube player files (for videos with
obfuscated signatures) are cached, but that
may change.
files in the current directory to debug
problems
--print-traffic Display sent and read HTTP traffic
- -C, --call-home Contact the youtube-dl server for debugging
- --no-call-home Do NOT contact the youtube-dl server for
+ -C, --call-home Contact the youtube-dlc server for
+ debugging
+ --no-call-home Do NOT contact the youtube-dlc server for
debugging
## Workarounds:
## Authentication Options:
-u, --username USERNAME Login with this account ID
-p, --password PASSWORD Account password. If this option is left
- out, youtube-dl will ask interactively.
+ out, youtube-dlc will ask interactively.
-2, --twofactor TWOFACTOR Two-factor authentication code
-n, --netrc Use .netrc authentication data
--video-password PASSWORD Video password (vimeo, smotri, youku)
a list of available MSOs
--ap-username USERNAME Multiple-system operator account login
--ap-password PASSWORD Multiple-system operator account password.
- If this option is left out, youtube-dl will
- ask interactively.
+ If this option is left out, youtube-dlc
+ will ask interactively.
--ap-list-mso List all supported multiple-system
operators
either the path to the binary or its
containing directory.
--exec CMD Execute a command on the file after
- downloading, similar to find's -exec
- syntax. Example: --exec 'adb push {}
- /sdcard/Music/ && rm {}'
+ downloading and post-processing, similar to
+ find's -exec syntax. Example: --exec 'adb
+ push {} /sdcard/Music/ && rm {}'
--convert-subs FORMAT Convert the subtitles to other format
(currently supported: srt|ass|vtt|lrc)
-# CONFIGURATION
-
-You can configure youtube-dl by placing any supported command line option to a configuration file. On Linux and macOS, the system wide configuration file is located at `/etc/youtube-dl.conf` and the user wide configuration file at `~/.config/youtube-dl/config`. On Windows, the user wide configuration file locations are `%APPDATA%\youtube-dl\config.txt` or `C:\Users\<user name>\youtube-dl.conf`. Note that by default configuration file may not exist so you may need to create it yourself.
-
-For example, with the following configuration file youtube-dl will always extract the audio, not copy the mtime, use a proxy and save all videos under `Movies` directory in your home directory:
-```
-# Lines starting with # are comments
-
-# Always extract audio
--x
-
-# Do not copy the mtime
---no-mtime
-
-# Use this proxy
---proxy 127.0.0.1:3128
-
-# Save all videos under Movies directory in your home directory
--o ~/Movies/%(title)s.%(ext)s
-```
-
-Note that options in configuration file are just the same options aka switches used in regular command line calls thus there **must be no whitespace** after `-` or `--`, e.g. `-o` or `--proxy` but not `- o` or `-- proxy`.
-
-You can use `--ignore-config` if you want to disable the configuration file for a particular youtube-dl run.
-
-You can also use `--config-location` if you want to use custom configuration file for a particular youtube-dl run.
-
-### Authentication with `.netrc` file
-
-You may also want to configure automatic credentials storage for extractors that support authentication (by providing login and password with `--username` and `--password`) in order not to pass credentials as command line arguments on every youtube-dl execution and prevent tracking plain text passwords in the shell command history. You can achieve this using a [`.netrc` file](https://stackoverflow.com/tags/.netrc/info) on a per extractor basis. For that you will need to create a `.netrc` file in your `$HOME` and restrict permissions to read/write by only you:
-```
-touch $HOME/.netrc
-chmod a-rwx,u+rw $HOME/.netrc
-```
-After that you can add credentials for an extractor in the following format, where *extractor* is the name of the extractor in lowercase:
-```
-machine <extractor> login <login> password <password>
-```
-For example:
-```
-machine youtube login myaccount@gmail.com password my_youtube_password
-machine twitch login my_twitch_account_name password my_twitch_password
-```
-To activate authentication with the `.netrc` file you should pass `--netrc` to youtube-dl or place it in the [configuration file](#configuration).
-
-On Windows you may also need to setup the `%HOME%` environment variable manually. For example:
-```
-set HOME=%USERPROFILE%
-```
-
-# OUTPUT TEMPLATE
-
-The `-o` option allows users to indicate a template for the output file names.
-
-**tl;dr:** [navigate me to examples](#output-template-examples).
-
-The basic usage is not to set any template arguments when downloading a single file, like in `youtube-dl -o funny_video.flv "https://some/video"`. However, it may contain special sequences that will be replaced when downloading each video. The special sequences may be formatted according to [python string formatting operations](https://docs.python.org/2/library/stdtypes.html#string-formatting). For example, `%(NAME)s` or `%(NAME)05d`. To clarify, that is a percent symbol followed by a name in parentheses, followed by formatting operations. Allowed names along with sequence type are:
-
- - `id` (string): Video identifier
- - `title` (string): Video title
- - `url` (string): Video URL
- - `ext` (string): Video filename extension
- - `alt_title` (string): A secondary title of the video
- - `display_id` (string): An alternative identifier for the video
- - `uploader` (string): Full name of the video uploader
- - `license` (string): License name the video is licensed under
- - `creator` (string): The creator of the video
- - `release_date` (string): The date (YYYYMMDD) when the video was released
- - `timestamp` (numeric): UNIX timestamp of the moment the video became available
- - `upload_date` (string): Video upload date (YYYYMMDD)
- - `uploader_id` (string): Nickname or id of the video uploader
- - `channel` (string): Full name of the channel the video is uploaded on
- - `channel_id` (string): Id of the channel
- - `location` (string): Physical location where the video was filmed
- - `duration` (numeric): Length of the video in seconds
- - `view_count` (numeric): How many users have watched the video on the platform
- - `like_count` (numeric): Number of positive ratings of the video
- - `dislike_count` (numeric): Number of negative ratings of the video
- - `repost_count` (numeric): Number of reposts of the video
- - `average_rating` (numeric): Average rating give by users, the scale used depends on the webpage
- - `comment_count` (numeric): Number of comments on the video
- - `age_limit` (numeric): Age restriction for the video (years)
- - `is_live` (boolean): Whether this video is a live stream or a fixed-length video
- - `start_time` (numeric): Time in seconds where the reproduction should start, as specified in the URL
- - `end_time` (numeric): Time in seconds where the reproduction should end, as specified in the URL
- - `format` (string): A human-readable description of the format
- - `format_id` (string): Format code specified by `--format`
- - `format_note` (string): Additional info about the format
- - `width` (numeric): Width of the video
- - `height` (numeric): Height of the video
- - `resolution` (string): Textual description of width and height
- - `tbr` (numeric): Average bitrate of audio and video in KBit/s
- - `abr` (numeric): Average audio bitrate in KBit/s
- - `acodec` (string): Name of the audio codec in use
- - `asr` (numeric): Audio sampling rate in Hertz
- - `vbr` (numeric): Average video bitrate in KBit/s
- - `fps` (numeric): Frame rate
- - `vcodec` (string): Name of the video codec in use
- - `container` (string): Name of the container format
- - `filesize` (numeric): The number of bytes, if known in advance
- - `filesize_approx` (numeric): An estimate for the number of bytes
- - `protocol` (string): The protocol that will be used for the actual download
- - `extractor` (string): Name of the extractor
- - `extractor_key` (string): Key name of the extractor
- - `epoch` (numeric): Unix epoch when creating the file
- - `autonumber` (numeric): Five-digit number that will be increased with each download, starting at zero
- - `playlist` (string): Name or id of the playlist that contains the video
- - `playlist_index` (numeric): Index of the video in the playlist padded with leading zeros according to the total length of the playlist
- - `playlist_id` (string): Playlist identifier
- - `playlist_title` (string): Playlist title
- - `playlist_uploader` (string): Full name of the playlist uploader
- - `playlist_uploader_id` (string): Nickname or id of the playlist uploader
-
-Available for the video that belongs to some logical chapter or section:
-
- - `chapter` (string): Name or title of the chapter the video belongs to
- - `chapter_number` (numeric): Number of the chapter the video belongs to
- - `chapter_id` (string): Id of the chapter the video belongs to
-
-Available for the video that is an episode of some series or programme:
-
- - `series` (string): Title of the series or programme the video episode belongs to
- - `season` (string): Title of the season the video episode belongs to
- - `season_number` (numeric): Number of the season the video episode belongs to
- - `season_id` (string): Id of the season the video episode belongs to
- - `episode` (string): Title of the video episode
- - `episode_number` (numeric): Number of the video episode within a season
- - `episode_id` (string): Id of the video episode
-
-Available for the media that is a track or a part of a music album:
-
- - `track` (string): Title of the track
- - `track_number` (numeric): Number of the track within an album or a disc
- - `track_id` (string): Id of the track
- - `artist` (string): Artist(s) of the track
- - `genre` (string): Genre(s) of the track
- - `album` (string): Title of the album the track belongs to
- - `album_type` (string): Type of the album
- - `album_artist` (string): List of all artists appeared on the album
- - `disc_number` (numeric): Number of the disc or other physical medium the track belongs to
- - `release_year` (numeric): Year (YYYY) when the album was released
-
-Each aforementioned sequence when referenced in an output template will be replaced by the actual value corresponding to the sequence name. Note that some of the sequences are not guaranteed to be present since they depend on the metadata obtained by a particular extractor. Such sequences will be replaced with `NA`.
-
-For example for `-o %(title)s-%(id)s.%(ext)s` and an mp4 video with title `youtube-dl test video` and id `BaW_jenozKcj`, this will result in a `youtube-dl test video-BaW_jenozKcj.mp4` file created in the current directory.
-
-For numeric sequences you can use numeric related formatting, for example, `%(view_count)05d` will result in a string with view count padded with zeros up to 5 characters, like in `00042`.
-
-Output templates can also contain arbitrary hierarchical path, e.g. `-o '%(playlist)s/%(playlist_index)s - %(title)s.%(ext)s'` which will result in downloading each video in a directory corresponding to this path template. Any missing directory will be automatically created for you.
-
-To use percent literals in an output template use `%%`. To output to stdout use `-o -`.
-
-The current default template is `%(title)s-%(id)s.%(ext)s`.
-
-In some cases, you don't want special characters such as 中, spaces, or &, such as when transferring the downloaded filename to a Windows system or the filename through an 8bit-unsafe channel. In these cases, add the `--restrict-filenames` flag to get a shorter title:
-
-#### Output template and Windows batch files
-
-If you are using an output template inside a Windows batch file then you must escape plain percent characters (`%`) by doubling, so that `-o "%(title)s-%(id)s.%(ext)s"` should become `-o "%%(title)s-%%(id)s.%%(ext)s"`. However you should not touch `%`'s that are not plain characters, e.g. environment variables for expansion should stay intact: `-o "C:\%HOMEPATH%\Desktop\%%(title)s.%%(ext)s"`.
-
-#### Output template examples
-
-Note that on Windows you may need to use double quotes instead of single.
-
-```bash
-$ youtube-dl --get-filename -o '%(title)s.%(ext)s' BaW_jenozKc
-youtube-dl test video ''_ä↭𝕐.mp4 # All kinds of weird characters
-
-$ youtube-dl --get-filename -o '%(title)s.%(ext)s' BaW_jenozKc --restrict-filenames
-youtube-dl_test_video_.mp4 # A simple file name
-
-# Download YouTube playlist videos in separate directory indexed by video order in a playlist
-$ youtube-dl -o '%(playlist)s/%(playlist_index)s - %(title)s.%(ext)s' https://www.youtube.com/playlist?list=PLwiyx1dc3P2JR9N8gQaQN_BCvlSlap7re
-
-# Download all playlists of YouTube channel/user keeping each playlist in separate directory:
-$ youtube-dl -o '%(uploader)s/%(playlist)s/%(playlist_index)s - %(title)s.%(ext)s' https://www.youtube.com/user/TheLinuxFoundation/playlists
-
-# Download Udemy course keeping each chapter in separate directory under MyVideos directory in your home
-$ youtube-dl -u user -p password -o '~/MyVideos/%(playlist)s/%(chapter_number)s - %(chapter)s/%(title)s.%(ext)s' https://www.udemy.com/java-tutorial/
-
-# Download entire series season keeping each series and each season in separate directory under C:/MyVideos
-$ youtube-dl -o "C:/MyVideos/%(series)s/%(season_number)s - %(season)s/%(episode_number)s - %(episode)s.%(ext)s" https://videomore.ru/kino_v_detalayah/5_sezon/367617
-
-# Stream the video being downloaded to stdout
-$ youtube-dl -o - BaW_jenozKc
-```
-
-# FORMAT SELECTION
-
-By default youtube-dl tries to download the best available quality, i.e. if you want the best quality you **don't need** to pass any special options, youtube-dl will guess it for you by **default**.
-
-But sometimes you may want to download in a different format, for example when you are on a slow or intermittent connection. The key mechanism for achieving this is so-called *format selection* based on which you can explicitly specify desired format, select formats based on some criterion or criteria, setup precedence and much more.
-
-The general syntax for format selection is `--format FORMAT` or shorter `-f FORMAT` where `FORMAT` is a *selector expression*, i.e. an expression that describes format or formats you would like to download.
-
-**tl;dr:** [navigate me to examples](#format-selection-examples).
-
-The simplest case is requesting a specific format, for example with `-f 22` you can download the format with format code equal to 22. You can get the list of available format codes for particular video using `--list-formats` or `-F`. Note that these format codes are extractor specific.
-
-You can also use a file extension (currently `3gp`, `aac`, `flv`, `m4a`, `mp3`, `mp4`, `ogg`, `wav`, `webm` are supported) to download the best quality format of a particular file extension served as a single file, e.g. `-f webm` will download the best quality format with the `webm` extension served as a single file.
-
-You can also use special names to select particular edge case formats:
-
- - `best`: Select the best quality format represented by a single file with video and audio.
- - `worst`: Select the worst quality format represented by a single file with video and audio.
- - `bestvideo`: Select the best quality video-only format (e.g. DASH video). May not be available.
- - `worstvideo`: Select the worst quality video-only format. May not be available.
- - `bestaudio`: Select the best quality audio only-format. May not be available.
- - `worstaudio`: Select the worst quality audio only-format. May not be available.
-
-For example, to download the worst quality video-only format you can use `-f worstvideo`.
-
-If you want to download multiple videos and they don't have the same formats available, you can specify the order of preference using slashes. Note that slash is left-associative, i.e. formats on the left hand side are preferred, for example `-f 22/17/18` will download format 22 if it's available, otherwise it will download format 17 if it's available, otherwise it will download format 18 if it's available, otherwise it will complain that no suitable formats are available for download.
-
-If you want to download several formats of the same video use a comma as a separator, e.g. `-f 22,17,18` will download all these three formats, of course if they are available. Or a more sophisticated example combined with the precedence feature: `-f 136/137/mp4/bestvideo,140/m4a/bestaudio`.
-
-You can also filter the video formats by putting a condition in brackets, as in `-f "best[height=720]"` (or `-f "[filesize>10M]"`).
-
-The following numeric meta fields can be used with comparisons `<`, `<=`, `>`, `>=`, `=` (equals), `!=` (not equals):
-
- - `filesize`: The number of bytes, if known in advance
- - `width`: Width of the video, if known
- - `height`: Height of the video, if known
- - `tbr`: Average bitrate of audio and video in KBit/s
- - `abr`: Average audio bitrate in KBit/s
- - `vbr`: Average video bitrate in KBit/s
- - `asr`: Audio sampling rate in Hertz
- - `fps`: Frame rate
-
-Also filtering work for comparisons `=` (equals), `^=` (starts with), `$=` (ends with), `*=` (contains) and following string meta fields:
-
- - `ext`: File extension
- - `acodec`: Name of the audio codec in use
- - `vcodec`: Name of the video codec in use
- - `container`: Name of the container format
- - `protocol`: The protocol that will be used for the actual download, lower-case (`http`, `https`, `rtsp`, `rtmp`, `rtmpe`, `mms`, `f4m`, `ism`, `http_dash_segments`, `m3u8`, or `m3u8_native`)
- - `format_id`: A short description of the format
-
-Any string comparison may be prefixed with negation `!` in order to produce an opposite comparison, e.g. `!*=` (does not contain).
-
-Note that none of the aforementioned meta fields are guaranteed to be present since this solely depends on the metadata obtained by particular extractor, i.e. the metadata offered by the video hoster.
-
-Formats for which the value is not known are excluded unless you put a question mark (`?`) after the operator. You can combine format filters, so `-f "[height <=? 720][tbr>500]"` selects up to 720p videos (or videos where the height is not known) with a bitrate of at least 500 KBit/s.
-
-You can merge the video and audio of two formats into a single file using `-f <video-format>+<audio-format>` (requires ffmpeg or avconv installed), for example `-f bestvideo+bestaudio` will download the best video-only format, the best audio-only format and mux them together with ffmpeg/avconv.
-
-Format selectors can also be grouped using parentheses, for example if you want to download the best mp4 and webm formats with a height lower than 480 you can use `-f '(mp4,webm)[height<480]'`.
-
-Since the end of April 2015 and version 2015.04.26, youtube-dl uses `-f bestvideo+bestaudio/best` as the default format selection (see [#5447](https://github.com/ytdl-org/youtube-dl/issues/5447), [#5456](https://github.com/ytdl-org/youtube-dl/issues/5456)). If ffmpeg or avconv are installed this results in downloading `bestvideo` and `bestaudio` separately and muxing them together into a single file giving the best overall quality available. Otherwise it falls back to `best` and results in downloading the best available quality served as a single file. `best` is also needed for videos that don't come from YouTube because they don't provide the audio and video in two different files. If you want to only download some DASH formats (for example if you are not interested in getting videos with a resolution higher than 1080p), you can add `-f bestvideo[height<=?1080]+bestaudio/best` to your configuration file. Note that if you use youtube-dl to stream to `stdout` (and most likely to pipe it to your media player then), i.e. you explicitly specify output template as `-o -`, youtube-dl still uses `-f best` format selection in order to start content delivery immediately to your player and not to wait until `bestvideo` and `bestaudio` are downloaded and muxed.
-
-If you want to preserve the old format selection behavior (prior to youtube-dl 2015.04.26), i.e. you want to download the best available quality media served as a single file, you should explicitly specify your choice with `-f best`. You may want to add it to the [configuration file](#configuration) in order not to type it every time you run youtube-dl.
-
-#### Format selection examples
-
-Note that on Windows you may need to use double quotes instead of single.
-
-```bash
-# Download best mp4 format available or any other best if no mp4 available
-$ youtube-dl -f 'bestvideo[ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]/best'
-
-# Download best format available but no better than 480p
-$ youtube-dl -f 'bestvideo[height<=480]+bestaudio/best[height<=480]'
-
-# Download best video only format but no bigger than 50 MB
-$ youtube-dl -f 'best[filesize<50M]'
-
-# Download best format available via direct link over HTTP/HTTPS protocol
-$ youtube-dl -f '(bestvideo+bestaudio/best)[protocol^=http]'
-
-# Download the best video format and the best audio format without merging them
-$ youtube-dl -f 'bestvideo,bestaudio' -o '%(title)s.f%(format_id)s.%(ext)s'
-```
-Note that in the last example, an output template is recommended as bestvideo and bestaudio may have the same file name.
-
-
-# VIDEO SELECTION
-
-Videos can be filtered by their upload date using the options `--date`, `--datebefore` or `--dateafter`. They accept dates in two formats:
-
- - Absolute dates: Dates in the format `YYYYMMDD`.
- - Relative dates: Dates in the format `(now|today)[+-][0-9](day|week|month|year)(s)?`
-
-Examples:
-
-```bash
-# Download only the videos uploaded in the last 6 months
-$ youtube-dl --dateafter now-6months
-
-# Download only the videos uploaded on January 1, 1970
-$ youtube-dl --date 19700101
-
-$ # Download only the videos uploaded in the 200x decade
-$ youtube-dl --dateafter 20000101 --datebefore 20091231
-```
-
-# FAQ
-
-### How do I update youtube-dl?
-
-If you've followed [our manual installation instructions](https://ytdl-org.github.io/youtube-dl/download.html), you can simply run `youtube-dl -U` (or, on Linux, `sudo youtube-dl -U`).
-
-If you have used pip, a simple `sudo pip install -U youtube-dl` is sufficient to update.
-
-If you have installed youtube-dl using a package manager like *apt-get* or *yum*, use the standard system update mechanism to update. Note that distribution packages are often outdated. As a rule of thumb, youtube-dl releases at least once a month, and often weekly or even daily. Simply go to https://yt-dl.org to find out the current version. Unfortunately, there is nothing we youtube-dl developers can do if your distribution serves a really outdated version. You can (and should) complain to your distribution in their bugtracker or support forum.
-
-As a last resort, you can also uninstall the version installed by your package manager and follow our manual installation instructions. For that, remove the distribution's package, with a line like
-
- sudo apt-get remove -y youtube-dl
-
-Afterwards, simply follow [our manual installation instructions](https://ytdl-org.github.io/youtube-dl/download.html):
-
-```
-sudo wget https://yt-dl.org/downloads/latest/youtube-dl -O /usr/local/bin/youtube-dl
-sudo chmod a+rx /usr/local/bin/youtube-dl
-hash -r
-```
-
-Again, from then on you'll be able to update with `sudo youtube-dl -U`.
-
-### youtube-dl is extremely slow to start on Windows
-
-Add a file exclusion for `youtube-dl.exe` in Windows Defender settings.
-
-### I'm getting an error `Unable to extract OpenGraph title` on YouTube playlists
-
-YouTube changed their playlist format in March 2014 and later on, so you'll need at least youtube-dl 2014.07.25 to download all YouTube videos.
-
-If you have installed youtube-dl with a package manager, pip, setup.py or a tarball, please use that to update. Note that Ubuntu packages do not seem to get updated anymore. Since we are not affiliated with Ubuntu, there is little we can do. Feel free to [report bugs](https://bugs.launchpad.net/ubuntu/+source/youtube-dl/+filebug) to the [Ubuntu packaging people](mailto:ubuntu-motu@lists.ubuntu.com?subject=outdated%20version%20of%20youtube-dl) - all they have to do is update the package to a somewhat recent version. See above for a way to update.
-
-### I'm getting an error when trying to use output template: `error: using output template conflicts with using title, video ID or auto number`
-
-Make sure you are not using `-o` with any of these options `-t`, `--title`, `--id`, `-A` or `--auto-number` set in command line or in a configuration file. Remove the latter if any.
-
-### Do I always have to pass `-citw`?
-
-By default, youtube-dl intends to have the best options (incidentally, if you have a convincing case that these should be different, [please file an issue where you explain that](https://yt-dl.org/bug)). Therefore, it is unnecessary and sometimes harmful to copy long option strings from webpages. In particular, the only option out of `-citw` that is regularly useful is `-i`.
-
-### Can you please put the `-b` option back?
-
-Most people asking this question are not aware that youtube-dl now defaults to downloading the highest available quality as reported by YouTube, which will be 1080p or 720p in some cases, so you no longer need the `-b` option. For some specific videos, maybe YouTube does not report them to be available in a specific high quality format you're interested in. In that case, simply request it with the `-f` option and youtube-dl will try to download it.
-
-### I get HTTP error 402 when trying to download a video. What's this?
-
-Apparently YouTube requires you to pass a CAPTCHA test if you download too much. We're [considering to provide a way to let you solve the CAPTCHA](https://github.com/ytdl-org/youtube-dl/issues/154), but at the moment, your best course of action is pointing a web browser to the youtube URL, solving the CAPTCHA, and restart youtube-dl.
-
-### Do I need any other programs?
-
-youtube-dl works fine on its own on most sites. However, if you want to convert video/audio, you'll need [avconv](https://libav.org/) or [ffmpeg](https://www.ffmpeg.org/). On some sites - most notably YouTube - videos can be retrieved in a higher quality format without sound. youtube-dl will detect whether avconv/ffmpeg is present and automatically pick the best option.
-
-Videos or video formats streamed via RTMP protocol can only be downloaded when [rtmpdump](https://rtmpdump.mplayerhq.hu/) is installed. Downloading MMS and RTSP videos requires either [mplayer](https://mplayerhq.hu/) or [mpv](https://mpv.io/) to be installed.
-
-### I have downloaded a video but how can I play it?
-
-Once the video is fully downloaded, use any video player, such as [mpv](https://mpv.io/), [vlc](https://www.videolan.org/) or [mplayer](https://www.mplayerhq.hu/).
-
-### I extracted a video URL with `-g`, but it does not play on another machine / in my web browser.
-
-It depends a lot on the service. In many cases, requests for the video (to download/play it) must come from the same IP address and with the same cookies and/or HTTP headers. Use the `--cookies` option to write the required cookies into a file, and advise your downloader to read cookies from that file. Some sites also require a common user agent to be used, use `--dump-user-agent` to see the one in use by youtube-dl. You can also get necessary cookies and HTTP headers from JSON output obtained with `--dump-json`.
-
-It may be beneficial to use IPv6; in some cases, the restrictions are only applied to IPv4. Some services (sometimes only for a subset of videos) do not restrict the video URL by IP address, cookie, or user-agent, but these are the exception rather than the rule.
-
-Please bear in mind that some URL protocols are **not** supported by browsers out of the box, including RTMP. If you are using `-g`, your own downloader must support these as well.
-
-If you want to play the video on a machine that is not running youtube-dl, you can relay the video content from the machine that runs youtube-dl. You can use `-o -` to let youtube-dl stream a video to stdout, or simply allow the player to download the files written by youtube-dl in turn.
-
-### ERROR: no fmt_url_map or conn information found in video info
-
-YouTube has switched to a new video info format in July 2011 which is not supported by old versions of youtube-dl. See [above](#how-do-i-update-youtube-dl) for how to update youtube-dl.
-
-### ERROR: unable to download video
-
-YouTube requires an additional signature since September 2012 which is not supported by old versions of youtube-dl. See [above](#how-do-i-update-youtube-dl) for how to update youtube-dl.
-
-### Video URL contains an ampersand and I'm getting some strange output `[1] 2839` or `'v' is not recognized as an internal or external command`
-
-That's actually the output from your shell. Since ampersand is one of the special shell characters it's interpreted by the shell preventing you from passing the whole URL to youtube-dl. To disable your shell from interpreting the ampersands (or any other special characters) you have to either put the whole URL in quotes or escape them with a backslash (which approach will work depends on your shell).
-
-For example if your URL is https://www.youtube.com/watch?t=4&v=BaW_jenozKc you should end up with following command:
-
-```youtube-dl 'https://www.youtube.com/watch?t=4&v=BaW_jenozKc'```
-
-or
-
-```youtube-dl https://www.youtube.com/watch?t=4\&v=BaW_jenozKc```
-
-For Windows you have to use the double quotes:
-
-```youtube-dl "https://www.youtube.com/watch?t=4&v=BaW_jenozKc"```
-
-### ExtractorError: Could not find JS function u'OF'
-
-In February 2015, the new YouTube player contained a character sequence in a string that was misinterpreted by old versions of youtube-dl. See [above](#how-do-i-update-youtube-dl) for how to update youtube-dl.
-
-### HTTP Error 429: Too Many Requests or 402: Payment Required
-
-These two error codes indicate that the service is blocking your IP address because of overuse. Contact the service and ask them to unblock your IP address, or - if you have acquired a whitelisted IP address already - use the [`--proxy` or `--source-address` options](#network-options) to select another IP address.
-
-### SyntaxError: Non-ASCII character
-
-The error
-
- File "youtube-dl", line 2
- SyntaxError: Non-ASCII character '\x93' ...
-
-means you're using an outdated version of Python. Please update to Python 2.6 or 2.7.
-
-### What is this binary file? Where has the code gone?
-
-Since June 2012 ([#342](https://github.com/ytdl-org/youtube-dl/issues/342)) youtube-dl is packed as an executable zipfile, simply unzip it (might need renaming to `youtube-dl.zip` first on some systems) or clone the git repository, as laid out above. If you modify the code, you can run it by executing the `__main__.py` file. To recompile the executable, run `make youtube-dl`.
-
-### The exe throws an error due to missing `MSVCR100.dll`
-
-To run the exe you need to install first the [Microsoft Visual C++ 2010 Redistributable Package (x86)](https://www.microsoft.com/en-US/download/details.aspx?id=5555).
-
-### On Windows, how should I set up ffmpeg and youtube-dl? Where should I put the exe files?
-
-If you put youtube-dl and ffmpeg in the same directory that you're running the command from, it will work, but that's rather cumbersome.
-
-To make a different directory work - either for ffmpeg, or for youtube-dl, or for both - simply create the directory (say, `C:\bin`, or `C:\Users\<User name>\bin`), put all the executables directly in there, and then [set your PATH environment variable](https://www.java.com/en/download/help/path.xml) to include that directory.
-
-From then on, after restarting your shell, you will be able to access both youtube-dl and ffmpeg (and youtube-dl will be able to find ffmpeg) by simply typing `youtube-dl` or `ffmpeg`, no matter what directory you're in.
-
-### How do I put downloads into a specific folder?
-
-Use the `-o` to specify an [output template](#output-template), for example `-o "/home/user/videos/%(title)s-%(id)s.%(ext)s"`. If you want this for all of your downloads, put the option into your [configuration file](#configuration).
-
-### How do I download a video starting with a `-`?
-
-Either prepend `https://www.youtube.com/watch?v=` or separate the ID from the options with `--`:
-
- youtube-dl -- -wNyEUrxzFU
- youtube-dl "https://www.youtube.com/watch?v=-wNyEUrxzFU"
-
-### How do I pass cookies to youtube-dl?
-
-Use the `--cookies` option, for example `--cookies /path/to/cookies/file.txt`.
-
-In order to extract cookies from browser use any conforming browser extension for exporting cookies. For example, [cookies.txt](https://chrome.google.com/webstore/detail/cookiestxt/njabckikapfpffapmjgojcnbfjonfjfg) (for Chrome) or [cookies.txt](https://addons.mozilla.org/en-US/firefox/addon/cookies-txt/) (for Firefox).
-
-Note that the cookies file must be in Mozilla/Netscape format and the first line of the cookies file must be either `# HTTP Cookie File` or `# Netscape HTTP Cookie File`. Make sure you have correct [newline format](https://en.wikipedia.org/wiki/Newline) in the cookies file and convert newlines if necessary to correspond with your OS, namely `CRLF` (`\r\n`) for Windows and `LF` (`\n`) for Unix and Unix-like systems (Linux, macOS, etc.). `HTTP Error 400: Bad Request` when using `--cookies` is a good sign of invalid newline format.
-
-Passing cookies to youtube-dl is a good way to workaround login when a particular extractor does not implement it explicitly. Another use case is working around [CAPTCHA](https://en.wikipedia.org/wiki/CAPTCHA) some websites require you to solve in particular cases in order to get access (e.g. YouTube, CloudFlare).
-
-### How do I stream directly to media player?
-
-You will first need to tell youtube-dl to stream media to stdout with `-o -`, and also tell your media player to read from stdin (it must be capable of this for streaming) and then pipe former to latter. For example, streaming to [vlc](https://www.videolan.org/) can be achieved with:
-
- youtube-dl -o - "https://www.youtube.com/watch?v=BaW_jenozKcj" | vlc -
-
-### How do I download only new videos from a playlist?
-
-Use download-archive feature. With this feature you should initially download the complete playlist with `--download-archive /path/to/download/archive/file.txt` that will record identifiers of all the videos in a special file. Each subsequent run with the same `--download-archive` will download only new videos and skip all videos that have been downloaded before. Note that only successful downloads are recorded in the file.
-
-For example, at first,
-
- youtube-dl --download-archive archive.txt "https://www.youtube.com/playlist?list=PLwiyx1dc3P2JR9N8gQaQN_BCvlSlap7re"
-
-will download the complete `PLwiyx1dc3P2JR9N8gQaQN_BCvlSlap7re` playlist and create a file `archive.txt`. Each subsequent run will only download new videos if any:
-
- youtube-dl --download-archive archive.txt "https://www.youtube.com/playlist?list=PLwiyx1dc3P2JR9N8gQaQN_BCvlSlap7re"
-
-### Should I add `--hls-prefer-native` into my config?
-
-When youtube-dl detects an HLS video, it can download it either with the built-in downloader or ffmpeg. Since many HLS streams are slightly invalid and ffmpeg/youtube-dl each handle some invalid cases better than the other, there is an option to switch the downloader if needed.
-
-When youtube-dl knows that one particular downloader works better for a given website, that downloader will be picked. Otherwise, youtube-dl will pick the best downloader for general compatibility, which at the moment happens to be ffmpeg. This choice may change in future versions of youtube-dl, with improvements of the built-in downloader and/or ffmpeg.
-
-In particular, the generic extractor (used when your website is not in the [list of supported sites by youtube-dl](https://ytdl-org.github.io/youtube-dl/supportedsites.html) cannot mandate one specific downloader.
-
-If you put either `--hls-prefer-native` or `--hls-prefer-ffmpeg` into your configuration, a different subset of videos will fail to download correctly. Instead, it is much better to [file an issue](https://yt-dl.org/bug) or a pull request which details why the native or the ffmpeg HLS downloader is a better choice for your use case.
-
-### Can you add support for this anime video site, or site which shows current movies for free?
-
-As a matter of policy (as well as legality), youtube-dl does not include support for services that specialize in infringing copyright. As a rule of thumb, if you cannot easily find a video that the service is quite obviously allowed to distribute (i.e. that has been uploaded by the creator, the creator's distributor, or is published under a free license), the service is probably unfit for inclusion to youtube-dl.
-
-A note on the service that they don't host the infringing content, but just link to those who do, is evidence that the service should **not** be included into youtube-dl. The same goes for any DMCA note when the whole front page of the service is filled with videos they are not allowed to distribute. A "fair use" note is equally unconvincing if the service shows copyright-protected videos in full without authorization.
-
-Support requests for services that **do** purchase the rights to distribute their content are perfectly fine though. If in doubt, you can simply include a source that mentions the legitimate purchase of content.
-
-### How can I speed up work on my issue?
-
-(Also known as: Help, my important issue not being solved!) The youtube-dl core developer team is quite small. While we do our best to solve as many issues as possible, sometimes that can take quite a while. To speed up your issue, here's what you can do:
-
-First of all, please do report the issue [at our issue tracker](https://yt-dl.org/bugs). That allows us to coordinate all efforts by users and developers, and serves as a unified point. Unfortunately, the youtube-dl project has grown too large to use personal email as an effective communication channel.
-
-Please read the [bug reporting instructions](#bugs) below. A lot of bugs lack all the necessary information. If you can, offer proxy, VPN, or shell access to the youtube-dl developers. If you are able to, test the issue from multiple computers in multiple countries to exclude local censorship or misconfiguration issues.
-
-If nobody is interested in solving your issue, you are welcome to take matters into your own hands and submit a pull request (or coerce/pay somebody else to do so).
-
-Feel free to bump the issue from time to time by writing a small comment ("Issue is still present in youtube-dl version ...from France, but fixed from Belgium"), but please not more than once a month. Please do not declare your issue as `important` or `urgent`.
-
-### How can I detect whether a given URL is supported by youtube-dl?
-
-For one, have a look at the [list of supported sites](docs/supportedsites.md). Note that it can sometimes happen that the site changes its URL scheme (say, from https://example.com/video/1234567 to https://example.com/v/1234567 ) and youtube-dl reports an URL of a service in that list as unsupported. In that case, simply report a bug.
-
-It is *not* possible to detect whether a URL is supported or not. That's because youtube-dl contains a generic extractor which matches **all** URLs. You may be tempted to disable, exclude, or remove the generic extractor, but the generic extractor not only allows users to extract videos from lots of websites that embed a video from another service, but may also be used to extract video from a service that it's hosting itself. Therefore, we neither recommend nor support disabling, excluding, or removing the generic extractor.
-
-If you want to find out whether a given URL is supported, simply call youtube-dl with it. If you get no videos back, chances are the URL is either not referring to a video or unsupported. You can find out which by examining the output (if you run youtube-dl on the console) or catching an `UnsupportedError` exception if you run it from a Python program.
-
-# Why do I need to go through that much red tape when filing bugs?
-
-Before we had the issue template, despite our extensive [bug reporting instructions](#bugs), about 80% of the issue reports we got were useless, for instance because people used ancient versions hundreds of releases old, because of simple syntactic errors (not in youtube-dl but in general shell usage), because the problem was already reported multiple times before, because people did not actually read an error message, even if it said "please install ffmpeg", because people did not mention the URL they were trying to download and many more simple, easy-to-avoid problems, many of whom were totally unrelated to youtube-dl.
-
-youtube-dl is an open-source project manned by too few volunteers, so we'd rather spend time fixing bugs where we are certain none of those simple problems apply, and where we can be reasonably confident to be able to reproduce the issue without asking the reporter repeatedly. As such, the output of `youtube-dl -v YOUR_URL_HERE` is really all that's required to file an issue. The issue template also guides you through some basic steps you can do, such as checking that your version of youtube-dl is current.
-
-# DEVELOPER INSTRUCTIONS
-
-Most users do not need to build youtube-dl and can [download the builds](https://ytdl-org.github.io/youtube-dl/download.html) or get them from their distribution.
-
-To run youtube-dl as a developer, you don't need to build anything either. Simply execute
-
- python -m youtube_dl
-
-To run the test, simply invoke your favorite test runner, or execute a test file directly; any of the following work:
-
- python -m unittest discover
- python test/test_download.py
- nosetests
-
-See item 6 of [new extractor tutorial](#adding-support-for-a-new-site) for how to run extractor specific test cases.
-
-If you want to create a build of youtube-dl yourself, you'll need
-
-* python
-* make (only GNU make is supported)
-* pandoc
-* zip
-* nosetests
-
-### Adding support for a new site
-
-If you want to add support for a new site, first of all **make sure** this site is **not dedicated to [copyright infringement](README.md#can-you-add-support-for-this-anime-video-site-or-site-which-shows-current-movies-for-free)**. youtube-dl does **not support** such sites thus pull requests adding support for them **will be rejected**.
-
-After you have ensured this site is distributing its content legally, you can follow this quick list (assuming your service is called `yourextractor`):
-
-1. [Fork this repository](https://github.com/ytdl-org/youtube-dl/fork)
-2. Check out the source code with:
-
- git clone git@github.com:YOUR_GITHUB_USERNAME/youtube-dl.git
-
-3. Start a new git branch with
-
- cd youtube-dl
- git checkout -b yourextractor
-
-4. Start with this simple template and save it to `youtube_dl/extractor/yourextractor.py`:
-
- ```python
- # coding: utf-8
- from __future__ import unicode_literals
-
- from .common import InfoExtractor
-
-
- class YourExtractorIE(InfoExtractor):
- _VALID_URL = r'https?://(?:www\.)?yourextractor\.com/watch/(?P<id>[0-9]+)'
- _TEST = {
- 'url': 'https://yourextractor.com/watch/42',
- 'md5': 'TODO: md5 sum of the first 10241 bytes of the video file (use --test)',
- 'info_dict': {
- 'id': '42',
- 'ext': 'mp4',
- 'title': 'Video title goes here',
- 'thumbnail': r're:^https?://.*\.jpg$',
- # TODO more properties, either as:
- # * A value
- # * MD5 checksum; start the string with md5:
- # * A regular expression; start the string with re:
- # * Any Python type (for example int or float)
- }
- }
-
- def _real_extract(self, url):
- video_id = self._match_id(url)
- webpage = self._download_webpage(url, video_id)
-
- # TODO more code goes here, for example ...
- title = self._html_search_regex(r'<h1>(.+?)</h1>', webpage, 'title')
-
- return {
- 'id': video_id,
- 'title': title,
- 'description': self._og_search_description(webpage),
- 'uploader': self._search_regex(r'<div[^>]+id="uploader"[^>]*>([^<]+)<', webpage, 'uploader', fatal=False),
- # TODO more properties (see youtube_dl/extractor/common.py)
- }
- ```
-5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/extractor/extractors.py).
-6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc. Note that tests with `only_matching` key in test's dict are not counted in.
-7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/ytdl-org/youtube-dl/blob/7f41a598b3fba1bcab2817de64a08941200aa3c8/youtube_dl/extractor/common.py#L94-L303). Add tests and code for as many as you want.
-8. Make sure your code follows [youtube-dl coding conventions](#youtube-dl-coding-conventions) and check the code with [flake8](http://flake8.pycqa.org/en/latest/index.html#quickstart):
-
- $ flake8 youtube_dl/extractor/yourextractor.py
-
-9. Make sure your code works under all [Python](https://www.python.org/) versions claimed supported by youtube-dl, namely 2.6, 2.7, and 3.2+.
-10. When the tests pass, [add](https://git-scm.com/docs/git-add) the new files and [commit](https://git-scm.com/docs/git-commit) them and [push](https://git-scm.com/docs/git-push) the result, like this:
-
- $ git add youtube_dl/extractor/extractors.py
- $ git add youtube_dl/extractor/yourextractor.py
- $ git commit -m '[yourextractor] Add new extractor'
- $ git push origin yourextractor
-
-11. Finally, [create a pull request](https://help.github.com/articles/creating-a-pull-request). We'll then review and merge it.
-
-In any case, thank you very much for your contributions!
-
-## youtube-dl coding conventions
-
-This section introduces a guide lines for writing idiomatic, robust and future-proof extractor code.
-
-Extractors are very fragile by nature since they depend on the layout of the source data provided by 3rd party media hosters out of your control and this layout tends to change. As an extractor implementer your task is not only to write code that will extract media links and metadata correctly but also to minimize dependency on the source's layout and even to make the code foresee potential future changes and be ready for that. This is important because it will allow the extractor not to break on minor layout changes thus keeping old youtube-dl versions working. Even though this breakage issue is easily fixed by emitting a new version of youtube-dl with a fix incorporated, all the previous versions become broken in all repositories and distros' packages that may not be so prompt in fetching the update from us. Needless to say, some non rolling release distros may never receive an update at all.
-
-### Mandatory and optional metafields
-
-For extraction to work youtube-dl relies on metadata your extractor extracts and provides to youtube-dl expressed by an [information dictionary](https://github.com/ytdl-org/youtube-dl/blob/7f41a598b3fba1bcab2817de64a08941200aa3c8/youtube_dl/extractor/common.py#L94-L303) or simply *info dict*. Only the following meta fields in the *info dict* are considered mandatory for a successful extraction process by youtube-dl:
-
- - `id` (media identifier)
- - `title` (media title)
- - `url` (media download URL) or `formats`
-
-In fact only the last option is technically mandatory (i.e. if you can't figure out the download location of the media the extraction does not make any sense). But by convention youtube-dl also treats `id` and `title` as mandatory. Thus the aforementioned metafields are the critical data that the extraction does not make any sense without and if any of them fail to be extracted then the extractor is considered completely broken.
-
-[Any field](https://github.com/ytdl-org/youtube-dl/blob/7f41a598b3fba1bcab2817de64a08941200aa3c8/youtube_dl/extractor/common.py#L188-L303) apart from the aforementioned ones are considered **optional**. That means that extraction should be **tolerant** to situations when sources for these fields can potentially be unavailable (even if they are always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields.
-
-#### Example
-
-Say you have some source dictionary `meta` that you've fetched as JSON with HTTP request and it has a key `summary`:
-
-```python
-meta = self._download_json(url, video_id)
-```
-
-Assume at this point `meta`'s layout is:
-
-```python
-{
- ...
- "summary": "some fancy summary text",
- ...
-}
-```
-
-Assume you want to extract `summary` and put it into the resulting info dict as `description`. Since `description` is an optional meta field you should be ready that this key may be missing from the `meta` dict, so that you should extract it like:
-
-```python
-description = meta.get('summary') # correct
-```
-
-and not like:
-
-```python
-description = meta['summary'] # incorrect
-```
-
-The latter will break extraction process with `KeyError` if `summary` disappears from `meta` at some later time but with the former approach extraction will just go ahead with `description` set to `None` which is perfectly fine (remember `None` is equivalent to the absence of data).
-
-Similarly, you should pass `fatal=False` when extracting optional data from a webpage with `_search_regex`, `_html_search_regex` or similar methods, for instance:
-
-```python
-description = self._search_regex(
- r'<span[^>]+id="title"[^>]*>([^<]+)<',
- webpage, 'description', fatal=False)
-```
-
-With `fatal` set to `False` if `_search_regex` fails to extract `description` it will emit a warning and continue extraction.
-
-You can also pass `default=<some fallback value>`, for example:
-
-```python
-description = self._search_regex(
- r'<span[^>]+id="title"[^>]*>([^<]+)<',
- webpage, 'description', default=None)
-```
-
-On failure this code will silently continue the extraction with `description` set to `None`. That is useful for metafields that may or may not be present.
-
-### Provide fallbacks
-
-When extracting metadata try to do so from multiple sources. For example if `title` is present in several places, try extracting from at least some of them. This makes it more future-proof in case some of the sources become unavailable.
-
-#### Example
-
-Say `meta` from the previous example has a `title` and you are about to extract it. Since `title` is a mandatory meta field you should end up with something like:
-
-```python
-title = meta['title']
-```
-
-If `title` disappears from `meta` in future due to some changes on the hoster's side the extraction would fail since `title` is mandatory. That's expected.
-
-Assume that you have some another source you can extract `title` from, for example `og:title` HTML meta of a `webpage`. In this case you can provide a fallback scenario:
-
-```python
-title = meta.get('title') or self._og_search_title(webpage)
-```
-
-This code will try to extract from `meta` first and if it fails it will try extracting `og:title` from a `webpage`.
-
-### Regular expressions
-
-#### Don't capture groups you don't use
-
-Capturing group must be an indication that it's used somewhere in the code. Any group that is not used must be non capturing.
-
-##### Example
-
-Don't capture id attribute name here since you can't use it for anything anyway.
-
-Correct:
-
-```python
-r'(?:id|ID)=(?P<id>\d+)'
-```
-
-Incorrect:
-```python
-r'(id|ID)=(?P<id>\d+)'
-```
-
-
-#### Make regular expressions relaxed and flexible
-
-When using regular expressions try to write them fuzzy, relaxed and flexible, skipping insignificant parts that are more likely to change, allowing both single and double quotes for quoted values and so on.
-
-##### Example
-
-Say you need to extract `title` from the following HTML code:
-
-```html
-<span style="position: absolute; left: 910px; width: 90px; float: right; z-index: 9999;" class="title">some fancy title</span>
-```
-
-The code for that task should look similar to:
-
-```python
-title = self._search_regex(
- r'<span[^>]+class="title"[^>]*>([^<]+)', webpage, 'title')
-```
-
-Or even better:
-
-```python
-title = self._search_regex(
- r'<span[^>]+class=(["\'])title\1[^>]*>(?P<title>[^<]+)',
- webpage, 'title', group='title')
-```
-
-Note how you tolerate potential changes in the `style` attribute's value or switch from using double quotes to single for `class` attribute:
-
-The code definitely should not look like:
-
-```python
-title = self._search_regex(
- r'<span style="position: absolute; left: 910px; width: 90px; float: right; z-index: 9999;" class="title">(.*?)</span>',
- webpage, 'title', group='title')
-```
-
-### Long lines policy
-
-There is a soft limit to keep lines of code under 80 characters long. This means it should be respected if possible and if it does not make readability and code maintenance worse.
-
-For example, you should **never** split long string literals like URLs or some other often copied entities over multiple lines to fit this limit:
-
-Correct:
-
-```python
-'https://www.youtube.com/watch?v=FqZTN594JQw&list=PLMYEtVRpaqY00V9W81Cwmzp6N6vZqfUKD4'
-```
-
-Incorrect:
-
-```python
-'https://www.youtube.com/watch?v=FqZTN594JQw&list='
-'PLMYEtVRpaqY00V9W81Cwmzp6N6vZqfUKD4'
-```
-
-### Inline values
-
-Extracting variables is acceptable for reducing code duplication and improving readability of complex expressions. However, you should avoid extracting variables used only once and moving them to opposite parts of the extractor file, which makes reading the linear flow difficult.
-
-#### Example
-
-Correct:
-
-```python
-title = self._html_search_regex(r'<title>([^<]+)</title>', webpage, 'title')
-```
-
-Incorrect:
-
-```python
-TITLE_RE = r'<title>([^<]+)</title>'
-# ...some lines of code...
-title = self._html_search_regex(TITLE_RE, webpage, 'title')
-```
-
-### Collapse fallbacks
-
-Multiple fallback values can quickly become unwieldy. Collapse multiple fallback values into a single expression via a list of patterns.
-
-#### Example
-
-Good:
-
-```python
-description = self._html_search_meta(
- ['og:description', 'description', 'twitter:description'],
- webpage, 'description', default=None)
-```
-
-Unwieldy:
-
-```python
-description = (
- self._og_search_description(webpage, default=None)
- or self._html_search_meta('description', webpage, default=None)
- or self._html_search_meta('twitter:description', webpage, default=None))
-```
-
-Methods supporting list of patterns are: `_search_regex`, `_html_search_regex`, `_og_search_property`, `_html_search_meta`.
-
-### Trailing parentheses
-
-Always move trailing parentheses after the last argument.
-
-#### Example
-
-Correct:
-
-```python
- lambda x: x['ResultSet']['Result'][0]['VideoUrlSet']['VideoUrl'],
- list)
-```
-
-Incorrect:
-
-```python
- lambda x: x['ResultSet']['Result'][0]['VideoUrlSet']['VideoUrl'],
- list,
-)
-```
-
-### Use convenience conversion and parsing functions
-
-Wrap all extracted numeric data into safe functions from [`youtube_dl/utils.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/utils.py): `int_or_none`, `float_or_none`. Use them for string to number conversions as well.
-
-Use `url_or_none` for safe URL processing.
-
-Use `try_get` for safe metadata extraction from parsed JSON.
-
-Use `unified_strdate` for uniform `upload_date` or any `YYYYMMDD` meta field extraction, `unified_timestamp` for uniform `timestamp` extraction, `parse_filesize` for `filesize` extraction, `parse_count` for count meta fields extraction, `parse_resolution`, `parse_duration` for `duration` extraction, `parse_age_limit` for `age_limit` extraction.
-
-Explore [`youtube_dl/utils.py`](https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/utils.py) for more useful convenience functions.
-
-#### More examples
-
-##### Safely extract optional description from parsed JSON
-```python
-description = try_get(response, lambda x: x['result']['video'][0]['summary'], compat_str)
-```
-
-##### Safely extract more optional metadata
-```python
-video = try_get(response, lambda x: x['result']['video'][0], dict) or {}
-description = video.get('summary')
-duration = float_or_none(video.get('durationMs'), scale=1000)
-view_count = int_or_none(video.get('views'))
-```
-
-# EMBEDDING YOUTUBE-DL
-
-youtube-dl makes the best effort to be a good command-line program, and thus should be callable from any programming language. If you encounter any problems parsing its output, feel free to [create a report](https://github.com/ytdl-org/youtube-dl/issues/new).
-
-From a Python program, you can embed youtube-dl in a more powerful fashion, like this:
-
-```python
-from __future__ import unicode_literals
-import youtube_dl
-
-ydl_opts = {}
-with youtube_dl.YoutubeDL(ydl_opts) as ydl:
- ydl.download(['https://www.youtube.com/watch?v=BaW_jenozKc'])
-```
-
-Most likely, you'll want to use various options. For a list of options available, have a look at [`youtube_dl/YoutubeDL.py`](https://github.com/ytdl-org/youtube-dl/blob/3e4cedf9e8cd3157df2457df7274d0c842421945/youtube_dl/YoutubeDL.py#L137-L312). For a start, if you want to intercept youtube-dl's output, set a `logger` object.
-
-Here's a more complete example of a program that outputs only errors (and a short message after the download is finished), and downloads/converts the video to an mp3 file:
-
-```python
-from __future__ import unicode_literals
-import youtube_dl
-
-
-class MyLogger(object):
- def debug(self, msg):
- pass
-
- def warning(self, msg):
- pass
-
- def error(self, msg):
- print(msg)
-
-
-def my_hook(d):
- if d['status'] == 'finished':
- print('Done downloading, now converting ...')
-
-
-ydl_opts = {
- 'format': 'bestaudio/best',
- 'postprocessors': [{
- 'key': 'FFmpegExtractAudio',
- 'preferredcodec': 'mp3',
- 'preferredquality': '192',
- }],
- 'logger': MyLogger(),
- 'progress_hooks': [my_hook],
-}
-with youtube_dl.YoutubeDL(ydl_opts) as ydl:
- ydl.download(['https://www.youtube.com/watch?v=BaW_jenozKc'])
-```
-
-# BUGS
-
-Bugs and suggestions should be reported at: <https://github.com/ytdl-org/youtube-dl/issues>. Unless you were prompted to or there is another pertinent reason (e.g. GitHub fails to accept the bug report), please do not send bug reports via personal email. For discussions, join us in the IRC channel [#youtube-dl](irc://chat.freenode.net/#youtube-dl) on freenode ([webchat](https://webchat.freenode.net/?randomnick=1&channels=youtube-dl)).
-
-**Please include the full output of youtube-dl when run with `-v`**, i.e. **add** `-v` flag to **your command line**, copy the **whole** output and post it in the issue body wrapped in \`\`\` for better formatting. It should look similar to this:
-```
-$ youtube-dl -v <your command line>
-[debug] System config: []
-[debug] User config: []
-[debug] Command-line args: [u'-v', u'https://www.youtube.com/watch?v=BaW_jenozKcj']
-[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
-[debug] youtube-dl version 2015.12.06
-[debug] Git HEAD: 135392e
-[debug] Python version 2.6.6 - Windows-2003Server-5.2.3790-SP2
-[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
-[debug] Proxy map: {}
-...
-```
-**Do not post screenshots of verbose logs; only plain text is acceptable.**
-
-The output (including the first lines) contains important debugging information. Issues without the full output are often not reproducible and therefore do not get solved in short order, if ever.
-
-Please re-read your issue once again to avoid a couple of common mistakes (you can and should use this as a checklist):
-
-### Is the description of the issue itself sufficient?
-
-We often get issue reports that we cannot really decipher. While in most cases we eventually get the required information after asking back multiple times, this poses an unnecessary drain on our resources. Many contributors, including myself, are also not native speakers, so we may misread some parts.
-
-So please elaborate on what feature you are requesting, or what bug you want to be fixed. Make sure that it's obvious
-
-- What the problem is
-- How it could be fixed
-- How your proposed solution would look like
-
-If your report is shorter than two lines, it is almost certainly missing some of these, which makes it hard for us to respond to it. We're often too polite to close the issue outright, but the missing info makes misinterpretation likely. As a committer myself, I often get frustrated by these issues, since the only possible way for me to move forward on them is to ask for clarification over and over.
-
-For bug reports, this means that your report should contain the *complete* output of youtube-dl when called with the `-v` flag. The error message you get for (most) bugs even says so, but you would not believe how many of our bug reports do not contain this information.
-
-If your server has multiple IPs or you suspect censorship, adding `--call-home` may be a good idea to get more diagnostics. If the error is `ERROR: Unable to extract ...` and you cannot reproduce it from multiple countries, add `--dump-pages` (warning: this will yield a rather large output, redirect it to the file `log.txt` by adding `>log.txt 2>&1` to your command-line) or upload the `.dump` files you get when you add `--write-pages` [somewhere](https://gist.github.com/).
-
-**Site support requests must contain an example URL**. An example URL is a URL you might want to download, like `https://www.youtube.com/watch?v=BaW_jenozKc`. There should be an obvious video present. Except under very special circumstances, the main page of a video service (e.g. `https://www.youtube.com/`) is *not* an example URL.
-
-### Are you using the latest version?
-
-Before reporting any issue, type `youtube-dl -U`. This should report that you're up-to-date. About 20% of the reports we receive are already fixed, but people are using outdated versions. This goes for feature requests as well.
-
-### Is the issue already documented?
-
-Make sure that someone has not already opened the issue you're trying to open. Search at the top of the window or browse the [GitHub Issues](https://github.com/ytdl-org/youtube-dl/search?type=Issues) of this repository. If there is an issue, feel free to write something along the lines of "This affects me as well, with version 2015.01.01. Here is some more information on the issue: ...". While some issues may be old, a new post into them often spurs rapid activity.
-
-### Why are existing options not enough?
-
-Before requesting a new feature, please have a quick peek at [the list of supported options](https://github.com/ytdl-org/youtube-dl/blob/master/README.md#options). Many feature requests are for features that actually exist already! Please, absolutely do show off your work in the issue report and detail how the existing similar options do *not* solve your problem.
-
-### Is there enough context in your bug report?
-
-People want to solve problems, and often think they do us a favor by breaking down their larger problems (e.g. wanting to skip already downloaded files) to a specific request (e.g. requesting us to look whether the file exists before downloading the info page). However, what often happens is that they break down the problem into two steps: One simple, and one impossible (or extremely complicated one).
-
-We are then presented with a very complicated request when the original problem could be solved far easier, e.g. by recording the downloaded video IDs in a separate file. To avoid this, you must include the greater context where it is non-obvious. In particular, every feature request that does not consist of adding support for a new site should contain a use case scenario that explains in what situation the missing feature would be useful.
-
-### Does the issue involve one problem, and one problem only?
-
-Some of our users seem to think there is a limit of issues they can or should open. There is no limit of issues they can or should open. While it may seem appealing to be able to dump all your issues into one ticket, that means that someone who solves one of your issues cannot mark the issue as closed. Typically, reporting a bunch of issues leads to the ticket lingering since nobody wants to attack that behemoth, until someone mercifully splits the issue into multiple ones.
-
-In particular, every site support request issue should only pertain to services at one site (generally under a common domain, but always using the same backend technology). Do not request support for vimeo user videos, White house podcasts, and Google Plus pages in the same issue. Also, make sure that you don't post bug reports alongside feature requests. As a rule of thumb, a feature request does not include outputs of youtube-dl that are not immediately related to the feature at hand. Do not post reports of a network error alongside the request for a new video service.
-
-### Is anyone going to need the feature?
-
-Only post features that you (or an incapacitated friend you can personally talk to) require. Do not post features because they seem like a good idea. If they are really useful, they will be requested by someone who requires them.
-
-### Is your question about youtube-dl?
-
-It may sound strange, but some bug reports we receive are completely unrelated to youtube-dl and relate to a different, or even the reporter's own, application. Please make sure that you are actually using youtube-dl. If you are using a UI for youtube-dl, report the bug to the maintainer of the actual application providing the UI. On the other hand, if your UI for youtube-dl fails in some way you believe is related to youtube-dl, by all means, go ahead and report the bug.
-
-# COPYRIGHT
-
-youtube-dl is released into the public domain by the copyright holders.
-
-This README file was originally written by [Daniel Bolton](https://github.com/dbbolton) and is likewise released into the public domain.
+++ /dev/null
-#!/usr/bin/env python
-
-import youtube_dl
-
-if __name__ == '__main__':
- youtube_dl.main()
-__youtube_dl()
+__youtube_dlc()
{
local cur prev opts fileopts diropts keywords
COMPREPLY=()
fi
}
-complete -F __youtube_dl youtube-dl
+complete -F __youtube_dlc youtube-dlc
import sys
sys.path.insert(0, dirn(dirn((os.path.abspath(__file__)))))
-import youtube_dl
+import youtube_dlc
-BASH_COMPLETION_FILE = "youtube-dl.bash-completion"
+BASH_COMPLETION_FILE = "youtube-dlc.bash-completion"
BASH_COMPLETION_TEMPLATE = "devscripts/bash-completion.in"
f.write(filled_template)
-parser = youtube_dl.parseOpts()[0]
+parser = youtube_dlc.parseOpts()[0]
build_completion(parser)
import os.path
sys.path.insert(0, os.path.dirname(os.path.dirname((os.path.abspath(__file__)))))
-from youtube_dl.compat import (
+from youtube_dlc.compat import (
compat_input,
compat_http_server,
compat_str,
authorizedUsers = ['fraca7', 'phihag', 'rg3', 'FiloSottile', 'ytdl-org']
def __init__(self, **kwargs):
- if self.repoName != 'youtube-dl':
+ if self.repoName != 'youtube-dlc':
raise BuildError('Invalid repository "%s"' % self.repoName)
if self.user not in self.authorizedUsers:
raise HTTPError('Unauthorized user "%s"' % self.user, 401)
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import gettestcases
-from youtube_dl.utils import compat_urllib_parse_urlparse
-from youtube_dl.utils import compat_urllib_request
+from youtube_dlc.utils import compat_urllib_parse_urlparse
+from youtube_dlc.utils import compat_urllib_request
if len(sys.argv) > 1:
METHOD = 'LIST'
#!/usr/bin/env python
from __future__ import unicode_literals
-import base64
import io
import json
import mimetypes
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
-from youtube_dl.compat import (
+from youtube_dlc.compat import (
compat_basestring,
- compat_input,
compat_getpass,
compat_print,
compat_urllib_request,
)
-from youtube_dl.utils import (
+from youtube_dlc.utils import (
make_HTTPS_handler,
sanitized_Request,
)
try:
info = netrc.netrc().authenticators(self._NETRC_MACHINE)
if info is not None:
- self._username = info[0]
- self._password = info[2]
+ self._token = info[2]
compat_print('Using GitHub credentials found in .netrc...')
return
else:
compat_print('No GitHub credentials found in .netrc')
except (IOError, netrc.NetrcParseError):
compat_print('Unable to parse .netrc')
- self._username = compat_input(
- 'Type your GitHub username or email address and press [Return]: ')
- self._password = compat_getpass(
- 'Type your GitHub password and press [Return]: ')
+ self._token = compat_getpass(
+ 'Type your GitHub PAT (personal access token) and press [Return]: ')
def _call(self, req):
if isinstance(req, compat_basestring):
req = sanitized_Request(req)
- # Authorizing manually since GitHub does not response with 401 with
- # WWW-Authenticate header set (see
- # https://developer.github.com/v3/#basic-authentication)
- b64 = base64.b64encode(
- ('%s:%s' % (self._username, self._password)).encode('utf-8')).decode('ascii')
- req.add_header('Authorization', 'Basic %s' % b64)
+ req.add_header('Authorization', 'token %s' % self._token)
response = self._opener.open(req).read().decode('utf-8')
return json.loads(response)
releaser = GitHubReleaser()
new_release = releaser.create_release(
- version, name='youtube-dl %s' % version, body=body)
+ version, name='youtube-dlc %s' % version, body=body)
release_id = new_release['id']
for asset in os.listdir(build_path):
{{commands}}
-complete --command youtube-dl --arguments ":ytfavorites :ytrecommended :ytsubscriptions :ytwatchlater :ythistory"
+complete --command youtube-dlc --arguments ":ytfavorites :ytrecommended :ytsubscriptions :ytwatchlater :ythistory"
import sys
sys.path.insert(0, dirn(dirn((os.path.abspath(__file__)))))
-import youtube_dl
-from youtube_dl.utils import shell_quote
+import youtube_dlc
+from youtube_dlc.utils import shell_quote
-FISH_COMPLETION_FILE = 'youtube-dl.fish'
+FISH_COMPLETION_FILE = 'youtube-dlc.fish'
FISH_COMPLETION_TEMPLATE = 'devscripts/fish-completion.in'
EXTRA_ARGS = {
for group in opt_parser.option_groups:
for option in group.option_list:
long_option = option.get_opt_string().strip('-')
- complete_cmd = ['complete', '--command', 'youtube-dl', '--long-option', long_option]
+ complete_cmd = ['complete', '--command', 'youtube-dlc', '--long-option', long_option]
if option._short_opts:
complete_cmd += ['--short-option', option._short_opts[0].strip('-')]
if option.help != optparse.SUPPRESS_HELP:
f.write(filled_template)
-parser = youtube_dl.parseOpts()[0]
+parser = youtube_dlc.parseOpts()[0]
build_completion(parser)
import sys
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
-from youtube_dl.utils import intlist_to_bytes
-from youtube_dl.aes import aes_encrypt, key_expansion
+from youtube_dlc.utils import intlist_to_bytes
+from youtube_dlc.aes import aes_encrypt, key_expansion
secret_msg = b'Secret message goes here'
new_version = {}
filenames = {
- 'bin': 'youtube-dl',
- 'exe': 'youtube-dl.exe',
- 'tar': 'youtube-dl-%s.tar.gz' % version}
+ 'bin': 'youtube-dlc',
+ 'exe': 'youtube-dlc.exe',
+ 'tar': 'youtube-dlc-%s.tar.gz' % version}
build_dir = os.path.join('..', '..', 'build', version)
for key, filename in filenames.items():
url = 'https://yt-dl.org/downloads/%s/%s' % (version, filename)
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<link rel="self" href="http://ytdl-org.github.io/youtube-dl/update/releases.atom" />
- <title>youtube-dl releases</title>
- <id>https://yt-dl.org/feed/youtube-dl-updates-feed</id>
+ <title>youtube-dlc releases</title>
+ <id>https://yt-dl.org/feed/youtube-dlc-updates-feed</id>
<updated>@TIMESTAMP@</updated>
@ENTRIES@
</feed>""")
entry_template = textwrap.dedent("""
<entry>
- <id>https://yt-dl.org/feed/youtube-dl-updates-feed/youtube-dl-@VERSION@</id>
+ <id>https://yt-dl.org/feed/youtube-dlc-updates-feed/youtube-dlc-@VERSION@</id>
<title>New version @VERSION@</title>
- <link href="http://ytdl-org.github.io/youtube-dl" />
+ <link href="http://ytdl-org.github.io/youtube-dlc" />
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
Downloads available at <a href="https://yt-dl.org/downloads/@VERSION@/">https://yt-dl.org/downloads/@VERSION@/</a>
</div>
</content>
<author>
- <name>The youtube-dl maintainers</name>
+ <name>The youtube-dlc maintainers</name>
</author>
<updated>@TIMESTAMP@</updated>
</entry>
import os
import textwrap
-# We must be able to import youtube_dl
+# We must be able to import youtube_dlc
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))))
-import youtube_dl
+import youtube_dlc
def main():
template = tmplf.read()
ie_htmls = []
- for ie in youtube_dl.list_extractors(age_limit=None):
+ for ie in youtube_dlc.list_extractors(age_limit=None):
ie_html = '<b>{}</b>'.format(ie.IE_NAME)
ie_desc = getattr(ie, 'IE_DESC', None)
if ie_desc is False:
#!/usr/bin/env python
from __future__ import unicode_literals
-import io
+# import io
import optparse
-import re
+# import re
def main():
if len(args) != 2:
parser.error('Expected an input and an output filename')
- infile, outfile = args
+
+""" infile, outfile = args
with io.open(infile, encoding='utf-8') as inf:
readme = inf.read()
- bug_text = re.search(
- r'(?s)#\s*BUGS\s*[^\n]*\s*(.*?)#\s*COPYRIGHT', readme).group(1)
- dev_text = re.search(
- r'(?s)(#\s*DEVELOPER INSTRUCTIONS.*?)#\s*EMBEDDING YOUTUBE-DL',
- readme).group(1)
+ bug_text = re.search( """
+# r'(?s)#\s*BUGS\s*[^\n]*\s*(.*?)#\s*COPYRIGHT', readme).group(1)
+# dev_text = re.search(
+# r'(?s)(#\s*DEVELOPER INSTRUCTIONS.*?)#\s*EMBEDDING youtube-dlc',
+""" readme).group(1)
out = bug_text + dev_text
with io.open(outfile, 'w', encoding='utf-8') as outf:
- outf.write(out)
-
+ outf.write(out) """
if __name__ == '__main__':
main()
with io.open(infile, encoding='utf-8') as inf:
issue_template_tmpl = inf.read()
- # Get the version from youtube_dl/version.py without importing the package
- exec(compile(open('youtube_dl/version.py').read(),
- 'youtube_dl/version.py', 'exec'))
+ # Get the version from youtube_dlc/version.py without importing the package
+ exec(compile(open('youtube_dlc/version.py').read(),
+ 'youtube_dlc/version.py', 'exec'))
out = issue_template_tmpl % {'version': locals()['__version__']}
if os.path.exists(lazy_extractors_filename):
os.remove(lazy_extractors_filename)
-from youtube_dl.extractor import _ALL_CLASSES
-from youtube_dl.extractor.common import InfoExtractor, SearchInfoExtractor
+from youtube_dlc.extractor import _ALL_CLASSES
+from youtube_dlc.extractor.common import InfoExtractor, SearchInfoExtractor
with open('devscripts/lazy_load_template.py', 'rt') as f:
module_template = f.read()
oldreadme = f.read()
header = oldreadme[:oldreadme.index('# OPTIONS')]
-footer = oldreadme[oldreadme.index('# CONFIGURATION'):]
+# footer = oldreadme[oldreadme.index('# CONFIGURATION'):]
options = helptext[helptext.index(' General Options:') + 19:]
options = re.sub(r'(?m)^ (\w.+)$', r'## \1', options)
with io.open(README_FILE, 'w', encoding='utf-8') as f:
f.write(header)
f.write(options)
- f.write(footer)
+ # f.write(footer)
import sys
-# Import youtube_dl
+# Import youtube_dlc
ROOT_DIR = os.path.join(os.path.dirname(__file__), '..')
sys.path.insert(0, ROOT_DIR)
-import youtube_dl
+import youtube_dlc
def main():
ie_md += ' (Currently broken)'
yield ie_md
- ies = sorted(youtube_dl.gen_extractors(), key=lambda i: i.IE_NAME.lower())
+ ies = sorted(youtube_dlc.gen_extractors(), key=lambda i: i.IE_NAME.lower())
out = '# Supported sites\n' + ''.join(
' - ' + md + '\n'
for md in gen_ies_md(ies))
ROOT_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
README_FILE = os.path.join(ROOT_DIR, 'README.md')
-PREFIX = r'''%YOUTUBE-DL(1)
+PREFIX = r'''%youtube-dlc(1)
# NAME
# SYNOPSIS
-**youtube-dl** \[OPTIONS\] URL [URL...]
+**youtube-dlc** \[OPTIONS\] URL [URL...]
'''
readme = f.read()
readme = re.sub(r'(?s)^.*?(?=# DESCRIPTION)', '', readme)
- readme = re.sub(r'\s+youtube-dl \[OPTIONS\] URL \[URL\.\.\.\]', '', readme)
+ readme = re.sub(r'\s+youtube-dlc \[OPTIONS\] URL \[URL\.\.\.\]', '', readme)
readme = PREFIX + readme
readme = filter_options(readme)
if [ ! -z "`git tag | grep "$version"`" ]; then echo 'ERROR: version already present'; exit 1; fi
if [ ! -z "`git status --porcelain | grep -v CHANGELOG`" ]; then echo 'ERROR: the working directory is not clean; commit or stash changes'; exit 1; fi
-useless_files=$(find youtube_dl -type f -not -name '*.py')
-if [ ! -z "$useless_files" ]; then echo "ERROR: Non-.py files in youtube_dl: $useless_files"; exit 1; fi
+useless_files=$(find youtube_dlc -type f -not -name '*.py')
+if [ ! -z "$useless_files" ]; then echo "ERROR: Non-.py files in youtube_dlc: $useless_files"; exit 1; fi
if [ ! -f "updates_key.pem" ]; then echo 'ERROR: updates_key.pem missing'; exit 1; fi
if ! type pandoc >/dev/null 2>/dev/null; then echo 'ERROR: pandoc is missing'; exit 1; fi
if ! python3 -c 'import rsa' 2>/dev/null; then echo 'ERROR: python3-rsa is missing'; exit 1; fi
if $skip_tests ; then
echo 'SKIPPING TESTS'
else
- nosetests --verbose --with-coverage --cover-package=youtube_dl --cover-html test --stop || exit 1
+ nosetests --verbose --with-coverage --cover-package=youtube_dlc --cover-html test --stop || exit 1
fi
/bin/echo -e "\n### Changing version in version.py..."
-sed -i "s/__version__ = '.*'/__version__ = '$version'/" youtube_dl/version.py
+sed -i "s/__version__ = '.*'/__version__ = '$version'/" youtube_dlc/version.py
/bin/echo -e "\n### Changing version in ChangeLog..."
sed -i "s/<unreleased>/$version/" ChangeLog
-/bin/echo -e "\n### Committing documentation, templates and youtube_dl/version.py..."
+/bin/echo -e "\n### Committing documentation, templates and youtube_dlc/version.py..."
make README.md CONTRIBUTING.md issuetemplates supportedsites
-git add README.md CONTRIBUTING.md .github/ISSUE_TEMPLATE/1_broken_site.md .github/ISSUE_TEMPLATE/2_site_support_request.md .github/ISSUE_TEMPLATE/3_site_feature_request.md .github/ISSUE_TEMPLATE/4_bug_report.md .github/ISSUE_TEMPLATE/5_feature_request.md .github/ISSUE_TEMPLATE/6_question.md docs/supportedsites.md youtube_dl/version.py ChangeLog
+git add README.md CONTRIBUTING.md .github/ISSUE_TEMPLATE/1_broken_site.md .github/ISSUE_TEMPLATE/2_site_support_request.md .github/ISSUE_TEMPLATE/3_site_feature_request.md .github/ISSUE_TEMPLATE/4_bug_report.md .github/ISSUE_TEMPLATE/5_feature_request.md .github/ISSUE_TEMPLATE/6_question.md docs/supportedsites.md youtube_dlc/version.py ChangeLog
git commit $gpg_sign_commits -m "release $version"
/bin/echo -e "\n### Now tagging, signing and pushing..."
/bin/echo -e "\n### OK, now it is time to build the binaries..."
REV=$(git rev-parse HEAD)
-make youtube-dl youtube-dl.tar.gz
+make youtube-dlc youtube-dlc.tar.gz
read -p "VM running? (y/n) " -n 1
-wget "http://$buildserver/build/ytdl-org/youtube-dl/youtube-dl.exe?rev=$REV" -O youtube-dl.exe
+wget "http://$buildserver/build/ytdl-org/youtube-dl/youtube-dlc.exe?rev=$REV" -O youtube-dlc.exe
mkdir -p "build/$version"
-mv youtube-dl youtube-dl.exe "build/$version"
-mv youtube-dl.tar.gz "build/$version/youtube-dl-$version.tar.gz"
-RELEASE_FILES="youtube-dl youtube-dl.exe youtube-dl-$version.tar.gz"
+mv youtube-dlc youtube-dlc.exe "build/$version"
+mv youtube-dlc.tar.gz "build/$version/youtube-dlc-$version.tar.gz"
+RELEASE_FILES="youtube-dlc youtube-dlc.exe youtube-dlc-$version.tar.gz"
(cd build/$version/ && md5sum $RELEASE_FILES > MD5SUMS)
(cd build/$version/ && sha1sum $RELEASE_FILES > SHA1SUMS)
(cd build/$version/ && sha256sum $RELEASE_FILES > SHA2-256SUMS)
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
-from youtube_dl.compat import (
+from youtube_dlc.compat import (
compat_print,
compat_urllib_request,
)
-from youtube_dl.utils import format_bytes
+from youtube_dlc.utils import format_bytes
def format_size(bytes):
asset_name = asset['name']
total_bytes += asset['download_count'] * asset['size']
if all(not re.match(p, asset_name) for p in (
- r'^youtube-dl$',
- r'^youtube-dl-\d{4}\.\d{2}\.\d{2}(?:\.\d+)?\.tar\.gz$',
- r'^youtube-dl\.exe$')):
+ r'^youtube-dlc$',
+ r'^youtube-dlc-\d{4}\.\d{2}\.\d{2}(?:\.\d+)?\.tar\.gz$',
+ r'^youtube-dlc\.exe$')):
continue
compat_print(
' %s size: %s downloads: %d'
-#compdef youtube-dl
+#compdef youtube-dlc
-__youtube_dl() {
+__youtube_dlc() {
local curcontext="$curcontext" fileopts diropts cur prev
typeset -A opt_args
fileopts="{{fileopts}}"
esac
}
-__youtube_dl
\ No newline at end of file
+__youtube_dlc
\ No newline at end of file
import sys
sys.path.insert(0, dirn(dirn((os.path.abspath(__file__)))))
-import youtube_dl
+import youtube_dlc
-ZSH_COMPLETION_FILE = "youtube-dl.zsh"
+ZSH_COMPLETION_FILE = "youtube-dlc.zsh"
ZSH_COMPLETION_TEMPLATE = "devscripts/zsh-completion.in"
f.write(template)
-parser = youtube_dl.parseOpts()[0]
+parser = youtube_dlc.parseOpts()[0]
build_completion(parser)
@echo
@echo "Build finished; now you can run "qcollectiongenerator" with the" \
".qhcp project file in $(BUILDDIR)/qthelp, like this:"
- @echo "# qcollectiongenerator $(BUILDDIR)/qthelp/youtube-dl.qhcp"
+ @echo "# qcollectiongenerator $(BUILDDIR)/qthelp/youtube-dlc.qhcp"
@echo "To view the help file:"
- @echo "# assistant -collectionFile $(BUILDDIR)/qthelp/youtube-dl.qhc"
+ @echo "# assistant -collectionFile $(BUILDDIR)/qthelp/youtube-dlc.qhc"
devhelp:
$(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp
@echo
@echo "Build finished."
@echo "To view the help file:"
- @echo "# mkdir -p $$HOME/.local/share/devhelp/youtube-dl"
- @echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/youtube-dl"
+ @echo "# mkdir -p $$HOME/.local/share/devhelp/youtube-dlc"
+ @echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/youtube-dlc"
@echo "# devhelp"
epub:
# coding: utf-8
#
-# youtube-dl documentation build configuration file, created by
+# youtube-dlc documentation build configuration file, created by
# sphinx-quickstart on Fri Mar 14 21:05:43 2014.
#
# This file is execfile()d with the current directory set to its
import sys
import os
-# Allows to import youtube_dl
+# Allows to import youtube_dlc
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
# -- General configuration ------------------------------------------------
master_doc = 'index'
# General information about the project.
-project = u'youtube-dl'
+project = u'youtube-dlc'
copyright = u'2014, Ricardo Garcia Gonzalez'
# The version info for the project you're documenting, acts as replacement for
# built documents.
#
# The short X.Y version.
-from youtube_dl.version import __version__
+from youtube_dlc.version import __version__
version = __version__
# The full version, including alpha/beta/rc tags.
release = version
html_static_path = ['_static']
# Output file base name for HTML help builder.
-htmlhelp_basename = 'youtube-dldoc'
+htmlhelp_basename = 'youtube-dlcdoc'
-Welcome to youtube-dl's documentation!
+Welcome to youtube-dlc's documentation!
======================================
-*youtube-dl* is a command-line program to download videos from YouTube.com and more sites.
+*youtube-dlc* is a command-line program to download videos from YouTube.com and more sites.
It can also be used in Python code.
Developer guide
---------------
-This section contains information for using *youtube-dl* from Python programs.
+This section contains information for using *youtube-dlc* from Python programs.
.. toctree::
:maxdepth: 2
-Using the ``youtube_dl`` module
+Using the ``youtube_dlc`` module
===============================
-When using the ``youtube_dl`` module, you start by creating an instance of :class:`YoutubeDL` and adding all the available extractors:
+When using the ``youtube_dlc`` module, you start by creating an instance of :class:`YoutubeDL` and adding all the available extractors:
.. code-block:: python
- >>> from youtube_dl import YoutubeDL
+ >>> from youtube_dlc import YoutubeDL
>>> ydl = YoutubeDL()
>>> ydl.add_default_info_extractors()
[youtube] BaW_jenozKc: Downloading video info webpage
[youtube] BaW_jenozKc: Extracting video information
>>> info['title']
- 'youtube-dl test video "\'/\\ä↭𝕐'
+ 'youtube-dlc test video "\'/\\ä↭𝕐'
>>> info['height'], info['width']
(720, 1280)
- **23video**
- **24video**
- **3qsdn**: 3Q SDN
- - **3sat**
- **4tube**
- **56.com**
- **5min**
- **acast:channel**
- **ADN**: Anime Digital Network
- **AdobeConnect**
- - **AdobeTV**
- - **AdobeTVChannel**
- - **AdobeTVShow**
- - **AdobeTVVideo**
+ - **adobetv**
+ - **adobetv:channel**
+ - **adobetv:embed**
+ - **adobetv:show**
+ - **adobetv:video**
- **AdultSwim**
- **aenetworks**: A+E Networks: A&E, Lifetime, History.com, FYI Network and History Vault
- **afreecatv**: afreecatv.com
- **BiliBili**
- **BilibiliAudio**
- **BilibiliAudioAlbum**
+ - **BiliBiliPlayer**
- **BioBioChileTV**
- **BIQLE**
- **BitChute**
- **Disney**
- **dlive:stream**
- **dlive:vod**
+ - **DoodStream**
- **Dotsub**
- **DouyuShow**
- **DouyuTV**: 斗鱼
- **hotstar:playlist**
- **Howcast**
- **HowStuffWorks**
+ - **hrfernsehen**
- **HRTi**
- **HRTiPlaylist**
- **Huajiao**: 花椒直播
- **JeuxVideo**
- **Joj**
- **Jove**
- - **jpopsuki.tv**
- **JWPlatform**
- **Kakao**
- **Kaltura**
- **Kankan**
- **Karaoketv**
- **KarriereVideos**
+ - **Katsomo**
- **KeezMovies**
- **Ketnet**
- **KhanAcademy**
- **KinjaEmbed**
- **KinoPoisk**
- **KonserthusetPlay**
- - **kontrtube**: KontrTube.ru - Труба зовёт
- **KrasView**: Красвью
- **Ku6**
- **KUSI**
- **MNetTV**
- **MoeVideo**: LetitBit video services: moevideo.net, playreplay.net and videochart.net
- **Mofosex**
+ - **MofosexEmbed**
- **Mojvideo**
- **Morningstar**: morningstar.com
- **Motherless**
- **mtvjapan**
- **mtvservices:embedded**
- **MuenchenTV**: münchen.tv
- - **MusicPlayOn**
- **mva**: Microsoft Virtual Academy videos
- **mva:course**: Microsoft Virtual Academy courses
- **Mwave**
- **Ooyala**
- **OoyalaExternal**
- **OraTV**
+ - **orf:burgenland**: Radio Burgenland
- **orf:fm4**: radio FM4
- **orf:fm4:story**: fm4.orf.at stories
- **orf:iptv**: iptv.ORF.at
+ - **orf:kaernten**: Radio Kärnten
+ - **orf:noe**: Radio Niederösterreich
+ - **orf:oberoesterreich**: Radio Oberösterreich
- **orf:oe1**: Radio Österreich 1
+ - **orf:oe3**: Radio Österreich 3
+ - **orf:salzburg**: Radio Salzburg
+ - **orf:steiermark**: Radio Steiermark
+ - **orf:tirol**: Radio Tirol
- **orf:tvthek**: ORF TVthek
+ - **orf:vorarlberg**: Radio Vorarlberg
+ - **orf:wien**: Radio Wien
- **OsnatelTV**
- **OutsideTV**
- **PacktPub**
- **PacktPubCourse**
- - **PandaTV**: 熊猫TV
- **pandora.tv**: 판도라TV
- **ParamountNetwork**
- **parliamentlive.tv**: UK parliament videos
- **plus.google**: Google Plus
- **podomatic**
- **Pokemon**
+ - **PokemonWatch**
- **PolskieRadio**
- **PolskieRadioCategory**
+ - **Popcorntimes**
- **PopcornTV**
- **PornCom**
- **PornerBros**
- **screen.yahoo:search**: Yahoo screen search
- **Screencast**
- **ScreencastOMatic**
+ - **ScrippsNetworks**
- **scrippsnetworks:watch**
- **SCTE**
- **SCTECourse**
- **stanfordoc**: Stanford Open ClassRoom
- **Steam**
- **Stitcher**
+ - **StoryFire**
+ - **StoryFireSeries**
+ - **StoryFireUser**
- **Streamable**
- **streamcloud.eu**
- **StreamCZ**
- **tv2.hu**
- **TV2Article**
- **TV2DK**
+ - **TV2DKBornholmPlay**
- **TV4**: tv4.se and tv4play.se
- **TV5MondePlus**: TV5MONDE+
- **TVA**
- **TVPlayHome**
- **Tweakers**
- **TwitCasting**
- - **twitch:chapter**
- **twitch:clips**
- - **twitch:profile**
- **twitch:stream**
- - **twitch:video**
- - **twitch:videos:all**
- - **twitch:videos:highlights**
- - **twitch:videos:past-broadcasts**
- - **twitch:videos:uploads**
- **twitch:vod**
+ - **TwitchCollection**
+ - **TwitchVideos**
+ - **TwitchVideosClips**
+ - **TwitchVideosCollections**
- **twitter**
- **twitter:amplify**
- **twitter:broadcast**
- **udemy**
- **udemy:course**
- **UDNEmbed**: 聯合影音
+ - **UFCArabia**
- **UFCTV**
- **UKTVPlay**
- **umg:de**: Universal Music Deutschland
- **videomore**
- **videomore:season**
- **videomore:video**
- - **VideoPremium**
- **VideoPress**
- **Vidio**
- **VidLii**
- **Vidzi**
- **vier**: vier.be and vijf.be
- **vier:videos**
- - **ViewLift**
- - **ViewLiftEmbed**
+ - **viewlift**
+ - **viewlift:embed**
- **Viidea**
- **viki**
- **viki:channel**
- **Zaq1**
- **Zattoo**
- **ZattooLive**
- - **ZDF**
+ - **ZDF-3sat**
- **ZDFChannel**
- **zingmp3**: mp3.zing.vn
- **Zype**
--- /dev/null
+pyinstaller.exe youtube_dlc\__main__.py --onefile --name youtube-dlc --version-file win\ver.txt --icon win\icon\cloud.ico
\ No newline at end of file
universal = True
[flake8]
-exclude = youtube_dl/extractor/__init__.py,devscripts/buildserver.py,devscripts/lazy_load_template.py,devscripts/make_issue_template.py,setup.py,build,.git,venv
+exclude = youtube_dlc/extractor/__init__.py,devscripts/buildserver.py,devscripts/lazy_load_template.py,devscripts/make_issue_template.py,setup.py,build,.git,venv
ignore = E402,E501,E731,E741,W503
#!/usr/bin/env python
# coding: utf-8
-from __future__ import print_function
-
+from setuptools import setup, Command, find_packages
import os.path
import warnings
import sys
-
-try:
- from setuptools import setup, Command
- setuptools_available = True
-except ImportError:
- from distutils.core import setup, Command
- setuptools_available = False
from distutils.spawn import spawn
-try:
- # This will create an exe that needs Microsoft Visual C++ 2008
- # Redistributable Package
- import py2exe
-except ImportError:
- if len(sys.argv) >= 2 and sys.argv[1] == 'py2exe':
- print('Cannot import py2exe', file=sys.stderr)
- exit(1)
-
-py2exe_options = {
- 'bundle_files': 1,
- 'compressed': 1,
- 'optimize': 2,
- 'dist_dir': '.',
- 'dll_excludes': ['w9xpopen.exe', 'crypt32.dll'],
-}
-
-# Get the version from youtube_dl/version.py without importing the package
-exec(compile(open('youtube_dl/version.py').read(),
- 'youtube_dl/version.py', 'exec'))
-
-DESCRIPTION = 'YouTube video downloader'
-LONG_DESCRIPTION = 'Command-line program to download videos from YouTube.com and other video sites'
+# Get the version from youtube_dlc/version.py without importing the package
+exec(compile(open('youtube_dlc/version.py').read(),
+ 'youtube_dlc/version.py', 'exec'))
-py2exe_console = [{
- 'script': './youtube_dl/__main__.py',
- 'dest_base': 'youtube-dl',
- 'version': __version__,
- 'description': DESCRIPTION,
- 'comments': LONG_DESCRIPTION,
- 'product_name': 'youtube-dl',
- 'product_version': __version__,
-}]
-
-py2exe_params = {
- 'console': py2exe_console,
- 'options': {'py2exe': py2exe_options},
- 'zipfile': None
-}
+DESCRIPTION = 'Media downloader supporting various sites such as youtube'
+LONG_DESCRIPTION = 'Command-line program to download videos from YouTube.com and other video sites. Based on a more active community fork.'
if len(sys.argv) >= 2 and sys.argv[1] == 'py2exe':
- params = py2exe_params
+ print("inv")
else:
files_spec = [
- ('etc/bash_completion.d', ['youtube-dl.bash-completion']),
- ('etc/fish/completions', ['youtube-dl.fish']),
- ('share/doc/youtube_dl', ['README.txt']),
- ('share/man/man1', ['youtube-dl.1'])
+ ('etc/bash_completion.d', ['youtube-dlc.bash-completion']),
+ ('etc/fish/completions', ['youtube-dlc.fish']),
+ ('share/doc/youtube_dlc', ['README.txt']),
+ ('share/man/man1', ['youtube-dlc.1'])
]
root = os.path.dirname(os.path.abspath(__file__))
data_files = []
params = {
'data_files': data_files,
}
- if setuptools_available:
- params['entry_points'] = {'console_scripts': ['youtube-dl = youtube_dl:main']}
- else:
- params['scripts'] = ['bin/youtube-dl']
+ #if setuptools_available:
+ params['entry_points'] = {'console_scripts': ['youtube-dlc = youtube_dlc:main']}
+ #else:
+ # params['scripts'] = ['bin/youtube-dlc']
class build_lazy_extractors(Command):
description = 'Build the extractor lazy loading module'
def run(self):
spawn(
- [sys.executable, 'devscripts/make_lazy_extractors.py', 'youtube_dl/extractor/lazy_extractors.py'],
+ [sys.executable, 'devscripts/make_lazy_extractors.py', 'youtube_dlc/extractor/lazy_extractors.py'],
dry_run=self.dry_run,
)
setup(
- name='youtube_dl',
+ name="youtube_dlc",
version=__version__,
+ maintainer="Tom-Oliver Heidel",
+ maintainer_email="theidel@uni-bremen.de",
description=DESCRIPTION,
long_description=LONG_DESCRIPTION,
- url='https://github.com/ytdl-org/youtube-dl',
- author='Ricardo Garcia',
- author_email='ytdl@yt-dl.org',
- maintainer='Sergey M.',
- maintainer_email='dstftw@gmail.com',
- license='Unlicense',
- packages=[
- 'youtube_dl',
- 'youtube_dl.extractor', 'youtube_dl.downloader',
- 'youtube_dl.postprocessor'],
-
- # Provokes warning on most systems (why?!)
- # test_suite = 'nose.collector',
- # test_requires = ['nosetest'],
-
+ # long_description_content_type="text/markdown",
+ url="https://github.com/blackjack4494/youtube-dlc",
+ packages=find_packages(exclude=("youtube_dl",)),
+ #packages=[
+ # 'youtube_dlc',
+ # 'youtube_dlc.extractor', 'youtube_dlc.downloader',
+ # 'youtube_dlc.postprocessor'],
classifiers=[
- 'Topic :: Multimedia :: Video',
- 'Development Status :: 5 - Production/Stable',
- 'Environment :: Console',
- 'License :: Public Domain',
- 'Programming Language :: Python',
- 'Programming Language :: Python :: 2',
- 'Programming Language :: Python :: 2.6',
- 'Programming Language :: Python :: 2.7',
- 'Programming Language :: Python :: 3',
- 'Programming Language :: Python :: 3.2',
- 'Programming Language :: Python :: 3.3',
- 'Programming Language :: Python :: 3.4',
- 'Programming Language :: Python :: 3.5',
- 'Programming Language :: Python :: 3.6',
- 'Programming Language :: Python :: 3.7',
- 'Programming Language :: Python :: 3.8',
- 'Programming Language :: Python :: Implementation',
- 'Programming Language :: Python :: Implementation :: CPython',
- 'Programming Language :: Python :: Implementation :: IronPython',
- 'Programming Language :: Python :: Implementation :: Jython',
- 'Programming Language :: Python :: Implementation :: PyPy',
+ "Topic :: Multimedia :: Video",
+ "Development Status :: 5 - Production/Stable",
+ "Environment :: Console",
+ "Programming Language :: Python",
+ "Programming Language :: Python :: 2",
+ "Programming Language :: Python :: 2.6",
+ "Programming Language :: Python :: 2.7",
+ "Programming Language :: Python :: 3",
+ "Programming Language :: Python :: 3.2",
+ "Programming Language :: Python :: 3.3",
+ "Programming Language :: Python :: 3.4",
+ "Programming Language :: Python :: 3.5",
+ "Programming Language :: Python :: 3.6",
+ "Programming Language :: Python :: 3.7",
+ "Programming Language :: Python :: 3.8",
+ "Programming Language :: Python :: Implementation",
+ "Programming Language :: Python :: Implementation :: CPython",
+ "Programming Language :: Python :: Implementation :: IronPython",
+ "Programming Language :: Python :: Implementation :: Jython",
+ "Programming Language :: Python :: Implementation :: PyPy",
+ "License :: Public Domain",
+ "Operating System :: OS Independent",
],
-
- cmdclass={'build_lazy_extractors': build_lazy_extractors},
+ python_requires='>=2.6',
+
+ cmdclass={'build_lazy_extractors': build_lazy_extractors},
**params
-)
+)
\ No newline at end of file
import ssl
import sys
-import youtube_dl.extractor
-from youtube_dl import YoutubeDL
-from youtube_dl.compat import (
+import youtube_dlc.extractor
+from youtube_dlc import YoutubeDL
+from youtube_dlc.compat import (
compat_os_name,
compat_str,
)
-from youtube_dl.utils import (
+from youtube_dlc.utils import (
preferredencoding,
write_string,
)
def gettestcases(include_onlymatching=False):
- for ie in youtube_dl.extractor.gen_extractors():
+ for ie in youtube_dlc.extractor.gen_extractors():
for tc in ie.get_testcases(include_onlymatching):
yield tc
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import FakeYDL, expect_dict, expect_value, http_server_port
-from youtube_dl.compat import compat_etree_fromstring, compat_http_server
-from youtube_dl.extractor.common import InfoExtractor
-from youtube_dl.extractor import YoutubeIE, get_info_extractor
-from youtube_dl.utils import encode_data_uri, strip_jsonp, ExtractorError, RegexNotFoundError
+from youtube_dlc.compat import compat_etree_fromstring, compat_http_server
+from youtube_dlc.extractor.common import InfoExtractor
+from youtube_dlc.extractor import YoutubeIE, get_info_extractor
+from youtube_dlc.utils import encode_data_uri, strip_jsonp, ExtractorError, RegexNotFoundError
import threading
import copy
from test.helper import FakeYDL, assertRegexpMatches
-from youtube_dl import YoutubeDL
-from youtube_dl.compat import compat_str, compat_urllib_error
-from youtube_dl.extractor import YoutubeIE
-from youtube_dl.extractor.common import InfoExtractor
-from youtube_dl.postprocessor.common import PostProcessor
-from youtube_dl.utils import ExtractorError, match_filter_func
+from youtube_dlc import YoutubeDL
+from youtube_dlc.compat import compat_str, compat_urllib_error
+from youtube_dlc.extractor import YoutubeIE
+from youtube_dlc.extractor.common import InfoExtractor
+from youtube_dlc.postprocessor.common import PostProcessor
+from youtube_dlc.utils import ExtractorError, match_filter_func
TEST_URL = 'http://localhost/sample.mp4'
'webpage_url': 'http://example.com',
}
- def get_ids(params):
+ def get_downloaded_info_dicts(params):
ydl = YDL(params)
- # make a copy because the dictionary can be modified
- ydl.process_ie_result(playlist.copy())
- return [int(v['id']) for v in ydl.downloaded_info_dicts]
+ # make a deep copy because the dictionary and nested entries
+ # can be modified
+ ydl.process_ie_result(copy.deepcopy(playlist))
+ return ydl.downloaded_info_dicts
+
+ def get_ids(params):
+ return [int(v['id']) for v in get_downloaded_info_dicts(params)]
result = get_ids({})
self.assertEqual(result, [1, 2, 3, 4])
result = get_ids({'playlist_items': '2-4,3-4,3'})
self.assertEqual(result, [2, 3, 4])
+ # Tests for https://github.com/ytdl-org/youtube-dl/issues/10591
+ # @{
+ result = get_downloaded_info_dicts({'playlist_items': '2-4,3-4,3'})
+ self.assertEqual(result[0]['playlist_index'], 2)
+ self.assertEqual(result[1]['playlist_index'], 3)
+
+ result = get_downloaded_info_dicts({'playlist_items': '2-4,3-4,3'})
+ self.assertEqual(result[0]['playlist_index'], 2)
+ self.assertEqual(result[1]['playlist_index'], 3)
+ self.assertEqual(result[2]['playlist_index'], 4)
+
+ result = get_downloaded_info_dicts({'playlist_items': '4,2'})
+ self.assertEqual(result[0]['playlist_index'], 4)
+ self.assertEqual(result[1]['playlist_index'], 2)
+ # @}
+
def test_urlopen_no_file_protocol(self):
# see https://github.com/ytdl-org/youtube-dl/issues/8227
ydl = YDL()
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
-from youtube_dl.utils import YoutubeDLCookieJar
+from youtube_dlc.utils import YoutubeDLCookieJar
class TestYoutubeDLCookieJar(unittest.TestCase):
assert_cookie_has_value('HTTPONLY_COOKIE')
assert_cookie_has_value('JS_ACCESSIBLE_COOKIE')
+ def test_malformed_cookies(self):
+ cookiejar = YoutubeDLCookieJar('./test/testdata/cookies/malformed_cookies.txt')
+ cookiejar.load(ignore_discard=True, ignore_expires=True)
+ # Cookies should be empty since all malformed cookie file entries
+ # will be ignored
+ self.assertFalse(cookiejar._cookies)
+
if __name__ == '__main__':
unittest.main()
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
-from youtube_dl.aes import aes_decrypt, aes_encrypt, aes_cbc_decrypt, aes_cbc_encrypt, aes_decrypt_text
-from youtube_dl.utils import bytes_to_intlist, intlist_to_bytes
+from youtube_dlc.aes import aes_decrypt, aes_encrypt, aes_cbc_decrypt, aes_cbc_encrypt, aes_decrypt_text
+from youtube_dlc.utils import bytes_to_intlist, intlist_to_bytes
import base64
# the encrypted data can be generate with 'devscripts/generate_aes_testdata.py'
from test.helper import try_rm
-from youtube_dl import YoutubeDL
+from youtube_dlc import YoutubeDL
def _download_restricted(url, filename, age):
from test.helper import gettestcases
-from youtube_dl.extractor import (
+from youtube_dlc.extractor import (
FacebookIE,
gen_extractors,
YoutubeIE,
def test_youtube_search_matching(self):
self.assertMatch('http://www.youtube.com/results?search_query=making+mustard', ['youtube:search_url'])
- self.assertMatch('https://www.youtube.com/results?baz=bar&search_query=youtube-dl+test+video&filters=video&lclk=video', ['youtube:search_url'])
+ self.assertMatch('https://www.youtube.com/results?baz=bar&search_query=youtube-dlc+test+video&filters=video&lclk=video', ['youtube:search_url'])
def test_youtube_extract(self):
assertExtractId = lambda url, id: self.assertEqual(YoutubeIE.extract_id(url), id)
from test.helper import FakeYDL
-from youtube_dl.cache import Cache
+from youtube_dlc.cache import Cache
def _is_empty(d):
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
-from youtube_dl.compat import (
+from youtube_dlc.compat import (
compat_getenv,
compat_setenv,
compat_etree_Element,
class TestCompat(unittest.TestCase):
def test_compat_getenv(self):
test_str = 'тест'
- compat_setenv('YOUTUBE_DL_COMPAT_GETENV', test_str)
- self.assertEqual(compat_getenv('YOUTUBE_DL_COMPAT_GETENV'), test_str)
+ compat_setenv('youtube_dlc_COMPAT_GETENV', test_str)
+ self.assertEqual(compat_getenv('youtube_dlc_COMPAT_GETENV'), test_str)
def test_compat_setenv(self):
- test_var = 'YOUTUBE_DL_COMPAT_SETENV'
+ test_var = 'youtube_dlc_COMPAT_SETENV'
test_str = 'тест'
compat_setenv(test_var, test_str)
compat_getenv(test_var)
compat_setenv('HOME', old_home or '')
def test_all_present(self):
- import youtube_dl.compat
- all_names = youtube_dl.compat.__all__
+ import youtube_dlc.compat
+ all_names = youtube_dlc.compat.__all__
present_names = set(filter(
lambda c: '_' in c and not c.startswith('_'),
- dir(youtube_dl.compat))) - set(['unicode_literals'])
+ dir(youtube_dlc.compat))) - set(['unicode_literals'])
self.assertEqual(all_names, sorted(present_names))
def test_compat_urllib_parse_unquote(self):
import json
import socket
-import youtube_dl.YoutubeDL
-from youtube_dl.compat import (
+import youtube_dlc.YoutubeDL
+from youtube_dlc.compat import (
compat_http_client,
compat_urllib_error,
compat_HTTPError,
)
-from youtube_dl.utils import (
+from youtube_dlc.utils import (
DownloadError,
ExtractorError,
format_bytes,
UnavailableVideoError,
)
-from youtube_dl.extractor import get_info_extractor
+from youtube_dlc.extractor import get_info_extractor
RETRIES = 3
-class YoutubeDL(youtube_dl.YoutubeDL):
+class YoutubeDL(youtube_dlc.YoutubeDL):
def __init__(self, *args, **kwargs):
self.to_stderr = self.to_screen
self.processed_info_dicts = []
def generator(test_case, tname):
def test_template(self):
- ie = youtube_dl.extractor.get_info_extractor(test_case['name'])()
+ ie = youtube_dlc.extractor.get_info_extractor(test_case['name'])()
other_ies = [get_info_extractor(ie_key)() for ie_key in test_case.get('add_ie', [])]
is_playlist = any(k.startswith('playlist') for k in test_case)
test_cases = test_case.get(
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import http_server_port, try_rm
-from youtube_dl import YoutubeDL
-from youtube_dl.compat import compat_http_server
-from youtube_dl.downloader.http import HttpFD
-from youtube_dl.utils import encodeFilename
+from youtube_dlc import YoutubeDL
+from youtube_dlc.compat import compat_http_server
+from youtube_dlc.downloader.http import HttpFD
+from youtube_dlc.utils import encodeFilename
import threading
TEST_DIR = os.path.dirname(os.path.abspath(__file__))
import subprocess
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
-from youtube_dl.utils import encodeArgument
+from youtube_dlc.utils import encodeArgument
rootDir = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
class TestExecution(unittest.TestCase):
def test_import(self):
- subprocess.check_call([sys.executable, '-c', 'import youtube_dl'], cwd=rootDir)
+ subprocess.check_call([sys.executable, '-c', 'import youtube_dlc'], cwd=rootDir)
def test_module_exec(self):
if sys.version_info >= (2, 7): # Python 2.6 doesn't support package execution
- subprocess.check_call([sys.executable, '-m', 'youtube_dl', '--version'], cwd=rootDir, stdout=_DEV_NULL)
+ subprocess.check_call([sys.executable, '-m', 'youtube_dlc', '--version'], cwd=rootDir, stdout=_DEV_NULL)
def test_main_exec(self):
- subprocess.check_call([sys.executable, 'youtube_dl/__main__.py', '--version'], cwd=rootDir, stdout=_DEV_NULL)
+ subprocess.check_call([sys.executable, 'youtube_dlc/__main__.py', '--version'], cwd=rootDir, stdout=_DEV_NULL)
def test_cmdline_umlauts(self):
p = subprocess.Popen(
- [sys.executable, 'youtube_dl/__main__.py', encodeArgument('ä'), '--version'],
+ [sys.executable, 'youtube_dlc/__main__.py', encodeArgument('ä'), '--version'],
cwd=rootDir, stdout=_DEV_NULL, stderr=subprocess.PIPE)
_, stderr = p.communicate()
self.assertFalse(stderr)
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import http_server_port
-from youtube_dl import YoutubeDL
-from youtube_dl.compat import compat_http_server, compat_urllib_request
+from youtube_dlc import YoutubeDL
+from youtube_dlc.compat import compat_http_server, compat_urllib_request
import ssl
import threading
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import FakeYDL
-from youtube_dl.extractor import IqiyiIE
+from youtube_dlc.extractor import IqiyiIE
class IqiyiIEWithCredentials(IqiyiIE):
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
-from youtube_dl.jsinterp import JSInterpreter
+from youtube_dlc.jsinterp import JSInterpreter
class TestJSInterpreter(unittest.TestCase):
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
-from youtube_dl.extractor import (
+from youtube_dlc.extractor import (
gen_extractors,
)
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
-from youtube_dl.options import _hide_login_info
+from youtube_dlc.options import _hide_login_info
class TestOptions(unittest.TestCase):
import unittest
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
-from youtube_dl.postprocessor import MetadataFromTitlePP
+from youtube_dlc.postprocessor import MetadataFromTitlePP
class TestMetadataFromTitle(unittest.TestCase):
FakeYDL,
get_params,
)
-from youtube_dl.compat import (
+from youtube_dlc.compat import (
compat_str,
compat_urllib_request,
)
from test.helper import FakeYDL, md5
-from youtube_dl.extractor import (
+from youtube_dlc.extractor import (
YoutubeIE,
DailymotionIE,
TEDIE,
ThePlatformIE,
ThePlatformFeedIE,
RTVEALaCartaIE,
- FunnyOrDieIE,
DemocracynowIE,
)
self.DL.params['allsubtitles'] = True
subtitles = self.getSubtitles()
self.assertEqual(len(subtitles.keys()), 13)
- self.assertEqual(md5(subtitles['en']), '3cb210999d3e021bd6c7f0ea751eab06')
- self.assertEqual(md5(subtitles['it']), '6d752b98c31f1cf8d597050c7a2cb4b5')
+ self.assertEqual(md5(subtitles['en']), '688dd1ce0981683867e7fe6fde2a224b')
+ self.assertEqual(md5(subtitles['it']), '31324d30b8430b309f7f5979a504a769')
for lang in ['fr', 'de']:
self.assertTrue(subtitles.get(lang) is not None, 'Subtitles for \'%s\' not extracted' % lang)
self.DL.params['writesubtitles'] = True
self.DL.params['subtitlesformat'] = 'ttml'
subtitles = self.getSubtitles()
- self.assertEqual(md5(subtitles['en']), 'e306f8c42842f723447d9f63ad65df54')
+ self.assertEqual(md5(subtitles['en']), 'c97ddf1217390906fa9fbd34901f3da2')
def test_youtube_subtitles_vtt_format(self):
self.DL.params['writesubtitles'] = True
self.DL.params['subtitlesformat'] = 'vtt'
subtitles = self.getSubtitles()
- self.assertEqual(md5(subtitles['en']), '3cb210999d3e021bd6c7f0ea751eab06')
+ self.assertEqual(md5(subtitles['en']), 'ae1bd34126571a77aabd4d276b28044d')
def test_youtube_automatic_captions(self):
self.url = '8YoUxe5ncPo'
subtitles = self.getSubtitles()
self.assertTrue(subtitles['it'] is not None)
+ def test_youtube_no_automatic_captions(self):
+ self.url = 'QRS8MkLhQmM'
+ self.DL.params['writeautomaticsub'] = True
+ subtitles = self.getSubtitles()
+ self.assertTrue(not subtitles)
+
def test_youtube_translated_subtitles(self):
# This video has a subtitles track, which can be translated
- self.url = 'Ky9eprVWzlI'
+ self.url = 'i0ZabxXmH4Y'
self.DL.params['writeautomaticsub'] = True
self.DL.params['subtitleslangs'] = ['it']
subtitles = self.getSubtitles()
self.assertEqual(md5(subtitles['es']), '69e70cae2d40574fb7316f31d6eb7fca')
-class TestFunnyOrDieSubtitles(BaseTestSubtitles):
- url = 'http://www.funnyordie.com/videos/224829ff6d/judd-apatow-will-direct-your-vine'
- IE = FunnyOrDieIE
-
- def test_allsubtitles(self):
- self.DL.params['writesubtitles'] = True
- self.DL.params['allsubtitles'] = True
- subtitles = self.getSubtitles()
- self.assertEqual(set(subtitles.keys()), set(['en']))
- self.assertEqual(md5(subtitles['en']), 'c5593c193eacd353596c11c2d4f9ecc4')
-
-
class TestDemocracynowSubtitles(BaseTestSubtitles):
url = 'http://www.democracynow.org/shows/2015/7/3'
IE = DemocracynowIE
import re
import subprocess
-from youtube_dl.swfinterp import SWFInterpreter
+from youtube_dlc.swfinterp import SWFInterpreter
TEST_DIR = os.path.join(
import json
-from youtube_dl.update import rsa_verify
+from youtube_dlc.update import rsa_verify
class TestUpdate(unittest.TestCase):
import json
import xml.etree.ElementTree
-from youtube_dl.utils import (
+from youtube_dlc.utils import (
age_restricted,
args_to_str,
encode_base_n,
cli_bool_option,
parse_codecs,
)
-from youtube_dl.compat import (
+from youtube_dlc.compat import (
compat_chr,
compat_etree_fromstring,
compat_getenv,
def env(var):
return '%{0}%'.format(var) if sys.platform == 'win32' else '${0}'.format(var)
- compat_setenv('YOUTUBE_DL_EXPATH_PATH', 'expanded')
- self.assertEqual(expand_path(env('YOUTUBE_DL_EXPATH_PATH')), 'expanded')
+ compat_setenv('youtube_dlc_EXPATH_PATH', 'expanded')
+ self.assertEqual(expand_path(env('youtube_dlc_EXPATH_PATH')), 'expanded')
self.assertEqual(expand_path(env('HOME')), compat_getenv('HOME'))
self.assertEqual(expand_path('~'), compat_getenv('HOME'))
self.assertEqual(
- expand_path('~/%s' % env('YOUTUBE_DL_EXPATH_PATH')),
+ expand_path('~/%s' % env('youtube_dlc_EXPATH_PATH')),
'%s/expanded' % compat_getenv('HOME'))
def test_prepend_extension(self):
self.assertEqual(mimetype2ext('text/vtt'), 'vtt')
self.assertEqual(mimetype2ext('text/vtt;charset=utf-8'), 'vtt')
self.assertEqual(mimetype2ext('text/html; charset=utf-8'), 'html')
+ self.assertEqual(mimetype2ext('audio/x-wav'), 'wav')
+ self.assertEqual(mimetype2ext('audio/x-wav;codec=pcm'), 'wav')
def test_month_by_name(self):
self.assertEqual(month_by_name(None), None)
self.assertEqual(caesar('ebg', 'acegik', -2), 'abc')
def test_rot47(self):
- self.assertEqual(rot47('youtube-dl'), r'J@FEF36\5=')
- self.assertEqual(rot47('YOUTUBE-DL'), r'*~&%&qt\s{')
+ self.assertEqual(rot47('youtube-dlc'), r'J@FEF36\5=4')
+ self.assertEqual(rot47('YOUTUBE-DLC'), r'*~&%&qt\s{r')
def test_urshift(self):
self.assertEqual(urshift(3, 1), 1)
def test_private_info_arg(self):
outp = subprocess.Popen(
[
- sys.executable, 'youtube_dl/__main__.py', '-v',
+ sys.executable, 'youtube_dlc/__main__.py', '-v',
'--username', 'johnsmith@gmail.com',
'--password', 'secret',
], cwd=rootDir, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
def test_private_info_shortarg(self):
outp = subprocess.Popen(
[
- sys.executable, 'youtube_dl/__main__.py', '-v',
+ sys.executable, 'youtube_dlc/__main__.py', '-v',
'-u', 'johnsmith@gmail.com',
'-p', 'secret',
], cwd=rootDir, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
def test_private_info_eq(self):
outp = subprocess.Popen(
[
- sys.executable, 'youtube_dl/__main__.py', '-v',
+ sys.executable, 'youtube_dlc/__main__.py', '-v',
'--username=johnsmith@gmail.com',
'--password=secret',
], cwd=rootDir, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
def test_private_info_shortarg_eq(self):
outp = subprocess.Popen(
[
- sys.executable, 'youtube_dl/__main__.py', '-v',
+ sys.executable, 'youtube_dlc/__main__.py', '-v',
'-u=johnsmith@gmail.com',
'-p=secret',
], cwd=rootDir, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
import xml.etree.ElementTree
-import youtube_dl.YoutubeDL
-import youtube_dl.extractor
+import youtube_dlc.YoutubeDL
+import youtube_dlc.extractor
-class YoutubeDL(youtube_dl.YoutubeDL):
+class YoutubeDL(youtube_dlc.YoutubeDL):
def __init__(self, *args, **kwargs):
super(YoutubeDL, self).__init__(*args, **kwargs)
self.to_stderr = self.to_screen
def test_info_json(self):
expected = list(EXPECTED_ANNOTATIONS) # Two annotations could have the same text.
- ie = youtube_dl.extractor.YoutubeIE()
+ ie = youtube_dlc.extractor.YoutubeIE()
ydl = YoutubeDL(params)
ydl.add_info_extractor(ie)
ydl.download([TEST_ID])
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import expect_value
-from youtube_dl.extractor import YoutubeIE
+from youtube_dlc.extractor import YoutubeIE
class TestYoutubeChapters(unittest.TestCase):
for description, duration, expected_chapters in self._TEST_CASES:
ie = YoutubeIE()
expect_value(
- self, ie._extract_chapters(description, duration),
+ self, ie._extract_chapters_from_description(description, duration),
expected_chapters, None)
from test.helper import FakeYDL
-from youtube_dl.extractor import (
+from youtube_dlc.extractor import (
YoutubePlaylistIE,
YoutubeIE,
)
import string
from test.helper import FakeYDL
-from youtube_dl.extractor import YoutubeIE
-from youtube_dl.compat import compat_str, compat_urlretrieve
+from youtube_dlc.extractor import YoutubeIE
+from youtube_dlc.compat import compat_str, compat_urlretrieve
_TESTS = [
(
]
+class TestPlayerInfo(unittest.TestCase):
+ def test_youtube_extract_player_info(self):
+ PLAYER_URLS = (
+ ('https://www.youtube.com/s/player/64dddad9/player_ias.vflset/en_US/base.js', '64dddad9'),
+ # obsolete
+ ('https://www.youtube.com/yts/jsbin/player_ias-vfle4-e03/en_US/base.js', 'vfle4-e03'),
+ ('https://www.youtube.com/yts/jsbin/player_ias-vfl49f_g4/en_US/base.js', 'vfl49f_g4'),
+ ('https://www.youtube.com/yts/jsbin/player_ias-vflCPQUIL/en_US/base.js', 'vflCPQUIL'),
+ ('https://www.youtube.com/yts/jsbin/player-vflzQZbt7/en_US/base.js', 'vflzQZbt7'),
+ ('https://www.youtube.com/yts/jsbin/player-en_US-vflaxXRn1/base.js', 'vflaxXRn1'),
+ ('https://s.ytimg.com/yts/jsbin/html5player-en_US-vflXGBaUN.js', 'vflXGBaUN'),
+ ('https://s.ytimg.com/yts/jsbin/html5player-en_US-vflKjOTVq/html5player.js', 'vflKjOTVq'),
+ ('http://s.ytimg.com/yt/swfbin/watch_as3-vflrEm9Nq.swf', 'vflrEm9Nq'),
+ ('https://s.ytimg.com/yts/swfbin/player-vflenCdZL/watch_as3.swf', 'vflenCdZL'),
+ )
+ for player_url, expected_player_id in PLAYER_URLS:
+ expected_player_type = player_url.split('.')[-1]
+ player_type, player_id = YoutubeIE._extract_player_info(player_url)
+ self.assertEqual(player_type, expected_player_type)
+ self.assertEqual(player_id, expected_player_id)
+
+
class TestSignature(unittest.TestCase):
def setUp(self):
TEST_DIR = os.path.dirname(os.path.abspath(__file__))
--- /dev/null
+# Netscape HTTP Cookie File
+# http://curl.haxx.se/rfc/cookie_spec.html
+# This is a generated file! Do not edit.
+
+# Cookie file entry with invalid number of fields - 6 instead of 7
+www.foobar.foobar FALSE / FALSE 0 COOKIE
+
+# Cookie file entry with invalid expires at
+www.foobar.foobar FALSE / FALSE 1.7976931348623157e+308 COOKIE VALUE
--exclude test_subtitles.py --exclude test_write_annotations.py
--exclude test_youtube_lists.py --exclude test_iqiyi_sdk_interpreter.py
--exclude test_socks.py
-commands = nosetests --verbose {posargs:{[testenv]defaultargs}} # --with-coverage --cover-package=youtube_dl --cover-html
+commands = nosetests --verbose {posargs:{[testenv]defaultargs}} # --with-coverage --cover-package=youtube_dlc --cover-html
# test.test_download:TestDownload.test_NowVideo
--- /dev/null
+# UTF-8
+#
+# For more details about fixed file info 'ffi' see:
+# http://msdn.microsoft.com/en-us/library/ms646997.aspx
+VSVersionInfo(
+ ffi=FixedFileInfo(
+ # filevers and prodvers should be always a tuple with four items: (1, 2, 3, 4)
+ # Set not needed items to zero 0.
+ filevers=(6, 9, 2020, 0),
+ prodvers=(6, 9, 2020, 0),
+ # Contains a bitmask that specifies the valid bits 'flags'r
+ mask=0x3f,
+ # Contains a bitmask that specifies the Boolean attributes of the file.
+ flags=0x0,
+ # The operating system for which this file was designed.
+ # 0x4 - NT and there is no need to change it.
+ # OS=0x40004,
+ OS=0x4,
+ # The general type of file.
+ # 0x1 - the file is an application.
+ fileType=0x1,
+ # The function of the file.
+ # 0x0 - the function is not defined for this fileType
+ subtype=0x0,
+ # Creation date and time stamp.
+ date=(0, 0)
+ ),
+ kids=[
+ StringFileInfo(
+ [
+ StringTable(
+ u'040904B0',
+ [StringStruct(u'Comments', u'Youtube-dlc Command Line Interface.'),
+ StringStruct(u'CompanyName', u'theidel@uni-bremen.de'),
+ StringStruct(u'FileDescription', u'Media Downloader'),
+ StringStruct(u'FileVersion', u'6.9.2020.0'),
+ StringStruct(u'InternalName', u'youtube-dlc'),
+ StringStruct(u'LegalCopyright', u'theidel@uni-bremen.de | UNLICENSE'),
+ StringStruct(u'OriginalFilename', u'youtube-dlc.exe'),
+ StringStruct(u'ProductName', u'Youtube-dlc'),
+ StringStruct(u'ProductVersion', u'6.9.2020.0 | git.io/JUGsM')])
+ ]),
+ VarFileInfo([VarStruct(u'Translation', [0, 1200])])
+ ]
+)
+++ /dev/null
-# This allows the youtube-dl command to be installed in ZSH using antigen.
-# Antigen is a bundle manager. It allows you to enhance the functionality of
-# your zsh session by installing bundles and themes easily.
-
-# Antigen documentation:
-# http://antigen.sharats.me/
-# https://github.com/zsh-users/antigen
-
-# Install youtube-dl:
-# antigen bundle ytdl-org/youtube-dl
-# Bundles installed by antigen are available for use immediately.
-
-# Update youtube-dl (and all other antigen bundles):
-# antigen update
-
-# The antigen command will download the git repository to a folder and then
-# execute an enabling script (this file). The complete process for loading the
-# code is documented here:
-# https://github.com/zsh-users/antigen#notes-on-writing-plugins
-
-# This specific script just aliases youtube-dl to the python script that this
-# library provides. This requires updating the PYTHONPATH to ensure that the
-# full set of code can be located.
-alias youtube-dl="PYTHONPATH=$(dirname $0) $(dirname $0)/bin/youtube-dl"
+++ /dev/null
-# coding: utf-8
-from __future__ import unicode_literals
-
-import re
-
-from .common import InfoExtractor
-from .generic import GenericIE
-from ..utils import (
- determine_ext,
- ExtractorError,
- int_or_none,
- parse_duration,
- qualities,
- str_or_none,
- try_get,
- unified_strdate,
- unified_timestamp,
- update_url_query,
- url_or_none,
- xpath_text,
-)
-from ..compat import compat_etree_fromstring
-
-
-class ARDMediathekIE(InfoExtractor):
- IE_NAME = 'ARD:mediathek'
- _VALID_URL = r'^https?://(?:(?:(?:www|classic)\.)?ardmediathek\.de|mediathek\.(?:daserste|rbb-online)\.de|one\.ard\.de)/(?:.*/)(?P<video_id>[0-9]+|[^0-9][^/\?]+)[^/\?]*(?:\?.*)?'
-
- _TESTS = [{
- # available till 26.07.2022
- 'url': 'http://www.ardmediathek.de/tv/S%C3%9CDLICHT/Was-ist-die-Kunst-der-Zukunft-liebe-Ann/BR-Fernsehen/Video?bcastId=34633636&documentId=44726822',
- 'info_dict': {
- 'id': '44726822',
- 'ext': 'mp4',
- 'title': 'Was ist die Kunst der Zukunft, liebe Anna McCarthy?',
- 'description': 'md5:4ada28b3e3b5df01647310e41f3a62f5',
- 'duration': 1740,
- },
- 'params': {
- # m3u8 download
- 'skip_download': True,
- }
- }, {
- 'url': 'https://one.ard.de/tv/Mord-mit-Aussicht/Mord-mit-Aussicht-6-39-T%C3%B6dliche-Nach/ONE/Video?bcastId=46384294&documentId=55586872',
- 'only_matching': True,
- }, {
- # audio
- 'url': 'http://www.ardmediathek.de/tv/WDR-H%C3%B6rspiel-Speicher/Tod-eines-Fu%C3%9Fballers/WDR-3/Audio-Podcast?documentId=28488308&bcastId=23074086',
- 'only_matching': True,
- }, {
- 'url': 'http://mediathek.daserste.de/sendungen_a-z/328454_anne-will/22429276_vertrauen-ist-gut-spionieren-ist-besser-geht',
- 'only_matching': True,
- }, {
- # audio
- 'url': 'http://mediathek.rbb-online.de/radio/Hörspiel/Vor-dem-Fest/kulturradio/Audio?documentId=30796318&topRessort=radio&bcastId=9839158',
- 'only_matching': True,
- }, {
- 'url': 'https://classic.ardmediathek.de/tv/Panda-Gorilla-Co/Panda-Gorilla-Co-Folge-274/Das-Erste/Video?bcastId=16355486&documentId=58234698',
- 'only_matching': True,
- }]
-
- @classmethod
- def suitable(cls, url):
- return False if ARDBetaMediathekIE.suitable(url) else super(ARDMediathekIE, cls).suitable(url)
-
- def _extract_media_info(self, media_info_url, webpage, video_id):
- media_info = self._download_json(
- media_info_url, video_id, 'Downloading media JSON')
-
- formats = self._extract_formats(media_info, video_id)
-
- if not formats:
- if '"fsk"' in webpage:
- raise ExtractorError(
- 'This video is only available after 20:00', expected=True)
- elif media_info.get('_geoblocked'):
- raise ExtractorError('This video is not available due to geo restriction', expected=True)
-
- self._sort_formats(formats)
-
- duration = int_or_none(media_info.get('_duration'))
- thumbnail = media_info.get('_previewImage')
- is_live = media_info.get('_isLive') is True
-
- subtitles = {}
- subtitle_url = media_info.get('_subtitleUrl')
- if subtitle_url:
- subtitles['de'] = [{
- 'ext': 'ttml',
- 'url': subtitle_url,
- }]
-
- return {
- 'id': video_id,
- 'duration': duration,
- 'thumbnail': thumbnail,
- 'is_live': is_live,
- 'formats': formats,
- 'subtitles': subtitles,
- }
-
- def _extract_formats(self, media_info, video_id):
- type_ = media_info.get('_type')
- media_array = media_info.get('_mediaArray', [])
- formats = []
- for num, media in enumerate(media_array):
- for stream in media.get('_mediaStreamArray', []):
- stream_urls = stream.get('_stream')
- if not stream_urls:
- continue
- if not isinstance(stream_urls, list):
- stream_urls = [stream_urls]
- quality = stream.get('_quality')
- server = stream.get('_server')
- for stream_url in stream_urls:
- if not url_or_none(stream_url):
- continue
- ext = determine_ext(stream_url)
- if quality != 'auto' and ext in ('f4m', 'm3u8'):
- continue
- if ext == 'f4m':
- formats.extend(self._extract_f4m_formats(
- update_url_query(stream_url, {
- 'hdcore': '3.1.1',
- 'plugin': 'aasp-3.1.1.69.124'
- }),
- video_id, f4m_id='hds', fatal=False))
- elif ext == 'm3u8':
- formats.extend(self._extract_m3u8_formats(
- stream_url, video_id, 'mp4', m3u8_id='hls', fatal=False))
- else:
- if server and server.startswith('rtmp'):
- f = {
- 'url': server,
- 'play_path': stream_url,
- 'format_id': 'a%s-rtmp-%s' % (num, quality),
- }
- else:
- f = {
- 'url': stream_url,
- 'format_id': 'a%s-%s-%s' % (num, ext, quality)
- }
- m = re.search(r'_(?P<width>\d+)x(?P<height>\d+)\.mp4$', stream_url)
- if m:
- f.update({
- 'width': int(m.group('width')),
- 'height': int(m.group('height')),
- })
- if type_ == 'audio':
- f['vcodec'] = 'none'
- formats.append(f)
- return formats
-
- def _real_extract(self, url):
- # determine video id from url
- m = re.match(self._VALID_URL, url)
-
- document_id = None
-
- numid = re.search(r'documentId=([0-9]+)', url)
- if numid:
- document_id = video_id = numid.group(1)
- else:
- video_id = m.group('video_id')
-
- webpage = self._download_webpage(url, video_id)
-
- ERRORS = (
- ('>Leider liegt eine Störung vor.', 'Video %s is unavailable'),
- ('>Der gewünschte Beitrag ist nicht mehr verfügbar.<',
- 'Video %s is no longer available'),
- )
-
- for pattern, message in ERRORS:
- if pattern in webpage:
- raise ExtractorError(message % video_id, expected=True)
-
- if re.search(r'[\?&]rss($|[=&])', url):
- doc = compat_etree_fromstring(webpage.encode('utf-8'))
- if doc.tag == 'rss':
- return GenericIE()._extract_rss(url, video_id, doc)
-
- title = self._html_search_regex(
- [r'<h1(?:\s+class="boxTopHeadline")?>(.*?)</h1>',
- r'<meta name="dcterms\.title" content="(.*?)"/>',
- r'<h4 class="headline">(.*?)</h4>',
- r'<title[^>]*>(.*?)</title>'],
- webpage, 'title')
- description = self._html_search_meta(
- 'dcterms.abstract', webpage, 'description', default=None)
- if description is None:
- description = self._html_search_meta(
- 'description', webpage, 'meta description', default=None)
- if description is None:
- description = self._html_search_regex(
- r'<p\s+class="teasertext">(.+?)</p>',
- webpage, 'teaser text', default=None)
-
- # Thumbnail is sometimes not present.
- # It is in the mobile version, but that seems to use a different URL
- # structure altogether.
- thumbnail = self._og_search_thumbnail(webpage, default=None)
-
- media_streams = re.findall(r'''(?x)
- mediaCollection\.addMediaStream\([0-9]+,\s*[0-9]+,\s*"[^"]*",\s*
- "([^"]+)"''', webpage)
-
- if media_streams:
- QUALITIES = qualities(['lo', 'hi', 'hq'])
- formats = []
- for furl in set(media_streams):
- if furl.endswith('.f4m'):
- fid = 'f4m'
- else:
- fid_m = re.match(r'.*\.([^.]+)\.[^.]+$', furl)
- fid = fid_m.group(1) if fid_m else None
- formats.append({
- 'quality': QUALITIES(fid),
- 'format_id': fid,
- 'url': furl,
- })
- self._sort_formats(formats)
- info = {
- 'formats': formats,
- }
- else: # request JSON file
- if not document_id:
- video_id = self._search_regex(
- r'/play/(?:config|media)/(\d+)', webpage, 'media id')
- info = self._extract_media_info(
- 'http://www.ardmediathek.de/play/media/%s' % video_id,
- webpage, video_id)
-
- info.update({
- 'id': video_id,
- 'title': self._live_title(title) if info.get('is_live') else title,
- 'description': description,
- 'thumbnail': thumbnail,
- })
-
- return info
-
-
-class ARDIE(InfoExtractor):
- _VALID_URL = r'(?P<mainurl>https?://(www\.)?daserste\.de/[^?#]+/videos/(?P<display_id>[^/?#]+)-(?P<id>[0-9]+))\.html'
- _TESTS = [{
- # available till 14.02.2019
- 'url': 'http://www.daserste.de/information/talk/maischberger/videos/das-groko-drama-zerlegen-sich-die-volksparteien-video-102.html',
- 'md5': '8e4ec85f31be7c7fc08a26cdbc5a1f49',
- 'info_dict': {
- 'display_id': 'das-groko-drama-zerlegen-sich-die-volksparteien-video',
- 'id': '102',
- 'ext': 'mp4',
- 'duration': 4435.0,
- 'title': 'Das GroKo-Drama: Zerlegen sich die Volksparteien?',
- 'upload_date': '20180214',
- 'thumbnail': r're:^https?://.*\.jpg$',
- },
- }, {
- 'url': 'http://www.daserste.de/information/reportage-dokumentation/dokus/videos/die-story-im-ersten-mission-unter-falscher-flagge-100.html',
- 'only_matching': True,
- }]
-
- def _real_extract(self, url):
- mobj = re.match(self._VALID_URL, url)
- display_id = mobj.group('display_id')
-
- player_url = mobj.group('mainurl') + '~playerXml.xml'
- doc = self._download_xml(player_url, display_id)
- video_node = doc.find('./video')
- upload_date = unified_strdate(xpath_text(
- video_node, './broadcastDate'))
- thumbnail = xpath_text(video_node, './/teaserImage//variant/url')
-
- formats = []
- for a in video_node.findall('.//asset'):
- f = {
- 'format_id': a.attrib['type'],
- 'width': int_or_none(a.find('./frameWidth').text),
- 'height': int_or_none(a.find('./frameHeight').text),
- 'vbr': int_or_none(a.find('./bitrateVideo').text),
- 'abr': int_or_none(a.find('./bitrateAudio').text),
- 'vcodec': a.find('./codecVideo').text,
- 'tbr': int_or_none(a.find('./totalBitrate').text),
- }
- if a.find('./serverPrefix').text:
- f['url'] = a.find('./serverPrefix').text
- f['playpath'] = a.find('./fileName').text
- else:
- f['url'] = a.find('./fileName').text
- formats.append(f)
- self._sort_formats(formats)
-
- return {
- 'id': mobj.group('id'),
- 'formats': formats,
- 'display_id': display_id,
- 'title': video_node.find('./title').text,
- 'duration': parse_duration(video_node.find('./duration').text),
- 'upload_date': upload_date,
- 'thumbnail': thumbnail,
- }
-
-
-class ARDBetaMediathekIE(InfoExtractor):
- _VALID_URL = r'https://(?:beta|www)\.ardmediathek\.de/[^/]+/(?:player|live)/(?P<video_id>[a-zA-Z0-9]+)(?:/(?P<display_id>[^/?#]+))?'
- _TESTS = [{
- 'url': 'https://beta.ardmediathek.de/ard/player/Y3JpZDovL2Rhc2Vyc3RlLmRlL3RhdG9ydC9mYmM4NGM1NC0xNzU4LTRmZGYtYWFhZS0wYzcyZTIxNGEyMDE/die-robuste-roswita',
- 'md5': '2d02d996156ea3c397cfc5036b5d7f8f',
- 'info_dict': {
- 'display_id': 'die-robuste-roswita',
- 'id': 'Y3JpZDovL2Rhc2Vyc3RlLmRlL3RhdG9ydC9mYmM4NGM1NC0xNzU4LTRmZGYtYWFhZS0wYzcyZTIxNGEyMDE',
- 'title': 'Tatort: Die robuste Roswita',
- 'description': r're:^Der Mord.*trüber ist als die Ilm.',
- 'duration': 5316,
- 'thumbnail': 'https://img.ardmediathek.de/standard/00/55/43/59/34/-1774185891/16x9/960?mandant=ard',
- 'upload_date': '20180826',
- 'ext': 'mp4',
- },
- }, {
- 'url': 'https://www.ardmediathek.de/ard/player/Y3JpZDovL3N3ci5kZS9hZXgvbzEwNzE5MTU/',
- 'only_matching': True,
- }, {
- 'url': 'https://www.ardmediathek.de/swr/live/Y3JpZDovL3N3ci5kZS8xMzQ4MTA0Mg',
- 'only_matching': True,
- }]
-
- def _real_extract(self, url):
- mobj = re.match(self._VALID_URL, url)
- video_id = mobj.group('video_id')
- display_id = mobj.group('display_id') or video_id
-
- webpage = self._download_webpage(url, display_id)
- data_json = self._search_regex(r'window\.__APOLLO_STATE__\s*=\s*(\{.*);\n', webpage, 'json')
- data = self._parse_json(data_json, display_id)
-
- res = {
- 'id': video_id,
- 'display_id': display_id,
- }
- formats = []
- subtitles = {}
- geoblocked = False
- for widget in data.values():
- if widget.get('_geoblocked') is True:
- geoblocked = True
- if '_duration' in widget:
- res['duration'] = int_or_none(widget['_duration'])
- if 'clipTitle' in widget:
- res['title'] = widget['clipTitle']
- if '_previewImage' in widget:
- res['thumbnail'] = widget['_previewImage']
- if 'broadcastedOn' in widget:
- res['timestamp'] = unified_timestamp(widget['broadcastedOn'])
- if 'synopsis' in widget:
- res['description'] = widget['synopsis']
- subtitle_url = url_or_none(widget.get('_subtitleUrl'))
- if subtitle_url:
- subtitles.setdefault('de', []).append({
- 'ext': 'ttml',
- 'url': subtitle_url,
- })
- if '_quality' in widget:
- format_url = url_or_none(try_get(
- widget, lambda x: x['_stream']['json'][0]))
- if not format_url:
- continue
- ext = determine_ext(format_url)
- if ext == 'f4m':
- formats.extend(self._extract_f4m_formats(
- format_url + '?hdcore=3.11.0',
- video_id, f4m_id='hds', fatal=False))
- elif ext == 'm3u8':
- formats.extend(self._extract_m3u8_formats(
- format_url, video_id, 'mp4', m3u8_id='hls',
- fatal=False))
- else:
- # HTTP formats are not available when geoblocked is True,
- # other formats are fine though
- if geoblocked:
- continue
- quality = str_or_none(widget.get('_quality'))
- formats.append({
- 'format_id': ('http-' + quality) if quality else 'http',
- 'url': format_url,
- 'preference': 10, # Plain HTTP, that's nice
- })
-
- if not formats and geoblocked:
- self.raise_geo_restricted(
- msg='This video is not available due to geoblocking',
- countries=['DE'])
-
- self._sort_formats(formats)
- res.update({
- 'subtitles': subtitles,
- 'formats': formats,
- })
-
- return res
+++ /dev/null
-from __future__ import unicode_literals
-
-import json
-import re
-
-from .common import InfoExtractor
-from ..utils import (
- ExtractorError,
- int_or_none,
- orderedSet,
-)
-
-
-class DeezerPlaylistIE(InfoExtractor):
- _VALID_URL = r'https?://(?:www\.)?deezer\.com/playlist/(?P<id>[0-9]+)'
- _TEST = {
- 'url': 'http://www.deezer.com/playlist/176747451',
- 'info_dict': {
- 'id': '176747451',
- 'title': 'Best!',
- 'uploader': 'Anonymous',
- 'thumbnail': r're:^https?://cdn-images\.deezer\.com/images/cover/.*\.jpg$',
- },
- 'playlist_count': 30,
- 'skip': 'Only available in .de',
- }
-
- def _real_extract(self, url):
- if 'test' not in self._downloader.params:
- self._downloader.report_warning('For now, this extractor only supports the 30 second previews. Patches welcome!')
-
- mobj = re.match(self._VALID_URL, url)
- playlist_id = mobj.group('id')
-
- webpage = self._download_webpage(url, playlist_id)
- geoblocking_msg = self._html_search_regex(
- r'<p class="soon-txt">(.*?)</p>', webpage, 'geoblocking message',
- default=None)
- if geoblocking_msg is not None:
- raise ExtractorError(
- 'Deezer said: %s' % geoblocking_msg, expected=True)
-
- data_json = self._search_regex(
- (r'__DZR_APP_STATE__\s*=\s*({.+?})\s*</script>',
- r'naboo\.display\(\'[^\']+\',\s*(.*?)\);\n'),
- webpage, 'data JSON')
- data = json.loads(data_json)
-
- playlist_title = data.get('DATA', {}).get('TITLE')
- playlist_uploader = data.get('DATA', {}).get('PARENT_USERNAME')
- playlist_thumbnail = self._search_regex(
- r'<img id="naboo_playlist_image".*?src="([^"]+)"', webpage,
- 'playlist thumbnail')
-
- preview_pattern = self._search_regex(
- r"var SOUND_PREVIEW_GATEWAY\s*=\s*'([^']+)';", webpage,
- 'preview URL pattern', fatal=False)
- entries = []
- for s in data['SONGS']['data']:
- puid = s['MD5_ORIGIN']
- preview_video_url = preview_pattern.\
- replace('{0}', puid[0]).\
- replace('{1}', puid).\
- replace('{2}', s['MEDIA_VERSION'])
- formats = [{
- 'format_id': 'preview',
- 'url': preview_video_url,
- 'preference': -100, # Only the first 30 seconds
- 'ext': 'mp3',
- }]
- self._sort_formats(formats)
- artists = ', '.join(
- orderedSet(a['ART_NAME'] for a in s['ARTISTS']))
- entries.append({
- 'id': s['SNG_ID'],
- 'duration': int_or_none(s.get('DURATION')),
- 'title': '%s - %s' % (artists, s['SNG_TITLE']),
- 'uploader': s['ART_NAME'],
- 'uploader_id': s['ART_ID'],
- 'age_limit': 16 if s.get('EXPLICIT_LYRICS') == '1' else 0,
- 'formats': formats,
- })
-
- return {
- '_type': 'playlist',
- 'id': playlist_id,
- 'title': playlist_title,
- 'uploader': playlist_uploader,
- 'thumbnail': playlist_thumbnail,
- 'entries': entries,
- }
+++ /dev/null
-from __future__ import unicode_literals
-
-import re
-
-from .common import InfoExtractor
-from ..utils import (
- int_or_none,
- unified_strdate,
- xpath_text,
- determine_ext,
- float_or_none,
- ExtractorError,
-)
-
-
-class DreiSatIE(InfoExtractor):
- IE_NAME = '3sat'
- _GEO_COUNTRIES = ['DE']
- _VALID_URL = r'https?://(?:www\.)?3sat\.de/mediathek/(?:(?:index|mediathek)\.php)?\?(?:(?:mode|display)=[^&]+&)*obj=(?P<id>[0-9]+)'
- _TESTS = [
- {
- 'url': 'http://www.3sat.de/mediathek/index.php?mode=play&obj=45918',
- 'md5': 'be37228896d30a88f315b638900a026e',
- 'info_dict': {
- 'id': '45918',
- 'ext': 'mp4',
- 'title': 'Waidmannsheil',
- 'description': 'md5:cce00ca1d70e21425e72c86a98a56817',
- 'uploader': 'SCHWEIZWEIT',
- 'uploader_id': '100000210',
- 'upload_date': '20140913'
- },
- 'params': {
- 'skip_download': True, # m3u8 downloads
- }
- },
- {
- 'url': 'http://www.3sat.de/mediathek/mediathek.php?mode=play&obj=51066',
- 'only_matching': True,
- },
- ]
-
- def _parse_smil_formats(self, smil, smil_url, video_id, namespace=None, f4m_params=None, transform_rtmp_url=None):
- param_groups = {}
- for param_group in smil.findall(self._xpath_ns('./head/paramGroup', namespace)):
- group_id = param_group.get(self._xpath_ns(
- 'id', 'http://www.w3.org/XML/1998/namespace'))
- params = {}
- for param in param_group:
- params[param.get('name')] = param.get('value')
- param_groups[group_id] = params
-
- formats = []
- for video in smil.findall(self._xpath_ns('.//video', namespace)):
- src = video.get('src')
- if not src:
- continue
- bitrate = int_or_none(self._search_regex(r'_(\d+)k', src, 'bitrate', None)) or float_or_none(video.get('system-bitrate') or video.get('systemBitrate'), 1000)
- group_id = video.get('paramGroup')
- param_group = param_groups[group_id]
- for proto in param_group['protocols'].split(','):
- formats.append({
- 'url': '%s://%s' % (proto, param_group['host']),
- 'app': param_group['app'],
- 'play_path': src,
- 'ext': 'flv',
- 'format_id': '%s-%d' % (proto, bitrate),
- 'tbr': bitrate,
- })
- self._sort_formats(formats)
- return formats
-
- def extract_from_xml_url(self, video_id, xml_url):
- doc = self._download_xml(
- xml_url, video_id,
- note='Downloading video info',
- errnote='Failed to download video info')
-
- status_code = xpath_text(doc, './status/statuscode')
- if status_code and status_code != 'ok':
- if status_code == 'notVisibleAnymore':
- message = 'Video %s is not available' % video_id
- else:
- message = '%s returned error: %s' % (self.IE_NAME, status_code)
- raise ExtractorError(message, expected=True)
-
- title = xpath_text(doc, './/information/title', 'title', True)
-
- urls = []
- formats = []
- for fnode in doc.findall('.//formitaeten/formitaet'):
- video_url = xpath_text(fnode, 'url')
- if not video_url or video_url in urls:
- continue
- urls.append(video_url)
-
- is_available = 'http://www.metafilegenerator' not in video_url
- geoloced = 'static_geoloced_online' in video_url
- if not is_available or geoloced:
- continue
-
- format_id = fnode.attrib['basetype']
- format_m = re.match(r'''(?x)
- (?P<vcodec>[^_]+)_(?P<acodec>[^_]+)_(?P<container>[^_]+)_
- (?P<proto>[^_]+)_(?P<index>[^_]+)_(?P<indexproto>[^_]+)
- ''', format_id)
-
- ext = determine_ext(video_url, None) or format_m.group('container')
-
- if ext == 'meta':
- continue
- elif ext == 'smil':
- formats.extend(self._extract_smil_formats(
- video_url, video_id, fatal=False))
- elif ext == 'm3u8':
- # the certificates are misconfigured (see
- # https://github.com/ytdl-org/youtube-dl/issues/8665)
- if video_url.startswith('https://'):
- continue
- formats.extend(self._extract_m3u8_formats(
- video_url, video_id, 'mp4', 'm3u8_native',
- m3u8_id=format_id, fatal=False))
- elif ext == 'f4m':
- formats.extend(self._extract_f4m_formats(
- video_url, video_id, f4m_id=format_id, fatal=False))
- else:
- quality = xpath_text(fnode, './quality')
- if quality:
- format_id += '-' + quality
-
- abr = int_or_none(xpath_text(fnode, './audioBitrate'), 1000)
- vbr = int_or_none(xpath_text(fnode, './videoBitrate'), 1000)
-
- tbr = int_or_none(self._search_regex(
- r'_(\d+)k', video_url, 'bitrate', None))
- if tbr and vbr and not abr:
- abr = tbr - vbr
-
- formats.append({
- 'format_id': format_id,
- 'url': video_url,
- 'ext': ext,
- 'acodec': format_m.group('acodec'),
- 'vcodec': format_m.group('vcodec'),
- 'abr': abr,
- 'vbr': vbr,
- 'tbr': tbr,
- 'width': int_or_none(xpath_text(fnode, './width')),
- 'height': int_or_none(xpath_text(fnode, './height')),
- 'filesize': int_or_none(xpath_text(fnode, './filesize')),
- 'protocol': format_m.group('proto').lower(),
- })
-
- geolocation = xpath_text(doc, './/details/geolocation')
- if not formats and geolocation and geolocation != 'none':
- self.raise_geo_restricted(countries=self._GEO_COUNTRIES)
-
- self._sort_formats(formats)
-
- thumbnails = []
- for node in doc.findall('.//teaserimages/teaserimage'):
- thumbnail_url = node.text
- if not thumbnail_url:
- continue
- thumbnail = {
- 'url': thumbnail_url,
- }
- thumbnail_key = node.get('key')
- if thumbnail_key:
- m = re.match('^([0-9]+)x([0-9]+)$', thumbnail_key)
- if m:
- thumbnail['width'] = int(m.group(1))
- thumbnail['height'] = int(m.group(2))
- thumbnails.append(thumbnail)
-
- upload_date = unified_strdate(xpath_text(doc, './/details/airtime'))
-
- return {
- 'id': video_id,
- 'title': title,
- 'description': xpath_text(doc, './/information/detail'),
- 'duration': int_or_none(xpath_text(doc, './/details/lengthSec')),
- 'thumbnails': thumbnails,
- 'uploader': xpath_text(doc, './/details/originChannelTitle'),
- 'uploader_id': xpath_text(doc, './/details/originChannelId'),
- 'upload_date': upload_date,
- 'formats': formats,
- }
-
- def _real_extract(self, url):
- video_id = self._match_id(url)
- details_url = 'http://www.3sat.de/mediathek/xmlservice/web/beitragsDetails?id=%s' % video_id
- return self.extract_from_xml_url(video_id, details_url)
+++ /dev/null
-from __future__ import unicode_literals
-
-import re
-
-from .common import InfoExtractor
-from ..utils import (
- js_to_json,
- remove_end,
- determine_ext,
-)
-
-
-class HellPornoIE(InfoExtractor):
- _VALID_URL = r'https?://(?:www\.)?hellporno\.(?:com/videos|net/v)/(?P<id>[^/]+)'
- _TESTS = [{
- 'url': 'http://hellporno.com/videos/dixie-is-posing-with-naked-ass-very-erotic/',
- 'md5': '1fee339c610d2049699ef2aa699439f1',
- 'info_dict': {
- 'id': '149116',
- 'display_id': 'dixie-is-posing-with-naked-ass-very-erotic',
- 'ext': 'mp4',
- 'title': 'Dixie is posing with naked ass very erotic',
- 'thumbnail': r're:https?://.*\.jpg$',
- 'age_limit': 18,
- }
- }, {
- 'url': 'http://hellporno.net/v/186271/',
- 'only_matching': True,
- }]
-
- def _real_extract(self, url):
- display_id = self._match_id(url)
-
- webpage = self._download_webpage(url, display_id)
-
- title = remove_end(self._html_search_regex(
- r'<title>([^<]+)</title>', webpage, 'title'), ' - Hell Porno')
-
- flashvars = self._parse_json(self._search_regex(
- r'var\s+flashvars\s*=\s*({.+?});', webpage, 'flashvars'),
- display_id, transform_source=js_to_json)
-
- video_id = flashvars.get('video_id')
- thumbnail = flashvars.get('preview_url')
- ext = determine_ext(flashvars.get('postfix'), 'mp4')
-
- formats = []
- for video_url_key in ['video_url', 'video_alt_url']:
- video_url = flashvars.get(video_url_key)
- if not video_url:
- continue
- video_text = flashvars.get('%s_text' % video_url_key)
- fmt = {
- 'url': video_url,
- 'ext': ext,
- 'format_id': video_text,
- }
- m = re.search(r'^(?P<height>\d+)[pP]', video_text)
- if m:
- fmt['height'] = int(m.group('height'))
- formats.append(fmt)
- self._sort_formats(formats)
-
- categories = self._html_search_meta(
- 'keywords', webpage, 'categories', default='').split(',')
-
- return {
- 'id': video_id,
- 'display_id': display_id,
- 'title': title,
- 'thumbnail': thumbnail,
- 'categories': categories,
- 'age_limit': 18,
- 'formats': formats,
- }
+++ /dev/null
-# coding: utf-8
-from __future__ import unicode_literals
-
-from .common import InfoExtractor
-from ..utils import (
- int_or_none,
- unified_strdate,
-)
-
-
-class JpopsukiIE(InfoExtractor):
- IE_NAME = 'jpopsuki.tv'
- _VALID_URL = r'https?://(?:www\.)?jpopsuki\.tv/(?:category/)?video/[^/]+/(?P<id>\S+)'
-
- _TEST = {
- 'url': 'http://www.jpopsuki.tv/video/ayumi-hamasaki---evolution/00be659d23b0b40508169cdee4545771',
- 'md5': '88018c0c1a9b1387940e90ec9e7e198e',
- 'info_dict': {
- 'id': '00be659d23b0b40508169cdee4545771',
- 'ext': 'mp4',
- 'title': 'ayumi hamasaki - evolution',
- 'description': 'Release date: 2001.01.31\r\n浜崎あゆみ - evolution',
- 'thumbnail': 'http://www.jpopsuki.tv/cache/89722c74d2a2ebe58bcac65321c115b2.jpg',
- 'uploader': 'plama_chan',
- 'uploader_id': '404',
- 'upload_date': '20121101'
- }
- }
-
- def _real_extract(self, url):
- video_id = self._match_id(url)
-
- webpage = self._download_webpage(url, video_id)
-
- video_url = 'http://www.jpopsuki.tv' + self._html_search_regex(
- r'<source src="(.*?)" type', webpage, 'video url')
-
- video_title = self._og_search_title(webpage)
- description = self._og_search_description(webpage)
- thumbnail = self._og_search_thumbnail(webpage)
- uploader = self._html_search_regex(
- r'<li>from: <a href="/user/view/user/(.*?)/uid/',
- webpage, 'video uploader', fatal=False)
- uploader_id = self._html_search_regex(
- r'<li>from: <a href="/user/view/user/\S*?/uid/(\d*)',
- webpage, 'video uploader_id', fatal=False)
- upload_date = unified_strdate(self._html_search_regex(
- r'<li>uploaded: (.*?)</li>', webpage, 'video upload_date',
- fatal=False))
- view_count_str = self._html_search_regex(
- r'<li>Hits: ([0-9]+?)</li>', webpage, 'video view_count',
- fatal=False)
- comment_count_str = self._html_search_regex(
- r'<h2>([0-9]+?) comments</h2>', webpage, 'video comment_count',
- fatal=False)
-
- return {
- 'id': video_id,
- 'url': video_url,
- 'title': video_title,
- 'description': description,
- 'thumbnail': thumbnail,
- 'uploader': uploader,
- 'uploader_id': uploader_id,
- 'upload_date': upload_date,
- 'view_count': int_or_none(view_count_str),
- 'comment_count': int_or_none(comment_count_str),
- }
+++ /dev/null
-# coding: utf-8
-from __future__ import unicode_literals
-
-import re
-
-from .common import InfoExtractor
-from ..compat import compat_str
-from ..utils import (
- unescapeHTML,
- parse_duration,
- get_element_by_class,
-)
-
-
-class LEGOIE(InfoExtractor):
- _VALID_URL = r'https?://(?:www\.)?lego\.com/(?P<locale>[^/]+)/(?:[^/]+/)*videos/(?:[^/]+/)*[^/?#]+-(?P<id>[0-9a-f]+)'
- _TESTS = [{
- 'url': 'http://www.lego.com/en-us/videos/themes/club/blocumentary-kawaguchi-55492d823b1b4d5e985787fa8c2973b1',
- 'md5': 'f34468f176cfd76488767fc162c405fa',
- 'info_dict': {
- 'id': '55492d823b1b4d5e985787fa8c2973b1',
- 'ext': 'mp4',
- 'title': 'Blocumentary Great Creations: Akiyuki Kawaguchi',
- 'description': 'Blocumentary Great Creations: Akiyuki Kawaguchi',
- },
- }, {
- # geo-restricted but the contentUrl contain a valid url
- 'url': 'http://www.lego.com/nl-nl/videos/themes/nexoknights/episode-20-kingdom-of-heroes-13bdc2299ab24d9685701a915b3d71e7##sp=399',
- 'md5': '4c3fec48a12e40c6e5995abc3d36cc2e',
- 'info_dict': {
- 'id': '13bdc2299ab24d9685701a915b3d71e7',
- 'ext': 'mp4',
- 'title': 'Aflevering 20 - Helden van het koninkrijk',
- 'description': 'md5:8ee499aac26d7fa8bcb0cedb7f9c3941',
- },
- }, {
- # special characters in title
- 'url': 'http://www.lego.com/en-us/starwars/videos/lego-star-wars-force-surprise-9685ee9d12e84ff38e84b4e3d0db533d',
- 'info_dict': {
- 'id': '9685ee9d12e84ff38e84b4e3d0db533d',
- 'ext': 'mp4',
- 'title': 'Force Surprise – LEGO® Star Wars™ Microfighters',
- 'description': 'md5:9c673c96ce6f6271b88563fe9dc56de3',
- },
- 'params': {
- 'skip_download': True,
- },
- }]
- _BITRATES = [256, 512, 1024, 1536, 2560]
-
- def _real_extract(self, url):
- locale, video_id = re.match(self._VALID_URL, url).groups()
- webpage = self._download_webpage(url, video_id)
- title = get_element_by_class('video-header', webpage).strip()
- progressive_base = 'https://lc-mediaplayerns-live-s.legocdn.com/'
- streaming_base = 'http://legoprod-f.akamaihd.net/'
- content_url = self._html_search_meta('contentUrl', webpage)
- path = self._search_regex(
- r'(?:https?:)?//[^/]+/(?:[iz]/s/)?public/(.+)_[0-9,]+\.(?:mp4|webm)',
- content_url, 'video path', default=None)
- if not path:
- player_url = self._proto_relative_url(self._search_regex(
- r'<iframe[^>]+src="((?:https?)?//(?:www\.)?lego\.com/[^/]+/mediaplayer/video/[^"]+)',
- webpage, 'player url', default=None))
- if not player_url:
- base_url = self._proto_relative_url(self._search_regex(
- r'data-baseurl="([^"]+)"', webpage, 'base url',
- default='http://www.lego.com/%s/mediaplayer/video/' % locale))
- player_url = base_url + video_id
- player_webpage = self._download_webpage(player_url, video_id)
- video_data = self._parse_json(unescapeHTML(self._search_regex(
- r"video='([^']+)'", player_webpage, 'video data')), video_id)
- progressive_base = self._search_regex(
- r'data-video-progressive-url="([^"]+)"',
- player_webpage, 'progressive base', default='https://lc-mediaplayerns-live-s.legocdn.com/')
- streaming_base = self._search_regex(
- r'data-video-streaming-url="([^"]+)"',
- player_webpage, 'streaming base', default='http://legoprod-f.akamaihd.net/')
- item_id = video_data['ItemId']
-
- net_storage_path = video_data.get('NetStoragePath') or '/'.join([item_id[:2], item_id[2:4]])
- base_path = '_'.join([item_id, video_data['VideoId'], video_data['Locale'], compat_str(video_data['VideoVersion'])])
- path = '/'.join([net_storage_path, base_path])
- streaming_path = ','.join(map(lambda bitrate: compat_str(bitrate), self._BITRATES))
-
- formats = self._extract_akamai_formats(
- '%si/s/public/%s_,%s,.mp4.csmil/master.m3u8' % (streaming_base, path, streaming_path), video_id)
- m3u8_formats = list(filter(
- lambda f: f.get('protocol') == 'm3u8_native' and f.get('vcodec') != 'none',
- formats))
- if len(m3u8_formats) == len(self._BITRATES):
- self._sort_formats(m3u8_formats)
- for bitrate, m3u8_format in zip(self._BITRATES, m3u8_formats):
- progressive_base_url = '%spublic/%s_%d.' % (progressive_base, path, bitrate)
- mp4_f = m3u8_format.copy()
- mp4_f.update({
- 'url': progressive_base_url + 'mp4',
- 'format_id': m3u8_format['format_id'].replace('hls', 'mp4'),
- 'protocol': 'http',
- })
- web_f = {
- 'url': progressive_base_url + 'webm',
- 'format_id': m3u8_format['format_id'].replace('hls', 'webm'),
- 'width': m3u8_format['width'],
- 'height': m3u8_format['height'],
- 'tbr': m3u8_format.get('tbr'),
- 'ext': 'webm',
- }
- formats.extend([web_f, mp4_f])
- else:
- for bitrate in self._BITRATES:
- for ext in ('web', 'mp4'):
- formats.append({
- 'format_id': '%s-%s' % (ext, bitrate),
- 'url': '%spublic/%s_%d.%s' % (progressive_base, path, bitrate, ext),
- 'tbr': bitrate,
- 'ext': ext,
- })
- self._sort_formats(formats)
-
- return {
- 'id': video_id,
- 'title': title,
- 'description': self._html_search_meta('description', webpage),
- 'thumbnail': self._html_search_meta('thumbnail', webpage),
- 'duration': parse_duration(self._html_search_meta('duration', webpage)),
- 'formats': formats,
- }
+++ /dev/null
-# coding: utf-8
-from __future__ import unicode_literals
-
-from .common import InfoExtractor
-from ..utils import (
- int_or_none,
- smuggle_url,
- parse_duration,
-)
-
-
-class MiTeleIE(InfoExtractor):
- IE_DESC = 'mitele.es'
- _VALID_URL = r'https?://(?:www\.)?mitele\.es/(?:[^/]+/)+(?P<id>[^/]+)/player'
-
- _TESTS = [{
- 'url': 'http://www.mitele.es/programas-tv/diario-de/57b0dfb9c715da65618b4afa/player',
- 'info_dict': {
- 'id': 'FhYW1iNTE6J6H7NkQRIEzfne6t2quqPg',
- 'ext': 'mp4',
- 'title': 'Tor, la web invisible',
- 'description': 'md5:3b6fce7eaa41b2d97358726378d9369f',
- 'series': 'Diario de',
- 'season': 'La redacción',
- 'season_number': 14,
- 'season_id': 'diario_de_t14_11981',
- 'episode': 'Programa 144',
- 'episode_number': 3,
- 'thumbnail': r're:(?i)^https?://.*\.jpg$',
- 'duration': 2913,
- },
- 'add_ie': ['Ooyala'],
- }, {
- # no explicit title
- 'url': 'http://www.mitele.es/programas-tv/cuarto-milenio/57b0de3dc915da14058b4876/player',
- 'info_dict': {
- 'id': 'oyNG1iNTE6TAPP-JmCjbwfwJqqMMX3Vq',
- 'ext': 'mp4',
- 'title': 'Cuarto Milenio Temporada 6 Programa 226',
- 'description': 'md5:5ff132013f0cd968ffbf1f5f3538a65f',
- 'series': 'Cuarto Milenio',
- 'season': 'Temporada 6',
- 'season_number': 6,
- 'season_id': 'cuarto_milenio_t06_12715',
- 'episode': 'Programa 226',
- 'episode_number': 24,
- 'thumbnail': r're:(?i)^https?://.*\.jpg$',
- 'duration': 7313,
- },
- 'params': {
- 'skip_download': True,
- },
- 'add_ie': ['Ooyala'],
- }, {
- 'url': 'http://www.mitele.es/series-online/la-que-se-avecina/57aac5c1c915da951a8b45ed/player',
- 'only_matching': True,
- }]
-
- def _real_extract(self, url):
- video_id = self._match_id(url)
-
- paths = self._download_json(
- 'https://www.mitele.es/amd/agp/web/metadata/general_configuration',
- video_id, 'Downloading paths JSON')
-
- ooyala_s = paths['general_configuration']['api_configuration']['ooyala_search']
- base_url = ooyala_s.get('base_url', 'cdn-search-mediaset.carbyne.ps.ooyala.com')
- full_path = ooyala_s.get('full_path', '/search/v1/full/providers/')
- source = self._download_json(
- '%s://%s%s%s/docs/%s' % (
- ooyala_s.get('protocol', 'https'), base_url, full_path,
- ooyala_s.get('provider_id', '104951'), video_id),
- video_id, 'Downloading data JSON', query={
- 'include_titles': 'Series,Season',
- 'product_name': ooyala_s.get('product_name', 'test'),
- 'format': 'full',
- })['hits']['hits'][0]['_source']
-
- embedCode = source['offers'][0]['embed_codes'][0]
- titles = source['localizable_titles'][0]
-
- title = titles.get('title_medium') or titles['title_long']
-
- description = titles.get('summary_long') or titles.get('summary_medium')
-
- def get(key1, key2):
- value1 = source.get(key1)
- if not value1 or not isinstance(value1, list):
- return
- if not isinstance(value1[0], dict):
- return
- return value1[0].get(key2)
-
- series = get('localizable_titles_series', 'title_medium')
-
- season = get('localizable_titles_season', 'title_medium')
- season_number = int_or_none(source.get('season_number'))
- season_id = source.get('season_id')
-
- episode = titles.get('title_sort_name')
- episode_number = int_or_none(source.get('episode_number'))
-
- duration = parse_duration(get('videos', 'duration'))
-
- return {
- '_type': 'url_transparent',
- # for some reason only HLS is supported
- 'url': smuggle_url('ooyala:' + embedCode, {'supportedformats': 'm3u8,dash'}),
- 'id': video_id,
- 'title': title,
- 'description': description,
- 'series': series,
- 'season': season,
- 'season_number': season_number,
- 'season_id': season_id,
- 'episode': episode,
- 'episode_number': episode_number,
- 'duration': duration,
- 'thumbnail': get('images', 'url'),
- }
+++ /dev/null
-# coding: utf-8
-from __future__ import unicode_literals
-
-from .common import InfoExtractor
-from ..utils import (
- ExtractorError,
- qualities,
-)
-
-
-class PandaTVIE(InfoExtractor):
- IE_DESC = '熊猫TV'
- _VALID_URL = r'https?://(?:www\.)?panda\.tv/(?P<id>[0-9]+)'
- _TESTS = [{
- 'url': 'http://www.panda.tv/66666',
- 'info_dict': {
- 'id': '66666',
- 'title': 're:.+',
- 'uploader': '刘杀鸡',
- 'ext': 'flv',
- 'is_live': True,
- },
- 'params': {
- 'skip_download': True,
- },
- 'skip': 'Live stream is offline',
- }, {
- 'url': 'https://www.panda.tv/66666',
- 'only_matching': True,
- }]
-
- def _real_extract(self, url):
- video_id = self._match_id(url)
-
- config = self._download_json(
- 'https://www.panda.tv/api_room_v2?roomid=%s' % video_id, video_id)
-
- error_code = config.get('errno', 0)
- if error_code != 0:
- raise ExtractorError(
- '%s returned error %s: %s'
- % (self.IE_NAME, error_code, config['errmsg']),
- expected=True)
-
- data = config['data']
- video_info = data['videoinfo']
-
- # 2 = live, 3 = offline
- if video_info.get('status') != '2':
- raise ExtractorError(
- 'Live stream is offline', expected=True)
-
- title = data['roominfo']['name']
- uploader = data.get('hostinfo', {}).get('name')
- room_key = video_info['room_key']
- stream_addr = video_info.get(
- 'stream_addr', {'OD': '1', 'HD': '1', 'SD': '1'})
-
- # Reverse engineered from web player swf
- # (http://s6.pdim.gs/static/07153e425f581151.swf at the moment of
- # writing).
- plflag0, plflag1 = video_info['plflag'].split('_')
- plflag0 = int(plflag0) - 1
- if plflag1 == '21':
- plflag0 = 10
- plflag1 = '4'
- live_panda = 'live_panda' if plflag0 < 1 else ''
-
- plflag_auth = self._parse_json(video_info['plflag_list'], video_id)
- sign = plflag_auth['auth']['sign']
- ts = plflag_auth['auth']['time']
- rid = plflag_auth['auth']['rid']
-
- quality_key = qualities(['OD', 'HD', 'SD'])
- suffix = ['_small', '_mid', '']
- formats = []
- for k, v in stream_addr.items():
- if v != '1':
- continue
- quality = quality_key(k)
- if quality <= 0:
- continue
- for pref, (ext, pl) in enumerate((('m3u8', '-hls'), ('flv', ''))):
- formats.append({
- 'url': 'https://pl%s%s.live.panda.tv/live_panda/%s%s%s.%s?sign=%s&ts=%s&rid=%s'
- % (pl, plflag1, room_key, live_panda, suffix[quality], ext, sign, ts, rid),
- 'format_id': '%s-%s' % (k, ext),
- 'quality': quality,
- 'source_preference': pref,
- })
- self._sort_formats(formats)
-
- return {
- 'id': video_id,
- 'title': self._live_title(title),
- 'uploader': uploader,
- 'formats': formats,
- 'is_live': True,
- }
+++ /dev/null
-from __future__ import unicode_literals
-
-from .dreisat import DreiSatIE
-
-
-class PhoenixIE(DreiSatIE):
- IE_NAME = 'phoenix.de'
- _VALID_URL = r'''(?x)https?://(?:www\.)?phoenix\.de/content/
- (?:
- phoenix/die_sendungen/(?:[^/]+/)?
- )?
- (?P<id>[0-9]+)'''
- _TESTS = [
- {
- 'url': 'http://www.phoenix.de/content/884301',
- 'md5': 'ed249f045256150c92e72dbb70eadec6',
- 'info_dict': {
- 'id': '884301',
- 'ext': 'mp4',
- 'title': 'Michael Krons mit Hans-Werner Sinn',
- 'description': 'Im Dialog - Sa. 25.10.14, 00.00 - 00.35 Uhr',
- 'upload_date': '20141025',
- 'uploader': 'Im Dialog',
- }
- },
- {
- 'url': 'http://www.phoenix.de/content/phoenix/die_sendungen/869815',
- 'only_matching': True,
- },
- {
- 'url': 'http://www.phoenix.de/content/phoenix/die_sendungen/diskussionen/928234',
- 'only_matching': True,
- },
- ]
-
- def _real_extract(self, url):
- video_id = self._match_id(url)
- webpage = self._download_webpage(url, video_id)
-
- internal_id = self._search_regex(
- r'<div class="phx_vod" id="phx_vod_([0-9]+)"',
- webpage, 'internal video ID')
-
- api_url = 'http://www.phoenix.de/php/mediaplayer/data/beitrags_details.php?ak=web&id=%s' % internal_id
- return self.extract_from_xml_url(video_id, api_url)
+++ /dev/null
-# coding: utf-8
-from __future__ import unicode_literals
-
-import re
-
-from .common import InfoExtractor
-from ..utils import (
- extract_attributes,
- int_or_none,
-)
-
-
-class PokemonIE(InfoExtractor):
- _VALID_URL = r'https?://(?:www\.)?pokemon\.com/[a-z]{2}(?:.*?play=(?P<id>[a-z0-9]{32})|/(?:[^/]+/)+(?P<display_id>[^/?#&]+))'
- _TESTS = [{
- 'url': 'https://www.pokemon.com/us/pokemon-episodes/20_30-the-ol-raise-and-switch/',
- 'md5': '2fe8eaec69768b25ef898cda9c43062e',
- 'info_dict': {
- 'id': 'afe22e30f01c41f49d4f1d9eab5cd9a4',
- 'ext': 'mp4',
- 'title': 'The Ol’ Raise and Switch!',
- 'description': 'md5:7db77f7107f98ba88401d3adc80ff7af',
- 'timestamp': 1511824728,
- 'upload_date': '20171127',
- },
- 'add_id': ['LimelightMedia'],
- }, {
- # no data-video-title
- 'url': 'https://www.pokemon.com/us/pokemon-episodes/pokemon-movies/pokemon-the-rise-of-darkrai-2008',
- 'info_dict': {
- 'id': '99f3bae270bf4e5097274817239ce9c8',
- 'ext': 'mp4',
- 'title': 'Pokémon: The Rise of Darkrai',
- 'description': 'md5:ea8fbbf942e1e497d54b19025dd57d9d',
- 'timestamp': 1417778347,
- 'upload_date': '20141205',
- },
- 'add_id': ['LimelightMedia'],
- 'params': {
- 'skip_download': True,
- },
- }, {
- 'url': 'http://www.pokemon.com/uk/pokemon-episodes/?play=2e8b5c761f1d4a9286165d7748c1ece2',
- 'only_matching': True,
- }, {
- 'url': 'http://www.pokemon.com/fr/episodes-pokemon/18_09-un-hiver-inattendu/',
- 'only_matching': True,
- }, {
- 'url': 'http://www.pokemon.com/de/pokemon-folgen/01_20-bye-bye-smettbo/',
- 'only_matching': True,
- }]
-
- def _real_extract(self, url):
- video_id, display_id = re.match(self._VALID_URL, url).groups()
- webpage = self._download_webpage(url, video_id or display_id)
- video_data = extract_attributes(self._search_regex(
- r'(<[^>]+data-video-id="%s"[^>]*>)' % (video_id if video_id else '[a-z0-9]{32}'),
- webpage, 'video data element'))
- video_id = video_data['data-video-id']
- title = video_data.get('data-video-title') or self._html_search_meta(
- 'pkm-title', webpage, ' title', default=None) or self._search_regex(
- r'<h1[^>]+\bclass=["\']us-title[^>]+>([^<]+)', webpage, 'title')
- return {
- '_type': 'url_transparent',
- 'id': video_id,
- 'url': 'limelight:media:%s' % video_id,
- 'title': title,
- 'description': video_data.get('data-video-summary'),
- 'thumbnail': video_data.get('data-video-poster'),
- 'series': 'Pokémon',
- 'season_number': int_or_none(video_data.get('data-video-season')),
- 'episode': title,
- 'episode_number': int_or_none(video_data.get('data-video-episode')),
- 'ie_key': 'LimelightMedia',
- }
+++ /dev/null
-from __future__ import unicode_literals
-
-import re
-
-from .common import InfoExtractor
-from ..compat import (
- compat_urllib_parse_unquote,
- compat_urllib_parse_urlparse,
-)
-from ..utils import (
- sanitized_Request,
- str_to_int,
- unified_strdate,
-)
-from ..aes import aes_decrypt_text
-
-
-class SpankwireIE(InfoExtractor):
- _VALID_URL = r'https?://(?:www\.)?(?P<url>spankwire\.com/[^/]*/video(?P<id>[0-9]+)/?)'
- _TESTS = [{
- # download URL pattern: */<height>P_<tbr>K_<video_id>.mp4
- 'url': 'http://www.spankwire.com/Buckcherry-s-X-Rated-Music-Video-Crazy-Bitch/video103545/',
- 'md5': '8bbfde12b101204b39e4b9fe7eb67095',
- 'info_dict': {
- 'id': '103545',
- 'ext': 'mp4',
- 'title': 'Buckcherry`s X Rated Music Video Crazy Bitch',
- 'description': 'Crazy Bitch X rated music video.',
- 'uploader': 'oreusz',
- 'uploader_id': '124697',
- 'upload_date': '20070507',
- 'age_limit': 18,
- }
- }, {
- # download URL pattern: */mp4_<format_id>_<video_id>.mp4
- 'url': 'http://www.spankwire.com/Titcums-Compiloation-I/video1921551/',
- 'md5': '09b3c20833308b736ae8902db2f8d7e6',
- 'info_dict': {
- 'id': '1921551',
- 'ext': 'mp4',
- 'title': 'Titcums Compiloation I',
- 'description': 'cum on tits',
- 'uploader': 'dannyh78999',
- 'uploader_id': '3056053',
- 'upload_date': '20150822',
- 'age_limit': 18,
- },
- }]
-
- def _real_extract(self, url):
- mobj = re.match(self._VALID_URL, url)
- video_id = mobj.group('id')
-
- req = sanitized_Request('http://www.' + mobj.group('url'))
- req.add_header('Cookie', 'age_verified=1')
- webpage = self._download_webpage(req, video_id)
-
- title = self._html_search_regex(
- r'<h1>([^<]+)', webpage, 'title')
- description = self._html_search_regex(
- r'(?s)<div\s+id="descriptionContent">(.+?)</div>',
- webpage, 'description', fatal=False)
- thumbnail = self._html_search_regex(
- r'playerData\.screenShot\s*=\s*["\']([^"\']+)["\']',
- webpage, 'thumbnail', fatal=False)
-
- uploader = self._html_search_regex(
- r'by:\s*<a [^>]*>(.+?)</a>',
- webpage, 'uploader', fatal=False)
- uploader_id = self._html_search_regex(
- r'by:\s*<a href="/(?:user/viewProfile|Profile\.aspx)\?.*?UserId=(\d+).*?"',
- webpage, 'uploader id', fatal=False)
- upload_date = unified_strdate(self._html_search_regex(
- r'</a> on (.+?) at \d+:\d+',
- webpage, 'upload date', fatal=False))
-
- view_count = str_to_int(self._html_search_regex(
- r'<div id="viewsCounter"><span>([\d,\.]+)</span> views</div>',
- webpage, 'view count', fatal=False))
- comment_count = str_to_int(self._html_search_regex(
- r'<span\s+id="spCommentCount"[^>]*>([\d,\.]+)</span>',
- webpage, 'comment count', fatal=False))
-
- videos = re.findall(
- r'playerData\.cdnPath([0-9]{3,})\s*=\s*(?:encodeURIComponent\()?["\']([^"\']+)["\']', webpage)
- heights = [int(video[0]) for video in videos]
- video_urls = list(map(compat_urllib_parse_unquote, [video[1] for video in videos]))
- if webpage.find(r'flashvars\.encrypted = "true"') != -1:
- password = self._search_regex(
- r'flashvars\.video_title = "([^"]+)',
- webpage, 'password').replace('+', ' ')
- video_urls = list(map(
- lambda s: aes_decrypt_text(s, password, 32).decode('utf-8'),
- video_urls))
-
- formats = []
- for height, video_url in zip(heights, video_urls):
- path = compat_urllib_parse_urlparse(video_url).path
- m = re.search(r'/(?P<height>\d+)[pP]_(?P<tbr>\d+)[kK]', path)
- if m:
- tbr = int(m.group('tbr'))
- height = int(m.group('height'))
- else:
- tbr = None
- formats.append({
- 'url': video_url,
- 'format_id': '%dp' % height,
- 'height': height,
- 'tbr': tbr,
- })
- self._sort_formats(formats)
-
- age_limit = self._rta_search(webpage)
-
- return {
- 'id': video_id,
- 'title': title,
- 'description': description,
- 'thumbnail': thumbnail,
- 'uploader': uploader,
- 'uploader_id': uploader_id,
- 'upload_date': upload_date,
- 'view_count': view_count,
- 'comment_count': comment_count,
- 'formats': formats,
- 'age_limit': age_limit,
- }
+++ /dev/null
-from __future__ import unicode_literals
-
-from .common import InfoExtractor
-from ..utils import int_or_none
-
-
-class StretchInternetIE(InfoExtractor):
- _VALID_URL = r'https?://portal\.stretchinternet\.com/[^/]+/portal\.htm\?.*?\beventId=(?P<id>\d+)'
- _TEST = {
- 'url': 'https://portal.stretchinternet.com/umary/portal.htm?eventId=313900&streamType=video',
- 'info_dict': {
- 'id': '313900',
- 'ext': 'mp4',
- 'title': 'Augustana (S.D.) Baseball vs University of Mary',
- 'description': 'md5:7578478614aae3bdd4a90f578f787438',
- 'timestamp': 1490468400,
- 'upload_date': '20170325',
- }
- }
-
- def _real_extract(self, url):
- video_id = self._match_id(url)
-
- stream = self._download_json(
- 'https://neo-client.stretchinternet.com/streamservice/v1/media/stream/v%s'
- % video_id, video_id)
-
- video_url = 'https://%s' % stream['source']
-
- event = self._download_json(
- 'https://neo-client.stretchinternet.com/portal-ws/getEvent.json',
- video_id, query={
- 'clientID': 99997,
- 'eventID': video_id,
- 'token': 'asdf',
- })['event']
-
- title = event.get('title') or event['mobileTitle']
- description = event.get('customText')
- timestamp = int_or_none(event.get('longtime'))
-
- return {
- 'id': video_id,
- 'title': title,
- 'description': description,
- 'timestamp': timestamp,
- 'url': video_url,
- }
+++ /dev/null
-# coding: utf-8
-from __future__ import unicode_literals
-
-from .common import InfoExtractor
-from .nexx import NexxIE
-from ..compat import compat_urlparse
-
-
-class Tele5IE(InfoExtractor):
- _VALID_URL = r'https?://(?:www\.)?tele5\.de/(?:[^/]+/)*(?P<id>[^/?#&]+)'
- _TESTS = [{
- 'url': 'https://www.tele5.de/mediathek/filme-online/videos?vid=1549416',
- 'info_dict': {
- 'id': '1549416',
- 'ext': 'mp4',
- 'upload_date': '20180814',
- 'timestamp': 1534290623,
- 'title': 'Pandorum',
- },
- 'params': {
- 'skip_download': True,
- },
- }, {
- 'url': 'https://www.tele5.de/kalkofes-mattscheibe/video-clips/politik-und-gesellschaft?ve_id=1551191',
- 'only_matching': True,
- }, {
- 'url': 'https://www.tele5.de/video-clip/?ve_id=1609440',
- 'only_matching': True,
- }, {
- 'url': 'https://www.tele5.de/filme/schlefaz-dragon-crusaders/',
- 'only_matching': True,
- }, {
- 'url': 'https://www.tele5.de/filme/making-of/avengers-endgame/',
- 'only_matching': True,
- }, {
- 'url': 'https://www.tele5.de/star-trek/raumschiff-voyager/ganze-folge/das-vinculum/',
- 'only_matching': True,
- }, {
- 'url': 'https://www.tele5.de/anders-ist-sevda/',
- 'only_matching': True,
- }]
-
- def _real_extract(self, url):
- qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
- video_id = (qs.get('vid') or qs.get('ve_id') or [None])[0]
-
- if not video_id:
- display_id = self._match_id(url)
- webpage = self._download_webpage(url, display_id)
- video_id = self._html_search_regex(
- (r'id\s*=\s*["\']video-player["\'][^>]+data-id\s*=\s*["\'](\d+)',
- r'\s+id\s*=\s*["\']player_(\d{6,})',
- r'\bdata-id\s*=\s*["\'](\d{6,})'), webpage, 'video id')
-
- return self.url_result(
- 'https://api.nexx.cloud/v3/759/videos/byid/%s' % video_id,
- ie=NexxIE.ie_key(), video_id=video_id)
+++ /dev/null
-from __future__ import unicode_literals
-
-from .common import InfoExtractor
-from ..utils import (
- dict_get,
- float_or_none,
- int_or_none,
- unified_timestamp,
- update_url_query,
- url_or_none,
-)
-
-
-class TruNewsIE(InfoExtractor):
- _VALID_URL = r'https?://(?:www\.)?trunews\.com/stream/(?P<id>[^/?#&]+)'
- _TEST = {
- 'url': 'https://www.trunews.com/stream/will-democrats-stage-a-circus-during-president-trump-s-state-of-the-union-speech',
- 'md5': 'a19c024c3906ff954fac9b96ce66bb08',
- 'info_dict': {
- 'id': '5c5a21e65d3c196e1c0020cc',
- 'display_id': 'will-democrats-stage-a-circus-during-president-trump-s-state-of-the-union-speech',
- 'ext': 'mp4',
- 'title': "Will Democrats Stage a Circus During President Trump's State of the Union Speech?",
- 'description': 'md5:c583b72147cc92cf21f56a31aff7a670',
- 'duration': 3685,
- 'timestamp': 1549411440,
- 'upload_date': '20190206',
- },
- 'add_ie': ['Zype'],
- }
-
- def _real_extract(self, url):
- display_id = self._match_id(url)
-
- video = self._download_json(
- 'https://api.zype.com/videos', display_id, query={
- 'app_key': 'PUVKp9WgGUb3-JUw6EqafLx8tFVP6VKZTWbUOR-HOm__g4fNDt1bCsm_LgYf_k9H',
- 'per_page': 1,
- 'active': 'true',
- 'friendly_title': display_id,
- })['response'][0]
-
- zype_id = video['_id']
-
- thumbnails = []
- thumbnails_list = video.get('thumbnails')
- if isinstance(thumbnails_list, list):
- for thumbnail in thumbnails_list:
- if not isinstance(thumbnail, dict):
- continue
- thumbnail_url = url_or_none(thumbnail.get('url'))
- if not thumbnail_url:
- continue
- thumbnails.append({
- 'url': thumbnail_url,
- 'width': int_or_none(thumbnail.get('width')),
- 'height': int_or_none(thumbnail.get('height')),
- })
-
- return {
- '_type': 'url_transparent',
- 'url': update_url_query(
- 'https://player.zype.com/embed/%s.js' % zype_id,
- {'api_key': 'X5XnahkjCwJrT_l5zUqypnaLEObotyvtUKJWWlONxDoHVjP8vqxlArLV8llxMbyt'}),
- 'ie_key': 'Zype',
- 'id': zype_id,
- 'display_id': display_id,
- 'title': video.get('title'),
- 'description': dict_get(video, ('description', 'ott_description', 'short_description')),
- 'duration': int_or_none(video.get('duration')),
- 'timestamp': unified_timestamp(video.get('published_at')),
- 'average_rating': float_or_none(video.get('rating')),
- 'view_count': int_or_none(video.get('request_count')),
- 'thumbnails': thumbnails,
- }
+++ /dev/null
-# coding: utf-8
-from __future__ import unicode_literals
-
-from .common import InfoExtractor
-from ..utils import (
- clean_html,
- determine_ext,
- extract_attributes,
- get_element_by_class,
- int_or_none,
- parse_duration,
- parse_iso8601,
-)
-
-
-class TV5MondePlusIE(InfoExtractor):
- IE_DESC = 'TV5MONDE+'
- _VALID_URL = r'https?://(?:www\.)?tv5mondeplus\.com/toutes-les-videos/[^/]+/(?P<id>[^/?#]+)'
- _TEST = {
- 'url': 'http://www.tv5mondeplus.com/toutes-les-videos/documentaire/tdah-mon-amour-tele-quebec-tdah-mon-amour-ep001-enfants',
- 'md5': '12130fc199f020673138a83466542ec6',
- 'info_dict': {
- 'id': 'tdah-mon-amour-tele-quebec-tdah-mon-amour-ep001-enfants',
- 'ext': 'mp4',
- 'title': 'Tdah, mon amour - Enfants',
- 'description': 'md5:230e3aca23115afcf8006d1bece6df74',
- 'upload_date': '20170401',
- 'timestamp': 1491022860,
- }
- }
- _GEO_BYPASS = False
-
- def _real_extract(self, url):
- display_id = self._match_id(url)
- webpage = self._download_webpage(url, display_id)
-
- if ">Ce programme n'est malheureusement pas disponible pour votre zone géographique.<" in webpage:
- self.raise_geo_restricted(countries=['FR'])
-
- series = get_element_by_class('video-detail__title', webpage)
- title = episode = get_element_by_class(
- 'video-detail__subtitle', webpage) or series
- if series and series != title:
- title = '%s - %s' % (series, title)
- vpl_data = extract_attributes(self._search_regex(
- r'(<[^>]+class="video_player_loader"[^>]+>)',
- webpage, 'video player loader'))
-
- video_files = self._parse_json(
- vpl_data['data-broadcast'], display_id).get('files', [])
- formats = []
- for video_file in video_files:
- v_url = video_file.get('url')
- if not v_url:
- continue
- video_format = video_file.get('format') or determine_ext(v_url)
- if video_format == 'm3u8':
- formats.extend(self._extract_m3u8_formats(
- v_url, display_id, 'mp4', 'm3u8_native',
- m3u8_id='hls', fatal=False))
- else:
- formats.append({
- 'url': v_url,
- 'format_id': video_format,
- })
- self._sort_formats(formats)
-
- return {
- 'id': display_id,
- 'display_id': display_id,
- 'title': title,
- 'description': clean_html(get_element_by_class('video-detail__description', webpage)),
- 'thumbnail': vpl_data.get('data-image'),
- 'duration': int_or_none(vpl_data.get('data-duration')) or parse_duration(self._html_search_meta('duration', webpage)),
- 'timestamp': parse_iso8601(self._html_search_meta('uploadDate', webpage)),
- 'formats': formats,
- 'episode': episode,
- 'series': series,
- }
+++ /dev/null
-from __future__ import unicode_literals
-
-import base64
-import re
-
-from .common import InfoExtractor
-from ..compat import compat_urllib_parse_unquote
-from ..utils import (
- ExtractorError,
- clean_html,
- determine_ext,
- int_or_none,
- js_to_json,
- parse_age_limit,
- parse_duration,
- try_get,
-)
-
-
-class ViewLiftBaseIE(InfoExtractor):
- _DOMAINS_REGEX = r'(?:(?:main\.)?snagfilms|snagxtreme|funnyforfree|kiddovid|winnersview|(?:monumental|lax)sportsnetwork|vayafilm)\.com|hoichoi\.tv'
-
-
-class ViewLiftEmbedIE(ViewLiftBaseIE):
- _VALID_URL = r'https?://(?:(?:www|embed)\.)?(?:%s)/embed/player\?.*\bfilmId=(?P<id>[\da-f]{8}-(?:[\da-f]{4}-){3}[\da-f]{12})' % ViewLiftBaseIE._DOMAINS_REGEX
- _TESTS = [{
- 'url': 'http://embed.snagfilms.com/embed/player?filmId=74849a00-85a9-11e1-9660-123139220831&w=500',
- 'md5': '2924e9215c6eff7a55ed35b72276bd93',
- 'info_dict': {
- 'id': '74849a00-85a9-11e1-9660-123139220831',
- 'ext': 'mp4',
- 'title': '#whilewewatch',
- }
- }, {
- # invalid labels, 360p is better that 480p
- 'url': 'http://www.snagfilms.com/embed/player?filmId=17ca0950-a74a-11e0-a92a-0026bb61d036',
- 'md5': '882fca19b9eb27ef865efeeaed376a48',
- 'info_dict': {
- 'id': '17ca0950-a74a-11e0-a92a-0026bb61d036',
- 'ext': 'mp4',
- 'title': 'Life in Limbo',
- }
- }, {
- 'url': 'http://www.snagfilms.com/embed/player?filmId=0000014c-de2f-d5d6-abcf-ffef58af0017',
- 'only_matching': True,
- }]
-
- @staticmethod
- def _extract_url(webpage):
- mobj = re.search(
- r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//(?:embed\.)?(?:%s)/embed/player.+?)\1' % ViewLiftBaseIE._DOMAINS_REGEX,
- webpage)
- if mobj:
- return mobj.group('url')
-
- def _real_extract(self, url):
- video_id = self._match_id(url)
-
- webpage = self._download_webpage(url, video_id)
-
- if '>This film is not playable in your area.<' in webpage:
- raise ExtractorError(
- 'Film %s is not playable in your area.' % video_id, expected=True)
-
- formats = []
- has_bitrate = False
- sources = self._parse_json(self._search_regex(
- r'(?s)sources:\s*(\[.+?\]),', webpage,
- 'sources', default='[]'), video_id, js_to_json)
- for source in sources:
- file_ = source.get('file')
- if not file_:
- continue
- type_ = source.get('type')
- ext = determine_ext(file_)
- format_id = source.get('label') or ext
- if all(v in ('m3u8', 'hls') for v in (type_, ext)):
- formats.extend(self._extract_m3u8_formats(
- file_, video_id, 'mp4', 'm3u8_native',
- m3u8_id='hls', fatal=False))
- else:
- bitrate = int_or_none(self._search_regex(
- [r'(\d+)kbps', r'_\d{1,2}x\d{1,2}_(\d{3,})\.%s' % ext],
- file_, 'bitrate', default=None))
- if not has_bitrate and bitrate:
- has_bitrate = True
- height = int_or_none(self._search_regex(
- r'^(\d+)[pP]$', format_id, 'height', default=None))
- formats.append({
- 'url': file_,
- 'format_id': 'http-%s%s' % (format_id, ('-%dk' % bitrate if bitrate else '')),
- 'tbr': bitrate,
- 'height': height,
- })
- if not formats:
- hls_url = self._parse_json(self._search_regex(
- r'filmInfo\.src\s*=\s*({.+?});',
- webpage, 'src'), video_id, js_to_json)['src']
- formats = self._extract_m3u8_formats(
- hls_url, video_id, 'mp4', 'm3u8_native',
- m3u8_id='hls', fatal=False)
- field_preference = None if has_bitrate else ('height', 'tbr', 'format_id')
- self._sort_formats(formats, field_preference)
-
- title = self._search_regex(
- [r"title\s*:\s*'([^']+)'", r'<title>([^<]+)</title>'],
- webpage, 'title')
-
- return {
- 'id': video_id,
- 'title': title,
- 'formats': formats,
- }
-
-
-class ViewLiftIE(ViewLiftBaseIE):
- _VALID_URL = r'https?://(?:www\.)?(?P<domain>%s)(?:/(?:films/title|show|(?:news/)?videos?))?/(?P<id>[^?#]+)' % ViewLiftBaseIE._DOMAINS_REGEX
- _TESTS = [{
- 'url': 'http://www.snagfilms.com/films/title/lost_for_life',
- 'md5': '19844f897b35af219773fd63bdec2942',
- 'info_dict': {
- 'id': '0000014c-de2f-d5d6-abcf-ffef58af0017',
- 'display_id': 'lost_for_life',
- 'ext': 'mp4',
- 'title': 'Lost for Life',
- 'description': 'md5:ea10b5a50405ae1f7b5269a6ec594102',
- 'thumbnail': r're:^https?://.*\.jpg',
- 'duration': 4489,
- 'categories': 'mincount:3',
- 'age_limit': 14,
- 'upload_date': '20150421',
- 'timestamp': 1429656820,
- }
- }, {
- 'url': 'http://www.snagfilms.com/show/the_world_cut_project/india',
- 'md5': 'e6292e5b837642bbda82d7f8bf3fbdfd',
- 'info_dict': {
- 'id': '00000145-d75c-d96e-a9c7-ff5c67b20000',
- 'display_id': 'the_world_cut_project/india',
- 'ext': 'mp4',
- 'title': 'India',
- 'description': 'md5:5c168c5a8f4719c146aad2e0dfac6f5f',
- 'thumbnail': r're:^https?://.*\.jpg',
- 'duration': 979,
- 'timestamp': 1399478279,
- 'upload_date': '20140507',
- }
- }, {
- 'url': 'http://main.snagfilms.com/augie_alone/s_2_ep_12_love',
- 'info_dict': {
- 'id': '00000148-7b53-de26-a9fb-fbf306f70020',
- 'display_id': 'augie_alone/s_2_ep_12_love',
- 'ext': 'mp4',
- 'title': 'Augie, Alone:S. 2 Ep. 12 - Love',
- 'description': 'md5:db2a5c72d994f16a780c1eb353a8f403',
- 'thumbnail': r're:^https?://.*\.jpg',
- 'duration': 107,
- },
- 'params': {
- 'skip_download': True,
- },
- }, {
- 'url': 'http://main.snagfilms.com/films/title/the_freebie',
- 'only_matching': True,
- }, {
- # Film is not playable in your area.
- 'url': 'http://www.snagfilms.com/films/title/inside_mecca',
- 'only_matching': True,
- }, {
- # Film is not available.
- 'url': 'http://www.snagfilms.com/show/augie_alone/flirting',
- 'only_matching': True,
- }, {
- 'url': 'http://www.winnersview.com/videos/the-good-son',
- 'only_matching': True,
- }, {
- # Was once Kaltura embed
- 'url': 'https://www.monumentalsportsnetwork.com/videos/john-carlson-postgame-2-25-15',
- 'only_matching': True,
- }]
-
- @classmethod
- def suitable(cls, url):
- return False if ViewLiftEmbedIE.suitable(url) else super(ViewLiftIE, cls).suitable(url)
-
- def _real_extract(self, url):
- domain, display_id = re.match(self._VALID_URL, url).groups()
-
- webpage = self._download_webpage(url, display_id)
-
- if ">Sorry, the Film you're looking for is not available.<" in webpage:
- raise ExtractorError(
- 'Film %s is not available.' % display_id, expected=True)
-
- initial_store_state = self._search_regex(
- r"window\.initialStoreState\s*=.*?JSON\.parse\(unescape\(atob\('([^']+)'\)\)\)",
- webpage, 'Initial Store State', default=None)
- if initial_store_state:
- modules = self._parse_json(compat_urllib_parse_unquote(base64.b64decode(
- initial_store_state).decode()), display_id)['page']['data']['modules']
- content_data = next(m['contentData'][0] for m in modules if m.get('moduleType') == 'VideoDetailModule')
- gist = content_data['gist']
- film_id = gist['id']
- title = gist['title']
- video_assets = try_get(
- content_data, lambda x: x['streamingInfo']['videoAssets'], dict)
- if not video_assets:
- token = self._download_json(
- 'https://prod-api.viewlift.com/identity/anonymous-token',
- film_id, 'Downloading authorization token',
- query={'site': 'snagfilms'})['authorizationToken']
- video_assets = self._download_json(
- 'https://prod-api.viewlift.com/entitlement/video/status',
- film_id, headers={
- 'Authorization': token,
- 'Referer': url,
- }, query={
- 'id': film_id
- })['video']['streamingInfo']['videoAssets']
-
- formats = []
- mpeg_video_assets = video_assets.get('mpeg') or []
- for video_asset in mpeg_video_assets:
- video_asset_url = video_asset.get('url')
- if not video_asset:
- continue
- bitrate = int_or_none(video_asset.get('bitrate'))
- height = int_or_none(self._search_regex(
- r'^_?(\d+)[pP]$', video_asset.get('renditionValue'),
- 'height', default=None))
- formats.append({
- 'url': video_asset_url,
- 'format_id': 'http%s' % ('-%d' % bitrate if bitrate else ''),
- 'tbr': bitrate,
- 'height': height,
- 'vcodec': video_asset.get('codec'),
- })
-
- hls_url = video_assets.get('hls')
- if hls_url:
- formats.extend(self._extract_m3u8_formats(
- hls_url, film_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
- self._sort_formats(formats, ('height', 'tbr', 'format_id'))
-
- info = {
- 'id': film_id,
- 'display_id': display_id,
- 'title': title,
- 'description': gist.get('description'),
- 'thumbnail': gist.get('videoImageUrl'),
- 'duration': int_or_none(gist.get('runtime')),
- 'age_limit': parse_age_limit(content_data.get('parentalRating')),
- 'timestamp': int_or_none(gist.get('publishDate'), 1000),
- 'formats': formats,
- }
- for k in ('categories', 'tags'):
- info[k] = [v['title'] for v in content_data.get(k, []) if v.get('title')]
- return info
- else:
- film_id = self._search_regex(r'filmId=([\da-f-]{36})"', webpage, 'film id')
-
- snag = self._parse_json(
- self._search_regex(
- r'Snag\.page\.data\s*=\s*(\[.+?\]);', webpage, 'snag', default='[]'),
- display_id)
-
- for item in snag:
- if item.get('data', {}).get('film', {}).get('id') == film_id:
- data = item['data']['film']
- title = data['title']
- description = clean_html(data.get('synopsis'))
- thumbnail = data.get('image')
- duration = int_or_none(data.get('duration') or data.get('runtime'))
- categories = [
- category['title'] for category in data.get('categories', [])
- if category.get('title')]
- break
- else:
- title = self._html_search_regex(
- (r'itemprop="title">([^<]+)<',
- r'(?s)itemprop="title">(.+?)<div'), webpage, 'title')
- description = self._html_search_regex(
- r'(?s)<div itemprop="description" class="film-synopsis-inner ">(.+?)</div>',
- webpage, 'description', default=None) or self._og_search_description(webpage)
- thumbnail = self._og_search_thumbnail(webpage)
- duration = parse_duration(self._search_regex(
- r'<span itemprop="duration" class="film-duration strong">([^<]+)<',
- webpage, 'duration', fatal=False))
- categories = re.findall(r'<a href="/movies/[^"]+">([^<]+)</a>', webpage)
-
- return {
- '_type': 'url_transparent',
- 'url': 'http://%s/embed/player?filmId=%s' % (domain, film_id),
- 'id': film_id,
- 'display_id': display_id,
- 'title': title,
- 'description': description,
- 'thumbnail': thumbnail,
- 'duration': duration,
- 'categories': categories,
- 'ie_key': 'ViewLiftEmbed',
- }
+++ /dev/null
-from __future__ import unicode_literals
-
-import re
-
-from .common import InfoExtractor
-from ..compat import (
- compat_str,
- compat_urlparse,
-)
-from ..utils import (
- ExtractorError,
- determine_ext,
- int_or_none,
- sanitized_Request,
-)
-
-
-class VoiceRepublicIE(InfoExtractor):
- _VALID_URL = r'https?://voicerepublic\.com/(?:talks|embed)/(?P<id>[0-9a-z-]+)'
- _TESTS = [{
- 'url': 'http://voicerepublic.com/talks/watching-the-watchers-building-a-sousveillance-state',
- 'md5': 'b9174d651323f17783000876347116e3',
- 'info_dict': {
- 'id': '2296',
- 'display_id': 'watching-the-watchers-building-a-sousveillance-state',
- 'ext': 'm4a',
- 'title': 'Watching the Watchers: Building a Sousveillance State',
- 'description': 'Secret surveillance programs have metadata too. The people and companies that operate secret surveillance programs can be surveilled.',
- 'thumbnail': r're:^https?://.*\.(?:png|jpg)$',
- 'duration': 1800,
- 'view_count': int,
- }
- }, {
- 'url': 'http://voicerepublic.com/embed/watching-the-watchers-building-a-sousveillance-state',
- 'only_matching': True,
- }]
-
- def _real_extract(self, url):
- display_id = self._match_id(url)
-
- req = sanitized_Request(
- compat_urlparse.urljoin(url, '/talks/%s' % display_id))
- # Older versions of Firefox get redirected to an "upgrade browser" page
- req.add_header('User-Agent', 'youtube-dl')
- webpage = self._download_webpage(req, display_id)
-
- if '>Queued for processing, please stand by...<' in webpage:
- raise ExtractorError(
- 'Audio is still queued for processing', expected=True)
-
- config = self._search_regex(
- r'(?s)return ({.+?});\s*\n', webpage,
- 'data', default=None)
- data = self._parse_json(config, display_id, fatal=False) if config else None
- if data:
- title = data['title']
- description = data.get('teaser')
- talk_id = compat_str(data.get('talk_id') or display_id)
- talk = data['talk']
- duration = int_or_none(talk.get('duration'))
- formats = [{
- 'url': compat_urlparse.urljoin(url, talk_url),
- 'format_id': format_id,
- 'ext': determine_ext(talk_url) or format_id,
- 'vcodec': 'none',
- } for format_id, talk_url in talk['links'].items()]
- else:
- title = self._og_search_title(webpage)
- description = self._html_search_regex(
- r"(?s)<div class='talk-teaser'[^>]*>(.+?)</div>",
- webpage, 'description', fatal=False)
- talk_id = self._search_regex(
- [r"id='jc-(\d+)'", r"data-shareable-id='(\d+)'"],
- webpage, 'talk id', default=None) or display_id
- duration = None
- player = self._search_regex(
- r"class='vr-player jp-jplayer'([^>]+)>", webpage, 'player')
- formats = [{
- 'url': compat_urlparse.urljoin(url, talk_url),
- 'format_id': format_id,
- 'ext': determine_ext(talk_url) or format_id,
- 'vcodec': 'none',
- } for format_id, talk_url in re.findall(r"data-([^=]+)='([^']+)'", player)]
- self._sort_formats(formats)
-
- thumbnail = self._og_search_thumbnail(webpage)
- view_count = int_or_none(self._search_regex(
- r"class='play-count[^']*'>\s*(\d+) plays",
- webpage, 'play count', fatal=False))
-
- return {
- 'id': talk_id,
- 'display_id': display_id,
- 'title': title,
- 'description': description,
- 'thumbnail': thumbnail,
- 'duration': duration,
- 'view_count': view_count,
- 'formats': formats,
- }
+++ /dev/null
-from __future__ import unicode_literals
-
-import re
-
-from .common import InfoExtractor
-from ..utils import (
- ExtractorError,
- int_or_none,
- float_or_none,
- unescapeHTML,
-)
-
-
-class WistiaIE(InfoExtractor):
- _VALID_URL = r'(?:wistia:|https?://(?:fast\.)?wistia\.(?:net|com)/embed/(?:iframe|medias)/)(?P<id>[a-z0-9]{10})'
- _API_URL = 'http://fast.wistia.com/embed/medias/%s.json'
- _IFRAME_URL = 'http://fast.wistia.net/embed/iframe/%s'
-
- _TESTS = [{
- 'url': 'http://fast.wistia.net/embed/iframe/sh7fpupwlt',
- 'md5': 'cafeb56ec0c53c18c97405eecb3133df',
- 'info_dict': {
- 'id': 'sh7fpupwlt',
- 'ext': 'mov',
- 'title': 'Being Resourceful',
- 'description': 'a Clients From Hell Video Series video from worldwidewebhosting',
- 'upload_date': '20131204',
- 'timestamp': 1386185018,
- 'duration': 117,
- },
- }, {
- 'url': 'wistia:sh7fpupwlt',
- 'only_matching': True,
- }, {
- # with hls video
- 'url': 'wistia:807fafadvk',
- 'only_matching': True,
- }, {
- 'url': 'http://fast.wistia.com/embed/iframe/sh7fpupwlt',
- 'only_matching': True,
- }, {
- 'url': 'http://fast.wistia.net/embed/medias/sh7fpupwlt.json',
- 'only_matching': True,
- }]
-
- # https://wistia.com/support/embed-and-share/video-on-your-website
- @staticmethod
- def _extract_url(webpage):
- match = re.search(
- r'<(?:meta[^>]+?content|(?:iframe|script)[^>]+?src)=["\'](?P<url>(?:https?:)?//(?:fast\.)?wistia\.(?:net|com)/embed/(?:iframe|medias)/[a-z0-9]{10})', webpage)
- if match:
- return unescapeHTML(match.group('url'))
-
- match = re.search(
- r'''(?sx)
- <script[^>]+src=(["'])(?:https?:)?//fast\.wistia\.com/assets/external/E-v1\.js\1[^>]*>.*?
- <div[^>]+class=(["']).*?\bwistia_async_(?P<id>[a-z0-9]{10})\b.*?\2
- ''', webpage)
- if match:
- return 'wistia:%s' % match.group('id')
-
- match = re.search(r'(?:data-wistia-?id=["\']|Wistia\.embed\(["\']|id=["\']wistia_)(?P<id>[a-z0-9]{10})', webpage)
- if match:
- return 'wistia:%s' % match.group('id')
-
- def _real_extract(self, url):
- video_id = self._match_id(url)
-
- data_json = self._download_json(
- self._API_URL % video_id, video_id,
- # Some videos require this.
- headers={
- 'Referer': url if url.startswith('http') else self._IFRAME_URL % video_id,
- })
-
- if data_json.get('error'):
- raise ExtractorError(
- 'Error while getting the playlist', expected=True)
-
- data = data_json['media']
- title = data['name']
-
- formats = []
- thumbnails = []
- for a in data['assets']:
- aurl = a.get('url')
- if not aurl:
- continue
- astatus = a.get('status')
- atype = a.get('type')
- if (astatus is not None and astatus != 2) or atype in ('preview', 'storyboard'):
- continue
- elif atype in ('still', 'still_image'):
- thumbnails.append({
- 'url': aurl,
- 'width': int_or_none(a.get('width')),
- 'height': int_or_none(a.get('height')),
- })
- else:
- aext = a.get('ext')
- is_m3u8 = a.get('container') == 'm3u8' or aext == 'm3u8'
- formats.append({
- 'format_id': atype,
- 'url': aurl,
- 'tbr': int_or_none(a.get('bitrate')),
- 'vbr': int_or_none(a.get('opt_vbitrate')),
- 'width': int_or_none(a.get('width')),
- 'height': int_or_none(a.get('height')),
- 'filesize': int_or_none(a.get('size')),
- 'vcodec': a.get('codec'),
- 'container': a.get('container'),
- 'ext': 'mp4' if is_m3u8 else aext,
- 'protocol': 'm3u8' if is_m3u8 else None,
- 'preference': 1 if atype == 'original' else None,
- })
-
- self._sort_formats(formats)
-
- return {
- 'id': video_id,
- 'title': title,
- 'description': data.get('seoDescription'),
- 'formats': formats,
- 'thumbnails': thumbnails,
- 'duration': float_or_none(data.get('duration')),
- 'timestamp': int_or_none(data.get('createdAt')),
- }
+++ /dev/null
-# coding: utf-8
-from __future__ import unicode_literals
-
-import re
-
-from .common import InfoExtractor
-
-
-class ZypeIE(InfoExtractor):
- _VALID_URL = r'https?://player\.zype\.com/embed/(?P<id>[\da-fA-F]+)\.js\?.*?api_key=[^&]+'
- _TEST = {
- 'url': 'https://player.zype.com/embed/5b400b834b32992a310622b9.js?api_key=jZ9GUhRmxcPvX7M3SlfejB6Hle9jyHTdk2jVxG7wOHPLODgncEKVdPYBhuz9iWXQ&autoplay=false&controls=true&da=false',
- 'md5': 'eaee31d474c76a955bdaba02a505c595',
- 'info_dict': {
- 'id': '5b400b834b32992a310622b9',
- 'ext': 'mp4',
- 'title': 'Smoky Barbecue Favorites',
- 'thumbnail': r're:^https?://.*\.jpe?g',
- },
- }
-
- @staticmethod
- def _extract_urls(webpage):
- return [
- mobj.group('url')
- for mobj in re.finditer(
- r'<script[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//player\.zype\.com/embed/[\da-fA-F]+\.js\?.*?api_key=.+?)\1',
- webpage)]
-
- def _real_extract(self, url):
- video_id = self._match_id(url)
-
- webpage = self._download_webpage(url, video_id)
-
- title = self._search_regex(
- r'video_title\s*[:=]\s*(["\'])(?P<value>(?:(?!\1).)+)\1', webpage,
- 'title', group='value')
-
- m3u8_url = self._search_regex(
- r'(["\'])(?P<url>(?:(?!\1).)+\.m3u8(?:(?!\1).)*)\1', webpage,
- 'm3u8 url', group='url')
-
- formats = self._extract_m3u8_formats(
- m3u8_url, video_id, 'mp4', entry_protocol='m3u8_native',
- m3u8_id='hls')
- self._sort_formats(formats)
-
- thumbnail = self._search_regex(
- r'poster\s*[:=]\s*(["\'])(?P<url>(?:(?!\1).)+)\1', webpage, 'thumbnail',
- default=False, group='url')
-
- return {
- 'id': video_id,
- 'title': title,
- 'thumbnail': thumbnail,
- 'formats': formats,
- }
YoutubeDLCookieJar,
YoutubeDLCookieProcessor,
YoutubeDLHandler,
+ YoutubeDLRedirectHandler,
)
from .cache import Cache
from .extractor import get_info_extractor, gen_extractor_classes, _LAZY_LOADER
playlist items.
postprocessors: A list of dictionaries, each with an entry
* key: The name of the postprocessor. See
- youtube_dl/postprocessor/__init__.py for a list.
+ youtube_dlc/postprocessor/__init__.py for a list.
as well as any further keyword arguments for the
postprocessor.
progress_hooks: A list of functions that get called on download
about it, warn otherwise (default)
source_address: Client-side IP address to bind to.
call_home: Boolean, true iff we are allowed to contact the
- youtube-dl servers for debugging.
+ youtube-dlc servers for debugging.
sleep_interval: Number of seconds to sleep before each download when
used alone or a lower bound of a range for randomized
sleep before each download (minimum possible number
use downloader suggested by extractor if None.
The following parameters are not used by YoutubeDL itself, they are used by
- the downloader (see youtube_dl/downloader/common.py):
+ the downloader (see youtube_dlc/downloader/common.py):
nopart, updatetime, buffersize, ratelimit, min_filesize, max_filesize, test,
noresizebuffer, retries, continuedl, noprogress, consoletitle,
xattr_set_filesize, external_downloader_args, hls_use_mpegts,
if re.match(r'^-[0-9A-Za-z_-]{10}$', a)]
if idxs:
correct_argv = (
- ['youtube-dl']
+ ['youtube-dlc']
+ [a for i, a in enumerate(argv) if i not in idxs]
+ ['--'] + [argv[i] for i in idxs]
)
'playlist_title': ie_result.get('title'),
'playlist_uploader': ie_result.get('uploader'),
'playlist_uploader_id': ie_result.get('uploader_id'),
- 'playlist_index': i + playliststart,
+ 'playlist_index': playlistitems[i - 1] if playlistitems else i + playliststart,
'extractor': ie_result['extractor'],
'webpage_url': ie_result['webpage_url'],
'webpage_url_basename': url_basename(ie_result['webpage_url']),
self.report_error('Cannot write annotations file: ' + annofn)
return
+ def dl(name, info):
+ fd = get_suitable_downloader(info, self.params)(self, self.params)
+ for ph in self._progress_hooks:
+ fd.add_progress_hook(ph)
+ if self.params.get('verbose'):
+ self.to_stdout('[debug] Invoking downloader on %r' % info.get('url'))
+ return fd.download(name, info)
+
subtitles_are_requested = any([self.params.get('writesubtitles', False),
self.params.get('writeautomaticsub')])
# subtitles download errors are already managed as troubles in relevant IE
# that way it will silently go on when used with unsupporting IE
subtitles = info_dict['requested_subtitles']
- ie = self.get_info_extractor(info_dict['extractor_key'])
for sub_lang, sub_info in subtitles.items():
sub_format = sub_info['ext']
sub_filename = subtitles_filename(filename, sub_lang, sub_format, info_dict.get('ext'))
if self.params.get('nooverwrites', False) and os.path.exists(encodeFilename(sub_filename)):
self.to_screen('[info] Video subtitle %s.%s is already present' % (sub_lang, sub_format))
else:
- self.to_screen('[info] Writing video subtitles to: ' + sub_filename)
if sub_info.get('data') is not None:
try:
# Use newline='' to prevent conversion of newline characters
return
else:
try:
- sub_data = ie._request_webpage(
- sub_info['url'], info_dict['id'], note=False).read()
- with io.open(encodeFilename(sub_filename), 'wb') as subfile:
- subfile.write(sub_data)
- except (ExtractorError, IOError, OSError, ValueError) as err:
+ dl(sub_filename, sub_info)
+ except (ExtractorError, IOError, OSError, ValueError,
+ compat_urllib_error.URLError,
+ compat_http_client.HTTPException,
+ socket.error) as err:
self.report_warning('Unable to download subtitle for "%s": %s' %
(sub_lang, error_to_compat_str(err)))
continue
if not self.params.get('skip_download', False):
try:
- def dl(name, info):
- fd = get_suitable_downloader(info, self.params)(self, self.params)
- for ph in self._progress_hooks:
- fd.add_progress_hook(ph)
- if self.params.get('verbose'):
- self.to_stdout('[debug] Invoking downloader on %r' % info.get('url'))
- return fd.download(name, info)
-
if info_dict.get('requested_formats') is not None:
downloaded = []
success = True
self.get_encoding()))
write_string(encoding_str, encoding=None)
- self._write_string('[debug] youtube-dl version ' + __version__ + '\n')
+ self._write_string('[debug] youtube-dlc version ' + __version__ + '\n')
if _LAZY_LOADER:
self._write_string('[debug] Lazy loading extractors enabled' + '\n')
try:
debuglevel = 1 if self.params.get('debug_printtraffic') else 0
https_handler = make_HTTPS_handler(self.params, debuglevel=debuglevel)
ydlh = YoutubeDLHandler(self.params, debuglevel=debuglevel)
+ redirect_handler = YoutubeDLRedirectHandler()
data_handler = compat_urllib_request_DataHandler()
# When passing our own FileHandler instance, build_opener won't add the
file_handler = compat_urllib_request.FileHandler()
def file_open(*args, **kwargs):
- raise compat_urllib_error.URLError('file:// scheme is explicitly disabled in youtube-dl for security reasons')
+ raise compat_urllib_error.URLError('file:// scheme is explicitly disabled in youtube-dlc for security reasons')
file_handler.file_open = file_open
opener = compat_urllib_request.build_opener(
- proxy_handler, https_handler, cookie_processor, ydlh, data_handler, file_handler)
+ proxy_handler, https_handler, cookie_processor, ydlh, redirect_handler, data_handler, file_handler)
# Delete the default user-agent header, which would otherwise apply in
# cases where our custom HTTP handler doesn't come into play
workaround_optparse_bug9161()
- setproctitle('youtube-dl')
+ setproctitle('youtube-dlc')
parser, opts, args = parseOpts(argv)
ydl.warn_if_short_id(sys.argv[1:] if argv is None else argv)
parser.error(
'You must provide at least one URL.\n'
- 'Type youtube-dl --help to see a list of all options.')
+ 'Type youtube-dlc --help to see a list of all options.')
try:
if opts.load_info_filename is not None:
from __future__ import unicode_literals
# Execute with
-# $ python youtube_dl/__main__.py (2.6+)
-# $ python -m youtube_dl (2.7+)
+# $ python youtube_dlc/__main__.py (2.6+)
+# $ python -m youtube_dlc (2.7+)
import sys
path = os.path.realpath(os.path.abspath(__file__))
sys.path.insert(0, os.path.dirname(os.path.dirname(path)))
-import youtube_dl
+import youtube_dlc
if __name__ == '__main__':
- youtube_dl.main()
+ youtube_dlc.main()
res = self._ydl.params.get('cachedir')
if res is None:
cache_root = compat_getenv('XDG_CACHE_HOME', '~/.cache')
- res = os.path.join(cache_root, 'youtube-dl')
+ res = os.path.join(cache_root, 'youtube-dlc')
return expand_path(res)
def _get_cache_fn(self, section, key, dtype):
except ImportError: # Python 2
import cookielib as compat_cookiejar
+if sys.version_info[0] == 2:
+ class compat_cookiejar_Cookie(compat_cookiejar.Cookie):
+ def __init__(self, version, name, value, *args, **kwargs):
+ if isinstance(name, compat_str):
+ name = name.encode()
+ if isinstance(value, compat_str):
+ value = value.encode()
+ compat_cookiejar.Cookie.__init__(self, version, name, value, *args, **kwargs)
+else:
+ compat_cookiejar_Cookie = compat_cookiejar.Cookie
+
try:
import http.cookies as compat_cookies
except ImportError: # Python 2
compat_expanduser = os.path.expanduser
+if compat_os_name == 'nt' and sys.version_info < (3, 8):
+ # os.path.realpath on Windows does not follow symbolic links
+ # prior to Python 3.8 (see https://bugs.python.org/issue9949)
+ def compat_realpath(path):
+ while os.path.islink(path):
+ path = os.path.abspath(os.readlink(path))
+ return path
+else:
+ compat_realpath = os.path.realpath
+
+
if sys.version_info < (3, 0):
def compat_print(s):
from .utils import preferredencoding
if platform.python_implementation() == 'PyPy' and sys.pypy_version_info < (5, 4, 0):
# PyPy2 prior to version 5.4.0 expects byte strings as Windows function
- # names, see the original PyPy issue [1] and the youtube-dl one [2].
+ # names, see the original PyPy issue [1] and the youtube-dlc one [2].
# 1. https://bitbucket.org/pypy/pypy/issues/2360/windows-ctypescdll-typeerror-function-name
# 2. https://github.com/ytdl-org/youtube-dl/pull/4392
def compat_ctypes_WINFUNCTYPE(*args, **kwargs):
'compat_basestring',
'compat_chr',
'compat_cookiejar',
+ 'compat_cookiejar_Cookie',
'compat_cookies',
'compat_ctypes_WINFUNCTYPE',
'compat_etree_Element',
'compat_os_name',
'compat_parse_qs',
'compat_print',
+ 'compat_realpath',
'compat_setenv',
'compat_shlex_quote',
'compat_shlex_split',
from .dash import DashSegmentsFD
from .rtsp import RtspFD
from .ism import IsmFD
+from .youtube_live_chat import YoutubeLiveChatReplayFD
from .external import (
get_external_downloader,
FFmpegFD,
'f4m': F4mFD,
'http_dash_segments': DashSegmentsFD,
'ism': IsmFD,
+ 'youtube_live_chat_replay': YoutubeLiveChatReplayFD,
}
else:
clear_line = ('\r\x1b[K' if sys.stderr.isatty() else '\r')
self.to_screen(clear_line + fullmsg, skip_eol=not is_last_line)
- self.to_console_title('youtube-dl ' + msg)
+ self.to_console_title('youtube-dlc ' + msg)
def report_progress(self, s):
if s['status'] == 'finished':
keep_fragments: Keep downloaded fragments on disk after downloading is
finished
- For each incomplete fragment download youtube-dl keeps on disk a special
+ For each incomplete fragment download youtube-dlc keeps on disk a special
bookkeeping file with download state and metadata (in future such files will
- be used for any incomplete download handled by youtube-dl). This file is
+ be used for any incomplete download handled by youtube-dlc). This file is
used to properly handle resuming, check download file consistency and detect
potential errors. The file has a .ytdl extension and represents a standard
JSON file of the following format:
while True:
try:
# Download and write
- data_block = ctx.data.read(block_size if not is_test else min(block_size, data_len - byte_counter))
+ data_block = ctx.data.read(block_size if data_len is None else min(block_size, data_len - byte_counter))
# socket.timeout is a subclass of socket.error but may not have
# errno set
except socket.timeout as e:
'elapsed': now - ctx.start_time,
})
- if is_test and byte_counter == data_len:
+ if data_len is not None and byte_counter == data_len:
break
if not is_test and ctx.chunk_size and ctx.data_len is not None and byte_counter < ctx.data_len:
--- /dev/null
+from __future__ import division, unicode_literals
+
+import re
+import json
+
+from .fragment import FragmentFD
+
+
+class YoutubeLiveChatReplayFD(FragmentFD):
+ """ Downloads YouTube live chat replays fragment by fragment """
+
+ FD_NAME = 'youtube_live_chat_replay'
+
+ def real_download(self, filename, info_dict):
+ video_id = info_dict['video_id']
+ self.to_screen('[%s] Downloading live chat' % self.FD_NAME)
+
+ test = self.params.get('test', False)
+
+ ctx = {
+ 'filename': filename,
+ 'live': True,
+ 'total_frags': None,
+ }
+
+ def dl_fragment(url):
+ headers = info_dict.get('http_headers', {})
+ return self._download_fragment(ctx, url, info_dict, headers)
+
+ def parse_yt_initial_data(data):
+ window_patt = b'window\\["ytInitialData"\\]\\s*=\\s*(.*?)(?<=});'
+ var_patt = b'var\\s+ytInitialData\\s*=\\s*(.*?)(?<=});'
+ for patt in window_patt, var_patt:
+ try:
+ raw_json = re.search(patt, data).group(1)
+ return json.loads(raw_json)
+ except AttributeError:
+ continue
+
+ self._prepare_and_start_frag_download(ctx)
+
+ success, raw_fragment = dl_fragment(
+ 'https://www.youtube.com/watch?v={}'.format(video_id))
+ if not success:
+ return False
+ data = parse_yt_initial_data(raw_fragment)
+ continuation_id = data['contents']['twoColumnWatchNextResults']['conversationBar']['liveChatRenderer']['continuations'][0]['reloadContinuationData']['continuation']
+ # no data yet but required to call _append_fragment
+ self._append_fragment(ctx, b'')
+
+ first = True
+ offset = None
+ while continuation_id is not None:
+ data = None
+ if first:
+ url = 'https://www.youtube.com/live_chat_replay?continuation={}'.format(continuation_id)
+ success, raw_fragment = dl_fragment(url)
+ if not success:
+ return False
+ data = parse_yt_initial_data(raw_fragment)
+ else:
+ url = ('https://www.youtube.com/live_chat_replay/get_live_chat_replay'
+ + '?continuation={}'.format(continuation_id)
+ + '&playerOffsetMs={}'.format(offset - 5000)
+ + '&hidden=false'
+ + '&pbj=1')
+ success, raw_fragment = dl_fragment(url)
+ if not success:
+ return False
+ data = json.loads(raw_fragment)['response']
+
+ first = False
+ continuation_id = None
+
+ live_chat_continuation = data['continuationContents']['liveChatContinuation']
+ offset = None
+ processed_fragment = bytearray()
+ if 'actions' in live_chat_continuation:
+ for action in live_chat_continuation['actions']:
+ if 'replayChatItemAction' in action:
+ replay_chat_item_action = action['replayChatItemAction']
+ offset = int(replay_chat_item_action['videoOffsetTimeMsec'])
+ processed_fragment.extend(
+ json.dumps(action, ensure_ascii=False).encode('utf-8') + b'\n')
+ continuation_id = live_chat_continuation['continuations'][0]['liveChatReplayContinuationData']['continuation']
+
+ self._append_fragment(ctx, processed_fragment)
+
+ if test or offset is None:
+ break
+
+ self._finish_frag_download(ctx)
+
+ return True
js_to_json,
int_or_none,
parse_iso8601,
+ str_or_none,
try_get,
unescapeHTML,
update_url_query,
class ABCIE(InfoExtractor):
IE_NAME = 'abc.net.au'
- _VALID_URL = r'https?://(?:www\.)?abc\.net\.au/news/(?:[^/]+/){1,2}(?P<id>\d+)'
+ _VALID_URL = r'https?://(?:www\.)?abc\.net\.au/(?:news|btn)/(?:[^/]+/){1,4}(?P<id>\d{5,})'
_TESTS = [{
'url': 'http://www.abc.net.au/news/2014-11-05/australia-to-staff-ebola-treatment-centre-in-sierra-leone/5868334',
'skip': 'this video has expired',
}, {
'url': 'http://www.abc.net.au/news/2015-08-17/warren-entsch-introduces-same-sex-marriage-bill/6702326',
- 'md5': 'db2a5369238b51f9811ad815b69dc086',
+ 'md5': '4ebd61bdc82d9a8b722f64f1f4b4d121',
'info_dict': {
'id': 'NvqvPeNZsHU',
'ext': 'mp4',
}, {
'url': 'http://www.abc.net.au/news/2015-10-19/6866214',
'only_matching': True,
+ }, {
+ 'url': 'https://www.abc.net.au/btn/classroom/wwi-centenary/10527914',
+ 'info_dict': {
+ 'id': '10527914',
+ 'ext': 'mp4',
+ 'title': 'WWI Centenary',
+ 'description': 'md5:c2379ec0ca84072e86b446e536954546',
+ }
+ }, {
+ 'url': 'https://www.abc.net.au/news/programs/the-world/2020-06-10/black-lives-matter-protests-spawn-support-for/12342074',
+ 'info_dict': {
+ 'id': '12342074',
+ 'ext': 'mp4',
+ 'title': 'Black Lives Matter protests spawn support for Papuans in Indonesia',
+ 'description': 'md5:2961a17dc53abc558589ccd0fb8edd6f',
+ }
+ }, {
+ 'url': 'https://www.abc.net.au/btn/newsbreak/btn-newsbreak-20200814/12560476',
+ 'info_dict': {
+ 'id': 'tDL8Ld4dK_8',
+ 'ext': 'mp4',
+ 'title': 'Fortnite Banned From Apple and Google App Stores',
+ 'description': 'md5:a6df3f36ce8f816b74af4bd6462f5651',
+ 'upload_date': '20200813',
+ 'uploader': 'Behind the News',
+ 'uploader_id': 'behindthenews',
+ }
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
- mobj = re.search(
- r'inline(?P<type>Video|Audio|YouTube)Data\.push\((?P<json_data>[^)]+)\);',
- webpage)
+ mobj = re.search(r'<a\s+href="(?P<url>[^"]+)"\s+data-duration="\d+"\s+title="Download audio directly">', webpage)
+ if mobj:
+ urls_info = mobj.groupdict()
+ youtube = False
+ video = False
+ else:
+ mobj = re.search(r'<a href="(?P<url>http://www\.youtube\.com/watch\?v=[^"]+)"><span><strong>External Link:</strong>',
+ webpage)
+ if mobj is None:
+ mobj = re.search(r'<iframe width="100%" src="(?P<url>//www\.youtube-nocookie\.com/embed/[^?"]+)', webpage)
+ if mobj:
+ urls_info = mobj.groupdict()
+ youtube = True
+ video = True
+
if mobj is None:
- expired = self._html_search_regex(r'(?s)class="expired-(?:video|audio)".+?<span>(.+?)</span>', webpage, 'expired', None)
- if expired:
- raise ExtractorError('%s said: %s' % (self.IE_NAME, expired), expected=True)
- raise ExtractorError('Unable to extract video urls')
+ mobj = re.search(r'(?P<type>)"sources": (?P<json_data>\[[^\]]+\]),', webpage)
+ if mobj is None:
+ mobj = re.search(
+ r'inline(?P<type>Video|Audio|YouTube)Data\.push\((?P<json_data>[^)]+)\);',
+ webpage)
+ if mobj is None:
+ expired = self._html_search_regex(r'(?s)class="expired-(?:video|audio)".+?<span>(.+?)</span>', webpage, 'expired', None)
+ if expired:
+ raise ExtractorError('%s said: %s' % (self.IE_NAME, expired), expected=True)
+ raise ExtractorError('Unable to extract video urls')
- urls_info = self._parse_json(
- mobj.group('json_data'), video_id, transform_source=js_to_json)
+ urls_info = self._parse_json(
+ mobj.group('json_data'), video_id, transform_source=js_to_json)
+ youtube = mobj.group('type') == 'YouTube'
+ video = mobj.group('type') == 'Video' or urls_info[0]['contentType'] == 'video/mp4'
if not isinstance(urls_info, list):
urls_info = [urls_info]
- if mobj.group('type') == 'YouTube':
+ if youtube:
return self.playlist_result([
self.url_result(url_info['url']) for url_info in urls_info])
- formats = [{
- 'url': url_info['url'],
- 'vcodec': url_info.get('codec') if mobj.group('type') == 'Video' else 'none',
- 'width': int_or_none(url_info.get('width')),
- 'height': int_or_none(url_info.get('height')),
- 'tbr': int_or_none(url_info.get('bitrate')),
- 'filesize': int_or_none(url_info.get('filesize')),
- } for url_info in urls_info]
+ formats = []
+ for url_info in urls_info:
+ height = int_or_none(url_info.get('height'))
+ bitrate = int_or_none(url_info.get('bitrate'))
+ width = int_or_none(url_info.get('width'))
+ format_id = None
+ mobj = re.search(r'_(?:(?P<height>\d+)|(?P<bitrate>\d+)k)\.mp4$', url_info['url'])
+ if mobj:
+ height_from_url = mobj.group('height')
+ if height_from_url:
+ height = height or int_or_none(height_from_url)
+ width = width or int_or_none(url_info.get('label'))
+ else:
+ bitrate = bitrate or int_or_none(mobj.group('bitrate'))
+ format_id = str_or_none(url_info.get('label'))
+ formats.append({
+ 'url': url_info['url'],
+ 'vcodec': url_info.get('codec') if video else 'none',
+ 'width': width,
+ 'height': height,
+ 'tbr': bitrate,
+ 'filesize': int_or_none(url_info.get('filesize')),
+ 'format_id': format_id
+ })
self._sort_formats(formats)
# ABC iview programs are normally available for 14 days only.
_TESTS = [{
- 'url': 'https://iview.abc.net.au/show/ben-and-hollys-little-kingdom/series/0/video/ZX9371A050S00',
- 'md5': 'cde42d728b3b7c2b32b1b94b4a548afc',
+ 'url': 'https://iview.abc.net.au/show/gruen/series/11/video/LE1927H001S00',
+ 'md5': '67715ce3c78426b11ba167d875ac6abf',
'info_dict': {
- 'id': 'ZX9371A050S00',
+ 'id': 'LE1927H001S00',
'ext': 'mp4',
- 'title': "Gaston's Birthday",
- 'series': "Ben And Holly's Little Kingdom",
- 'description': 'md5:f9de914d02f226968f598ac76f105bcf',
- 'upload_date': '20180604',
- 'uploader_id': 'abc4kids',
- 'timestamp': 1528140219,
+ 'title': "Series 11 Ep 1",
+ 'series': "Gruen",
+ 'description': 'md5:52cc744ad35045baf6aded2ce7287f67',
+ 'upload_date': '20190925',
+ 'uploader_id': 'abc1',
+ 'timestamp': 1569445289,
},
'params': {
'skip_download': True,
'hdnea': token,
})
- for sd in ('sd', 'sd-low'):
+ for sd in ('720', 'sd', 'sd-low'):
sd_url = try_get(
stream, lambda x: x['streams']['hls'][sd], compat_str)
if not sd_url:
--- /dev/null
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+
+from ..compat import (
+ compat_urlparse,
+)
+
+from ..utils import (
+ urlencode_postdata,
+ urljoin,
+ int_or_none,
+ clean_html,
+ ExtractorError
+)
+
+
+class AluraIE(InfoExtractor):
+ _VALID_URL = r'https?://(?:cursos\.)?alura\.com\.br/course/(?P<course_name>[^/]+)/task/(?P<id>\d+)'
+ _LOGIN_URL = 'https://cursos.alura.com.br/loginForm?urlAfterLogin=/loginForm'
+ _VIDEO_URL = 'https://cursos.alura.com.br/course/%s/task/%s/video'
+ _NETRC_MACHINE = 'alura'
+ _TESTS = [{
+ 'url': 'https://cursos.alura.com.br/course/clojure-mutabilidade-com-atoms-e-refs/task/60095',
+ 'info_dict': {
+ 'id': '60095',
+ 'ext': 'mp4',
+ 'title': 'Referências, ref-set e alter'
+ },
+ 'skip': 'Requires alura account credentials'},
+ {
+ # URL without video
+ 'url': 'https://cursos.alura.com.br/course/clojure-mutabilidade-com-atoms-e-refs/task/60098',
+ 'only_matching': True},
+ {
+ 'url': 'https://cursos.alura.com.br/course/fundamentos-market-digital/task/55219',
+ 'only_matching': True}
+ ]
+
+ def _real_extract(self, url):
+
+ video_id = self._match_id(url)
+ course = self._search_regex(self._VALID_URL, url, 'post url', group='course_name')
+ video_url = self._VIDEO_URL % (course, video_id)
+
+ video_dict = self._download_json(video_url, video_id, 'Searching for videos')
+
+ if video_dict:
+ webpage = self._download_webpage(url, video_id)
+ video_title = clean_html(self._search_regex(
+ r'<span[^>]+class=(["\'])task-body-header-title-text\1[^>]*>(?P<title>[^<]+)',
+ webpage, 'title', group='title'))
+
+ formats = []
+ for video_obj in video_dict:
+ video_url_m3u8 = video_obj.get('link')
+ video_format = self._extract_m3u8_formats(
+ video_url_m3u8, None, 'mp4', entry_protocol='m3u8_native',
+ m3u8_id='hls', fatal=False)
+ for f in video_format:
+ m = re.search(r'^[\w \W]*-(?P<res>\w*).mp4[\W \w]*', f['url'])
+ if m:
+ if not f.get('height'):
+ f['height'] = int('720' if m.group('res') == 'hd' else '480')
+ formats.extend(video_format)
+
+ self._sort_formats(formats, field_preference=('height', 'width', 'tbr', 'format_id'))
+
+ return {
+ 'id': video_id,
+ 'title': video_title,
+ "formats": formats
+ }
+
+ def _real_initialize(self):
+ self._login()
+
+ def _login(self):
+ username, password = self._get_login_info()
+ if username is None:
+ return
+ pass
+
+ login_page = self._download_webpage(
+ self._LOGIN_URL, None, 'Downloading login popup')
+
+ def is_logged(webpage):
+ return any(re.search(p, webpage) for p in (
+ r'href=[\"|\']?/signout[\"|\']',
+ r'>Logout<'))
+
+ # already logged in
+ if is_logged(login_page):
+ return
+
+ login_form = self._hidden_inputs(login_page)
+
+ login_form.update({
+ 'username': username,
+ 'password': password,
+ })
+
+ post_url = self._search_regex(
+ r'<form[^>]+class=["|\']signin-form["|\'] action=["|\'](?P<url>.+?)["|\']', login_page,
+ 'post url', default=self._LOGIN_URL, group='url')
+
+ if not post_url.startswith('http'):
+ post_url = compat_urlparse.urljoin(self._LOGIN_URL, post_url)
+
+ response = self._download_webpage(
+ post_url, None, 'Logging in',
+ data=urlencode_postdata(login_form),
+ headers={'Content-Type': 'application/x-www-form-urlencoded'})
+
+ if not is_logged(response):
+ error = self._html_search_regex(
+ r'(?s)<p[^>]+class="alert-message[^"]*">(.+?)</p>',
+ response, 'error message', default=None)
+ if error:
+ raise ExtractorError('Unable to login: %s' % error, expected=True)
+ raise ExtractorError('Unable to log in')
+
+
+class AluraCourseIE(AluraIE):
+
+ _VALID_URL = r'https?://(?:cursos\.)?alura\.com\.br/course/(?P<id>[^/]+)'
+ _LOGIN_URL = 'https://cursos.alura.com.br/loginForm?urlAfterLogin=/loginForm'
+ _NETRC_MACHINE = 'aluracourse'
+ _TESTS = [{
+ 'url': 'https://cursos.alura.com.br/course/clojure-mutabilidade-com-atoms-e-refs',
+ 'only_matching': True,
+ }]
+
+ @classmethod
+ def suitable(cls, url):
+ return False if AluraIE.suitable(url) else super(AluraCourseIE, cls).suitable(url)
+
+ def _real_extract(self, url):
+
+ course_path = self._match_id(url)
+ webpage = self._download_webpage(url, course_path)
+
+ course_title = self._search_regex(
+ r'<h1.*?>(.*?)<strong>(?P<course_title>.*?)</strong></h[0-9]>', webpage,
+ 'course title', default=course_path, group='course_title')
+
+ entries = []
+ if webpage:
+ for path in re.findall(r'<a\b(?=[^>]* class="[^"]*(?<=[" ])courseSectionList-section[" ])(?=[^>]* href="([^"]*))', webpage):
+ page_url = urljoin(url, path)
+ section_path = self._download_webpage(page_url, course_path)
+ for path_video in re.findall(r'<a\b(?=[^>]* class="[^"]*(?<=[" ])task-menu-nav-item-link-VIDEO[" ])(?=[^>]* href="([^"]*))', section_path):
+ chapter = clean_html(
+ self._search_regex(
+ r'<h3[^>]+class=(["\'])task-menu-section-title-text\1[^>]*>(?P<chapter>[^<]+)',
+ section_path,
+ 'chapter',
+ group='chapter'))
+
+ chapter_number = int_or_none(
+ self._search_regex(
+ r'<span[^>]+class=(["\'])task-menu-section-title-number[^>]*>(.*?)<strong>(?P<chapter_number>[^<]+)</strong>',
+ section_path,
+ 'chapter number',
+ group='chapter_number'))
+ video_url = urljoin(url, path_video)
+
+ entry = {
+ '_type': 'url_transparent',
+ 'id': self._match_id(video_url),
+ 'url': video_url,
+ 'id_key': self.ie_key(),
+ 'chapter': chapter,
+ 'chapter_number': chapter_number
+ }
+ entries.append(entry)
+ return self.playlist_result(entries, course_path, course_title)
from ..utils import (
clean_html,
int_or_none,
+ js_to_json,
try_get,
unified_strdate,
)
class AmericasTestKitchenIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?americastestkitchen\.com/(?:episode|videos)/(?P<id>\d+)'
_TESTS = [{
- 'url': 'https://www.americastestkitchen.com/episode/548-summer-dinner-party',
+ 'url': 'https://www.americastestkitchen.com/episode/582-weeknight-japanese-suppers',
'md5': 'b861c3e365ac38ad319cfd509c30577f',
'info_dict': {
- 'id': '1_5g5zua6e',
- 'title': 'Summer Dinner Party',
+ 'id': '5b400b9ee338f922cb06450c',
+ 'title': 'Weeknight Japanese Suppers',
'ext': 'mp4',
- 'description': 'md5:858d986e73a4826979b6a5d9f8f6a1ec',
- 'thumbnail': r're:^https?://.*\.jpg',
- 'timestamp': 1497285541,
- 'upload_date': '20170612',
- 'uploader_id': 'roger.metcalf@americastestkitchen.com',
- 'release_date': '20170617',
+ 'description': 'md5:3d0c1a44bb3b27607ce82652db25b4a8',
+ 'thumbnail': r're:^https?://',
+ 'timestamp': 1523664000,
+ 'upload_date': '20180414',
+ 'release_date': '20180414',
'series': "America's Test Kitchen",
- 'season_number': 17,
- 'episode': 'Summer Dinner Party',
- 'episode_number': 24,
+ 'season_number': 18,
+ 'episode': 'Weeknight Japanese Suppers',
+ 'episode_number': 15,
},
'params': {
'skip_download': True,
self._search_regex(
r'window\.__INITIAL_STATE__\s*=\s*({.+?})\s*;\s*</script>',
webpage, 'initial context'),
- video_id)
+ video_id, js_to_json)
ep_data = try_get(
video_data,
lambda x: x['videoDetail']['content']['data']), dict)
ep_meta = ep_data.get('full_video', {})
- zype_id = ep_meta.get('zype_id')
- if zype_id:
- embed_url = 'https://player.zype.com/embed/%s.js?api_key=jZ9GUhRmxcPvX7M3SlfejB6Hle9jyHTdk2jVxG7wOHPLODgncEKVdPYBhuz9iWXQ' % zype_id
- ie_key = 'Zype'
- else:
- partner_id = self._search_regex(
- r'src=["\'](?:https?:)?//(?:[^/]+\.)kaltura\.com/(?:[^/]+/)*(?:p|partner_id)/(\d+)',
- webpage, 'kaltura partner id')
- external_id = ep_data.get('external_id') or ep_meta['external_id']
- embed_url = 'kaltura:%s:%s' % (partner_id, external_id)
- ie_key = 'Kaltura'
+ zype_id = ep_data.get('zype_id') or ep_meta['zype_id']
title = ep_data.get('title') or ep_meta.get('title')
description = clean_html(ep_meta.get('episode_description') or ep_data.get(
return {
'_type': 'url_transparent',
- 'url': embed_url,
- 'ie_key': ie_key,
+ 'url': 'https://player.zype.com/embed/%s.js?api_key=jZ9GUhRmxcPvX7M3SlfejB6Hle9jyHTdk2jVxG7wOHPLODgncEKVdPYBhuz9iWXQ' % zype_id,
+ 'ie_key': 'Zype',
'title': title,
'description': description,
'thumbnail': thumbnail,
'upload_date': upload_date,
'uploader_id': uploader_id,
'http_headers': {
- 'User-Agent': 'QuickTime compatible (youtube-dl)',
+ 'User-Agent': 'QuickTime compatible (youtube-dlc)',
},
})
--- /dev/null
+# coding: utf-8
+from __future__ import unicode_literals
+
+import json
+import re
+
+from .common import InfoExtractor
+from .generic import GenericIE
+from ..utils import (
+ determine_ext,
+ ExtractorError,
+ int_or_none,
+ parse_duration,
+ qualities,
+ str_or_none,
+ try_get,
+ unified_strdate,
+ unified_timestamp,
+ update_url_query,
+ url_or_none,
+ xpath_text,
+)
+from ..compat import compat_etree_fromstring
+
+
+class ARDMediathekBaseIE(InfoExtractor):
+ _GEO_COUNTRIES = ['DE']
+
+ def _extract_media_info(self, media_info_url, webpage, video_id):
+ media_info = self._download_json(
+ media_info_url, video_id, 'Downloading media JSON')
+ return self._parse_media_info(media_info, video_id, '"fsk"' in webpage)
+
+ def _parse_media_info(self, media_info, video_id, fsk):
+ formats = self._extract_formats(media_info, video_id)
+
+ if not formats:
+ if fsk:
+ raise ExtractorError(
+ 'This video is only available after 20:00', expected=True)
+ elif media_info.get('_geoblocked'):
+ self.raise_geo_restricted(
+ 'This video is not available due to geoblocking',
+ countries=self._GEO_COUNTRIES)
+
+ self._sort_formats(formats)
+
+ subtitles = {}
+ subtitle_url = media_info.get('_subtitleUrl')
+ if subtitle_url:
+ subtitles['de'] = [{
+ 'ext': 'ttml',
+ 'url': subtitle_url,
+ }]
+
+ return {
+ 'id': video_id,
+ 'duration': int_or_none(media_info.get('_duration')),
+ 'thumbnail': media_info.get('_previewImage'),
+ 'is_live': media_info.get('_isLive') is True,
+ 'formats': formats,
+ 'subtitles': subtitles,
+ }
+
+ def _ARD_extract_episode_info(self, title):
+ """Try to extract season/episode data from the title."""
+ res = {}
+ if not title:
+ return res
+
+ for pattern in [
+ # Pattern for title like "Homo sapiens (S06/E07) - Originalversion"
+ # from: https://www.ardmediathek.de/one/sendung/doctor-who/Y3JpZDovL3dkci5kZS9vbmUvZG9jdG9yIHdobw
+ r'.*(?P<ep_info> \(S(?P<season_number>\d+)/E(?P<episode_number>\d+)\)).*',
+ # E.g.: title="Fritjof aus Norwegen (2) (AD)"
+ # from: https://www.ardmediathek.de/ard/sammlung/der-krieg-und-ich/68cMkqJdllm639Skj4c7sS/
+ r'.*(?P<ep_info> \((?:Folge |Teil )?(?P<episode_number>\d+)(?:/\d+)?\)).*',
+ r'.*(?P<ep_info>Folge (?P<episode_number>\d+)(?:\:| -|) )\"(?P<episode>.+)\".*',
+ # E.g.: title="Folge 25/42: Symmetrie"
+ # from: https://www.ardmediathek.de/ard/video/grips-mathe/folge-25-42-symmetrie/ard-alpha/Y3JpZDovL2JyLmRlL3ZpZGVvLzMyYzI0ZjczLWQ1N2MtNDAxNC05ZmZhLTFjYzRkZDA5NDU5OQ/
+ # E.g.: title="Folge 1063 - Vertrauen"
+ # from: https://www.ardmediathek.de/ard/sendung/die-fallers/Y3JpZDovL3N3ci5kZS8yMzAyMDQ4/
+ r'.*(?P<ep_info>Folge (?P<episode_number>\d+)(?:/\d+)?(?:\:| -|) ).*',
+ ]:
+ m = re.match(pattern, title)
+ if m:
+ groupdict = m.groupdict()
+ res['season_number'] = int_or_none(groupdict.get('season_number'))
+ res['episode_number'] = int_or_none(groupdict.get('episode_number'))
+ res['episode'] = str_or_none(groupdict.get('episode'))
+ # Build the episode title by removing numeric episode information:
+ if groupdict.get('ep_info') and not res['episode']:
+ res['episode'] = str_or_none(
+ title.replace(groupdict.get('ep_info'), ''))
+ if res['episode']:
+ res['episode'] = res['episode'].strip()
+ break
+
+ # As a fallback use the whole title as the episode name:
+ if not res.get('episode'):
+ res['episode'] = title.strip()
+ return res
+
+ def _extract_formats(self, media_info, video_id):
+ type_ = media_info.get('_type')
+ media_array = media_info.get('_mediaArray', [])
+ formats = []
+ for num, media in enumerate(media_array):
+ for stream in media.get('_mediaStreamArray', []):
+ stream_urls = stream.get('_stream')
+ if not stream_urls:
+ continue
+ if not isinstance(stream_urls, list):
+ stream_urls = [stream_urls]
+ quality = stream.get('_quality')
+ server = stream.get('_server')
+ for stream_url in stream_urls:
+ if not url_or_none(stream_url):
+ continue
+ ext = determine_ext(stream_url)
+ if quality != 'auto' and ext in ('f4m', 'm3u8'):
+ continue
+ if ext == 'f4m':
+ formats.extend(self._extract_f4m_formats(
+ update_url_query(stream_url, {
+ 'hdcore': '3.1.1',
+ 'plugin': 'aasp-3.1.1.69.124'
+ }), video_id, f4m_id='hds', fatal=False))
+ elif ext == 'm3u8':
+ formats.extend(self._extract_m3u8_formats(
+ stream_url, video_id, 'mp4', 'm3u8_native',
+ m3u8_id='hls', fatal=False))
+ else:
+ if server and server.startswith('rtmp'):
+ f = {
+ 'url': server,
+ 'play_path': stream_url,
+ 'format_id': 'a%s-rtmp-%s' % (num, quality),
+ }
+ else:
+ f = {
+ 'url': stream_url,
+ 'format_id': 'a%s-%s-%s' % (num, ext, quality)
+ }
+ m = re.search(
+ r'_(?P<width>\d+)x(?P<height>\d+)\.mp4$',
+ stream_url)
+ if m:
+ f.update({
+ 'width': int(m.group('width')),
+ 'height': int(m.group('height')),
+ })
+ if type_ == 'audio':
+ f['vcodec'] = 'none'
+ formats.append(f)
+ return formats
+
+
+class ARDMediathekIE(ARDMediathekBaseIE):
+ IE_NAME = 'ARD:mediathek'
+ _VALID_URL = r'^https?://(?:(?:(?:www|classic)\.)?ardmediathek\.de|mediathek\.(?:daserste|rbb-online)\.de|one\.ard\.de)/(?:.*/)(?P<video_id>[0-9]+|[^0-9][^/\?]+)[^/\?]*(?:\?.*)?'
+
+ _TESTS = [{
+ # available till 26.07.2022
+ 'url': 'http://www.ardmediathek.de/tv/S%C3%9CDLICHT/Was-ist-die-Kunst-der-Zukunft-liebe-Ann/BR-Fernsehen/Video?bcastId=34633636&documentId=44726822',
+ 'info_dict': {
+ 'id': '44726822',
+ 'ext': 'mp4',
+ 'title': 'Was ist die Kunst der Zukunft, liebe Anna McCarthy?',
+ 'description': 'md5:4ada28b3e3b5df01647310e41f3a62f5',
+ 'duration': 1740,
+ },
+ 'params': {
+ # m3u8 download
+ 'skip_download': True,
+ }
+ }, {
+ 'url': 'https://one.ard.de/tv/Mord-mit-Aussicht/Mord-mit-Aussicht-6-39-T%C3%B6dliche-Nach/ONE/Video?bcastId=46384294&documentId=55586872',
+ 'only_matching': True,
+ }, {
+ # audio
+ 'url': 'http://www.ardmediathek.de/tv/WDR-H%C3%B6rspiel-Speicher/Tod-eines-Fu%C3%9Fballers/WDR-3/Audio-Podcast?documentId=28488308&bcastId=23074086',
+ 'only_matching': True,
+ }, {
+ 'url': 'http://mediathek.daserste.de/sendungen_a-z/328454_anne-will/22429276_vertrauen-ist-gut-spionieren-ist-besser-geht',
+ 'only_matching': True,
+ }, {
+ # audio
+ 'url': 'http://mediathek.rbb-online.de/radio/Hörspiel/Vor-dem-Fest/kulturradio/Audio?documentId=30796318&topRessort=radio&bcastId=9839158',
+ 'only_matching': True,
+ }, {
+ 'url': 'https://classic.ardmediathek.de/tv/Panda-Gorilla-Co/Panda-Gorilla-Co-Folge-274/Das-Erste/Video?bcastId=16355486&documentId=58234698',
+ 'only_matching': True,
+ }]
+
+ @classmethod
+ def suitable(cls, url):
+ return False if ARDBetaMediathekIE.suitable(url) else super(ARDMediathekIE, cls).suitable(url)
+
+ def _real_extract(self, url):
+ # determine video id from url
+ m = re.match(self._VALID_URL, url)
+
+ document_id = None
+
+ numid = re.search(r'documentId=([0-9]+)', url)
+ if numid:
+ document_id = video_id = numid.group(1)
+ else:
+ video_id = m.group('video_id')
+
+ webpage = self._download_webpage(url, video_id)
+
+ ERRORS = (
+ ('>Leider liegt eine Störung vor.', 'Video %s is unavailable'),
+ ('>Der gewünschte Beitrag ist nicht mehr verfügbar.<',
+ 'Video %s is no longer available'),
+ )
+
+ for pattern, message in ERRORS:
+ if pattern in webpage:
+ raise ExtractorError(message % video_id, expected=True)
+
+ if re.search(r'[\?&]rss($|[=&])', url):
+ doc = compat_etree_fromstring(webpage.encode('utf-8'))
+ if doc.tag == 'rss':
+ return GenericIE()._extract_rss(url, video_id, doc)
+
+ title = self._html_search_regex(
+ [r'<h1(?:\s+class="boxTopHeadline")?>(.*?)</h1>',
+ r'<meta name="dcterms\.title" content="(.*?)"/>',
+ r'<h4 class="headline">(.*?)</h4>',
+ r'<title[^>]*>(.*?)</title>'],
+ webpage, 'title')
+ description = self._html_search_meta(
+ 'dcterms.abstract', webpage, 'description', default=None)
+ if description is None:
+ description = self._html_search_meta(
+ 'description', webpage, 'meta description', default=None)
+ if description is None:
+ description = self._html_search_regex(
+ r'<p\s+class="teasertext">(.+?)</p>',
+ webpage, 'teaser text', default=None)
+
+ # Thumbnail is sometimes not present.
+ # It is in the mobile version, but that seems to use a different URL
+ # structure altogether.
+ thumbnail = self._og_search_thumbnail(webpage, default=None)
+
+ media_streams = re.findall(r'''(?x)
+ mediaCollection\.addMediaStream\([0-9]+,\s*[0-9]+,\s*"[^"]*",\s*
+ "([^"]+)"''', webpage)
+
+ if media_streams:
+ QUALITIES = qualities(['lo', 'hi', 'hq'])
+ formats = []
+ for furl in set(media_streams):
+ if furl.endswith('.f4m'):
+ fid = 'f4m'
+ else:
+ fid_m = re.match(r'.*\.([^.]+)\.[^.]+$', furl)
+ fid = fid_m.group(1) if fid_m else None
+ formats.append({
+ 'quality': QUALITIES(fid),
+ 'format_id': fid,
+ 'url': furl,
+ })
+ self._sort_formats(formats)
+ info = {
+ 'formats': formats,
+ }
+ else: # request JSON file
+ if not document_id:
+ video_id = self._search_regex(
+ r'/play/(?:config|media)/(\d+)', webpage, 'media id')
+ info = self._extract_media_info(
+ 'http://www.ardmediathek.de/play/media/%s' % video_id,
+ webpage, video_id)
+
+ info.update({
+ 'id': video_id,
+ 'title': self._live_title(title) if info.get('is_live') else title,
+ 'description': description,
+ 'thumbnail': thumbnail,
+ })
+ info.update(self._ARD_extract_episode_info(info['title']))
+
+ return info
+
+
+class ARDIE(InfoExtractor):
+ _VALID_URL = r'(?P<mainurl>https?://(www\.)?daserste\.de/[^?#]+/videos(?:extern)?/(?P<display_id>[^/?#]+)-(?P<id>[0-9]+))\.html'
+ _TESTS = [{
+ # available till 14.02.2019
+ 'url': 'http://www.daserste.de/information/talk/maischberger/videos/das-groko-drama-zerlegen-sich-die-volksparteien-video-102.html',
+ 'md5': '8e4ec85f31be7c7fc08a26cdbc5a1f49',
+ 'info_dict': {
+ 'display_id': 'das-groko-drama-zerlegen-sich-die-volksparteien-video',
+ 'id': '102',
+ 'ext': 'mp4',
+ 'duration': 4435.0,
+ 'title': 'Das GroKo-Drama: Zerlegen sich die Volksparteien?',
+ 'upload_date': '20180214',
+ 'thumbnail': r're:^https?://.*\.jpg$',
+ },
+ }, {
+ 'url': 'https://www.daserste.de/information/reportage-dokumentation/erlebnis-erde/videosextern/woelfe-und-herdenschutzhunde-ungleiche-brueder-102.html',
+ 'only_matching': True,
+ }, {
+ 'url': 'http://www.daserste.de/information/reportage-dokumentation/dokus/videos/die-story-im-ersten-mission-unter-falscher-flagge-100.html',
+ 'only_matching': True,
+ }]
+
+ def _real_extract(self, url):
+ mobj = re.match(self._VALID_URL, url)
+ display_id = mobj.group('display_id')
+
+ player_url = mobj.group('mainurl') + '~playerXml.xml'
+ doc = self._download_xml(player_url, display_id)
+ video_node = doc.find('./video')
+ upload_date = unified_strdate(xpath_text(
+ video_node, './broadcastDate'))
+ thumbnail = xpath_text(video_node, './/teaserImage//variant/url')
+
+ formats = []
+ for a in video_node.findall('.//asset'):
+ f = {
+ 'format_id': a.attrib['type'],
+ 'width': int_or_none(a.find('./frameWidth').text),
+ 'height': int_or_none(a.find('./frameHeight').text),
+ 'vbr': int_or_none(a.find('./bitrateVideo').text),
+ 'abr': int_or_none(a.find('./bitrateAudio').text),
+ 'vcodec': a.find('./codecVideo').text,
+ 'tbr': int_or_none(a.find('./totalBitrate').text),
+ }
+ if a.find('./serverPrefix').text:
+ f['url'] = a.find('./serverPrefix').text
+ f['playpath'] = a.find('./fileName').text
+ else:
+ f['url'] = a.find('./fileName').text
+ formats.append(f)
+ self._sort_formats(formats)
+
+ return {
+ 'id': mobj.group('id'),
+ 'formats': formats,
+ 'display_id': display_id,
+ 'title': video_node.find('./title').text,
+ 'duration': parse_duration(video_node.find('./duration').text),
+ 'upload_date': upload_date,
+ 'thumbnail': thumbnail,
+ }
+
+
+class ARDBetaMediathekIE(ARDMediathekBaseIE):
+ _VALID_URL = r'https://(?:(?:beta|www)\.)?ardmediathek\.de/(?P<client>[^/]+)/(?P<mode>player|live|video|sendung|sammlung)/(?P<display_id>(?:[^/]+/)*)(?P<video_id>[a-zA-Z0-9]+)'
+ _TESTS = [{
+ 'url': 'https://ardmediathek.de/ard/video/die-robuste-roswita/Y3JpZDovL2Rhc2Vyc3RlLmRlL3RhdG9ydC9mYmM4NGM1NC0xNzU4LTRmZGYtYWFhZS0wYzcyZTIxNGEyMDE',
+ 'md5': 'dfdc87d2e7e09d073d5a80770a9ce88f',
+ 'info_dict': {
+ 'display_id': 'die-robuste-roswita',
+ 'id': '70153354',
+ 'title': 'Die robuste Roswita',
+ 'description': r're:^Der Mord.*trüber ist als die Ilm.',
+ 'duration': 5316,
+ 'thumbnail': 'https://img.ardmediathek.de/standard/00/70/15/33/90/-1852531467/16x9/960?mandant=ard',
+ 'timestamp': 1577047500,
+ 'upload_date': '20191222',
+ 'ext': 'mp4',
+ },
+ }, {
+ 'url': 'https://beta.ardmediathek.de/ard/video/Y3JpZDovL2Rhc2Vyc3RlLmRlL3RhdG9ydC9mYmM4NGM1NC0xNzU4LTRmZGYtYWFhZS0wYzcyZTIxNGEyMDE',
+ 'only_matching': True,
+ }, {
+ 'url': 'https://ardmediathek.de/ard/video/saartalk/saartalk-gesellschaftsgift-haltung-gegen-hass/sr-fernsehen/Y3JpZDovL3NyLW9ubGluZS5kZS9TVF84MTY4MA/',
+ 'only_matching': True,
+ }, {
+ 'url': 'https://www.ardmediathek.de/ard/video/trailer/private-eyes-s01-e01/one/Y3JpZDovL3dkci5kZS9CZWl0cmFnLTE1MTgwYzczLWNiMTEtNGNkMS1iMjUyLTg5MGYzOWQxZmQ1YQ/',
+ 'only_matching': True,
+ }, {
+ 'url': 'https://www.ardmediathek.de/ard/player/Y3JpZDovL3N3ci5kZS9hZXgvbzEwNzE5MTU/',
+ 'only_matching': True,
+ }, {
+ 'url': 'https://www.ardmediathek.de/swr/live/Y3JpZDovL3N3ci5kZS8xMzQ4MTA0Mg',
+ 'only_matching': True,
+ }, {
+ # playlist of type 'sendung'
+ 'url': 'https://www.ardmediathek.de/ard/sendung/doctor-who/Y3JpZDovL3dkci5kZS9vbmUvZG9jdG9yIHdobw/',
+ 'only_matching': True,
+ }, {
+ # playlist of type 'sammlung'
+ 'url': 'https://www.ardmediathek.de/ard/sammlung/team-muenster/5JpTzLSbWUAK8184IOvEir/',
+ 'only_matching': True,
+ }]
+
+ def _ARD_load_playlist_snipped(self, playlist_id, display_id, client, mode, pageNumber):
+ """ Query the ARD server for playlist information
+ and returns the data in "raw" format """
+ if mode == 'sendung':
+ graphQL = json.dumps({
+ 'query': '''{
+ showPage(
+ client: "%s"
+ showId: "%s"
+ pageNumber: %d
+ ) {
+ pagination {
+ pageSize
+ totalElements
+ }
+ teasers { # Array
+ mediumTitle
+ links { target { id href title } }
+ type
+ }
+ }}''' % (client, playlist_id, pageNumber),
+ }).encode()
+ else: # mode == 'sammlung'
+ graphQL = json.dumps({
+ 'query': '''{
+ morePage(
+ client: "%s"
+ compilationId: "%s"
+ pageNumber: %d
+ ) {
+ widget {
+ pagination {
+ pageSize
+ totalElements
+ }
+ teasers { # Array
+ mediumTitle
+ links { target { id href title } }
+ type
+ }
+ }
+ }}''' % (client, playlist_id, pageNumber),
+ }).encode()
+ # Ressources for ARD graphQL debugging:
+ # https://api-test.ardmediathek.de/public-gateway
+ show_page = self._download_json(
+ 'https://api.ardmediathek.de/public-gateway',
+ '[Playlist] %s' % display_id,
+ data=graphQL,
+ headers={'Content-Type': 'application/json'})['data']
+ # align the structure of the returned data:
+ if mode == 'sendung':
+ show_page = show_page['showPage']
+ else: # mode == 'sammlung'
+ show_page = show_page['morePage']['widget']
+ return show_page
+
+ def _ARD_extract_playlist(self, url, playlist_id, display_id, client, mode):
+ """ Collects all playlist entries and returns them as info dict.
+ Supports playlists of mode 'sendung' and 'sammlung', and also nested
+ playlists. """
+ entries = []
+ pageNumber = 0
+ while True: # iterate by pageNumber
+ show_page = self._ARD_load_playlist_snipped(
+ playlist_id, display_id, client, mode, pageNumber)
+ for teaser in show_page['teasers']: # process playlist items
+ if '/compilation/' in teaser['links']['target']['href']:
+ # alternativ cond.: teaser['type'] == "compilation"
+ # => This is an nested compilation, e.g. like:
+ # https://www.ardmediathek.de/ard/sammlung/die-kirche-bleibt-im-dorf/5eOHzt8XB2sqeFXbIoJlg2/
+ link_mode = 'sammlung'
+ else:
+ link_mode = 'video'
+
+ item_url = 'https://www.ardmediathek.de/%s/%s/%s/%s/%s' % (
+ client, link_mode, display_id,
+ # perform HTLM quoting of episode title similar to ARD:
+ re.sub('^-|-$', '', # remove '-' from begin/end
+ re.sub('[^a-zA-Z0-9]+', '-', # replace special chars by -
+ teaser['links']['target']['title'].lower()
+ .replace('ä', 'ae').replace('ö', 'oe')
+ .replace('ü', 'ue').replace('ß', 'ss'))),
+ teaser['links']['target']['id'])
+ entries.append(self.url_result(
+ item_url,
+ ie=ARDBetaMediathekIE.ie_key()))
+
+ if (show_page['pagination']['pageSize'] * (pageNumber + 1)
+ >= show_page['pagination']['totalElements']):
+ # we've processed enough pages to get all playlist entries
+ break
+ pageNumber = pageNumber + 1
+
+ return self.playlist_result(entries, playlist_title=display_id)
+
+ def _real_extract(self, url):
+ mobj = re.match(self._VALID_URL, url)
+ video_id = mobj.group('video_id')
+ display_id = mobj.group('display_id')
+ if display_id:
+ display_id = display_id.rstrip('/')
+ if not display_id:
+ display_id = video_id
+
+ if mobj.group('mode') in ('sendung', 'sammlung'):
+ # this is a playlist-URL
+ return self._ARD_extract_playlist(
+ url, video_id, display_id,
+ mobj.group('client'),
+ mobj.group('mode'))
+
+ player_page = self._download_json(
+ 'https://api.ardmediathek.de/public-gateway',
+ display_id, data=json.dumps({
+ 'query': '''{
+ playerPage(client:"%s", clipId: "%s") {
+ blockedByFsk
+ broadcastedOn
+ maturityContentRating
+ mediaCollection {
+ _duration
+ _geoblocked
+ _isLive
+ _mediaArray {
+ _mediaStreamArray {
+ _quality
+ _server
+ _stream
+ }
+ }
+ _previewImage
+ _subtitleUrl
+ _type
+ }
+ show {
+ title
+ }
+ synopsis
+ title
+ tracking {
+ atiCustomVars {
+ contentId
+ }
+ }
+ }
+}''' % (mobj.group('client'), video_id),
+ }).encode(), headers={
+ 'Content-Type': 'application/json'
+ })['data']['playerPage']
+ title = player_page['title']
+ content_id = str_or_none(try_get(
+ player_page, lambda x: x['tracking']['atiCustomVars']['contentId']))
+ media_collection = player_page.get('mediaCollection') or {}
+ if not media_collection and content_id:
+ media_collection = self._download_json(
+ 'https://www.ardmediathek.de/play/media/' + content_id,
+ content_id, fatal=False) or {}
+ info = self._parse_media_info(
+ media_collection, content_id or video_id,
+ player_page.get('blockedByFsk'))
+ age_limit = None
+ description = player_page.get('synopsis')
+ maturity_content_rating = player_page.get('maturityContentRating')
+ if maturity_content_rating:
+ age_limit = int_or_none(maturity_content_rating.lstrip('FSK'))
+ if not age_limit and description:
+ age_limit = int_or_none(self._search_regex(
+ r'\(FSK\s*(\d+)\)\s*$', description, 'age limit', default=None))
+ info.update({
+ 'age_limit': age_limit,
+ 'display_id': display_id,
+ 'title': title,
+ 'description': description,
+ 'timestamp': unified_timestamp(player_page.get('broadcastedOn')),
+ 'series': try_get(player_page, lambda x: x['show']['title']),
+ })
+ info.update(self._ARD_extract_episode_info(info['title']))
+ return info
'url': 'https://www.telebaern.tv/telebaern-news/montag-1-oktober-2018-ganze-sendung-133531189#video=0_7xjo9lf1',
'only_matching': True
}]
-
+ _API_TEMPL = 'https://www.%s/api/pub/gql/%s/NewsArticleTeaser/cb9f2f81ed22e9b47f4ca64ea3cc5a5d13e88d1d'
_PARTNER_ID = '1719221'
def _real_extract(self, url):
- mobj = re.match(self._VALID_URL, url)
- host = mobj.group('host')
- video_id = mobj.group('id')
- entry_id = mobj.group('kaltura_id')
+ host, display_id, article_id, entry_id = re.match(self._VALID_URL, url).groups()
if not entry_id:
- api_url = 'https://www.%s/api/pub/gql/%s' % (host, host.split('.')[0])
- payload = {
- 'query': '''query VideoContext($articleId: ID!) {
- article: node(id: $articleId) {
- ... on Article {
- mainAssetRelation {
- asset {
- ... on VideoAsset {
- kalturaId
- }
- }
- }
- }
- }
- }''',
- 'variables': {'articleId': 'Article:%s' % mobj.group('article_id')},
- }
- json_data = self._download_json(
- api_url, video_id, headers={
- 'Content-Type': 'application/json',
- },
- data=json.dumps(payload).encode())
- entry_id = json_data['data']['article']['mainAssetRelation']['asset']['kalturaId']
+ entry_id = self._download_json(
+ self._API_TEMPL % (host, host.split('.')[0]), display_id, query={
+ 'variables': json.dumps({
+ 'contextId': 'NewsArticle:' + article_id,
+ }),
+ })['data']['context']['mainAsset']['video']['kaltura']['kalturaId']
return self.url_result(
'kaltura:%s:%s' % (self._PARTNER_ID, entry_id),
class BandcampIE(InfoExtractor):
_VALID_URL = r'https?://[^/]+\.bandcamp\.com/track/(?P<title>[^/?#&]+)'
_TESTS = [{
- 'url': 'http://youtube-dl.bandcamp.com/track/youtube-dl-test-song',
+ 'url': 'http://youtube-dlc.bandcamp.com/track/youtube-dlc-test-song',
'md5': 'c557841d5e50261777a6585648adf439',
'info_dict': {
'id': '1812978515',
'ext': 'mp3',
- 'title': "youtube-dl \"'/\\\u00e4\u21ad - youtube-dl test song \"'/\\\u00e4\u21ad",
+ 'title': "youtube-dlc \"'/\\\u00e4\u21ad - youtube-dlc test song \"'/\\\u00e4\u21ad",
'duration': 9.8485,
},
'_skip': 'There is a limit of 200 free downloads / month for the test song'
def get_programme_id(item):
def get_from_attributes(item):
- for p in('identifier', 'group'):
+ for p in ('identifier', 'group'):
value = item.get(p)
if value and re.match(r'^[pb][\da-z]{7}$', value):
return value
etalk|
marilyn
)\.ca|
- much\.com
- )/.*?(?:\bvid(?:eoid)?=|-vid|~|%7E|/(?:episode)?)(?P<id>[0-9]{6,})'''
+ (?:much|cp24)\.com
+ )/.*?(?:\b(?:vid(?:eoid)?|clipId)=|-vid|~|%7E|/(?:episode)?)(?P<id>[0-9]{6,})'''
_TESTS = [{
'url': 'https://www.bnnbloomberg.ca/video/david-cockfield-s-top-picks~1403070',
'md5': '36d3ef559cfe8af8efe15922cd3ce950',
}, {
'url': 'http://www.etalk.ca/video?videoid=663455',
'only_matching': True,
+ }, {
+ 'url': 'https://www.cp24.com/video?clipId=1982548',
+ 'only_matching': True,
}]
_DOMAINS = {
'thecomedynetwork': 'comedy',
class BiliBiliIE(InfoExtractor):
- _VALID_URL = r'https?://(?:www\.|bangumi\.|)bilibili\.(?:tv|com)/(?:video/av|anime/(?P<anime_id>\d+)/play#)(?P<id>\d+)'
+ _VALID_URL = r'''(?x)
+ https?://
+ (?:(?:www|bangumi)\.)?
+ bilibili\.(?:tv|com)/
+ (?:
+ (?:
+ video/[aA][vV]|
+ anime/(?P<anime_id>\d+)/play\#
+ )(?P<id_bv>\d+)|
+ video/[bB][vV](?P<id>[^/?#&]+)
+ )
+ '''
_TESTS = [{
'url': 'http://www.bilibili.tv/video/av1074402/',
'skip_download': True, # Test metadata only
},
}]
+ }, {
+ # new BV video id format
+ 'url': 'https://www.bilibili.com/video/BV1JE411F741',
+ 'only_matching': True,
}]
_APP_KEY = 'iVGUTjsxvpLeuDCf'
url, smuggled_data = unsmuggle_url(url, {})
mobj = re.match(self._VALID_URL, url)
- video_id = mobj.group('id')
+ video_id = mobj.group('id') or mobj.group('id_bv')
anime_id = mobj.group('anime_id')
webpage = self._download_webpage(url, video_id)
webpage, 'player parameters'))['cid'][0]
else:
if 'no_bangumi_tip' not in smuggled_data:
- self.to_screen('Downloading episode %s. To download all videos in anime %s, re-run youtube-dl with %s' % (
+ self.to_screen('Downloading episode %s. To download all videos in anime %s, re-run youtube-dlc with %s' % (
video_id, anime_id, compat_urlparse.urljoin(url, '//bangumi.bilibili.com/anime/%s' % anime_id)))
headers = {
'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
entries, am_id, album_title, album_data.get('intro'))
return self.playlist_result(entries, am_id)
+
+
+class BiliBiliPlayerIE(InfoExtractor):
+ _VALID_URL = r'https?://player\.bilibili\.com/player\.html\?.*?\baid=(?P<id>\d+)'
+ _TEST = {
+ 'url': 'http://player.bilibili.com/player.html?aid=92494333&cid=157926707&page=1',
+ 'only_matching': True,
+ }
+
+ def _real_extract(self, url):
+ video_id = self._match_id(url)
+ return self.url_result(
+ 'http://www.bilibili.tv/video/av%s/' % video_id,
+ ie=BiliBiliIE.ie_key(), video_id=video_id)
from .common import InfoExtractor
from .vk import VKIE
-from ..utils import (
- HEADRequest,
- int_or_none,
+from ..compat import (
+ compat_b64decode,
+ compat_urllib_parse_unquote,
)
+from ..utils import int_or_none
class BIQLEIE(InfoExtractor):
if VKIE.suitable(embed_url):
return self.url_result(embed_url, VKIE.ie_key(), video_id)
- self._request_webpage(
- HEADRequest(embed_url), video_id, headers={'Referer': url})
- video_id, sig, _, access_token = self._get_cookies(embed_url)['video_ext'].value.split('%3A')
+ embed_page = self._download_webpage(
+ embed_url, video_id, headers={'Referer': url})
+ video_ext = self._get_cookies(embed_url).get('video_ext')
+ if video_ext:
+ video_ext = compat_urllib_parse_unquote(video_ext.value)
+ if not video_ext:
+ video_ext = compat_b64decode(self._search_regex(
+ r'video_ext\s*:\s*[\'"]([A-Za-z0-9+/=]+)',
+ embed_page, 'video_ext')).decode()
+ video_id, sig, _, access_token = video_ext.split(':')
item = self._download_json(
'https://api.vk.com/method/video.get', video_id,
headers={'User-Agent': 'okhttp/3.4.1'}, query={
from .common import InfoExtractor
from ..utils import (
+ ExtractorError,
+ GeoRestrictedError,
orderedSet,
unified_strdate,
urlencode_postdata,
for format_url in orderedSet(format_urls)]
if not formats:
- formats = self._parse_html5_media_entries(
- url, webpage, video_id)[0]['formats']
+ entries = self._parse_html5_media_entries(
+ url, webpage, video_id)
+ if not entries:
+ error = self._html_search_regex(r'<h1 class="page-title">([^<]+)</h1>', webpage, 'error', default='Cannot find video')
+ if error == 'Video Unavailable':
+ raise GeoRestrictedError(error)
+ raise ExtractorError(error)
+ formats = entries[0]['formats']
self._check_formats(formats, video_id)
self._sort_formats(formats)
import re
import struct
-from .common import InfoExtractor
from .adobepass import AdobePassIE
+from .common import InfoExtractor
from ..compat import (
compat_etree_fromstring,
+ compat_HTTPError,
compat_parse_qs,
compat_urllib_parse_urlparse,
compat_urlparse,
compat_xml_parse_error,
- compat_HTTPError,
)
from ..utils import (
- ExtractorError,
+ clean_html,
extract_attributes,
+ ExtractorError,
find_xpath_attr,
fix_xml_ampersands,
float_or_none,
- js_to_json,
int_or_none,
+ js_to_json,
+ mimetype2ext,
parse_iso8601,
smuggle_url,
+ str_or_none,
unescapeHTML,
unsmuggle_url,
- update_url_query,
- clean_html,
- mimetype2ext,
UnsupportedError,
+ update_url_query,
+ url_or_none,
)
# [2] looks like:
for video, script_tag, account_id, player_id, embed in re.findall(
r'''(?isx)
- (<video\s+[^>]*\bdata-video-id\s*=\s*['"]?[^>]+>)
+ (<video(?:-js)?\s+[^>]*\bdata-video-id\s*=\s*['"]?[^>]+>)
(?:.*?
(<script[^>]+
src=["\'](?:https?:)?//players\.brightcove\.net/
subtitles = {}
for text_track in json_data.get('text_tracks', []):
- if text_track.get('src'):
- subtitles.setdefault(text_track.get('srclang'), []).append({
- 'url': text_track['src'],
- })
+ if text_track.get('kind') != 'captions':
+ continue
+ text_track_url = url_or_none(text_track.get('src'))
+ if not text_track_url:
+ continue
+ lang = (str_or_none(text_track.get('srclang'))
+ or str_or_none(text_track.get('label')) or 'en').lower()
+ subtitles.setdefault(lang, []).append({
+ 'url': text_track_url,
+ })
is_live = False
duration = float_or_none(json_data.get('duration'), 1000)
account_id, player_id, embed, content_type, video_id = re.match(self._VALID_URL, url).groups()
- webpage = self._download_webpage(
- 'http://players.brightcove.net/%s/%s_%s/index.min.js'
- % (account_id, player_id, embed), video_id)
+ policy_key_id = '%s_%s' % (account_id, player_id)
+ policy_key = self._downloader.cache.load('brightcove', policy_key_id)
+ policy_key_extracted = False
+ store_pk = lambda x: self._downloader.cache.store('brightcove', policy_key_id, x)
- policy_key = None
+ def extract_policy_key():
+ webpage = self._download_webpage(
+ 'http://players.brightcove.net/%s/%s_%s/index.min.js'
+ % (account_id, player_id, embed), video_id)
- catalog = self._search_regex(
- r'catalog\(({.+?})\);', webpage, 'catalog', default=None)
- if catalog:
- catalog = self._parse_json(
- js_to_json(catalog), video_id, fatal=False)
+ policy_key = None
+
+ catalog = self._search_regex(
+ r'catalog\(({.+?})\);', webpage, 'catalog', default=None)
if catalog:
- policy_key = catalog.get('policyKey')
+ catalog = self._parse_json(
+ js_to_json(catalog), video_id, fatal=False)
+ if catalog:
+ policy_key = catalog.get('policyKey')
+
+ if not policy_key:
+ policy_key = self._search_regex(
+ r'policyKey\s*:\s*(["\'])(?P<pk>.+?)\1',
+ webpage, 'policy key', group='pk')
- if not policy_key:
- policy_key = self._search_regex(
- r'policyKey\s*:\s*(["\'])(?P<pk>.+?)\1',
- webpage, 'policy key', group='pk')
+ store_pk(policy_key)
+ return policy_key
api_url = 'https://edge.api.brightcove.com/playback/v1/accounts/%s/%ss/%s' % (account_id, content_type, video_id)
- headers = {
- 'Accept': 'application/json;pk=%s' % policy_key,
- }
+ headers = {}
referrer = smuggled_data.get('referrer')
if referrer:
headers.update({
'Referer': referrer,
'Origin': re.search(r'https?://[^/]+', referrer).group(0),
})
- try:
- json_data = self._download_json(api_url, video_id, headers=headers)
- except ExtractorError as e:
- if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
- json_data = self._parse_json(e.cause.read().decode(), video_id)[0]
- message = json_data.get('message') or json_data['error_code']
- if json_data.get('error_subcode') == 'CLIENT_GEO':
- self.raise_geo_restricted(msg=message)
- raise ExtractorError(message, expected=True)
- raise
+
+ for _ in range(2):
+ if not policy_key:
+ policy_key = extract_policy_key()
+ policy_key_extracted = True
+ headers['Accept'] = 'application/json;pk=%s' % policy_key
+ try:
+ json_data = self._download_json(api_url, video_id, headers=headers)
+ break
+ except ExtractorError as e:
+ if isinstance(e.cause, compat_HTTPError) and e.cause.code in (401, 403):
+ json_data = self._parse_json(e.cause.read().decode(), video_id)[0]
+ message = json_data.get('message') or json_data['error_code']
+ if json_data.get('error_subcode') == 'CLIENT_GEO':
+ self.raise_geo_restricted(msg=message)
+ elif json_data.get('error_code') == 'INVALID_POLICY_KEY' and not policy_key_extracted:
+ policy_key = None
+ store_pk(None)
+ continue
+ raise ExtractorError(message, expected=True)
+ raise
errors = json_data.get('errors')
if errors and errors[0].get('error_subcode') == 'TVE_AUTH':
_VALID_URL = r'https?://(?:[^/]+\.)?businessinsider\.(?:com|nl)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'http://uk.businessinsider.com/how-much-radiation-youre-exposed-to-in-everyday-life-2016-6',
- 'md5': 'ca237a53a8eb20b6dc5bd60564d4ab3e',
+ 'md5': 'ffed3e1e12a6f950aa2f7d83851b497a',
'info_dict': {
- 'id': 'hZRllCfw',
+ 'id': 'cjGDb0X9',
'ext': 'mp4',
- 'title': "Here's how much radiation you're exposed to in everyday life",
- 'description': 'md5:9a0d6e2c279948aadaa5e84d6d9b99bd',
- 'upload_date': '20170709',
- 'timestamp': 1499606400,
- },
- 'params': {
- 'skip_download': True,
+ 'title': "Bananas give you more radiation exposure than living next to a nuclear power plant",
+ 'description': 'md5:0175a3baf200dd8fa658f94cade841b3',
+ 'upload_date': '20160611',
+ 'timestamp': 1465675620,
},
}, {
'url': 'https://www.businessinsider.nl/5-scientifically-proven-things-make-you-less-attractive-2017-7/',
- 'only_matching': True,
+ 'md5': '43f438dbc6da0b89f5ac42f68529d84a',
+ 'info_dict': {
+ 'id': '5zJwd4FK',
+ 'ext': 'mp4',
+ 'title': 'Deze dingen zorgen ervoor dat je minder snel een date scoort',
+ 'description': 'md5:2af8975825d38a4fed24717bbe51db49',
+ 'upload_date': '20170705',
+ 'timestamp': 1499270528,
+ },
}, {
'url': 'http://www.businessinsider.com/excel-index-match-vlookup-video-how-to-2015-2?IR=T',
'only_matching': True,
jwplatform_id = self._search_regex(
(r'data-media-id=["\']([a-zA-Z0-9]{8})',
r'id=["\']jwplayer_([a-zA-Z0-9]{8})',
- r'id["\']?\s*:\s*["\']?([a-zA-Z0-9]{8})'),
+ r'id["\']?\s*:\s*["\']?([a-zA-Z0-9]{8})',
+ r'(?:jwplatform\.com/players/|jwplayer_)([a-zA-Z0-9]{8})'),
webpage, 'jwplatform id')
return self.url_result(
'jwplatform:%s' % jwplatform_id, ie=JWPlatformIE.ie_key(),
int_or_none,
merge_dicts,
parse_iso8601,
+ str_or_none,
+ url_or_none,
)
_VALID_URL = r'https?://mediazone\.vrt\.be/api/v1/(?P<site_id>canvas|een|ketnet|vrt(?:video|nieuws)|sporza)/assets/(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://mediazone.vrt.be/api/v1/ketnet/assets/md-ast-4ac54990-ce66-4d00-a8ca-9eac86f4c475',
- 'md5': '90139b746a0a9bd7bb631283f6e2a64e',
+ 'md5': '68993eda72ef62386a15ea2cf3c93107',
'info_dict': {
'id': 'md-ast-4ac54990-ce66-4d00-a8ca-9eac86f4c475',
'display_id': 'md-ast-4ac54990-ce66-4d00-a8ca-9eac86f4c475',
- 'ext': 'flv',
+ 'ext': 'mp4',
'title': 'Nachtwacht: De Greystook',
- 'description': 'md5:1db3f5dc4c7109c821261e7512975be7',
+ 'description': 'Nachtwacht: De Greystook',
'thumbnail': r're:^https?://.*\.jpg$',
- 'duration': 1468.03,
+ 'duration': 1468.04,
},
'expected_warnings': ['is not a supported codec', 'Unknown MIME type'],
}, {
'HLS': 'm3u8_native',
'HLS_AES': 'm3u8',
}
+ _REST_API_BASE = 'https://media-services-public.vrt.be/vualto-video-aggregator-web/rest/external/v1'
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
site_id, video_id = mobj.group('site_id'), mobj.group('id')
+ # Old API endpoint, serves more formats but may fail for some videos
data = self._download_json(
'https://mediazone.vrt.be/api/v1/%s/assets/%s'
- % (site_id, video_id), video_id)
+ % (site_id, video_id), video_id, 'Downloading asset JSON',
+ 'Unable to download asset JSON', fatal=False)
+
+ # New API endpoint
+ if not data:
+ token = self._download_json(
+ '%s/tokens' % self._REST_API_BASE, video_id,
+ 'Downloading token', data=b'',
+ headers={'Content-Type': 'application/json'})['vrtPlayerToken']
+ data = self._download_json(
+ '%s/videos/%s' % (self._REST_API_BASE, video_id),
+ video_id, 'Downloading video JSON', fatal=False, query={
+ 'vrtPlayerToken': token,
+ 'client': '%s@PROD' % site_id,
+ }, expected_status=400)
+ message = data.get('message')
+ if message and not data.get('title'):
+ if data.get('code') == 'AUTHENTICATION_REQUIRED':
+ self.raise_login_required(message)
+ raise ExtractorError(message, expected=True)
title = data['title']
description = data.get('description')
formats = []
for target in data['targetUrls']:
- format_url, format_type = target.get('url'), target.get('type')
+ format_url, format_type = url_or_none(target.get('url')), str_or_none(target.get('type'))
if not format_url or not format_type:
continue
+ format_type = format_type.upper()
if format_type in self._HLS_ENTRY_PROTOCOLS_MAP:
formats.extend(self._extract_m3u8_formats(
format_url, video_id, 'mp4', self._HLS_ENTRY_PROTOCOLS_MAP[format_type],
},
'skip': 'Pagina niet gevonden',
}, {
- 'url': 'https://www.een.be/sorry-voor-alles/herbekijk-sorry-voor-alles',
+ 'url': 'https://www.een.be/thuis/emma-pakt-thilly-aan',
'info_dict': {
- 'id': 'mz-ast-11a587f8-b921-4266-82e2-0bce3e80d07f',
- 'display_id': 'herbekijk-sorry-voor-alles',
+ 'id': 'md-ast-3a24ced2-64d7-44fb-b4ed-ed1aafbf90b8',
+ 'display_id': 'emma-pakt-thilly-aan',
'ext': 'mp4',
- 'title': 'Herbekijk Sorry voor alles',
- 'description': 'md5:8bb2805df8164e5eb95d6a7a29dc0dd3',
+ 'title': 'Emma pakt Thilly aan',
+ 'description': 'md5:c5c9b572388a99b2690030afa3f3bad7',
'thumbnail': r're:^https?://.*\.jpg$',
- 'duration': 3788.06,
+ 'duration': 118.24,
},
'params': {
'skip_download': True,
},
- 'skip': 'Episode no longer available',
+ 'expected_warnings': ['is not a supported codec'],
}, {
'url': 'https://www.canvas.be/check-point/najaar-2016/de-politie-uw-vriend',
'only_matching': True,
IE_DESC = 'VrtNU.be'
_VALID_URL = r'https?://(?:www\.)?vrt\.be/(?P<site_id>vrtnu)/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TESTS = [{
+ # Available via old API endpoint
'url': 'https://www.vrt.be/vrtnu/a-z/postbus-x/1/postbus-x-s1a1/',
'info_dict': {
'id': 'pbs-pub-2e2d8c27-df26-45c9-9dc6-90c78153044d$vid-90c932b1-e21d-4fb8-99b1-db7b49cf74de',
- 'ext': 'flv',
+ 'ext': 'mp4',
'title': 'De zwarte weduwe',
- 'description': 'md5:d90c21dced7db869a85db89a623998d4',
+ 'description': 'md5:db1227b0f318c849ba5eab1fef895ee4',
'duration': 1457.04,
'thumbnail': r're:^https?://.*\.jpg$',
- 'season': '1',
+ 'season': 'Season 1',
'season_number': 1,
'episode_number': 1,
},
- 'skip': 'This video is only available for registered users'
+ 'skip': 'This video is only available for registered users',
+ 'params': {
+ 'username': '<snip>',
+ 'password': '<snip>',
+ },
+ 'expected_warnings': ['is not a supported codec'],
+ }, {
+ # Only available via new API endpoint
+ 'url': 'https://www.vrt.be/vrtnu/a-z/kamp-waes/1/kamp-waes-s1a5/',
+ 'info_dict': {
+ 'id': 'pbs-pub-0763b56c-64fb-4d38-b95b-af60bf433c71$vid-ad36a73c-4735-4f1f-b2c0-a38e6e6aa7e1',
+ 'ext': 'mp4',
+ 'title': 'Aflevering 5',
+ 'description': 'Wie valt door de mand tijdens een missie?',
+ 'duration': 2967.06,
+ 'season': 'Season 1',
+ 'season_number': 1,
+ 'episode_number': 5,
+ },
+ 'skip': 'This video is only available for registered users',
+ 'params': {
+ 'username': '<snip>',
+ 'password': '<snip>',
+ },
+ 'expected_warnings': ['Unable to download asset JSON', 'is not a supported codec', 'Unknown MIME type'],
}]
_NETRC_MACHINE = 'vrtnu'
_APIKEY = '3_0Z2HujMtiWq_pkAjgnS2Md2E11a1AwZjYiBETtwNE-EoEHDINgtnvcAOpNgmrVGy'
# coding: utf-8
from __future__ import unicode_literals
+import hashlib
import json
import re
+from xml.sax.saxutils import escape
from .common import InfoExtractor
from ..compat import (
'clearleap': 'http://www.clearleap.com/namespace/clearleap/1.0/',
}
_GEO_COUNTRIES = ['CA']
+ _LOGIN_URL = 'https://api.loginradius.com/identity/v2/auth/login'
+ _TOKEN_URL = 'https://cloud-api.loginradius.com/sso/jwt/api/token'
+ _API_KEY = '3f4beddd-2061-49b0-ae80-6f1f2ed65b37'
+ _NETRC_MACHINE = 'cbcwatch'
+
+ def _signature(self, email, password):
+ data = json.dumps({
+ 'email': email,
+ 'password': password,
+ }).encode()
+ headers = {'content-type': 'application/json'}
+ query = {'apikey': self._API_KEY}
+ resp = self._download_json(self._LOGIN_URL, None, data=data, headers=headers, query=query)
+ access_token = resp['access_token']
+
+ # token
+ query = {
+ 'access_token': access_token,
+ 'apikey': self._API_KEY,
+ 'jwtapp': 'jwt',
+ }
+ resp = self._download_json(self._TOKEN_URL, None, headers=headers, query=query)
+ return resp['signature']
def _call_api(self, path, video_id):
url = path if path.startswith('http') else self._API_BASE_URL + path
def _real_initialize(self):
if self._valid_device_token():
return
- device = self._downloader.cache.load('cbcwatch', 'device') or {}
+ device = self._downloader.cache.load(
+ 'cbcwatch', self._cache_device_key()) or {}
self._device_id, self._device_token = device.get('id'), device.get('token')
if self._valid_device_token():
return
def _valid_device_token(self):
return self._device_id and self._device_token
+ def _cache_device_key(self):
+ email, _ = self._get_login_info()
+ return '%s_device' % hashlib.sha256(email.encode()).hexdigest() if email else 'device'
+
def _register_device(self):
- self._device_id = self._device_token = None
result = self._download_xml(
self._API_BASE_URL + 'device/register',
None, 'Acquiring device token',
data=b'<device><type>web</type></device>')
self._device_id = xpath_text(result, 'deviceId', fatal=True)
- self._device_token = xpath_text(result, 'deviceToken', fatal=True)
+ email, password = self._get_login_info()
+ if email and password:
+ signature = self._signature(email, password)
+ data = '<login><token>{0}</token><device><deviceId>{1}</deviceId><type>web</type></device></login>'.format(
+ escape(signature), escape(self._device_id)).encode()
+ url = self._API_BASE_URL + 'device/login'
+ result = self._download_xml(
+ url, None, data=data,
+ headers={'content-type': 'application/xml'})
+ self._device_token = xpath_text(result, 'token', fatal=True)
+ else:
+ self._device_token = xpath_text(result, 'deviceToken', fatal=True)
self._downloader.cache.store(
- 'cbcwatch', 'device', {
+ 'cbcwatch', self._cache_device_key(), {
'id': self._device_id,
'token': self._device_token,
})
# coding: utf-8
from __future__ import unicode_literals
+import base64
import re
from .common import InfoExtractor
class CloudflareStreamIE(InfoExtractor):
+ _DOMAIN_RE = r'(?:cloudflarestream\.com|(?:videodelivery|bytehighway)\.net)'
+ _EMBED_RE = r'embed\.%s/embed/[^/]+\.js\?.*?\bvideo=' % _DOMAIN_RE
+ _ID_RE = r'[\da-f]{32}|[\w-]+\.[\w-]+\.[\w-]+'
_VALID_URL = r'''(?x)
https?://
(?:
- (?:watch\.)?(?:cloudflarestream\.com|videodelivery\.net)/|
- embed\.(?:cloudflarestream\.com|videodelivery\.net)/embed/[^/]+\.js\?.*?\bvideo=
+ (?:watch\.)?%s/|
+ %s
)
- (?P<id>[\da-f]+)
- '''
+ (?P<id>%s)
+ ''' % (_DOMAIN_RE, _EMBED_RE, _ID_RE)
_TESTS = [{
'url': 'https://embed.cloudflarestream.com/embed/we4g.fla9.latest.js?video=31c9291ab41fac05471db4e73aa11717',
'info_dict': {
return [
mobj.group('url')
for mobj in re.finditer(
- r'<script[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//embed\.(?:cloudflarestream\.com|videodelivery\.net)/embed/[^/]+\.js\?.*?\bvideo=[\da-f]+?.*?)\1',
+ r'<script[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//%s(?:%s).*?)\1' % (CloudflareStreamIE._EMBED_RE, CloudflareStreamIE._ID_RE),
webpage)]
def _real_extract(self, url):
video_id = self._match_id(url)
+ domain = 'bytehighway.net' if 'bytehighway.net/' in url else 'videodelivery.net'
+ base_url = 'https://%s/%s/' % (domain, video_id)
+ if '.' in video_id:
+ video_id = self._parse_json(base64.urlsafe_b64decode(
+ video_id.split('.')[1]), video_id)['sub']
+ manifest_base_url = base_url + 'manifest/video.'
formats = self._extract_m3u8_formats(
- 'https://cloudflarestream.com/%s/manifest/video.m3u8' % video_id,
- video_id, 'mp4', entry_protocol='m3u8_native', m3u8_id='hls',
- fatal=False)
+ manifest_base_url + 'm3u8', video_id, 'mp4',
+ 'm3u8_native', m3u8_id='hls', fatal=False)
formats.extend(self._extract_mpd_formats(
- 'https://cloudflarestream.com/%s/manifest/video.mpd' % video_id,
- video_id, mpd_id='dash', fatal=False))
+ manifest_base_url + 'mpd', video_id, mpd_id='dash', fatal=False))
self._sort_formats(formats)
return {
'id': video_id,
'title': video_id,
+ 'thumbnail': base_url + 'thumbnails/thumbnail.jpg',
'formats': formats,
}
import math
from ..compat import (
- compat_cookiejar,
+ compat_cookiejar_Cookie,
compat_cookies,
compat_etree_Element,
compat_etree_fromstring,
Set to "root" to indicate that this is a
comment to the original video.
age_limit: Age restriction for the video, as an integer (years)
- webpage_url: The URL to the video webpage, if given to youtube-dl it
+ webpage_url: The URL to the video webpage, if given to youtube-dlc it
should allow to get the same result again. (It will be set
by YoutubeDL if it's missing)
categories: A list of categories that the video falls in, for example
'twitter card player')
def _search_json_ld(self, html, video_id, expected_type=None, **kwargs):
- json_ld = self._search_regex(
- JSON_LD_RE, html, 'JSON-LD', group='json_ld', **kwargs)
+ json_ld_list = list(re.finditer(JSON_LD_RE, html))
default = kwargs.get('default', NO_DEFAULT)
- if not json_ld:
- return default if default is not NO_DEFAULT else {}
# JSON-LD may be malformed and thus `fatal` should be respected.
# At the same time `default` may be passed that assumes `fatal=False`
# for _search_regex. Let's simulate the same behavior here as well.
fatal = kwargs.get('fatal', True) if default == NO_DEFAULT else False
- return self._json_ld(json_ld, video_id, fatal=fatal, expected_type=expected_type)
+ json_ld = []
+ for mobj in json_ld_list:
+ json_ld_item = self._parse_json(
+ mobj.group('json_ld'), video_id, fatal=fatal)
+ if not json_ld_item:
+ continue
+ if isinstance(json_ld_item, dict):
+ json_ld.append(json_ld_item)
+ elif isinstance(json_ld_item, (list, tuple)):
+ json_ld.extend(json_ld_item)
+ if json_ld:
+ json_ld = self._json_ld(json_ld, video_id, fatal=fatal, expected_type=expected_type)
+ if json_ld:
+ return json_ld
+ if default is not NO_DEFAULT:
+ return default
+ elif fatal:
+ raise RegexNotFoundError('Unable to extract JSON-LD')
+ else:
+ self._downloader.report_warning('unable to extract JSON-LD %s' % bug_reports_message())
+ return {}
def _json_ld(self, json_ld, video_id, fatal=True, expected_type=None):
if isinstance(json_ld, compat_str):
extract_interaction_statistic(e)
for e in json_ld:
- if isinstance(e.get('@context'), compat_str) and re.match(r'^https?://schema.org/?$', e.get('@context')):
+ if '@context' in e:
item_type = e.get('@type')
if expected_type is not None and expected_type != item_type:
- return info
+ continue
if item_type in ('TVEpisode', 'Episode'):
episode_name = unescapeHTML(e.get('name'))
info.update({
})
elif item_type == 'VideoObject':
extract_video_object(e)
- continue
+ if expected_type is None:
+ continue
+ else:
+ break
video = e.get('video')
if isinstance(video, dict) and video.get('@type') == 'VideoObject':
extract_video_object(video)
- break
+ if expected_type is None:
+ continue
+ else:
+ break
return dict((k, v) for k, v in info.items() if v is not None)
@staticmethod
if not isinstance(manifest, compat_etree_Element) and not fatal:
return []
- # currently youtube-dl cannot decode the playerVerificationChallenge as Akamai uses Adobe Alchemy
+ # currently youtube-dlc cannot decode the playerVerificationChallenge as Akamai uses Adobe Alchemy
akamai_pv = manifest.find('{http://ns.adobe.com/f4m/1.0}pv-2.0')
if akamai_pv is not None and ';' in akamai_pv.text:
playerVerificationChallenge = akamai_pv.text.split(';')[0]
if res is False:
return []
ism_doc, urlh = res
+ if ism_doc is None:
+ return []
return self._parse_ism_formats(ism_doc, urlh.geturl(), ism_id)
def _set_cookie(self, domain, name, value, expire_time=None, port=None,
path='/', secure=False, discard=False, rest={}, **kwargs):
- cookie = compat_cookiejar.Cookie(
+ cookie = compat_cookiejar_Cookie(
0, name, value, port, port is not None, domain, True,
domain.startswith('.'), path, True, secure, expire_time,
discard, None, None, rest)
def _real_extract(self, url):
msg = (
- 'You\'ve asked youtube-dl to download the URL "%s". '
+ 'You\'ve asked youtube-dlc to download the URL "%s". '
'That doesn\'t make any sense. '
'Simply remove the parameter in your command or configuration.'
) % url
if not self._downloader.params.get('verbose'):
- msg += ' Add -v to the command line to see what arguments and configuration youtube-dl got.'
+ msg += ' Add -v to the command line to see what arguments and configuration youtube-dlc got.'
raise ExtractorError(msg, expected=True)
compat_b64decode,
compat_etree_Element,
compat_etree_fromstring,
+ compat_str,
compat_urllib_parse_urlencode,
compat_urllib_request,
compat_urlparse,
intlist_to_bytes,
int_or_none,
lowercase_escape,
+ merge_dicts,
remove_end,
sanitized_Request,
- unified_strdate,
urlencode_postdata,
xpath_text,
)
# rtmp
'skip_download': True,
},
+ 'skip': 'Video gone',
}, {
'url': 'http://www.crunchyroll.com/media-589804/culture-japan-1',
'info_dict': {
'info_dict': {
'id': '702409',
'ext': 'mp4',
- 'title': 'Re:ZERO -Starting Life in Another World- Episode 5 – The Morning of Our Promise Is Still Distant',
- 'description': 'md5:97664de1ab24bbf77a9c01918cb7dca9',
+ 'title': compat_str,
+ 'description': compat_str,
'thumbnail': r're:^https?://.*\.jpg$',
- 'uploader': 'TV TOKYO',
- 'upload_date': '20160508',
+ 'uploader': 'Re:Zero Partners',
+ 'timestamp': 1462098900,
+ 'upload_date': '20160501',
},
'params': {
# m3u8 download
'info_dict': {
'id': '727589',
'ext': 'mp4',
- 'title': "KONOSUBA -God's blessing on this wonderful world! 2 Episode 1 – Give Me Deliverance From This Judicial Injustice!",
- 'description': 'md5:cbcf05e528124b0f3a0a419fc805ea7d',
+ 'title': compat_str,
+ 'description': compat_str,
'thumbnail': r're:^https?://.*\.jpg$',
'uploader': 'Kadokawa Pictures Inc.',
- 'upload_date': '20170118',
- 'series': "KONOSUBA -God's blessing on this wonderful world!",
+ 'timestamp': 1484130900,
+ 'upload_date': '20170111',
+ 'series': compat_str,
'season': "KONOSUBA -God's blessing on this wonderful world! 2",
'season_number': 2,
'episode': 'Give Me Deliverance From This Judicial Injustice!',
'info_dict': {
'id': '535080',
'ext': 'mp4',
- 'title': '11eyes Episode 1 – Red Night ~ Piros éjszaka',
- 'description': 'Kakeru and Yuka are thrown into an alternate nightmarish world they call "Red Night".',
+ 'title': compat_str,
+ 'description': compat_str,
'uploader': 'Marvelous AQL Inc.',
- 'upload_date': '20091021',
+ 'timestamp': 1255512600,
+ 'upload_date': '20091014',
},
'params': {
# Just test metadata extraction
# just test metadata extraction
'skip_download': True,
},
+ 'skip': 'Video gone',
}, {
# A video with a vastly different season name compared to the series name
'url': 'http://www.crunchyroll.com/nyarko-san-another-crawling-chaos/episode-1-test-590532',
'info_dict': {
'id': '590532',
'ext': 'mp4',
- 'title': 'Haiyoru! Nyaruani (ONA) Episode 1 – Test',
- 'description': 'Mahiro and Nyaruko talk about official certification.',
+ 'title': compat_str,
+ 'description': compat_str,
'uploader': 'TV TOKYO',
+ 'timestamp': 1330956000,
'upload_date': '20120305',
'series': 'Nyarko-san: Another Crawling Chaos',
'season': 'Haiyoru! Nyaruani (ONA)',
webpage, 'language', default=None, group='lang')
video_title = self._html_search_regex(
- r'(?s)<h1[^>]*>((?:(?!<h1).)*?<span[^>]+itemprop=["\']title["\'][^>]*>(?:(?!<h1).)+?)</h1>',
- webpage, 'video_title')
+ (r'(?s)<h1[^>]*>((?:(?!<h1).)*?<(?:span[^>]+itemprop=["\']title["\']|meta[^>]+itemprop=["\']position["\'])[^>]*>(?:(?!<h1).)+?)</h1>',
+ r'<title>(.+?),\s+-\s+.+? Crunchyroll'),
+ webpage, 'video_title', default=None)
+ if not video_title:
+ video_title = re.sub(r'^Watch\s+', '', self._og_search_description(webpage))
video_title = re.sub(r' {2,}', ' ', video_title)
video_description = (self._parse_json(self._html_search_regex(
r'<script[^>]*>\s*.+?\[media_id=%s\].+?({.+?"description"\s*:.+?})\);' % video_id,
webpage, 'description', default='{}'), video_id) or media_metadata).get('description')
if video_description:
video_description = lowercase_escape(video_description.replace(r'\r\n', '\n'))
- video_upload_date = self._html_search_regex(
- [r'<div>Availability for free users:(.+?)</div>', r'<div>[^<>]+<span>\s*(.+?\d{4})\s*</span></div>'],
- webpage, 'video_upload_date', fatal=False, flags=re.DOTALL)
- if video_upload_date:
- video_upload_date = unified_strdate(video_upload_date)
video_uploader = self._html_search_regex(
# try looking for both an uploader that's a link and one that's not
[r'<a[^>]+href="/publisher/[^"]+"[^>]*>([^<]+)</a>', r'<div>\s*Publisher:\s*<span>\s*(.+?)\s*</span>\s*</div>'],
- webpage, 'video_uploader', fatal=False)
+ webpage, 'video_uploader', default=False)
formats = []
for stream in media.get('streams', []):
r'(?s)<h\d[^>]+id=["\']showmedia_about_episode_num[^>]+>.+?</h\d>\s*<h4>\s*Season (\d+)',
webpage, 'season number', default=None))
- return {
+ info = self._search_json_ld(webpage, video_id, default={})
+
+ return merge_dicts({
'id': video_id,
'title': video_title,
'description': video_description,
'duration': duration,
'thumbnail': thumbnail,
'uploader': video_uploader,
- 'upload_date': video_upload_date,
'series': series,
'season': season,
'season_number': season_number,
'episode_number': episode_number,
'subtitles': subtitles,
'formats': formats,
- }
+ }, info)
class CrunchyrollShowPlaylistIE(CrunchyrollBaseIE):
@staticmethod
def _get_cookie_value(cookies, name):
- cookie = cookies.get('name')
+ cookie = cookies.get(name)
if cookie:
return cookie.value
_TESTS = [{
# 4x3
'url': 'http://www.dctp.tv/filme/videoinstallation-fuer-eine-kaufhausfassade/',
+ 'md5': '3ffbd1556c3fe210724d7088fad723e3',
'info_dict': {
'id': '95eaa4f33dad413aa17b4ee613cccc6c',
'display_id': 'videoinstallation-fuer-eine-kaufhausfassade',
- 'ext': 'flv',
+ 'ext': 'm4v',
'title': 'Videoinstallation für eine Kaufhausfassade',
'description': 'Kurzfilm',
'thumbnail': r're:^https?://.*\.jpg$',
'timestamp': 1302172322,
'upload_date': '20110407',
},
- 'params': {
- # rtmp download
- 'skip_download': True,
- },
}, {
# 16x9
'url': 'http://www.dctp.tv/filme/sind-youtuber-die-besseren-lehrer/',
uuid = media['uuid']
title = media['title']
- ratio = '16x9' if media.get('is_wide') else '4x3'
- play_path = 'mp4:%s_dctp_0500_%s.m4v' % (uuid, ratio)
-
- servers = self._download_json(
- 'http://www.dctp.tv/streaming_servers/', display_id,
- note='Downloading server list JSON', fatal=False)
-
- if servers:
- endpoint = next(
- server['endpoint']
- for server in servers
- if url_or_none(server.get('endpoint'))
- and 'cloudfront' in server['endpoint'])
- else:
- endpoint = 'rtmpe://s2pqqn4u96e4j8.cloudfront.net/cfx/st/'
-
- app = self._search_regex(
- r'^rtmpe?://[^/]+/(?P<app>.*)$', endpoint, 'app')
-
- formats = [{
- 'url': endpoint,
- 'app': app,
- 'play_path': play_path,
- 'page_url': url,
- 'player_url': 'http://svm-prod-dctptv-static.s3.amazonaws.com/dctptv-relaunch2012-110.swf',
- 'ext': 'flv',
- }]
+ is_wide = media.get('is_wide')
+ formats = []
+
+ def add_formats(suffix):
+ templ = 'https://%%s/%s_dctp_%s.m4v' % (uuid, suffix)
+ formats.extend([{
+ 'format_id': 'hls-' + suffix,
+ 'url': templ % 'cdn-segments.dctp.tv' + '/playlist.m3u8',
+ 'protocol': 'm3u8_native',
+ }, {
+ 'format_id': 's3-' + suffix,
+ 'url': templ % 'completed-media.s3.amazonaws.com',
+ }, {
+ 'format_id': 'http-' + suffix,
+ 'url': templ % 'cdn-media.dctp.tv',
+ }])
+
+ add_formats('0500_' + ('16x9' if is_wide else '4x3'))
+ if is_wide:
+ add_formats('720p')
thumbnails = []
images = media.get('images')
--- /dev/null
+from __future__ import unicode_literals
+
+import json
+import re
+
+from .common import InfoExtractor
+from ..utils import (
+ ExtractorError,
+ int_or_none,
+ orderedSet,
+)
+
+
+class DeezerBaseInfoExtractor(InfoExtractor):
+ def get_data(self, url):
+ if not self._downloader.params.get('test'):
+ self._downloader.report_warning('For now, this extractor only supports the 30 second previews. Patches welcome!')
+
+ mobj = re.match(self._VALID_URL, url)
+ data_id = mobj.group('id')
+
+ webpage = self._download_webpage(url, data_id)
+ geoblocking_msg = self._html_search_regex(
+ r'<p class="soon-txt">(.*?)</p>', webpage, 'geoblocking message',
+ default=None)
+ if geoblocking_msg is not None:
+ raise ExtractorError(
+ 'Deezer said: %s' % geoblocking_msg, expected=True)
+
+ data_json = self._search_regex(
+ (r'__DZR_APP_STATE__\s*=\s*({.+?})\s*</script>',
+ r'naboo\.display\(\'[^\']+\',\s*(.*?)\);\n'),
+ webpage, 'data JSON')
+ data = json.loads(data_json)
+ return data_id, webpage, data
+
+
+class DeezerPlaylistIE(DeezerBaseInfoExtractor):
+ _VALID_URL = r'https?://(?:www\.)?deezer\.com/(../)?playlist/(?P<id>[0-9]+)'
+ _TEST = {
+ 'url': 'http://www.deezer.com/playlist/176747451',
+ 'info_dict': {
+ 'id': '176747451',
+ 'title': 'Best!',
+ 'uploader': 'anonymous',
+ 'thumbnail': r're:^https?://(e-)?cdns-images\.dzcdn\.net/images/cover/.*\.jpg$',
+ },
+ 'playlist_count': 29,
+ }
+
+ def _real_extract(self, url):
+ playlist_id, webpage, data = self.get_data(url)
+
+ playlist_title = data.get('DATA', {}).get('TITLE')
+ playlist_uploader = data.get('DATA', {}).get('PARENT_USERNAME')
+ playlist_thumbnail = self._search_regex(
+ r'<img id="naboo_playlist_image".*?src="([^"]+)"', webpage,
+ 'playlist thumbnail')
+
+ entries = []
+ for s in data.get('SONGS', {}).get('data'):
+ formats = [{
+ 'format_id': 'preview',
+ 'url': s.get('MEDIA', [{}])[0].get('HREF'),
+ 'preference': -100, # Only the first 30 seconds
+ 'ext': 'mp3',
+ }]
+ self._sort_formats(formats)
+ artists = ', '.join(
+ orderedSet(a.get('ART_NAME') for a in s.get('ARTISTS')))
+ entries.append({
+ 'id': s.get('SNG_ID'),
+ 'duration': int_or_none(s.get('DURATION')),
+ 'title': '%s - %s' % (artists, s.get('SNG_TITLE')),
+ 'uploader': s.get('ART_NAME'),
+ 'uploader_id': s.get('ART_ID'),
+ 'age_limit': 16 if s.get('EXPLICIT_LYRICS') == '1' else 0,
+ 'formats': formats,
+ })
+
+ return {
+ '_type': 'playlist',
+ 'id': playlist_id,
+ 'title': playlist_title,
+ 'uploader': playlist_uploader,
+ 'thumbnail': playlist_thumbnail,
+ 'entries': entries,
+ }
+
+
+class DeezerAlbumIE(DeezerBaseInfoExtractor):
+ _VALID_URL = r'https?://(?:www\.)?deezer\.com/(../)?album/(?P<id>[0-9]+)'
+ _TEST = {
+ 'url': 'https://www.deezer.com/fr/album/67505622',
+ 'info_dict': {
+ 'id': '67505622',
+ 'title': 'Last Week',
+ 'uploader': 'Home Brew',
+ 'thumbnail': r're:^https?://(e-)?cdns-images\.dzcdn\.net/images/cover/.*\.jpg$',
+ },
+ 'playlist_count': 7,
+ }
+
+ def _real_extract(self, url):
+ album_id, webpage, data = self.get_data(url)
+
+ album_title = data.get('DATA', {}).get('ALB_TITLE')
+ album_uploader = data.get('DATA', {}).get('ART_NAME')
+ album_thumbnail = self._search_regex(
+ r'<img id="naboo_album_image".*?src="([^"]+)"', webpage,
+ 'album thumbnail')
+
+ entries = []
+ for s in data.get('SONGS', {}).get('data'):
+ formats = [{
+ 'format_id': 'preview',
+ 'url': s.get('MEDIA', [{}])[0].get('HREF'),
+ 'preference': -100, # Only the first 30 seconds
+ 'ext': 'mp3',
+ }]
+ self._sort_formats(formats)
+ artists = ', '.join(
+ orderedSet(a.get('ART_NAME') for a in s.get('ARTISTS')))
+ entries.append({
+ 'id': s.get('SNG_ID'),
+ 'duration': int_or_none(s.get('DURATION')),
+ 'title': '%s - %s' % (artists, s.get('SNG_TITLE')),
+ 'uploader': s.get('ART_NAME'),
+ 'uploader_id': s.get('ART_ID'),
+ 'age_limit': 16 if s.get('EXPLICIT_LYRICS') == '1' else 0,
+ 'formats': formats,
+ 'track': s.get('SNG_TITLE'),
+ 'track_number': int_or_none(s.get('TRACK_NUMBER')),
+ 'track_id': s.get('SNG_ID'),
+ 'artist': album_uploader,
+ 'album': album_title,
+ 'album_artist': album_uploader,
+ })
+
+ return {
+ '_type': 'playlist',
+ 'id': album_id,
+ 'title': album_title,
+ 'uploader': album_uploader,
+ 'thumbnail': album_thumbnail,
+ 'entries': entries,
+ }
class DiscoveryIE(DiscoveryGoBaseIE):
_VALID_URL = r'''(?x)https?://
(?P<site>
- (?:(?:www|go)\.)?discovery|
- (?:www\.)?
+ go\.discovery|
+ www\.
(?:
investigationdiscovery|
discoverylife|
ahctv|
destinationamerica|
sciencechannel|
- tlc|
- velocity
+ tlc
)|
watch\.
(?:
'authRel': 'authorization',
'client_id': '3020a40c2356a645b4b4',
'nonce': ''.join([random.choice(string.ascii_letters) for _ in range(32)]),
- 'redirectUri': 'https://fusion.ddmcdn.com/app/mercury-sdk/180/redirectHandler.html?https://www.%s.com' % site,
+ 'redirectUri': 'https://www.discovery.com/',
})['access_token']
headers = self.geo_verification_headers()
--- /dev/null
+# coding: utf-8
+from __future__ import unicode_literals
+
+import string
+import random
+import time
+
+from .common import InfoExtractor
+
+
+class DoodStreamIE(InfoExtractor):
+ _VALID_URL = r'https?://(?:www\.)?dood\.(?:to|watch)/[ed]/(?P<id>[a-z0-9]+)'
+ _TESTS = [{
+ 'url': 'http://dood.to/e/5s1wmbdacezb',
+ 'md5': '4568b83b31e13242b3f1ff96c55f0595',
+ 'info_dict': {
+ 'id': '5s1wmbdacezb',
+ 'ext': 'mp4',
+ 'title': 'Kat Wonders - Monthly May 2020',
+ 'description': 'Kat Wonders - Monthly May 2020 | DoodStream.com',
+ 'thumbnail': 'https://img.doodcdn.com/snaps/flyus84qgl2fsk4g.jpg',
+ }
+ }, {
+ 'url': 'https://dood.to/d/jzrxn12t2s7n',
+ 'md5': '3207e199426eca7c2aa23c2872e6728a',
+ 'info_dict': {
+ 'id': 'jzrxn12t2s7n',
+ 'ext': 'mp4',
+ 'title': 'Stacy Cruz Cute ALLWAYSWELL',
+ 'description': 'Stacy Cruz Cute ALLWAYSWELL | DoodStream.com',
+ 'thumbnail': 'https://img.doodcdn.com/snaps/8edqd5nppkac3x8u.jpg',
+ }
+ }]
+
+ def _real_extract(self, url):
+ video_id = self._match_id(url)
+ webpage = self._download_webpage(url, video_id)
+
+ if '/d/' in url:
+ url = "https://dood.to" + self._html_search_regex(
+ r'<iframe src="(/e/[a-z0-9]+)"', webpage, 'embed')
+ video_id = self._match_id(url)
+ webpage = self._download_webpage(url, video_id)
+
+ title = self._html_search_meta(['og:title', 'twitter:title'],
+ webpage, default=None)
+ thumb = self._html_search_meta(['og:image', 'twitter:image'],
+ webpage, default=None)
+ token = self._html_search_regex(r'[?&]token=([a-z0-9]+)[&\']', webpage, 'token')
+ description = self._html_search_meta(
+ ['og:description', 'description', 'twitter:description'],
+ webpage, default=None)
+ auth_url = 'https://dood.to' + self._html_search_regex(
+ r'(/pass_md5.*?)\'', webpage, 'pass_md5')
+ headers = {
+ 'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:53.0) Gecko/20100101 Firefox/66.0',
+ 'referer': url
+ }
+
+ webpage = self._download_webpage(auth_url, video_id, headers=headers)
+ final_url = webpage + ''.join([random.choice(string.ascii_letters + string.digits) for _ in range(10)]) + "?token=" + token + "&expiry=" + str(int(time.time() * 1000))
+
+ return {
+ 'id': video_id,
+ 'title': title,
+ 'url': final_url,
+ 'http_headers': headers,
+ 'ext': 'mp4',
+ 'description': description,
+ 'thumbnail': thumb,
+ }
_VALID_URL = r'https?://(?:www\.)?dropbox[.]com/sh?/(?P<id>[a-zA-Z0-9]{15})/.*'
_TESTS = [
{
- 'url': 'https://www.dropbox.com/s/nelirfsxnmcfbfh/youtube-dl%20test%20video%20%27%C3%A4%22BaW_jenozKc.mp4?dl=0',
+ 'url': 'https://www.dropbox.com/s/nelirfsxnmcfbfh/youtube-dlc%20test%20video%20%27%C3%A4%22BaW_jenozKc.mp4?dl=0',
'info_dict': {
'id': 'nelirfsxnmcfbfh',
'ext': 'mp4',
- 'title': 'youtube-dl test video \'ä"BaW_jenozKc'
+ 'title': 'youtube-dlc test video \'ä"BaW_jenozKc'
}
}, {
'url': 'https://www.dropbox.com/sh/662glsejgzoj9sr/AAByil3FGH9KFNZ13e08eSa1a/Pregame%20Ceremony%20Program%20PA%2020140518.m4v',
--- /dev/null
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..compat import compat_urlparse
+from ..utils import (
+ clean_html,
+ extract_attributes,
+ ExtractorError,
+ get_elements_by_class,
+ int_or_none,
+ js_to_json,
+ smuggle_url,
+ unescapeHTML,
+)
+
+
+def _get_elements_by_tag_and_attrib(html, tag=None, attribute=None, value=None, escape_value=True):
+ """Return the content of the tag with the specified attribute in the passed HTML document"""
+
+ if tag is None:
+ tag = '[a-zA-Z0-9:._-]+'
+ if attribute is None:
+ attribute = ''
+ else:
+ attribute = r'\s+(?P<attribute>%s)' % re.escape(attribute)
+ if value is None:
+ value = ''
+ else:
+ value = re.escape(value) if escape_value else value
+ value = '=[\'"]?(?P<value>%s)[\'"]?' % value
+
+ retlist = []
+ for m in re.finditer(r'''(?xs)
+ <(?P<tag>%s)
+ (?:\s+[a-zA-Z0-9:._-]+(?:=[a-zA-Z0-9:._-]*|="[^"]*"|='[^']*'|))*?
+ %s%s
+ (?:\s+[a-zA-Z0-9:._-]+(?:=[a-zA-Z0-9:._-]*|="[^"]*"|='[^']*'|))*?
+ \s*>
+ (?P<content>.*?)
+ </\1>
+ ''' % (tag, attribute, value), html):
+ retlist.append(m)
+
+ return retlist
+
+
+def _get_element_by_tag_and_attrib(html, tag=None, attribute=None, value=None, escape_value=True):
+ retval = _get_elements_by_tag_and_attrib(html, tag, attribute, value, escape_value)
+ return retval[0] if retval else None
+
+
+class DubokuIE(InfoExtractor):
+ IE_NAME = 'duboku'
+ IE_DESC = 'www.duboku.co'
+
+ _VALID_URL = r'(?:https?://[^/]+\.duboku\.co/vodplay/)(?P<id>[0-9]+-[0-9-]+)\.html.*'
+ _TESTS = [{
+ 'url': 'https://www.duboku.co/vodplay/1575-1-1.html',
+ 'info_dict': {
+ 'id': '1575-1-1',
+ 'ext': 'ts',
+ 'series': '白色月光',
+ 'title': 'contains:白色月光',
+ 'season_number': 1,
+ 'episode_number': 1,
+ },
+ 'params': {
+ 'skip_download': 'm3u8 download',
+ },
+ }, {
+ 'url': 'https://www.duboku.co/vodplay/1588-1-1.html',
+ 'info_dict': {
+ 'id': '1588-1-1',
+ 'ext': 'ts',
+ 'series': '亲爱的自己',
+ 'title': 'contains:预告片',
+ 'season_number': 1,
+ 'episode_number': 1,
+ },
+ 'params': {
+ 'skip_download': 'm3u8 download',
+ },
+ }]
+
+ _PLAYER_DATA_PATTERN = r'player_data\s*=\s*(\{\s*(.*)})\s*;?\s*</script'
+
+ def _real_extract(self, url):
+ video_id = self._match_id(url)
+ temp = video_id.split('-')
+ series_id = temp[0]
+ season_id = temp[1]
+ episode_id = temp[2]
+
+ webpage_url = 'https://www.duboku.co/vodplay/%s.html' % video_id
+ webpage_html = self._download_webpage(webpage_url, video_id)
+
+ # extract video url
+
+ player_data = self._search_regex(
+ self._PLAYER_DATA_PATTERN, webpage_html, 'player_data')
+ player_data = self._parse_json(player_data, video_id, js_to_json)
+
+ # extract title
+
+ temp = get_elements_by_class('title', webpage_html)
+ series_title = None
+ title = None
+ for html in temp:
+ mobj = re.search(r'<a\s+.*>(.*)</a>', html)
+ if mobj:
+ href = extract_attributes(mobj.group(0)).get('href')
+ if href:
+ mobj1 = re.search(r'/(\d+)\.html', href)
+ if mobj1 and mobj1.group(1) == series_id:
+ series_title = clean_html(mobj.group(0))
+ series_title = re.sub(r'[\s\r\n\t]+', ' ', series_title)
+ title = clean_html(html)
+ title = re.sub(r'[\s\r\n\t]+', ' ', title)
+ break
+
+ data_url = player_data.get('url')
+ if not data_url:
+ raise ExtractorError('Cannot find url in player_data')
+ data_from = player_data.get('from')
+
+ # if it is an embedded iframe, maybe it's an external source
+ if data_from == 'iframe':
+ # use _type url_transparent to retain the meaningful details
+ # of the video.
+ return {
+ '_type': 'url_transparent',
+ 'url': smuggle_url(data_url, {'http_headers': {'Referer': webpage_url}}),
+ 'id': video_id,
+ 'title': title,
+ 'series': series_title,
+ 'season_number': int_or_none(season_id),
+ 'season_id': season_id,
+ 'episode_number': int_or_none(episode_id),
+ 'episode_id': episode_id,
+ }
+
+ formats = self._extract_m3u8_formats(data_url, video_id, 'mp4')
+
+ return {
+ 'id': video_id,
+ 'title': title,
+ 'series': series_title,
+ 'season_number': int_or_none(season_id),
+ 'season_id': season_id,
+ 'episode_number': int_or_none(episode_id),
+ 'episode_id': episode_id,
+ 'formats': formats,
+ 'http_headers': {'Referer': 'https://www.duboku.co/static/player/videojs.html'}
+ }
+
+
+class DubokuPlaylistIE(InfoExtractor):
+ IE_NAME = 'duboku:list'
+ IE_DESC = 'www.duboku.co entire series'
+
+ _VALID_URL = r'(?:https?://[^/]+\.duboku\.co/voddetail/)(?P<id>[0-9]+)\.html.*'
+ _TESTS = [{
+ 'url': 'https://www.duboku.co/voddetail/1575.html',
+ 'info_dict': {
+ 'id': 'startswith:1575',
+ 'title': '白色月光',
+ },
+ 'playlist_count': 12,
+ }, {
+ 'url': 'https://www.duboku.co/voddetail/1554.html',
+ 'info_dict': {
+ 'id': 'startswith:1554',
+ 'title': '以家人之名',
+ },
+ 'playlist_mincount': 30,
+ }, {
+ 'url': 'https://www.duboku.co/voddetail/1554.html#playlist2',
+ 'info_dict': {
+ 'id': '1554#playlist2',
+ 'title': '以家人之名',
+ },
+ 'playlist_mincount': 27,
+ }]
+
+ def _real_extract(self, url):
+ mobj = re.match(self._VALID_URL, url)
+ if mobj is None:
+ raise ExtractorError('Invalid URL: %s' % url)
+ series_id = mobj.group('id')
+ fragment = compat_urlparse.urlparse(url).fragment
+
+ webpage_url = 'https://www.duboku.co/voddetail/%s.html' % series_id
+ webpage_html = self._download_webpage(webpage_url, series_id)
+
+ # extract title
+
+ title = _get_element_by_tag_and_attrib(webpage_html, 'h1', 'class', 'title')
+ title = unescapeHTML(title.group('content')) if title else None
+ if not title:
+ title = self._html_search_meta('keywords', webpage_html)
+ if not title:
+ title = _get_element_by_tag_and_attrib(webpage_html, 'title')
+ title = unescapeHTML(title.group('content')) if title else None
+
+ # extract playlists
+
+ playlists = {}
+ for div in _get_elements_by_tag_and_attrib(
+ webpage_html, attribute='id', value='playlist\\d+', escape_value=False):
+ playlist_id = div.group('value')
+ playlist = []
+ for a in _get_elements_by_tag_and_attrib(
+ div.group('content'), 'a', 'href', value='[^\'"]+?', escape_value=False):
+ playlist.append({
+ 'href': unescapeHTML(a.group('value')),
+ 'title': unescapeHTML(a.group('content'))
+ })
+ playlists[playlist_id] = playlist
+
+ # select the specified playlist if url fragment exists
+ playlist = None
+ playlist_id = None
+ if fragment:
+ playlist = playlists.get(fragment)
+ playlist_id = fragment
+ else:
+ first = next(iter(playlists.items()), None)
+ if first:
+ (playlist_id, playlist) = first
+ if not playlist:
+ raise ExtractorError(
+ 'Cannot find %s' % fragment if fragment else 'Cannot extract playlist')
+
+ # return url results
+ return self.playlist_result([
+ self.url_result(
+ compat_urlparse.urljoin('https://www.duboku.co', x['href']),
+ ie=DubokuIE.ie_key(), video_title=x.get('title'))
+ for x in playlist], series_id + '#' + playlist_id, title)
_VALID_URL = r'https?://8tracks\.com/(?P<user>[^/]+)/(?P<id>[^/#]+)(?:#.*)?$'
_TEST = {
'name': 'EightTracks',
- 'url': 'http://8tracks.com/ytdl/youtube-dl-test-tracks-a',
+ 'url': 'http://8tracks.com/ytdl/youtube-dlc-test-tracks-a',
'info_dict': {
'id': '1336550',
- 'display_id': 'youtube-dl-test-tracks-a',
+ 'display_id': 'youtube-dlc-test-tracks-a',
'description': "test chars: \"'/\\ä↭",
- 'title': "youtube-dl test tracks \"'/\\ä↭<>",
+ 'title': "youtube-dlc test tracks \"'/\\ä↭<>",
},
'playlist': [
{
'info_dict': {
'id': '11885610',
'ext': 'm4a',
- 'title': "youtue-dl project<>\"' - youtube-dl test track 1 \"'/\\\u00e4\u21ad",
+ 'title': "youtue-dl project<>\"' - youtube-dlc test track 1 \"'/\\\u00e4\u21ad",
'uploader_id': 'ytdl'
}
},
'info_dict': {
'id': '11885608',
'ext': 'm4a',
- 'title': "youtube-dl project - youtube-dl test track 2 \"'/\\\u00e4\u21ad",
+ 'title': "youtube-dlc project - youtube-dlc test track 2 \"'/\\\u00e4\u21ad",
'uploader_id': 'ytdl'
}
},
'info_dict': {
'id': '11885679',
'ext': 'm4a',
- 'title': "youtube-dl project as well - youtube-dl test track 3 \"'/\\\u00e4\u21ad",
+ 'title': "youtube-dlc project as well - youtube-dlc test track 3 \"'/\\\u00e4\u21ad",
'uploader_id': 'ytdl'
}
},
'info_dict': {
'id': '11885680',
'ext': 'm4a',
- 'title': "youtube-dl project as well - youtube-dl test track 4 \"'/\\\u00e4\u21ad",
+ 'title': "youtube-dlc project as well - youtube-dlc test track 4 \"'/\\\u00e4\u21ad",
'uploader_id': 'ytdl'
}
},
'info_dict': {
'id': '11885682',
'ext': 'm4a',
- 'title': "PH - youtube-dl test track 5 \"'/\\\u00e4\u21ad",
+ 'title': "PH - youtube-dlc test track 5 \"'/\\\u00e4\u21ad",
'uploader_id': 'ytdl'
}
},
'info_dict': {
'id': '11885683',
'ext': 'm4a',
- 'title': "PH - youtube-dl test track 6 \"'/\\\u00e4\u21ad",
+ 'title': "PH - youtube-dlc test track 6 \"'/\\\u00e4\u21ad",
'uploader_id': 'ytdl'
}
},
'info_dict': {
'id': '11885684',
'ext': 'm4a',
- 'title': "phihag - youtube-dl test track 7 \"'/\\\u00e4\u21ad",
+ 'title': "phihag - youtube-dlc test track 7 \"'/\\\u00e4\u21ad",
'uploader_id': 'ytdl'
}
},
'info_dict': {
'id': '11885685',
'ext': 'm4a',
- 'title': "phihag - youtube-dl test track 8 \"'/\\\u00e4\u21ad",
+ 'title': "phihag - youtube-dlc test track 8 \"'/\\\u00e4\u21ad",
'uploader_id': 'ytdl'
}
}
import re
from .common import InfoExtractor
-from ..compat import compat_str
from ..utils import (
encode_base_n,
ExtractorError,
webpage, urlh = self._download_webpage_handle(url, display_id)
- video_id = self._match_id(compat_str(urlh.geturl()))
+ video_id = self._match_id(urlh.geturl())
hash = self._search_regex(
r'hash\s*:\s*["\']([\da-f]{32})', webpage, 'hash')
from .airmozilla import AirMozillaIE
from .aljazeera import AlJazeeraIE
from .alphaporno import AlphaPornoIE
+from .alura import (
+ AluraIE,
+ AluraCourseIE
+)
from .amcnetworks import AMCNetworksIE
from .americastestkitchen import AmericasTestKitchenIE
from .animeondemand import AnimeOnDemandIE
BiliBiliBangumiIE,
BilibiliAudioIE,
BilibiliAudioAlbumIE,
+ BiliBiliPlayerIE,
)
from .biobiochiletv import BioBioChileTVIE
from .bitchute import (
)
from .dbtv import DBTVIE
from .dctp import DctpTvIE
-from .deezer import DeezerPlaylistIE
+from .deezer import (
+ DeezerPlaylistIE,
+ DeezerAlbumIE,
+)
from .democracynow import DemocracynowIE
from .dfb import DFBIE
from .dhm import DHMIE
DouyuTVIE,
)
from .dplay import DPlayIE
-from .dreisat import DreiSatIE
from .drbonanza import DRBonanzaIE
from .drtuber import DrTuberIE
from .drtv import (
)
from .dtube import DTubeIE
from .dvtv import DVTVIE
+from .duboku import (
+ DubokuIE,
+ DubokuPlaylistIE
+)
from .dumpert import DumpertIE
from .defense import DefenseGouvFrIE
from .discovery import DiscoveryIE
from .discoveryvr import DiscoveryVRIE
from .disney import DisneyIE
from .dispeak import DigitallySpeakingIE
+from .doodstream import DoodStreamIE
from .dropbox import DropboxIE
from .dw import (
DWIE,
)
from .howcast import HowcastIE
from .howstuffworks import HowStuffWorksIE
+from .hrfensehen import HRFernsehenIE
from .hrti import (
HRTiIE,
HRTiPlaylistIE,
from .jove import JoveIE
from .joj import JojIE
from .jwplatform import JWPlatformIE
-from .jpopsukitv import JpopsukiIE
from .kakao import KakaoIE
from .kaltura import KalturaIE
from .kanalplay import KanalPlayIE
LyndaCourseIE
)
from .m6 import M6IE
+from .magentamusik360 import MagentaMusik360IE
from .mailru import (
MailRuIE,
MailRuMusicIE,
from .mlb import MLBIE
from .mnet import MnetIE
from .moevideo import MoeVideoIE
-from .mofosex import MofosexIE
+from .mofosex import (
+ MofosexIE,
+ MofosexEmbedIE,
+)
from .mojvideo import MojvideoIE
from .morningstar import MorningstarIE
from .motherless import (
MyviIE,
MyviEmbedIE,
)
+from .myvideoge import MyVideoGeIE
from .myvidster import MyVidsterIE
from .nationalgeographic import (
NationalGeographicVideoIE,
ORFFM4IE,
ORFFM4StoryIE,
ORFOE1IE,
+ ORFOE3IE,
+ ORFNOEIE,
+ ORFWIEIE,
+ ORFBGLIE,
+ ORFOOEIE,
+ ORFSTMIE,
+ ORFKTNIE,
+ ORFSBGIE,
+ ORFTIRIE,
+ ORFVBGIE,
ORFIPTVIE,
)
from .outsidetv import OutsideTVIE
PacktPubIE,
PacktPubCourseIE,
)
-from .pandatv import PandaTVIE
from .pandoratv import PandoraTVIE
from .parliamentliveuk import ParliamentLiveUKIE
from .patreon import PatreonIE
PluralsightCourseIE,
)
from .podomatic import PodomaticIE
-from .pokemon import PokemonIE
+from .pokemon import (
+ PokemonIE,
+ PokemonWatchIE,
+)
from .polskieradio import (
PolskieRadioIE,
PolskieRadioCategoryIE,
)
+from .popcorntimes import PopcorntimesIE
from .popcorntv import PopcornTVIE
from .porn91 import Porn91IE
from .porncom import PornComIE
from .sbs import SBSIE
from .screencast import ScreencastIE
from .screencastomatic import ScreencastOMaticIE
-from .scrippsnetworks import ScrippsNetworksWatchIE
+from .scrippsnetworks import (
+ ScrippsNetworksWatchIE,
+ ScrippsNetworksIE,
+)
from .scte import (
SCTEIE,
SCTECourseIE,
BellatorIE,
ParamountNetworkIE,
)
+from .storyfire import (
+ StoryFireIE,
+ StoryFireUserIE,
+ StoryFireSeriesIE,
+)
from .stitcher import StitcherIE
from .sport5 import Sport5IE
from .sportbox import SportBoxIE
from .tvnoe import TVNoeIE
from .tvnow import (
TVNowIE,
+ TVNowFilmIE,
TVNowNewIE,
TVNowSeasonIE,
TVNowAnnualIE,
from .twentythreevideo import TwentyThreeVideoIE
from .twitcasting import TwitCastingIE
from .twitch import (
- TwitchVideoIE,
- TwitchChapterIE,
TwitchVodIE,
- TwitchProfileIE,
- TwitchAllVideosIE,
- TwitchUploadsIE,
- TwitchPastBroadcastsIE,
- TwitchHighlightsIE,
+ TwitchCollectionIE,
+ TwitchVideosIE,
+ TwitchVideosClipsIE,
+ TwitchVideosCollectionsIE,
TwitchStreamIE,
TwitchClipsIE,
)
return info_dict
if '/posts/' in url:
- entries = [
- self.url_result('facebook:%s' % vid, FacebookIE.ie_key())
- for vid in self._parse_json(
- self._search_regex(
- r'(["\'])video_ids\1\s*:\s*(?P<ids>\[.+?\])',
- webpage, 'video ids', group='ids'),
- video_id)]
-
- return self.playlist_result(entries, video_id)
+ video_id_json = self._search_regex(
+ r'(["\'])video_ids\1\s*:\s*(?P<ids>\[.+?\])', webpage, 'video ids', group='ids',
+ default='')
+ if video_id_json:
+ entries = [
+ self.url_result('facebook:%s' % vid, FacebookIE.ie_key())
+ for vid in self._parse_json(video_id_json, video_id)]
+ return self.playlist_result(entries, video_id)
+
+ # Single Video?
+ video_id = self._search_regex(r'video_id:\s*"([0-9]+)"', webpage, 'single video id')
+ return self.url_result('facebook:%s' % video_id, FacebookIE.ie_key())
else:
_, info_dict = self._extract_from_url(
self._VIDEO_PAGE_TEMPLATE % video_id,
webpage = self._download_webpage(url, display_id)
video_data = extract_attributes(self._search_regex(
- r'(?s)<div[^>]+class="[^"]*?(?:title-zone-diffusion|heading-zone-(?:wrapper|player-button))[^"]*?"[^>]*>.*?(<button[^>]+data-asset-source="[^"]+"[^>]+>)',
+ r'''(?sx)
+ (?:
+ </h1>|
+ <div[^>]+class="[^"]*?(?:title-zone-diffusion|heading-zone-(?:wrapper|player-button))[^"]*?"[^>]*>
+ ).*?
+ (<button[^>]+data-asset-source="[^"]+"[^>]+>)
+ ''',
webpage, 'video data'))
video_url = video_data['data-asset-source']
_VALID_URL = r'https?://(?:www|mobile|france3-regions)\.francetvinfo\.fr/(?:[^/]+/)*(?P<id>[^/?#&.]+)'
_TESTS = [{
- 'url': 'http://www.francetvinfo.fr/replay-jt/france-3/soir-3/jt-grand-soir-3-lundi-26-aout-2013_393427.html',
+ 'url': 'https://www.francetvinfo.fr/replay-jt/france-3/soir-3/jt-grand-soir-3-jeudi-22-aout-2019_3561461.html',
'info_dict': {
- 'id': '84981923',
+ 'id': 'd12458ee-5062-48fe-bfdd-a30d6a01b793',
'ext': 'mp4',
'title': 'Soir 3',
- 'upload_date': '20130826',
- 'timestamp': 1377548400,
+ 'upload_date': '20190822',
+ 'timestamp': 1566510900,
+ 'description': 'md5:72d167097237701d6e8452ff03b83c00',
'subtitles': {
'fr': 'mincount:2',
},
video_id = self._search_regex(
(r'player\.load[^;]+src:\s*["\']([^"\']+)',
r'id-video=([^@]+@[^"]+)',
- r'<a[^>]+href="(?:https?:)?//videos\.francetv\.fr/video/([^@]+@[^"]+)"'),
+ r'<a[^>]+href="(?:https?:)?//videos\.francetv\.fr/video/([^@]+@[^"]+)"',
+ r'data-id="([^"]+)"'),
webpage, 'video id')
return self._make_url_result(video_id)
from .drtuber import DrTuberIE
from .redtube import RedTubeIE
from .tube8 import Tube8IE
+from .mofosex import MofosexEmbedIE
+from .spankwire import SpankwireIE
+from .youporn import YouPornIE
from .vimeo import VimeoIE
from .dailymotion import DailymotionIE
from .dailymail import DailyMailIE
},
'add_ie': ['Kaltura'],
},
+ {
+ # multiple kaltura embeds, nsfw
+ 'url': 'https://www.quartier-rouge.be/prive/femmes/kamila-avec-video-jaime-sadomie.html',
+ 'info_dict': {
+ 'id': 'kamila-avec-video-jaime-sadomie',
+ 'title': "Kamila avec vídeo “J'aime sadomie”",
+ },
+ 'playlist_count': 8,
+ },
{
# Non-standard Vimeo embed
'url': 'https://openclassrooms.com/courses/understanding-the-web',
},
{
# vshare embed
- 'url': 'https://youtube-dl-demo.neocities.org/vshare.html',
+ 'url': 'https://youtube-dlc-demo.neocities.org/vshare.html',
'md5': '17b39f55b5497ae8b59f5fbce8e35886',
'info_dict': {
'id': '0f64ce6',
'ext': 'mp4',
'title': 'Smoky Barbecue Favorites',
'thumbnail': r're:^https?://.*\.jpe?g',
+ 'description': 'md5:5ff01e76316bd8d46508af26dc86023b',
+ 'upload_date': '20170909',
+ 'timestamp': 1504915200,
},
'add_ie': [ZypeIE.ie_key()],
'params': {
if default_search == 'auto_warning':
if re.match(r'^(?:url|URL)$', url):
raise ExtractorError(
- 'Invalid URL: %r . Call youtube-dl like this: youtube-dl -v "https://www.youtube.com/watch?v=BaW_jenozKc" ' % url,
+ 'Invalid URL: %r . Call youtube-dlc like this: youtube-dlc -v "https://www.youtube.com/watch?v=BaW_jenozKc" ' % url,
expected=True)
else:
self._downloader.report_warning(
if default_search in ('error', 'fixup_error'):
raise ExtractorError(
'%r is not a valid URL. '
- 'Set --default-search "ytsearch" (or run youtube-dl "ytsearch:%s" ) to search YouTube'
+ 'Set --default-search "ytsearch" (or run youtube-dlc "ytsearch:%s" ) to search YouTube'
% (url, url), expected=True)
else:
if ':' not in default_search:
if head_response is not False:
# Check for redirect
- new_url = compat_str(head_response.geturl())
+ new_url = head_response.geturl()
if url != new_url:
self.report_following_redirect(new_url)
if force_videoid:
request = sanitized_Request(url)
# Some webservers may serve compressed content of rather big size (e.g. gzipped flac)
# making it impossible to download only chunk of the file (yet we need only 512kB to
- # test whether it's HTML or not). According to youtube-dl default Accept-Encoding
+ # test whether it's HTML or not). According to youtube-dlc default Accept-Encoding
# that will always result in downloading the whole file that is not desirable.
# Therefore for extraction pass we have to override Accept-Encoding to any in order
# to accept raw bytes and being able to download only a chunk.
return self.playlist_result(
self._parse_xspf(
doc, video_id, xspf_url=url,
- xspf_base_url=compat_str(full_response.geturl())),
+ xspf_base_url=full_response.geturl()),
video_id)
elif re.match(r'(?i)^(?:{[^}]+})?MPD$', doc.tag):
info_dict['formats'] = self._parse_mpd_formats(
doc,
- mpd_base_url=compat_str(full_response.geturl()).rpartition('/')[0],
+ mpd_base_url=full_response.geturl().rpartition('/')[0],
mpd_url=url)
self._sort_formats(info_dict['formats'])
return info_dict
return self.playlist_from_matches(
dailymail_urls, video_id, video_title, ie=DailyMailIE.ie_key())
+ # Look for Teachable embeds, must be before Wistia
+ teachable_url = TeachableIE._extract_url(webpage, url)
+ if teachable_url:
+ return self.url_result(teachable_url)
+
# Look for embedded Wistia player
- wistia_url = WistiaIE._extract_url(webpage)
- if wistia_url:
- return {
- '_type': 'url_transparent',
- 'url': self._proto_relative_url(wistia_url),
- 'ie_key': WistiaIE.ie_key(),
- 'uploader': video_uploader,
- }
+ wistia_urls = WistiaIE._extract_urls(webpage)
+ if wistia_urls:
+ playlist = self.playlist_from_matches(wistia_urls, video_id, video_title, ie=WistiaIE.ie_key())
+ for entry in playlist['entries']:
+ entry.update({
+ '_type': 'url_transparent',
+ 'uploader': video_uploader,
+ })
+ return playlist
# Look for SVT player
svt_url = SVTIE._extract_url(webpage)
if tube8_urls:
return self.playlist_from_matches(tube8_urls, video_id, video_title, ie=Tube8IE.ie_key())
+ # Look for embedded Mofosex player
+ mofosex_urls = MofosexEmbedIE._extract_urls(webpage)
+ if mofosex_urls:
+ return self.playlist_from_matches(mofosex_urls, video_id, video_title, ie=MofosexEmbedIE.ie_key())
+
+ # Look for embedded Spankwire player
+ spankwire_urls = SpankwireIE._extract_urls(webpage)
+ if spankwire_urls:
+ return self.playlist_from_matches(spankwire_urls, video_id, video_title, ie=SpankwireIE.ie_key())
+
+ # Look for embedded YouPorn player
+ youporn_urls = YouPornIE._extract_urls(webpage)
+ if youporn_urls:
+ return self.playlist_from_matches(youporn_urls, video_id, video_title, ie=YouPornIE.ie_key())
+
# Look for embedded Tvigle player
mobj = re.search(
r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//cloud\.tvigle\.ru/video/.+?)\1', webpage)
return self.url_result(mobj.group('url'), 'Zapiks')
# Look for Kaltura embeds
- kaltura_url = KalturaIE._extract_url(webpage)
- if kaltura_url:
- return self.url_result(smuggle_url(kaltura_url, {'source_url': url}), KalturaIE.ie_key())
+ kaltura_urls = KalturaIE._extract_urls(webpage)
+ if kaltura_urls:
+ return self.playlist_from_matches(
+ kaltura_urls, video_id, video_title,
+ getter=lambda x: smuggle_url(x, {'source_url': url}),
+ ie=KalturaIE.ie_key())
# Look for EaglePlatform embeds
eagleplatform_url = EaglePlatformIE._extract_url(webpage)
# Look for VODPlatform embeds
mobj = re.search(
- r'<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//(?:www\.)?vod-platform\.net/[eE]mbed/.+?)\1',
+ r'<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//(?:(?:www\.)?vod-platform\.net|embed\.kwikmotion\.com)/[eE]mbed/.+?)\1',
webpage)
if mobj is not None:
return self.url_result(
return self.playlist_from_matches(
peertube_urls, video_id, video_title, ie=PeerTubeIE.ie_key())
- teachable_url = TeachableIE._extract_url(webpage, url)
- if teachable_url:
- return self.url_result(teachable_url)
-
indavideo_urls = IndavideoEmbedIE._extract_urls(webpage)
if indavideo_urls:
return self.playlist_from_matches(
if not found:
# twitter:player is a https URL to iframe player that may or may not
- # be supported by youtube-dl thus this is checked the very last (see
+ # be supported by youtube-dlc thus this is checked the very last (see
# https://dev.twitter.com/cards/types/player#On_twitter.com_via_desktop_browser)
embed_url = self._html_search_meta('twitter:player', webpage, default=None)
if embed_url and embed_url != url:
class GiantBombIE(InfoExtractor):
- _VALID_URL = r'https?://(?:www\.)?giantbomb\.com/videos/(?P<display_id>[^/]+)/(?P<id>\d+-\d+)'
- _TEST = {
+ _VALID_URL = r'https?://(?:www\.)?giantbomb\.com/(?:videos|shows)/(?P<display_id>[^/]+)/(?P<id>\d+-\d+)'
+ _TESTS = [{
'url': 'http://www.giantbomb.com/videos/quick-look-destiny-the-dark-below/2300-9782/',
- 'md5': 'c8ea694254a59246a42831155dec57ac',
+ 'md5': '132f5a803e7e0ab0e274d84bda1e77ae',
'info_dict': {
'id': '2300-9782',
'display_id': 'quick-look-destiny-the-dark-below',
'duration': 2399,
'thumbnail': r're:^https?://.*\.jpg$',
}
- }
+ }, {
+ 'url': 'https://www.giantbomb.com/shows/ben-stranding/2970-20212',
+ 'only_matching': True,
+ }]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
subtitles_id = ttsurl.encode('utf-8').decode(
'unicode_escape').split('=')[-1]
+ self._downloader.cookiejar.clear(domain='.google.com', path='/', name='NID')
+
return {
'id': video_id,
'title': title,
--- /dev/null
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import (
+ int_or_none,
+ merge_dicts,
+ remove_end,
+ unified_timestamp,
+)
+
+
+class HellPornoIE(InfoExtractor):
+ _VALID_URL = r'https?://(?:www\.)?hellporno\.(?:com/videos|net/v)/(?P<id>[^/]+)'
+ _TESTS = [{
+ 'url': 'http://hellporno.com/videos/dixie-is-posing-with-naked-ass-very-erotic/',
+ 'md5': 'f0a46ebc0bed0c72ae8fe4629f7de5f3',
+ 'info_dict': {
+ 'id': '149116',
+ 'display_id': 'dixie-is-posing-with-naked-ass-very-erotic',
+ 'ext': 'mp4',
+ 'title': 'Dixie is posing with naked ass very erotic',
+ 'description': 'md5:9a72922749354edb1c4b6e540ad3d215',
+ 'categories': list,
+ 'thumbnail': r're:https?://.*\.jpg$',
+ 'duration': 240,
+ 'timestamp': 1398762720,
+ 'upload_date': '20140429',
+ 'view_count': int,
+ 'age_limit': 18,
+ },
+ }, {
+ 'url': 'http://hellporno.net/v/186271/',
+ 'only_matching': True,
+ }]
+
+ def _real_extract(self, url):
+ display_id = self._match_id(url)
+
+ webpage = self._download_webpage(url, display_id)
+
+ title = remove_end(self._html_search_regex(
+ r'<title>([^<]+)</title>', webpage, 'title'), ' - Hell Porno')
+
+ info = self._parse_html5_media_entries(url, webpage, display_id)[0]
+ self._sort_formats(info['formats'])
+
+ video_id = self._search_regex(
+ (r'chs_object\s*=\s*["\'](\d+)',
+ r'params\[["\']video_id["\']\]\s*=\s*(\d+)'), webpage, 'video id',
+ default=display_id)
+ description = self._search_regex(
+ r'class=["\']desc_video_view_v2[^>]+>([^<]+)', webpage,
+ 'description', fatal=False)
+ categories = [
+ c.strip()
+ for c in self._html_search_meta(
+ 'keywords', webpage, 'categories', default='').split(',')
+ if c.strip()]
+ duration = int_or_none(self._og_search_property(
+ 'video:duration', webpage, fatal=False))
+ timestamp = unified_timestamp(self._og_search_property(
+ 'video:release_date', webpage, fatal=False))
+ view_count = int_or_none(self._search_regex(
+ r'>Views\s+(\d+)', webpage, 'view count', fatal=False))
+
+ return merge_dicts(info, {
+ 'id': video_id,
+ 'display_id': display_id,
+ 'title': title,
+ 'description': description,
+ 'categories': categories,
+ 'duration': duration,
+ 'timestamp': timestamp,
+ 'view_count': view_count,
+ 'age_limit': 18,
+ })
import re
import time
import uuid
+import json
from .common import InfoExtractor
from ..compat import (
exp = st + 6000
auth = 'st=%d~exp=%d~acl=/*' % (st, exp)
auth += '~hmac=' + hmac.new(self._AKAMAI_ENCRYPTION_KEY, auth.encode(), hashlib.sha256).hexdigest()
+ token = self._download_json(
+ 'https://api.hotstar.com/in/aadhar/v2/web/in/user/guest-signup',
+ video_id, note='Downloading token',
+ data=json.dumps({"idType": "device", "id": compat_str(uuid.uuid4())}).encode('utf-8'),
+ headers={
+ 'hotstarauth': auth,
+ 'Content-Type': 'application/json',
+ })['description']['userIdentity']
response = self._download_json(
'https://api.hotstar.com/' + path, video_id, headers={
'hotstarauth': auth,
- 'x-country-code': 'IN',
- 'x-platform-code': 'JIO',
+ 'x-hs-appversion': '6.72.2',
+ 'x-hs-platform': 'web',
+ 'x-hs-usertoken': token,
}, query=query)
- if response['statusCode'] != 'OK':
+ if response['message'] != "Playback URL's fetched successfully":
raise ExtractorError(
- response['body']['message'], expected=True)
- return response['body']['results']
+ response['message'], expected=True)
+ return response['data']
def _call_api(self, path, video_id, query_name='contentId'):
return self._call_api_impl(path, video_id, {
def _call_api_v2(self, path, video_id):
return self._call_api_impl(
- '%s/in/contents/%s' % (path, video_id), video_id, {
- 'desiredConfig': 'encryption:plain;ladder:phone,tv;package:hls,dash',
- 'client': 'mweb',
- 'clientVersion': '6.18.0',
- 'deviceId': compat_str(uuid.uuid4()),
- 'osName': 'Windows',
- 'osVersion': '10',
+ '%s/content/%s' % (path, video_id), video_id, {
+ 'desired-config': 'encryption:plain;ladder:phone,tv;package:hls,dash',
+ 'device-id': compat_str(uuid.uuid4()),
+ 'os-name': 'Windows',
+ 'os-version': '10',
})
headers = {'Referer': url}
formats = []
geo_restricted = False
- playback_sets = self._call_api_v2('h/v2/play', video_id)['playBackSets']
+ playback_sets = self._call_api_v2('play/v1/playback', video_id)['playBackSets']
for playback_set in playback_sets:
if not isinstance(playback_set, dict):
continue
--- /dev/null
+# coding: utf-8
+from __future__ import unicode_literals
+
+import json
+import re
+
+from youtube_dlc.utils import int_or_none, unified_timestamp, unescapeHTML
+from .common import InfoExtractor
+
+
+class HRFernsehenIE(InfoExtractor):
+ IE_NAME = 'hrfernsehen'
+ _VALID_URL = r'^https?://www\.(?:hr-fernsehen|hessenschau)\.de/.*,video-(?P<id>[0-9]{6})\.html'
+
+ _TESTS = [{
+ 'url': 'https://www.hessenschau.de/tv-sendung/hessenschau-vom-26082020,video-130546.html',
+ 'md5': '5c4e0ba94677c516a2f65a84110fc536',
+ 'info_dict': {
+ 'id': '130546',
+ 'ext': 'mp4',
+ 'description': 'Sturmtief Kirsten fegt über Hessen / Die Corona-Pandemie – eine Chronologie / '
+ 'Sterbehilfe: Die Lage in Hessen / Miss Hessen leitet zwei eigene Unternehmen / '
+ 'Pop-Up Museum zeigt Schwarze Unterhaltung und Black Music',
+ 'subtitles': {'de': [{
+ 'url': 'https://hr-a.akamaihd.net/video/as/hessenschau/2020_08/hrLogo_200826200407_L385592_512x288-25p-500kbit.vtt'
+ }]},
+ 'timestamp': 1598470200,
+ 'upload_date': '20200826',
+ 'thumbnails': [{
+ 'url': 'https://www.hessenschau.de/tv-sendung/hs_ganz-1554~_t-1598465545029_v-16to9.jpg',
+ 'id': '0'
+ }, {
+ 'url': 'https://www.hessenschau.de/tv-sendung/hs_ganz-1554~_t-1598465545029_v-16to9__medium.jpg',
+ 'id': '1'
+ }],
+ 'title': 'hessenschau vom 26.08.2020'
+ }
+ }, {
+ 'url': 'https://www.hr-fernsehen.de/sendungen-a-z/mex/sendungen/fair-und-gut---was-hinter-aldis-eigenem-guetesiegel-steckt,video-130544.html',
+ 'only_matching': True
+ }]
+
+ _GEO_COUNTRIES = ['DE']
+
+ def extract_airdate(self, loader_data):
+ airdate_str = loader_data.get('mediaMetadata', {}).get('agf', {}).get('airdate')
+
+ if airdate_str is None:
+ return None
+
+ return unified_timestamp(airdate_str)
+
+ def extract_formats(self, loader_data):
+ stream_formats = []
+ for stream_obj in loader_data["videoResolutionLevels"]:
+ stream_format = {
+ 'format_id': str(stream_obj['verticalResolution']) + "p",
+ 'height': stream_obj['verticalResolution'],
+ 'url': stream_obj['url'],
+ }
+
+ quality_information = re.search(r'([0-9]{3,4})x([0-9]{3,4})-([0-9]{2})p-([0-9]{3,4})kbit',
+ stream_obj['url'])
+ if quality_information:
+ stream_format['width'] = int_or_none(quality_information.group(1))
+ stream_format['height'] = int_or_none(quality_information.group(2))
+ stream_format['fps'] = int_or_none(quality_information.group(3))
+ stream_format['tbr'] = int_or_none(quality_information.group(4))
+
+ stream_formats.append(stream_format)
+
+ self._sort_formats(stream_formats)
+ return stream_formats
+
+ def _real_extract(self, url):
+ video_id = self._match_id(url)
+ webpage = self._download_webpage(url, video_id)
+
+ title = self._html_search_meta(
+ ['og:title', 'twitter:title', 'name'], webpage)
+ description = self._html_search_meta(
+ ['description'], webpage)
+
+ loader_str = unescapeHTML(self._search_regex(r"data-hr-mediaplayer-loader='([^']*)'", webpage, "ardloader"))
+ loader_data = json.loads(loader_str)
+
+ info = {
+ 'id': video_id,
+ 'title': title,
+ 'description': description,
+ 'formats': self.extract_formats(loader_data),
+ 'timestamp': self.extract_airdate(loader_data)
+ }
+
+ if "subtitle" in loader_data:
+ info["subtitles"] = {"de": [{"url": loader_data["subtitle"]}]}
+
+ thumbnails = list(set([t for t in loader_data.get("previewImageUrl", {}).values()]))
+ if len(thumbnails) > 0:
+ info["thumbnails"] = [{"url": t} for t in thumbnails]
+
+ return info
from __future__ import unicode_literals
+import base64
+import json
import re
from .common import InfoExtractor
mimetype2ext,
parse_duration,
qualities,
+ try_get,
url_or_none,
)
class ImdbIE(InfoExtractor):
IE_NAME = 'imdb'
IE_DESC = 'Internet Movie Database trailers'
- _VALID_URL = r'https?://(?:www|m)\.imdb\.com/(?:video|title|list).+?[/-]vi(?P<id>\d+)'
+ _VALID_URL = r'https?://(?:www|m)\.imdb\.com/(?:video|title|list).*?[/-]vi(?P<id>\d+)'
_TESTS = [{
'url': 'http://www.imdb.com/video/imdb/vi2524815897',
'info_dict': {
'id': '2524815897',
'ext': 'mp4',
- 'title': 'No. 2 from Ice Age: Continental Drift (2012)',
+ 'title': 'No. 2',
'description': 'md5:87bd0bdc61e351f21f20d2d7441cb4e7',
+ 'duration': 152,
}
}, {
'url': 'http://www.imdb.com/video/_/vi2524815897',
def _real_extract(self, url):
video_id = self._match_id(url)
- webpage = self._download_webpage(
- 'https://www.imdb.com/videoplayer/vi' + video_id, video_id)
- video_metadata = self._parse_json(self._search_regex(
- r'window\.IMDbReactInitialState\.push\(({.+?})\);', webpage,
- 'video metadata'), video_id)['videos']['videoMetadata']['vi' + video_id]
- title = self._html_search_meta(
- ['og:title', 'twitter:title'], webpage) or self._html_search_regex(
- r'<title>(.+?)</title>', webpage, 'title', fatal=False) or video_metadata['title']
+
+ data = self._download_json(
+ 'https://www.imdb.com/ve/data/VIDEO_PLAYBACK_DATA', video_id,
+ query={
+ 'key': base64.b64encode(json.dumps({
+ 'type': 'VIDEO_PLAYER',
+ 'subType': 'FORCE_LEGACY',
+ 'id': 'vi%s' % video_id,
+ }).encode()).decode(),
+ })[0]
quality = qualities(('SD', '480p', '720p', '1080p'))
formats = []
- for encoding in video_metadata.get('encodings', []):
+ for encoding in data['videoLegacyEncodings']:
if not encoding or not isinstance(encoding, dict):
continue
- video_url = url_or_none(encoding.get('videoUrl'))
+ video_url = url_or_none(encoding.get('url'))
if not video_url:
continue
ext = mimetype2ext(encoding.get(
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
video_url, video_id, 'mp4', entry_protocol='m3u8_native',
- m3u8_id='hls', fatal=False))
+ preference=1, m3u8_id='hls', fatal=False))
continue
format_id = encoding.get('definition')
formats.append({
})
self._sort_formats(formats)
+ webpage = self._download_webpage(
+ 'https://www.imdb.com/video/vi' + video_id, video_id)
+ video_metadata = self._parse_json(self._search_regex(
+ r'args\.push\(\s*({.+?})\s*\)\s*;', webpage,
+ 'video metadata'), video_id)
+
+ video_info = video_metadata.get('VIDEO_INFO')
+ if video_info and isinstance(video_info, dict):
+ info = try_get(
+ video_info, lambda x: x[list(video_info.keys())[0]][0], dict)
+ else:
+ info = {}
+
+ title = self._html_search_meta(
+ ['og:title', 'twitter:title'], webpage) or self._html_search_regex(
+ r'<title>(.+?)</title>', webpage, 'title',
+ default=None) or info['videoTitle']
+
return {
'id': video_id,
'title': title,
+ 'alt_title': info.get('videoSubTitle'),
'formats': formats,
- 'description': video_metadata.get('description'),
- 'thumbnail': video_metadata.get('slate', {}).get('url'),
- 'duration': parse_duration(video_metadata.get('duration')),
+ 'description': info.get('videoDescription'),
+ 'thumbnail': url_or_none(try_get(
+ video_metadata, lambda x: x['videoSlate']['source'])),
+ 'duration': parse_duration(info.get('videoRuntime')),
}
'width': width,
'height': height,
'http_headers': {
- 'User-Agent': 'youtube-dl (like wget)',
+ 'User-Agent': 'youtube-dlc (like wget)',
},
})
'url': self._proto_relative_url(gifd['gifUrl']),
'filesize': gifd.get('size'),
'http_headers': {
- 'User-Agent': 'youtube-dl (like wget)',
+ 'User-Agent': 'youtube-dlc (like wget)',
},
})
video_id = self._match_id(url)
video = self._download_json(
- 'http://amfphp.indavideo.hu/SYm0json.php/player.playerHandler.getVideoData/%s' % video_id,
+ 'https://amfphp.indavideo.hu/SYm0json.php/player.playerHandler.getVideoData/%s' % video_id,
video_id)['data']
title = video['title']
_GEO_BYPASS = False
_TESTS = [{
- 'url': 'http://play.iprima.cz/gondici-s-r-o-33',
+ 'url': 'https://prima.iprima.cz/particka/92-epizoda',
'info_dict': {
- 'id': 'p136534',
+ 'id': 'p51388',
'ext': 'mp4',
- 'title': 'Gondíci s. r. o. (34)',
- 'description': 'md5:16577c629d006aa91f59ca8d8e7f99bd',
+ 'title': 'Partička (92)',
+ 'description': 'md5:859d53beae4609e6dd7796413f1b6cac',
+ },
+ 'params': {
+ 'skip_download': True, # m3u8 download
+ },
+ }, {
+ 'url': 'https://cnn.iprima.cz/videa/70-epizoda',
+ 'info_dict': {
+ 'id': 'p681554',
+ 'ext': 'mp4',
+ 'title': 'HLAVNÍ ZPRÁVY 3.5.2020',
},
'params': {
'skip_download': True, # m3u8 download
webpage = self._download_webpage(url, video_id)
+ title = self._og_search_title(
+ webpage, default=None) or self._search_regex(
+ r'<h1>([^<]+)', webpage, 'title')
+
video_id = self._search_regex(
(r'<iframe[^>]+\bsrc=["\'](?:https?:)?//(?:api\.play-backend\.iprima\.cz/prehravac/embedded|prima\.iprima\.cz/[^/]+/[^/]+)\?.*?\bid=(p\d+)',
- r'data-product="([^"]+)">'),
+ r'data-product="([^"]+)">',
+ r'id=["\']player-(p\d+)"',
+ r'playerId\s*:\s*["\']player-(p\d+)'),
webpage, 'real id')
playerpage = self._download_webpage(
return {
'id': video_id,
- 'title': self._og_search_title(webpage),
- 'thumbnail': self._og_search_thumbnail(webpage),
+ 'title': title,
+ 'thumbnail': self._og_search_thumbnail(webpage, default=None),
'formats': formats,
- 'description': self._og_search_description(webpage),
+ 'description': self._og_search_description(webpage, default=None),
}
continue
elif bundled:
raise ExtractorError(
- 'This feature does not work from bundled exe. Run youtube-dl from sources.',
+ 'This feature does not work from bundled exe. Run youtube-dlc from sources.',
expected=True)
elif not pycryptodomex_found:
raise ExtractorError(
self.url_result(
'http://www.ivi.ru/watch/%s/%s' % (compilation_id, serie), IviIE.ie_key())
for serie in re.findall(
- r'<a href="/watch/%s/(\d+)"[^>]+data-id="\1"' % compilation_id, html)]
+ r'<a\b[^>]+\bhref=["\']/watch/%s/(\d+)["\']' % compilation_id, html)]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
-# coding: utf-8\r
-from __future__ import unicode_literals\r
-\r
-import re\r
-\r
-from .common import InfoExtractor\r
-from ..compat import compat_str\r
-from ..utils import (\r
- int_or_none,\r
- js_to_json,\r
- try_get,\r
-)\r
-\r
-\r
-class JojIE(InfoExtractor):\r
- _VALID_URL = r'''(?x)\r
- (?:\r
- joj:|\r
- https?://media\.joj\.sk/embed/\r
- )\r
- (?P<id>[^/?#^]+)\r
- '''\r
- _TESTS = [{\r
- 'url': 'https://media.joj.sk/embed/a388ec4c-6019-4a4a-9312-b1bee194e932',\r
- 'info_dict': {\r
- 'id': 'a388ec4c-6019-4a4a-9312-b1bee194e932',\r
- 'ext': 'mp4',\r
- 'title': 'NOVÉ BÝVANIE',\r
- 'thumbnail': r're:^https?://.*\.jpg$',\r
- 'duration': 3118,\r
- }\r
- }, {\r
- 'url': 'https://media.joj.sk/embed/9i1cxv',\r
- 'only_matching': True,\r
- }, {\r
- 'url': 'joj:a388ec4c-6019-4a4a-9312-b1bee194e932',\r
- 'only_matching': True,\r
- }, {\r
- 'url': 'joj:9i1cxv',\r
- 'only_matching': True,\r
- }]\r
-\r
- @staticmethod\r
- def _extract_urls(webpage):\r
- return [\r
- mobj.group('url')\r
- for mobj in re.finditer(\r
- r'<iframe\b[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//media\.joj\.sk/embed/(?:(?!\1).)+)\1',\r
- webpage)]\r
-\r
- def _real_extract(self, url):\r
- video_id = self._match_id(url)\r
-\r
- webpage = self._download_webpage(\r
- 'https://media.joj.sk/embed/%s' % video_id, video_id)\r
-\r
- title = self._search_regex(\r
- (r'videoTitle\s*:\s*(["\'])(?P<title>(?:(?!\1).)+)\1',\r
- r'<title>(?P<title>[^<]+)'), webpage, 'title',\r
- default=None, group='title') or self._og_search_title(webpage)\r
-\r
- bitrates = self._parse_json(\r
- self._search_regex(\r
- r'(?s)(?:src|bitrates)\s*=\s*({.+?});', webpage, 'bitrates',\r
- default='{}'),\r
- video_id, transform_source=js_to_json, fatal=False)\r
-\r
- formats = []\r
- for format_url in try_get(bitrates, lambda x: x['mp4'], list) or []:\r
- if isinstance(format_url, compat_str):\r
- height = self._search_regex(\r
- r'(\d+)[pP]\.', format_url, 'height', default=None)\r
- formats.append({\r
- 'url': format_url,\r
- 'format_id': '%sp' % height if height else None,\r
- 'height': int(height),\r
- })\r
- if not formats:\r
- playlist = self._download_xml(\r
- 'https://media.joj.sk/services/Video.php?clip=%s' % video_id,\r
- video_id)\r
- for file_el in playlist.findall('./files/file'):\r
- path = file_el.get('path')\r
- if not path:\r
- continue\r
- format_id = file_el.get('id') or file_el.get('label')\r
- formats.append({\r
- 'url': 'http://n16.joj.sk/storage/%s' % path.replace(\r
- 'dat/', '', 1),\r
- 'format_id': format_id,\r
- 'height': int_or_none(self._search_regex(\r
- r'(\d+)[pP]', format_id or path, 'height',\r
- default=None)),\r
- })\r
- self._sort_formats(formats)\r
-\r
- thumbnail = self._og_search_thumbnail(webpage)\r
-\r
- duration = int_or_none(self._search_regex(\r
- r'videoDuration\s*:\s*(\d+)', webpage, 'duration', fatal=False))\r
-\r
- return {\r
- 'id': video_id,\r
- 'title': title,\r
- 'thumbnail': thumbnail,\r
- 'duration': duration,\r
- 'formats': formats,\r
- }\r
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..compat import compat_str
+from ..utils import (
+ int_or_none,
+ js_to_json,
+ try_get,
+)
+
+
+class JojIE(InfoExtractor):
+ _VALID_URL = r'''(?x)
+ (?:
+ joj:|
+ https?://media\.joj\.sk/embed/
+ )
+ (?P<id>[^/?#^]+)
+ '''
+ _TESTS = [{
+ 'url': 'https://media.joj.sk/embed/a388ec4c-6019-4a4a-9312-b1bee194e932',
+ 'info_dict': {
+ 'id': 'a388ec4c-6019-4a4a-9312-b1bee194e932',
+ 'ext': 'mp4',
+ 'title': 'NOVÉ BÝVANIE',
+ 'thumbnail': r're:^https?://.*\.jpg$',
+ 'duration': 3118,
+ }
+ }, {
+ 'url': 'https://media.joj.sk/embed/9i1cxv',
+ 'only_matching': True,
+ }, {
+ 'url': 'joj:a388ec4c-6019-4a4a-9312-b1bee194e932',
+ 'only_matching': True,
+ }, {
+ 'url': 'joj:9i1cxv',
+ 'only_matching': True,
+ }]
+
+ @staticmethod
+ def _extract_urls(webpage):
+ return [
+ mobj.group('url')
+ for mobj in re.finditer(
+ r'<iframe\b[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//media\.joj\.sk/embed/(?:(?!\1).)+)\1',
+ webpage)]
+
+ def _real_extract(self, url):
+ video_id = self._match_id(url)
+
+ webpage = self._download_webpage(
+ 'https://media.joj.sk/embed/%s' % video_id, video_id)
+
+ title = self._search_regex(
+ (r'videoTitle\s*:\s*(["\'])(?P<title>(?:(?!\1).)+)\1',
+ r'<title>(?P<title>[^<]+)'), webpage, 'title',
+ default=None, group='title') or self._og_search_title(webpage)
+
+ bitrates = self._parse_json(
+ self._search_regex(
+ r'(?s)(?:src|bitrates)\s*=\s*({.+?});', webpage, 'bitrates',
+ default='{}'),
+ video_id, transform_source=js_to_json, fatal=False)
+
+ formats = []
+ for format_url in try_get(bitrates, lambda x: x['mp4'], list) or []:
+ if isinstance(format_url, compat_str):
+ height = self._search_regex(
+ r'(\d+)[pP]\.', format_url, 'height', default=None)
+ formats.append({
+ 'url': format_url,
+ 'format_id': '%sp' % height if height else None,
+ 'height': int(height),
+ })
+ if not formats:
+ playlist = self._download_xml(
+ 'https://media.joj.sk/services/Video.php?clip=%s' % video_id,
+ video_id)
+ for file_el in playlist.findall('./files/file'):
+ path = file_el.get('path')
+ if not path:
+ continue
+ format_id = file_el.get('id') or file_el.get('label')
+ formats.append({
+ 'url': 'http://n16.joj.sk/storage/%s' % path.replace(
+ 'dat/', '', 1),
+ 'format_id': format_id,
+ 'height': int_or_none(self._search_regex(
+ r'(\d+)[pP]', format_id or path, 'height',
+ default=None)),
+ })
+ self._sort_formats(formats)
+
+ thumbnail = self._og_search_thumbnail(webpage)
+
+ duration = int_or_none(self._search_regex(
+ r'videoDuration\s*:\s*(\d+)', webpage, 'duration', fatal=False))
+
+ return {
+ 'id': video_id,
+ 'title': title,
+ 'thumbnail': thumbnail,
+ 'duration': duration,
+ 'formats': formats,
+ }
import re
from .common import InfoExtractor
+from ..utils import unsmuggle_url
class JWPlatformIE(InfoExtractor):
@staticmethod
def _extract_urls(webpage):
return re.findall(
- r'<(?:script|iframe)[^>]+?src=["\']((?:https?:)?//content\.jwplatform\.com/players/[a-zA-Z0-9]{8})',
+ r'<(?:script|iframe)[^>]+?src=["\']((?:https?:)?//(?:content\.jwplatform|cdn\.jwplayer)\.com/players/[a-zA-Z0-9]{8})',
webpage)
def _real_extract(self, url):
+ url, smuggled_data = unsmuggle_url(url, {})
+ self._initialize_geo_bypass({
+ 'countries': smuggled_data.get('geo_countries'),
+ })
video_id = self._match_id(url)
json_data = self._download_json('https://cdn.jwplayer.com/v2/media/' + video_id, video_id)
return self._parse_jwplayer_data(json_data, video_id)
@staticmethod
def _extract_url(webpage):
+ urls = KalturaIE._extract_urls(webpage)
+ return urls[0] if urls else None
+
+ @staticmethod
+ def _extract_urls(webpage):
# Embed codes: https://knowledge.kaltura.com/embedding-kaltura-media-players-your-site
- mobj = (
- re.search(
+ finditer = (
+ re.finditer(
r"""(?xs)
kWidget\.(?:thumb)?[Ee]mbed\(
\{.*?
(?P<q3>['"])entry_?[Ii]d(?P=q3)\s*:\s*
(?P<q4>['"])(?P<id>(?:(?!(?P=q4)).)+)(?P=q4)(?:,|\s*\})
""", webpage)
- or re.search(
+ or re.finditer(
r'''(?xs)
(?P<q1>["'])
(?:https?:)?//cdnapi(?:sec)?\.kaltura\.com(?::\d+)?/(?:(?!(?P=q1)).)*\b(?:p|partner_id)/(?P<partner_id>\d+)(?:(?!(?P=q1)).)*
)
(?P<q3>["'])(?P<id>(?:(?!(?P=q3)).)+)(?P=q3)
''', webpage)
- or re.search(
+ or re.finditer(
r'''(?xs)
<(?:iframe[^>]+src|meta[^>]+\bcontent)=(?P<q1>["'])
(?:https?:)?//(?:(?:www|cdnapi(?:sec)?)\.)?kaltura\.com/(?:(?!(?P=q1)).)*\b(?:p|partner_id)/(?P<partner_id>\d+)
(?P=q1)
''', webpage)
)
- if mobj:
+ urls = []
+ for mobj in finditer:
embed_info = mobj.groupdict()
for k, v in embed_info.items():
if v:
webpage)
if service_mobj:
url = smuggle_url(url, {'service_url': service_mobj.group('id')})
- return url
+ urls.append(url)
+ return urls
def _kaltura_api_call(self, video_id, actions, service_url=None, *args, **kwargs):
params = actions[0]
import re
from .common import InfoExtractor
-from ..compat import compat_str
from ..utils import (
clean_html,
determine_ext,
self._LOGIN_URL, None, 'Downloading login popup')
def is_logged(url_handle):
- return self._LOGIN_URL not in compat_str(url_handle.geturl())
+ return self._LOGIN_URL not in url_handle.geturl()
# Already logged in
if is_logged(urlh):
--- /dev/null
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+import uuid
+
+from .common import InfoExtractor
+from ..compat import compat_HTTPError
+from ..utils import (
+ ExtractorError,
+ int_or_none,
+ qualities,
+)
+
+
+class LEGOIE(InfoExtractor):
+ _VALID_URL = r'https?://(?:www\.)?lego\.com/(?P<locale>[a-z]{2}-[a-z]{2})/(?:[^/]+/)*videos/(?:[^/]+/)*[^/?#]+-(?P<id>[0-9a-f]{32})'
+ _TESTS = [{
+ 'url': 'http://www.lego.com/en-us/videos/themes/club/blocumentary-kawaguchi-55492d823b1b4d5e985787fa8c2973b1',
+ 'md5': 'f34468f176cfd76488767fc162c405fa',
+ 'info_dict': {
+ 'id': '55492d82-3b1b-4d5e-9857-87fa8c2973b1_en-US',
+ 'ext': 'mp4',
+ 'title': 'Blocumentary Great Creations: Akiyuki Kawaguchi',
+ 'description': 'Blocumentary Great Creations: Akiyuki Kawaguchi',
+ },
+ }, {
+ # geo-restricted but the contentUrl contain a valid url
+ 'url': 'http://www.lego.com/nl-nl/videos/themes/nexoknights/episode-20-kingdom-of-heroes-13bdc2299ab24d9685701a915b3d71e7##sp=399',
+ 'md5': 'c7420221f7ffd03ff056f9db7f8d807c',
+ 'info_dict': {
+ 'id': '13bdc229-9ab2-4d96-8570-1a915b3d71e7_nl-NL',
+ 'ext': 'mp4',
+ 'title': 'Aflevering 20: Helden van het koninkrijk',
+ 'description': 'md5:8ee499aac26d7fa8bcb0cedb7f9c3941',
+ 'age_limit': 5,
+ },
+ }, {
+ # with subtitle
+ 'url': 'https://www.lego.com/nl-nl/kids/videos/classic/creative-storytelling-the-little-puppy-aa24f27c7d5242bc86102ebdc0f24cba',
+ 'info_dict': {
+ 'id': 'aa24f27c-7d52-42bc-8610-2ebdc0f24cba_nl-NL',
+ 'ext': 'mp4',
+ 'title': 'De kleine puppy',
+ 'description': 'md5:5b725471f849348ac73f2e12cfb4be06',
+ 'age_limit': 1,
+ 'subtitles': {
+ 'nl': [{
+ 'ext': 'srt',
+ 'url': r're:^https://.+\.srt$',
+ }],
+ },
+ },
+ 'params': {
+ 'skip_download': True,
+ },
+ }]
+ _QUALITIES = {
+ 'Lowest': (64, 180, 320),
+ 'Low': (64, 270, 480),
+ 'Medium': (96, 360, 640),
+ 'High': (128, 540, 960),
+ 'Highest': (128, 720, 1280),
+ }
+
+ def _real_extract(self, url):
+ locale, video_id = re.match(self._VALID_URL, url).groups()
+ countries = [locale.split('-')[1].upper()]
+ self._initialize_geo_bypass({
+ 'countries': countries,
+ })
+
+ try:
+ item = self._download_json(
+ # https://contentfeed.services.lego.com/api/v2/item/[VIDEO_ID]?culture=[LOCALE]&contentType=Video
+ 'https://services.slingshot.lego.com/mediaplayer/v2',
+ video_id, query={
+ 'videoId': '%s_%s' % (uuid.UUID(video_id), locale),
+ }, headers=self.geo_verification_headers())
+ except ExtractorError as e:
+ if isinstance(e.cause, compat_HTTPError) and e.cause.code == 451:
+ self.raise_geo_restricted(countries=countries)
+ raise
+
+ video = item['Video']
+ video_id = video['Id']
+ title = video['Title']
+
+ q = qualities(['Lowest', 'Low', 'Medium', 'High', 'Highest'])
+ formats = []
+ for video_source in item.get('VideoFormats', []):
+ video_source_url = video_source.get('Url')
+ if not video_source_url:
+ continue
+ video_source_format = video_source.get('Format')
+ if video_source_format == 'F4M':
+ formats.extend(self._extract_f4m_formats(
+ video_source_url, video_id,
+ f4m_id=video_source_format, fatal=False))
+ elif video_source_format == 'M3U8':
+ formats.extend(self._extract_m3u8_formats(
+ video_source_url, video_id, 'mp4', 'm3u8_native',
+ m3u8_id=video_source_format, fatal=False))
+ else:
+ video_source_quality = video_source.get('Quality')
+ format_id = []
+ for v in (video_source_format, video_source_quality):
+ if v:
+ format_id.append(v)
+ f = {
+ 'format_id': '-'.join(format_id),
+ 'quality': q(video_source_quality),
+ 'url': video_source_url,
+ }
+ quality = self._QUALITIES.get(video_source_quality)
+ if quality:
+ f.update({
+ 'abr': quality[0],
+ 'height': quality[1],
+ 'width': quality[2],
+ }),
+ formats.append(f)
+ self._sort_formats(formats)
+
+ subtitles = {}
+ sub_file_id = video.get('SubFileId')
+ if sub_file_id and sub_file_id != '00000000-0000-0000-0000-000000000000':
+ net_storage_path = video.get('NetstoragePath')
+ invariant_id = video.get('InvariantId')
+ video_file_id = video.get('VideoFileId')
+ video_version = video.get('VideoVersion')
+ if net_storage_path and invariant_id and video_file_id and video_version:
+ subtitles.setdefault(locale[:2], []).append({
+ 'url': 'https://lc-mediaplayerns-live-s.legocdn.com/public/%s/%s_%s_%s_%s_sub.srt' % (net_storage_path, invariant_id, video_file_id, locale, video_version),
+ })
+
+ return {
+ 'id': video_id,
+ 'title': title,
+ 'description': video.get('Description'),
+ 'thumbnail': video.get('GeneratedCoverImage') or video.get('GeneratedThumbnail'),
+ 'duration': int_or_none(video.get('Length')),
+ 'formats': formats,
+ 'subtitles': subtitles,
+ 'age_limit': int_or_none(video.get('AgeFrom')),
+ 'season': video.get('SeasonTitle'),
+ 'season_number': int_or_none(video.get('Season')) or None,
+ 'episode_number': int_or_none(video.get('Episode')) or None,
+ }
class LimelightBaseIE(InfoExtractor):
_PLAYLIST_SERVICE_URL = 'http://production-ps.lvp.llnw.net/r/PlaylistService/%s/%s/%s'
- _API_URL = 'http://api.video.limelight.com/rest/organizations/%s/%s/%s/%s.json'
@classmethod
def _extract_urls(cls, webpage, source_url):
try:
return self._download_json(
self._PLAYLIST_SERVICE_URL % (self._PLAYLIST_SERVICE_PATH, item_id, method),
- item_id, 'Downloading PlaylistService %s JSON' % method, fatal=fatal, headers=headers)
+ item_id, 'Downloading PlaylistService %s JSON' % method,
+ fatal=fatal, headers=headers)
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
error = self._parse_json(e.cause.read().decode(), item_id)['detail']['contentAccessPermission']
raise ExtractorError(error, expected=True)
raise
- def _call_api(self, organization_id, item_id, method):
- return self._download_json(
- self._API_URL % (organization_id, self._API_PATH, item_id, method),
- item_id, 'Downloading API %s JSON' % method)
-
- def _extract(self, item_id, pc_method, mobile_method, meta_method, referer=None):
+ def _extract(self, item_id, pc_method, mobile_method, referer=None):
pc = self._call_playlist_service(item_id, pc_method, referer=referer)
- metadata = self._call_api(pc['orgId'], item_id, meta_method)
- mobile = self._call_playlist_service(item_id, mobile_method, fatal=False, referer=referer)
- return pc, mobile, metadata
+ mobile = self._call_playlist_service(
+ item_id, mobile_method, fatal=False, referer=referer)
+ return pc, mobile
+
+ def _extract_info(self, pc, mobile, i, referer):
+ get_item = lambda x, y: try_get(x, lambda x: x[y][i], dict) or {}
+ pc_item = get_item(pc, 'playlistItems')
+ mobile_item = get_item(mobile, 'mediaList')
+ video_id = pc_item.get('mediaId') or mobile_item['mediaId']
+ title = pc_item.get('title') or mobile_item['title']
- def _extract_info(self, streams, mobile_urls, properties):
- video_id = properties['media_id']
formats = []
urls = []
- for stream in streams:
+ for stream in pc_item.get('streams', []):
stream_url = stream.get('url')
if not stream_url or stream.get('drmProtected') or stream_url in urls:
continue
})
formats.append(fmt)
- for mobile_url in mobile_urls:
+ for mobile_url in mobile_item.get('mobileUrls', []):
media_url = mobile_url.get('mobileUrl')
format_id = mobile_url.get('targetMediaPlatform')
if not media_url or format_id in ('Widevine', 'SmoothStreaming') or media_url in urls:
self._sort_formats(formats)
- title = properties['title']
- description = properties.get('description')
- timestamp = int_or_none(properties.get('publish_date') or properties.get('create_date'))
- duration = float_or_none(properties.get('duration_in_milliseconds'), 1000)
- filesize = int_or_none(properties.get('total_storage_in_bytes'))
- categories = [properties.get('category')]
- tags = properties.get('tags', [])
- thumbnails = [{
- 'url': thumbnail['url'],
- 'width': int_or_none(thumbnail.get('width')),
- 'height': int_or_none(thumbnail.get('height')),
- } for thumbnail in properties.get('thumbnails', []) if thumbnail.get('url')]
-
subtitles = {}
- for caption in properties.get('captions', []):
- lang = caption.get('language_code')
- subtitles_url = caption.get('url')
- if lang and subtitles_url:
- subtitles.setdefault(lang, []).append({
- 'url': subtitles_url,
- })
- closed_captions_url = properties.get('closed_captions_url')
- if closed_captions_url:
- subtitles.setdefault('en', []).append({
- 'url': closed_captions_url,
- 'ext': 'ttml',
- })
+ for flag in mobile_item.get('flags'):
+ if flag == 'ClosedCaptions':
+ closed_captions = self._call_playlist_service(
+ video_id, 'getClosedCaptionsDetailsByMediaId',
+ False, referer) or []
+ for cc in closed_captions:
+ cc_url = cc.get('webvttFileUrl')
+ if not cc_url:
+ continue
+ lang = cc.get('languageCode') or self._search_regex(r'/[a-z]{2}\.vtt', cc_url, 'lang', default='en')
+ subtitles.setdefault(lang, []).append({
+ 'url': cc_url,
+ })
+ break
+
+ get_meta = lambda x: pc_item.get(x) or mobile_item.get(x)
return {
'id': video_id,
'title': title,
- 'description': description,
+ 'description': get_meta('description'),
'formats': formats,
- 'timestamp': timestamp,
- 'duration': duration,
- 'filesize': filesize,
- 'categories': categories,
- 'tags': tags,
- 'thumbnails': thumbnails,
+ 'duration': float_or_none(get_meta('durationInMilliseconds'), 1000),
+ 'thumbnail': get_meta('previewImageUrl') or get_meta('thumbnailImageUrl'),
'subtitles': subtitles,
}
- def _extract_info_helper(self, pc, mobile, i, metadata):
- return self._extract_info(
- try_get(pc, lambda x: x['playlistItems'][i]['streams'], list) or [],
- try_get(mobile, lambda x: x['mediaList'][i]['mobileUrls'], list) or [],
- metadata)
-
class LimelightMediaIE(LimelightBaseIE):
IE_NAME = 'limelight'
'description': 'md5:8005b944181778e313d95c1237ddb640',
'thumbnail': r're:^https?://.*\.jpeg$',
'duration': 144.23,
- 'timestamp': 1244136834,
- 'upload_date': '20090604',
},
'params': {
# m3u8 download
'title': '3Play Media Overview Video',
'thumbnail': r're:^https?://.*\.jpeg$',
'duration': 78.101,
- 'timestamp': 1338929955,
- 'upload_date': '20120605',
- 'subtitles': 'mincount:9',
+ # TODO: extract all languages that were accessible via API
+ # 'subtitles': 'mincount:9',
+ 'subtitles': 'mincount:1',
},
}, {
'url': 'https://assets.delvenetworks.com/player/loader.swf?mediaId=8018a574f08d416e95ceaccae4ba0452',
'only_matching': True,
}]
_PLAYLIST_SERVICE_PATH = 'media'
- _API_PATH = 'media'
def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {})
video_id = self._match_id(url)
+ source_url = smuggled_data.get('source_url')
self._initialize_geo_bypass({
'countries': smuggled_data.get('geo_countries'),
})
- pc, mobile, metadata = self._extract(
+ pc, mobile = self._extract(
video_id, 'getPlaylistByMediaId',
- 'getMobilePlaylistByMediaId', 'properties',
- smuggled_data.get('source_url'))
+ 'getMobilePlaylistByMediaId', source_url)
- return self._extract_info_helper(pc, mobile, 0, metadata)
+ return self._extract_info(pc, mobile, 0, source_url)
class LimelightChannelIE(LimelightBaseIE):
'info_dict': {
'id': 'ab6a524c379342f9b23642917020c082',
'title': 'Javascript Sample Code',
+ 'description': 'Javascript Sample Code - http://www.delvenetworks.com/sample-code/playerCode-demo.html',
},
'playlist_mincount': 3,
}, {
'only_matching': True,
}]
_PLAYLIST_SERVICE_PATH = 'channel'
- _API_PATH = 'channels'
def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {})
channel_id = self._match_id(url)
+ source_url = smuggled_data.get('source_url')
- pc, mobile, medias = self._extract(
+ pc, mobile = self._extract(
channel_id, 'getPlaylistByChannelId',
'getMobilePlaylistWithNItemsByChannelId?begin=0&count=-1',
- 'media', smuggled_data.get('source_url'))
+ source_url)
entries = [
- self._extract_info_helper(pc, mobile, i, medias['media_list'][i])
- for i in range(len(medias['media_list']))]
+ self._extract_info(pc, mobile, i, source_url)
+ for i in range(len(pc['playlistItems']))]
- return self.playlist_result(entries, channel_id, pc['title'])
+ return self.playlist_result(
+ entries, channel_id, pc.get('title'), mobile.get('description'))
class LimelightChannelListIE(LimelightBaseIE):
def _real_extract(self, url):
channel_list_id = self._match_id(url)
- channel_list = self._call_playlist_service(channel_list_id, 'getMobileChannelListById')
+ channel_list = self._call_playlist_service(
+ channel_list_id, 'getMobileChannelListById')
entries = [
self.url_result('limelight:channel:%s' % channel['id'], 'LimelightChannel')
for channel in channel_list['channelList']]
- return self.playlist_result(entries, channel_list_id, channel_list['title'])
+ return self.playlist_result(
+ entries, channel_list_id, channel_list['title'])
from ..compat import (
compat_b64decode,
compat_HTTPError,
- compat_str,
)
from ..utils import (
ExtractorError,
'sso': 'true',
})
- login_state_url = compat_str(urlh.geturl())
+ login_state_url = urlh.geturl()
try:
login_page = self._download_webpage(
})
access_token = self._search_regex(
- r'access_token=([^=&]+)', compat_str(urlh.geturl()),
+ r'access_token=([^=&]+)', urlh.geturl(),
'access token')
self._download_webpage(
--- /dev/null
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+
+
+class MagentaMusik360IE(InfoExtractor):
+ _VALID_URL = r'https?://(?:www\.)?magenta-musik-360\.de/([a-z0-9-]+-(?P<id>[0-9]+)|festivals/.+)'
+ _TESTS = [{
+ 'url': 'https://www.magenta-musik-360.de/within-temptation-wacken-2019-1-9208205928595185932',
+ 'md5': '65b6f060b40d90276ec6fb9b992c1216',
+ 'info_dict': {
+ 'id': '9208205928595185932',
+ 'ext': 'm3u8',
+ 'title': 'WITHIN TEMPTATION',
+ 'description': 'Robert Westerholt und Sharon Janny den Adel gründeten die Symphonic Metal-Band. Privat sind die Niederländer ein Paar und haben zwei Kinder. Die Single Ice Queen brachte ihnen Platin und Gold und verhalf 2002 zum internationalen Durchbruch. Charakteristisch für die Band war Anfangs der hohe Gesang von Frontfrau Sharon. Stilistisch fing die Band im Gothic Metal an. Mit neuem Sound, schnellen Gitarrenriffs und Gitarrensoli, avancierte Within Temptation zur erfolgreichen Rockband. Auch dieses Jahr wird die Band ihre Fangemeinde wieder mitreißen.',
+ }
+ }, {
+ 'url': 'https://www.magenta-musik-360.de/festivals/wacken-world-wide-2020-body-count-feat-ice-t',
+ 'md5': '81010d27d7cab3f7da0b0f681b983b7e',
+ 'info_dict': {
+ 'id': '9208205928595231363',
+ 'ext': 'm3u8',
+ 'title': 'Body Count feat. Ice-T',
+ 'description': 'Body Count feat. Ice-T konnten bereits im vergangenen Jahr auf dem „Holy Ground“ in Wacken überzeugen. 2020 gehen die Crossover-Metaller aus einem Club in Los Angeles auf Sendung und bringen mit ihrer Mischung aus Metal und Hip-Hop Abwechslung und ordentlich Alarm zum WWW. Bereits seit 1990 stehen die beiden Gründer Ice-T (Gesang) und Ernie C (Gitarre) auf der Bühne. Sieben Studioalben hat die Gruppe bis jetzt veröffentlicht, darunter das Debüt „Body Count“ (1992) mit dem kontroversen Track „Cop Killer“.',
+ }
+ }]
+
+ def _real_extract(self, url):
+ video_id = self._match_id(url)
+ # _match_id casts to string, but since "None" is not a valid video_id for magenta
+ # there is no risk for confusion
+ if video_id == "None":
+ webpage = self._download_webpage(url, video_id)
+ video_id = self._html_search_regex(r'data-asset-id="([^"]+)"', webpage, 'video_id')
+ json = self._download_json("https://wcps.t-online.de/cvss/magentamusic/vodplayer/v3/player/58935/%s/Main%%20Movie" % video_id, video_id)
+ xml_url = json['content']['feature']['representations'][0]['contentPackages'][0]['media']['href']
+ metadata = json['content']['feature'].get('metadata')
+ title = None
+ description = None
+ duration = None
+ thumbnails = []
+ if metadata:
+ title = metadata.get('title')
+ description = metadata.get('fullDescription')
+ duration = metadata.get('runtimeInSeconds')
+ for img_key in ('teaserImageWide', 'smallCoverImage'):
+ if img_key in metadata:
+ thumbnails.append({'url': metadata[img_key].get('href')})
+
+ xml = self._download_xml(xml_url, video_id)
+ final_url = xml[0][0][0].attrib['src']
+
+ return {
+ 'id': video_id,
+ 'title': title,
+ 'description': description,
+ 'url': final_url,
+ 'duration': duration,
+ 'thumbnails': thumbnails
+ }
IE_DESC = 'Видео@Mail.Ru'
_VALID_URL = r'''(?x)
https?://
- (?:(?:www|m)\.)?my\.mail\.ru/
+ (?:(?:www|m|videoapi)\.)?my\.mail\.ru/+
(?:
video/.*\#video=/?(?P<idv1>(?:[^/]+/){3}\d+)|
- (?:(?P<idv2prefix>(?:[^/]+/){2})video/(?P<idv2suffix>[^/]+/\d+))\.html|
+ (?:videos/embed/)?(?:(?P<idv2prefix>(?:[^/]+/+){2})(?:video/(?:embed/)?)?(?P<idv2suffix>[^/]+/\d+))(?:\.html)?|
(?:video/embed|\+/video/meta)/(?P<metaid>\d+)
)
'''
{
'url': 'http://my.mail.ru/+/video/meta/7949340477499637815',
'only_matching': True,
+ },
+ {
+ 'url': 'https://my.mail.ru//list/sinyutin10/video/_myvideo/4.html',
+ 'only_matching': True,
+ },
+ {
+ 'url': 'https://my.mail.ru//list//sinyutin10/video/_myvideo/4.html',
+ 'only_matching': True,
}
]
if not video_id:
video_id = mobj.group('idv2prefix') + mobj.group('idv2suffix')
webpage = self._download_webpage(url, video_id)
- page_config = self._parse_json(self._search_regex(
+ page_config = self._parse_json(self._search_regex([
r'(?s)<script[^>]+class="sp-video__page-config"[^>]*>(.+?)</script>',
+ r'(?s)"video":\s*(\{.+?\}),'],
webpage, 'page config', default='{}'), video_id, fatal=False)
if page_config:
- meta_url = page_config.get('metaUrl') or page_config.get('video', {}).get('metaUrl')
+ meta_url = page_config.get('metaUrl') or page_config.get('video', {}).get('metaUrl') or page_config.get('metadataUrl')
else:
meta_url = None
video_data = None
+
+ # fix meta_url if missing the host address
+ if re.match(r'^\/\+\/', meta_url):
+ meta_url = 'https://my.mail.ru' + meta_url
+
if meta_url:
video_data = self._download_json(
meta_url, video_id or meta_id, 'Downloading video meta JSON',
'http://api.video.mail.ru/videos/%s.json?new=1' % video_id,
video_id, 'Downloading video JSON')
+ headers = {}
+
+ video_key = self._get_cookies('https://my.mail.ru').get('video_key')
+ if video_key:
+ headers['Cookie'] = 'video_key=%s' % video_key.value
+
formats = []
for f in video_data['videos']:
video_url = f.get('url')
'url': video_url,
'format_id': format_id,
'height': height,
+ 'http_headers': headers,
})
self._sort_formats(formats)
class MailRuMusicIE(MailRuMusicSearchBaseIE):
IE_NAME = 'mailru:music'
IE_DESC = 'Музыка@Mail.Ru'
- _VALID_URL = r'https?://my\.mail\.ru/music/songs/[^/?#&]+-(?P<id>[\da-f]+)'
+ _VALID_URL = r'https?://my\.mail\.ru/+music/+songs/+[^/?#&]+-(?P<id>[\da-f]+)'
_TESTS = [{
'url': 'https://my.mail.ru/music/songs/%D0%BC8%D0%BB8%D1%82%D1%85-l-a-h-luciferian-aesthetics-of-herrschaft-single-2017-4e31f7125d0dfaef505d947642366893',
'md5': '0f8c22ef8c5d665b13ac709e63025610',
class MailRuMusicSearchIE(MailRuMusicSearchBaseIE):
IE_NAME = 'mailru:music:search'
IE_DESC = 'Музыка@Mail.Ru'
- _VALID_URL = r'https?://my\.mail\.ru/music/search/(?P<id>[^/?#&]+)'
+ _VALID_URL = r'https?://my\.mail\.ru/+music/+search/+(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://my.mail.ru/music/search/black%20shadow',
'info_dict': {
class MallTVIE(InfoExtractor):
- _VALID_URL = r'https?://(?:www\.)?mall\.tv/(?:[^/]+/)*(?P<id>[^/?#&]+)'
+ _VALID_URL = r'https?://(?:(?:www|sk)\.)?mall\.tv/(?:[^/]+/)*(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://www.mall.tv/18-miliard-pro-neziskovky-opravdu-jsou-sportovci-nebo-clovek-v-tisni-pijavice',
'md5': '1c4a37f080e1f3023103a7b43458e518',
}, {
'url': 'https://www.mall.tv/kdo-to-plati/18-miliard-pro-neziskovky-opravdu-jsou-sportovci-nebo-clovek-v-tisni-pijavice',
'only_matching': True,
+ }, {
+ 'url': 'https://sk.mall.tv/gejmhaus/reklamacia-nehreje-vyrobnik-tepla-alebo-spekacka',
+ 'only_matching': True,
}]
def _real_extract(self, url):
from .theplatform import ThePlatformBaseIE
from ..compat import (
compat_parse_qs,
- compat_str,
compat_urllib_parse_urlparse,
)
from ..utils import (
continue
urlh = ie._request_webpage(
embed_url, video_id, note='Following embed URL redirect')
- embed_url = compat_str(urlh.geturl())
+ embed_url = urlh.geturl()
program_guid = _program_guid(_qs(embed_url))
if program_guid:
entries.append(embed_url)
def _parse_smil_formats(self, smil, smil_url, video_id, namespace=None, f4m_params=None, transform_rtmp_url=None):
for video in smil.findall(self._xpath_ns('.//video', namespace)):
video.attrib['src'] = re.sub(r'(https?://vod05)t(-mediaset-it\.akamaized\.net/.+?.mpd)\?.+', r'\1\2', video.attrib['src'])
- return super()._parse_smil_formats(smil, smil_url, video_id, namespace, f4m_params, transform_rtmp_url)
+ return super(MediasetIE, self)._parse_smil_formats(smil, smil_url, video_id, namespace, f4m_params, transform_rtmp_url)
def _real_extract(self, url):
guid = self._match_id(url)
query = mobj.group('query')
webpage, urlh = self._download_webpage_handle(url, resource_id) # XXX: add UrlReferrer?
- redirect_url = compat_str(urlh.geturl())
+ redirect_url = urlh.geturl()
# XXX: might have also extracted UrlReferrer and QueryString from the html
service_path = compat_urlparse.urljoin(redirect_url, self._html_search_regex(
--- /dev/null
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import (
+ int_or_none,
+ parse_iso8601,
+ smuggle_url,
+)
+
+
+class MiTeleIE(InfoExtractor):
+ IE_DESC = 'mitele.es'
+ _VALID_URL = r'https?://(?:www\.)?mitele\.es/(?:[^/]+/)+(?P<id>[^/]+)/player'
+
+ _TESTS = [{
+ 'url': 'http://www.mitele.es/programas-tv/diario-de/57b0dfb9c715da65618b4afa/player',
+ 'info_dict': {
+ 'id': 'FhYW1iNTE6J6H7NkQRIEzfne6t2quqPg',
+ 'ext': 'mp4',
+ 'title': 'Diario de La redacción Programa 144',
+ 'description': 'md5:07c35a7b11abb05876a6a79185b58d27',
+ 'series': 'Diario de',
+ 'season': 'Season 14',
+ 'season_number': 14,
+ 'episode': 'Tor, la web invisible',
+ 'episode_number': 3,
+ 'thumbnail': r're:(?i)^https?://.*\.jpg$',
+ 'duration': 2913,
+ 'age_limit': 16,
+ 'timestamp': 1471209401,
+ 'upload_date': '20160814',
+ },
+ 'add_ie': ['Ooyala'],
+ }, {
+ # no explicit title
+ 'url': 'http://www.mitele.es/programas-tv/cuarto-milenio/57b0de3dc915da14058b4876/player',
+ 'info_dict': {
+ 'id': 'oyNG1iNTE6TAPP-JmCjbwfwJqqMMX3Vq',
+ 'ext': 'mp4',
+ 'title': 'Cuarto Milenio Temporada 6 Programa 226',
+ 'description': 'md5:5ff132013f0cd968ffbf1f5f3538a65f',
+ 'series': 'Cuarto Milenio',
+ 'season': 'Season 6',
+ 'season_number': 6,
+ 'episode': 'Episode 24',
+ 'episode_number': 24,
+ 'thumbnail': r're:(?i)^https?://.*\.jpg$',
+ 'duration': 7313,
+ 'age_limit': 12,
+ 'timestamp': 1471209021,
+ 'upload_date': '20160814',
+ },
+ 'params': {
+ 'skip_download': True,
+ },
+ 'add_ie': ['Ooyala'],
+ }, {
+ 'url': 'http://www.mitele.es/series-online/la-que-se-avecina/57aac5c1c915da951a8b45ed/player',
+ 'only_matching': True,
+ }, {
+ 'url': 'https://www.mitele.es/programas-tv/diario-de/la-redaccion/programa-144-40_1006364575251/player/',
+ 'only_matching': True,
+ }]
+
+ def _real_extract(self, url):
+ display_id = self._match_id(url)
+ webpage = self._download_webpage(url, display_id)
+ pre_player = self._parse_json(self._search_regex(
+ r'window\.\$REACTBASE_STATE\.prePlayer_mtweb\s*=\s*({.+})',
+ webpage, 'Pre Player'), display_id)['prePlayer']
+ title = pre_player['title']
+ video = pre_player['video']
+ video_id = video['dataMediaId']
+ content = pre_player.get('content') or {}
+ info = content.get('info') or {}
+
+ return {
+ '_type': 'url_transparent',
+ # for some reason only HLS is supported
+ 'url': smuggle_url('ooyala:' + video_id, {'supportedformats': 'm3u8,dash'}),
+ 'id': video_id,
+ 'title': title,
+ 'description': info.get('synopsis'),
+ 'series': content.get('title'),
+ 'season_number': int_or_none(info.get('season_number')),
+ 'episode': content.get('subtitle'),
+ 'episode_number': int_or_none(info.get('episode_number')),
+ 'duration': int_or_none(info.get('duration')),
+ 'thumbnail': video.get('dataPoster'),
+ 'age_limit': int_or_none(info.get('rating')),
+ 'timestamp': parse_iso8601(pre_player.get('publishedTime')),
+ }
from __future__ import unicode_literals
+import re
+
+from .common import InfoExtractor
from ..utils import (
int_or_none,
str_to_int,
})
return info
+
+
+class MofosexEmbedIE(InfoExtractor):
+ _VALID_URL = r'https?://(?:www\.)?mofosex\.com/embed/?\?.*?\bvideoid=(?P<id>\d+)'
+ _TESTS = [{
+ 'url': 'https://www.mofosex.com/embed/?videoid=318131&referrer=KM',
+ 'only_matching': True,
+ }]
+
+ @staticmethod
+ def _extract_urls(webpage):
+ return re.findall(
+ r'<iframe[^>]+\bsrc=["\']((?:https?:)?//(?:www\.)?mofosex\.com/embed/?\?.*?\bvideoid=\d+)',
+ webpage)
+
+ def _real_extract(self, url):
+ video_id = self._match_id(url)
+ return self.url_result(
+ 'http://www.mofosex.com/videos/{0}/{0}.html'.format(video_id),
+ ie=MofosexIE.ie_key(), video_id=video_id)
'categories': ['Gaming', 'anal', 'reluctant', 'rough', 'Wife'],
'upload_date': '20100913',
'uploader_id': 'famouslyfuckedup',
- 'thumbnail': r're:http://.*\.jpg',
+ 'thumbnail': r're:https?://.*\.jpg',
'age_limit': 18,
}
}, {
'game', 'hairy'],
'upload_date': '20140622',
'uploader_id': 'Sulivana7x',
- 'thumbnail': r're:http://.*\.jpg',
+ 'thumbnail': r're:https?://.*\.jpg',
'age_limit': 18,
},
'skip': '404',
'categories': ['superheroine heroine superher'],
'upload_date': '20140827',
'uploader_id': 'shade0230',
- 'thumbnail': r're:http://.*\.jpg',
+ 'thumbnail': r're:https?://.*\.jpg',
'age_limit': 18,
}
}, {
raise ExtractorError('Video %s is for friends only' % video_id, expected=True)
title = self._html_search_regex(
- r'id="view-upload-title">\s+([^<]+)<', webpage, 'title')
+ (r'(?s)<div[^>]+\bclass=["\']media-meta-title[^>]+>(.+?)</div>',
+ r'id="view-upload-title">\s+([^<]+)<'), webpage, 'title')
video_url = (self._html_search_regex(
(r'setup\(\{\s*["\']file["\']\s*:\s*(["\'])(?P<url>(?:(?!\1).)+)\1',
r'fileurl\s*=\s*(["\'])(?P<url>(?:(?!\1).)+)\1'),
or 'http://cdn4.videos.motherlessmedia.com/videos/%s.mp4?fs=opencloud' % video_id)
age_limit = self._rta_search(webpage)
view_count = str_to_int(self._html_search_regex(
- r'<strong>Views</strong>\s+([^<]+)<',
+ (r'>(\d+)\s+Views<', r'<strong>Views</strong>\s+([^<]+)<'),
webpage, 'view count', fatal=False))
like_count = str_to_int(self._html_search_regex(
- r'<strong>Favorited</strong>\s+([^<]+)<',
+ (r'>(\d+)\s+Favorites<', r'<strong>Favorited</strong>\s+([^<]+)<'),
webpage, 'like count', fatal=False))
upload_date = self._html_search_regex(
- r'<strong>Uploaded</strong>\s+([^<]+)<', webpage, 'upload date')
+ (r'class=["\']count[^>]+>(\d+\s+[a-zA-Z]{3}\s+\d{4})<',
+ r'<strong>Uploaded</strong>\s+([^<]+)<'), webpage, 'upload date')
if 'Ago' in upload_date:
days = int(re.search(r'([0-9]+)', upload_date).group(1))
upload_date = (datetime.datetime.now() - datetime.timedelta(days=days)).strftime('%Y%m%d')
--- /dev/null
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import js_to_json
+
+
+class MyVideoGeIE(InfoExtractor):
+ _VALID_URL = r'https?://(?:www\.)?myvideo\.ge/v/(?P<id>[0-9]+)'
+ _TEST = {
+ 'url': 'https://www.myvideo.ge/v/3941048',
+ 'md5': '8c192a7d2b15454ba4f29dc9c9a52ea9',
+ 'info_dict': {
+ 'id': '3941048',
+ 'ext': 'mp4',
+ 'title': 'The best prikol',
+ 'thumbnail': r're:^https?://.*\.jpg$',
+ 'uploader': 'md5:d72addd357b0dd914e704781f7f777d8',
+ 'description': 'md5:5c0371f540f5888d603ebfedd46b6df3'
+ }
+ }
+
+ def _real_extract(self, url):
+ video_id = self._match_id(url)
+ webpage = self._download_webpage(url, video_id)
+
+ title = self._html_search_regex(r'<h1[^>]*>([^<]+)</h1>', webpage, 'title')
+ description = self._og_search_description(webpage)
+ thumbnail = self._html_search_meta(['og:image'], webpage)
+ uploader = self._search_regex(r'<a[^>]+class="mv_user_name"[^>]*>([^<]+)<', webpage, 'uploader', fatal=False)
+
+ jwplayer_sources = self._parse_json(
+ self._search_regex(
+ r"(?s)jwplayer\(\"mvplayer\"\).setup\(.*?sources: (.*?])", webpage, 'jwplayer sources'),
+ video_id, transform_source=js_to_json)
+
+ def _formats_key(f):
+ if f['label'] == 'SD':
+ return -1
+ elif f['label'] == 'HD':
+ return 1
+ else:
+ return 0
+
+ jwplayer_sources = sorted(jwplayer_sources, key=_formats_key)
+
+ formats = self._parse_jwplayer_formats(jwplayer_sources, video_id)
+
+ return {
+ 'id': video_id,
+ 'title': title,
+ 'description': description,
+ 'uploader': uploader,
+ 'formats': formats,
+ 'thumbnail': thumbnail
+ }
# coding: utf-8
from __future__ import unicode_literals
+import re
+
from .common import InfoExtractor
from ..utils import (
+ clean_html,
+ dict_get,
ExtractorError,
int_or_none,
+ parse_duration,
+ try_get,
update_url_query,
)
-class NaverIE(InfoExtractor):
- _VALID_URL = r'https?://(?:m\.)?tv(?:cast)?\.naver\.com/v/(?P<id>\d+)'
+class NaverBaseIE(InfoExtractor):
+ _CAPTION_EXT_RE = r'\.(?:ttml|vtt)'
- _TESTS = [{
- 'url': 'http://tv.naver.com/v/81652',
- 'info_dict': {
- 'id': '81652',
- 'ext': 'mp4',
- 'title': '[9월 모의고사 해설강의][수학_김상희] 수학 A형 16~20번',
- 'description': '합격불변의 법칙 메가스터디 | 메가스터디 수학 김상희 선생님이 9월 모의고사 수학A형 16번에서 20번까지 해설강의를 공개합니다.',
- 'upload_date': '20130903',
- },
- }, {
- 'url': 'http://tv.naver.com/v/395837',
- 'md5': '638ed4c12012c458fefcddfd01f173cd',
- 'info_dict': {
- 'id': '395837',
- 'ext': 'mp4',
- 'title': '9년이 지나도 아픈 기억, 전효성의 아버지',
- 'description': 'md5:5bf200dcbf4b66eb1b350d1eb9c753f7',
- 'upload_date': '20150519',
- },
- 'skip': 'Georestricted',
- }, {
- 'url': 'http://tvcast.naver.com/v/81652',
- 'only_matching': True,
- }]
-
- def _real_extract(self, url):
- video_id = self._match_id(url)
- webpage = self._download_webpage(url, video_id)
-
- vid = self._search_regex(
- r'videoId["\']\s*:\s*(["\'])(?P<value>(?:(?!\1).)+)\1', webpage,
- 'video id', fatal=None, group='value')
- in_key = self._search_regex(
- r'inKey["\']\s*:\s*(["\'])(?P<value>(?:(?!\1).)+)\1', webpage,
- 'key', default=None, group='value')
-
- if not vid or not in_key:
- error = self._html_search_regex(
- r'(?s)<div class="(?:nation_error|nation_box|error_box)">\s*(?:<!--.*?-->)?\s*<p class="[^"]+">(?P<msg>.+?)</p>\s*</div>',
- webpage, 'error', default=None)
- if error:
- raise ExtractorError(error, expected=True)
- raise ExtractorError('couldn\'t extract vid and key')
+ def _extract_video_info(self, video_id, vid, key):
video_data = self._download_json(
'http://play.rmcnmv.naver.com/vod/play/v2.0/' + vid,
video_id, query={
- 'key': in_key,
+ 'key': key,
})
meta = video_data['meta']
title = meta['subject']
formats = []
+ get_list = lambda x: try_get(video_data, lambda y: y[x + 's']['list'], list) or []
def extract_formats(streams, stream_type, query={}):
for stream in streams:
encoding_option = stream.get('encodingOption', {})
bitrate = stream.get('bitrate', {})
formats.append({
- 'format_id': '%s_%s' % (stream.get('type') or stream_type, encoding_option.get('id') or encoding_option.get('name')),
+ 'format_id': '%s_%s' % (stream.get('type') or stream_type, dict_get(encoding_option, ('name', 'id'))),
'url': stream_url,
'width': int_or_none(encoding_option.get('width')),
'height': int_or_none(encoding_option.get('height')),
'protocol': 'm3u8_native' if stream_type == 'HLS' else None,
})
- extract_formats(video_data.get('videos', {}).get('list', []), 'H264')
+ extract_formats(get_list('video'), 'H264')
for stream_set in video_data.get('streams', []):
query = {}
for param in stream_set.get('keys', []):
'mp4', 'm3u8_native', m3u8_id=stream_type, fatal=False))
self._sort_formats(formats)
+ replace_ext = lambda x, y: re.sub(self._CAPTION_EXT_RE, '.' + y, x)
+
+ def get_subs(caption_url):
+ if re.search(self._CAPTION_EXT_RE, caption_url):
+ return [{
+ 'url': replace_ext(caption_url, 'ttml'),
+ }, {
+ 'url': replace_ext(caption_url, 'vtt'),
+ }]
+ else:
+ return [{'url': caption_url}]
+
+ automatic_captions = {}
subtitles = {}
- for caption in video_data.get('captions', {}).get('list', []):
+ for caption in get_list('caption'):
caption_url = caption.get('source')
if not caption_url:
continue
- subtitles.setdefault(caption.get('language') or caption.get('locale'), []).append({
- 'url': caption_url,
- })
+ sub_dict = automatic_captions if caption.get('type') == 'auto' else subtitles
+ sub_dict.setdefault(dict_get(caption, ('locale', 'language')), []).extend(get_subs(caption_url))
- upload_date = self._search_regex(
- r'<span[^>]+class="date".*?(\d{4}\.\d{2}\.\d{2})',
- webpage, 'upload date', fatal=False)
- if upload_date:
- upload_date = upload_date.replace('.', '')
+ user = meta.get('user', {})
return {
'id': video_id,
'title': title,
'formats': formats,
'subtitles': subtitles,
- 'description': self._og_search_description(webpage),
- 'thumbnail': meta.get('cover', {}).get('source') or self._og_search_thumbnail(webpage),
+ 'automatic_captions': automatic_captions,
+ 'thumbnail': try_get(meta, lambda x: x['cover']['source']),
'view_count': int_or_none(meta.get('count')),
- 'upload_date': upload_date,
+ 'uploader_id': user.get('id'),
+ 'uploader': user.get('name'),
+ 'uploader_url': user.get('url'),
}
+
+
+class NaverIE(NaverBaseIE):
+ _VALID_URL = r'https?://(?:m\.)?tv(?:cast)?\.naver\.com/(?:v|embed)/(?P<id>\d+)'
+ _GEO_BYPASS = False
+ _TESTS = [{
+ 'url': 'http://tv.naver.com/v/81652',
+ 'info_dict': {
+ 'id': '81652',
+ 'ext': 'mp4',
+ 'title': '[9월 모의고사 해설강의][수학_김상희] 수학 A형 16~20번',
+ 'description': '메가스터디 수학 김상희 선생님이 9월 모의고사 수학A형 16번에서 20번까지 해설강의를 공개합니다.',
+ 'timestamp': 1378200754,
+ 'upload_date': '20130903',
+ 'uploader': '메가스터디, 합격불변의 법칙',
+ 'uploader_id': 'megastudy',
+ },
+ }, {
+ 'url': 'http://tv.naver.com/v/395837',
+ 'md5': '8a38e35354d26a17f73f4e90094febd3',
+ 'info_dict': {
+ 'id': '395837',
+ 'ext': 'mp4',
+ 'title': '9년이 지나도 아픈 기억, 전효성의 아버지',
+ 'description': 'md5:eb6aca9d457b922e43860a2a2b1984d3',
+ 'timestamp': 1432030253,
+ 'upload_date': '20150519',
+ 'uploader': '4가지쇼 시즌2',
+ 'uploader_id': 'wrappinguser29',
+ },
+ 'skip': 'Georestricted',
+ }, {
+ 'url': 'http://tvcast.naver.com/v/81652',
+ 'only_matching': True,
+ }]
+
+ def _real_extract(self, url):
+ video_id = self._match_id(url)
+ content = self._download_json(
+ 'https://tv.naver.com/api/json/v/' + video_id,
+ video_id, headers=self.geo_verification_headers())
+ player_info_json = content.get('playerInfoJson') or {}
+ current_clip = player_info_json.get('currentClip') or {}
+
+ vid = current_clip.get('videoId')
+ in_key = current_clip.get('inKey')
+
+ if not vid or not in_key:
+ player_auth = try_get(player_info_json, lambda x: x['playerOption']['auth'])
+ if player_auth == 'notCountry':
+ self.raise_geo_restricted(countries=['KR'])
+ elif player_auth == 'notLogin':
+ self.raise_login_required()
+ raise ExtractorError('couldn\'t extract vid and key')
+ info = self._extract_video_info(video_id, vid, in_key)
+ info.update({
+ 'description': clean_html(current_clip.get('description')),
+ 'timestamp': int_or_none(current_clip.get('firstExposureTime'), 1000),
+ 'duration': parse_duration(current_clip.get('displayPlayTime')),
+ 'like_count': int_or_none(current_clip.get('recommendPoint')),
+ 'age_limit': 19 if current_clip.get('adult') else None,
+ })
+ return info
def _real_extract(self, url):
permalink, video_id = re.match(self._VALID_URL, url).groups()
permalink = 'http' + compat_urllib_parse_unquote(permalink)
- response = self._download_json(
+ video_data = self._download_json(
'https://friendship.nbc.co/v2/graphql', video_id, query={
- 'query': '''{
- page(name: "%s", platform: web, type: VIDEO, userId: "0") {
- data {
+ 'query': '''query bonanzaPage(
+ $app: NBCUBrands! = nbc
+ $name: String!
+ $oneApp: Boolean
+ $platform: SupportedPlatforms! = web
+ $type: EntityPageType! = VIDEO
+ $userId: String!
+) {
+ bonanzaPage(
+ app: $app
+ name: $name
+ oneApp: $oneApp
+ platform: $platform
+ type: $type
+ userId: $userId
+ ) {
+ metadata {
... on VideoPageData {
description
episodeNumber
mpxAccountId
mpxGuid
rating
+ resourceId
seasonNumber
secondaryTitle
seriesShortTitle
}
}
}
-}''' % permalink,
- })
- video_data = response['data']['page']['data']
+}''',
+ 'variables': json.dumps({
+ 'name': permalink,
+ 'oneApp': True,
+ 'userId': '0',
+ }),
+ })['data']['bonanzaPage']['metadata']
query = {
'mbr': 'true',
'manifest': 'm3u',
title = video_data['secondaryTitle']
if video_data.get('locked'):
resource = self._get_mvpd_resource(
- 'nbcentertainment', title, video_id,
- video_data.get('rating'))
+ video_data.get('resourceId') or 'nbcentertainment',
+ title, video_id, video_data.get('rating'))
query['auth'] = self._extract_mvpd_auth(
url, video_id, 'nbcentertainment', resource)
theplatform_url = smuggle_url(update_url_query(
from ..utils import (
determine_ext,
int_or_none,
+ merge_dicts,
parse_iso8601,
qualities,
+ try_get,
+ urljoin,
)
def _extract_embed(self, webpage, display_id):
embed_url = self._html_search_meta(
- 'embedURL', webpage, 'embed URL', fatal=True)
+ 'embedURL', webpage, 'embed URL',
+ default=None) or self._search_regex(
+ r'\bembedUrl["\']\s*:\s*(["\'])(?P<url>(?:(?!\1).)+)\1', webpage,
+ 'embed URL', group='url')
description = self._search_regex(
r'<p[^>]+itemprop="description">([^<]+)</p>',
webpage, 'description', default=None) or self._og_search_description(webpage)
timestamp = parse_iso8601(
self._search_regex(
r'<span[^>]+itemprop="(?:datePublished|uploadDate)"[^>]+content="([^"]+)"',
- webpage, 'upload date', fatal=False))
- return {
+ webpage, 'upload date', default=None))
+ info = self._search_json_ld(webpage, display_id, default={})
+ return merge_dicts({
'_type': 'url_transparent',
'url': embed_url,
'display_id': display_id,
'description': description,
'timestamp': timestamp,
- }
+ }, info)
class NJoyIE(NDRBaseIE):
upload_date = ppjson.get('config', {}).get('publicationDate')
duration = int_or_none(config.get('duration'))
- thumbnails = [{
- 'id': thumbnail.get('quality') or thumbnail_id,
- 'url': thumbnail['src'],
- 'preference': quality_key(thumbnail.get('quality')),
- } for thumbnail_id, thumbnail in config.get('poster', {}).items() if thumbnail.get('src')]
+ thumbnails = []
+ poster = try_get(config, lambda x: x['poster'], dict) or {}
+ for thumbnail_id, thumbnail in poster.items():
+ thumbnail_url = urljoin(url, thumbnail.get('src'))
+ if not thumbnail_url:
+ continue
+ thumbnails.append({
+ 'id': thumbnail.get('quality') or thumbnail_id,
+ 'url': thumbnail_url,
+ 'preference': quality_key(thumbnail.get('quality')),
+ })
return {
'id': video_id,
class NhkVodIE(InfoExtractor):
- _VALID_URL = r'https?://www3\.nhk\.or\.jp/nhkworld/(?P<lang>[a-z]{2})/ondemand/(?P<type>video|audio)/(?P<id>\d{7}|[a-z]+-\d{8}-\d+)'
+ _VALID_URL = r'https?://www3\.nhk\.or\.jp/nhkworld/(?P<lang>[a-z]{2})/ondemand/(?P<type>video|audio)/(?P<id>\d{7}|[^/]+?-\d{8}-\d+)'
# Content available only for a limited period of time. Visit
# https://www3.nhk.or.jp/nhkworld/en/ondemand/ for working samples.
_TESTS = [{
}, {
'url': 'https://www3.nhk.or.jp/nhkworld/fr/ondemand/audio/plugin-20190404-1/',
'only_matching': True,
+ }, {
+ 'url': 'https://www3.nhk.or.jp/nhkworld/en/ondemand/audio/j_art-20150903-1/',
+ 'only_matching': True,
}]
- _API_URL_TEMPLATE = 'https://api.nhk.or.jp/nhkworld/%sod%slist/v7/episode/%s/%s/all%s.json'
+ _API_URL_TEMPLATE = 'https://api.nhk.or.jp/nhkworld/%sod%slist/v7a/episode/%s/%s/all%s.json'
def _real_extract(self, url):
lang, m_type, episode_id = re.match(self._VALID_URL, url).groups()
audio = episode['audio']
audio_path = audio['audio']
info['formats'] = self._extract_m3u8_formats(
- 'https://nhks-vh.akamaihd.net/i%s/master.m3u8' % audio_path,
- episode_id, 'm4a', m3u8_id='hls', fatal=False)
- for proto in ('rtmpt', 'rtmp'):
- info['formats'].append({
- 'ext': 'flv',
- 'format_id': proto,
- 'url': '%s://flv.nhk.or.jp/ondemand/mp4:flv%s' % (proto, audio_path),
- 'vcodec': 'none',
- })
+ 'https://nhkworld-vh.akamaihd.net/i%s/master.m3u8' % audio_path,
+ episode_id, 'm4a', entry_protocol='m3u8_native',
+ m3u8_id='hls', fatal=False)
for f in info['formats']:
f['language'] = lang
return info
from .common import InfoExtractor
from ..utils import (
clean_html,
+ determine_ext,
int_or_none,
js_to_json,
qualities,
_VALID_URL = r'https?://media\.cms\.nova\.cz/embed/(?P<id>[^/?#&]+)'
_TEST = {
'url': 'https://media.cms.nova.cz/embed/8o0n0r?autoplay=1',
- 'md5': 'b3834f6de5401baabf31ed57456463f7',
+ 'md5': 'ee009bafcc794541570edd44b71cbea3',
'info_dict': {
'id': '8o0n0r',
'ext': 'mp4',
webpage = self._download_webpage(url, video_id)
- bitrates = self._parse_json(
+ duration = None
+ formats = []
+
+ player = self._parse_json(
self._search_regex(
- r'(?s)(?:src|bitrates)\s*=\s*({.+?})\s*;', webpage, 'formats'),
- video_id, transform_source=js_to_json)
+ r'Player\.init\s*\([^,]+,\s*({.+?})\s*,\s*{.+?}\s*\)\s*;',
+ webpage, 'player', default='{}'), video_id, fatal=False)
+ if player:
+ for format_id, format_list in player['tracks'].items():
+ if not isinstance(format_list, list):
+ format_list = [format_list]
+ for format_dict in format_list:
+ if not isinstance(format_dict, dict):
+ continue
+ format_url = url_or_none(format_dict.get('src'))
+ format_type = format_dict.get('type')
+ ext = determine_ext(format_url)
+ if (format_type == 'application/x-mpegURL'
+ or format_id == 'HLS' or ext == 'm3u8'):
+ formats.extend(self._extract_m3u8_formats(
+ format_url, video_id, 'mp4',
+ entry_protocol='m3u8_native', m3u8_id='hls',
+ fatal=False))
+ elif (format_type == 'application/dash+xml'
+ or format_id == 'DASH' or ext == 'mpd'):
+ formats.extend(self._extract_mpd_formats(
+ format_url, video_id, mpd_id='dash', fatal=False))
+ else:
+ formats.append({
+ 'url': format_url,
+ })
+ duration = int_or_none(player.get('duration'))
+ else:
+ # Old path, not actual as of 08.04.2020
+ bitrates = self._parse_json(
+ self._search_regex(
+ r'(?s)(?:src|bitrates)\s*=\s*({.+?})\s*;', webpage, 'formats'),
+ video_id, transform_source=js_to_json)
- QUALITIES = ('lq', 'mq', 'hq', 'hd')
- quality_key = qualities(QUALITIES)
+ QUALITIES = ('lq', 'mq', 'hq', 'hd')
+ quality_key = qualities(QUALITIES)
+
+ for format_id, format_list in bitrates.items():
+ if not isinstance(format_list, list):
+ format_list = [format_list]
+ for format_url in format_list:
+ format_url = url_or_none(format_url)
+ if not format_url:
+ continue
+ if format_id == 'hls':
+ formats.extend(self._extract_m3u8_formats(
+ format_url, video_id, ext='mp4',
+ entry_protocol='m3u8_native', m3u8_id='hls',
+ fatal=False))
+ continue
+ f = {
+ 'url': format_url,
+ }
+ f_id = format_id
+ for quality in QUALITIES:
+ if '%s.mp4' % quality in format_url:
+ f_id += '-%s' % quality
+ f.update({
+ 'quality': quality_key(quality),
+ 'format_note': quality.upper(),
+ })
+ break
+ f['format_id'] = f_id
+ formats.append(f)
- formats = []
- for format_id, format_list in bitrates.items():
- if not isinstance(format_list, list):
- continue
- for format_url in format_list:
- format_url = url_or_none(format_url)
- if not format_url:
- continue
- f = {
- 'url': format_url,
- }
- f_id = format_id
- for quality in QUALITIES:
- if '%s.mp4' % quality in format_url:
- f_id += '-%s' % quality
- f.update({
- 'quality': quality_key(quality),
- 'format_note': quality.upper(),
- })
- break
- f['format_id'] = f_id
- formats.append(f)
self._sort_formats(formats)
title = self._og_search_title(
r'poster\s*:\s*(["\'])(?P<value>(?:(?!\1).)+)\1', webpage,
'thumbnail', fatal=False, group='value')
duration = int_or_none(self._search_regex(
- r'videoDuration\s*:\s*(\d+)', webpage, 'duration', fatal=False))
+ r'videoDuration\s*:\s*(\d+)', webpage, 'duration',
+ default=duration))
return {
'id': video_id,
_VALID_URL = r'https?://(?:[^.]+\.)?(?P<site>tv(?:noviny)?|tn|novaplus|vymena|fanda|krasna|doma|prask)\.nova\.cz/(?:[^/]+/)+(?P<id>[^/]+?)(?:\.html|/|$)'
_TESTS = [{
'url': 'http://tn.nova.cz/clanek/tajemstvi-ukryte-v-podzemi-specialni-nemocnice-v-prazske-krci.html#player_13260',
- 'md5': '1dd7b9d5ea27bc361f110cd855a19bd3',
+ 'md5': '249baab7d0104e186e78b0899c7d5f28',
'info_dict': {
'id': '1757139',
'display_id': 'tajemstvi-ukryte-v-podzemi-specialni-nemocnice-v-prazske-krci',
'params': {
# rtmp download
'skip_download': True,
- }
+ },
+ 'skip': 'gone',
}, {
# media.cms.nova.cz embed
'url': 'https://novaplus.nova.cz/porad/ulice/epizoda/18760-2180-dil',
'skip_download': True,
},
'add_ie': [NovaEmbedIE.ie_key()],
+ 'skip': 'CHYBA 404: STRÁNKA NENALEZENA',
}, {
'url': 'http://sport.tn.nova.cz/clanek/sport/hokej/nhl/zivot-jde-dal-hodnotil-po-vyrazeni-z-playoff-jiri-sekac.html',
'only_matching': True,
webpage = self._download_webpage(url, display_id)
+ description = clean_html(self._og_search_description(webpage, default=None))
+ if site == 'novaplus':
+ upload_date = unified_strdate(self._search_regex(
+ r'(\d{1,2}-\d{1,2}-\d{4})$', display_id, 'upload date', default=None))
+ elif site == 'fanda':
+ upload_date = unified_strdate(self._search_regex(
+ r'<span class="date_time">(\d{1,2}\.\d{1,2}\.\d{4})', webpage, 'upload date', default=None))
+ else:
+ upload_date = None
+
# novaplus
embed_id = self._search_regex(
r'<iframe[^>]+\bsrc=["\'](?:https?:)?//media\.cms\.nova\.cz/embed/([^/?#&]+)',
webpage, 'embed url', default=None)
if embed_id:
- return self.url_result(
- 'https://media.cms.nova.cz/embed/%s' % embed_id,
- ie=NovaEmbedIE.ie_key(), video_id=embed_id)
+ return {
+ '_type': 'url_transparent',
+ 'url': 'https://media.cms.nova.cz/embed/%s' % embed_id,
+ 'ie_key': NovaEmbedIE.ie_key(),
+ 'id': embed_id,
+ 'description': description,
+ 'upload_date': upload_date
+ }
video_id = self._search_regex(
[r"(?:media|video_id)\s*:\s*'(\d+)'",
self._sort_formats(formats)
title = mediafile.get('meta', {}).get('title') or self._og_search_title(webpage)
- description = clean_html(self._og_search_description(webpage, default=None))
thumbnail = config.get('poster')
- if site == 'novaplus':
- upload_date = unified_strdate(self._search_regex(
- r'(\d{1,2}-\d{1,2}-\d{4})$', display_id, 'upload date', default=None))
- elif site == 'fanda':
- upload_date = unified_strdate(self._search_regex(
- r'<span class="date_time">(\d{1,2}\.\d{1,2}\.\d{4})', webpage, 'upload date', default=None))
- else:
- upload_date = None
-
return {
'id': video_id,
'display_id': display_id,
elif source == 'youtube':
return self.url_result(video_id, 'Youtube')
elif source == 'cinematique':
- # youtube-dl currently doesn't support cinematique
+ # youtube-dlc currently doesn't support cinematique
# return self.url_result('http://cinematique.com/embed/%s' % video_id, 'Cinematique')
pass
from ..utils import (
int_or_none,
qualities,
+ url_or_none,
)
},
}],
'expected_warnings': ['Failed to download m3u8 information'],
+ }, {
+ # multimedia, no formats, stream
+ 'url': 'https://www.npr.org/2020/02/14/805476846/laura-stevenson-tiny-desk-concert',
+ 'only_matching': True,
}]
def _real_extract(self, url):
'format_id': format_id,
'quality': quality(format_id),
})
+ for stream_id, stream_entry in media.get('stream', {}).items():
+ if not isinstance(stream_entry, dict):
+ continue
+ if stream_id != 'hlsUrl':
+ continue
+ stream_url = url_or_none(stream_entry.get('$text'))
+ if not stream_url:
+ continue
+ formats.extend(self._extract_m3u8_formats(
+ stream_url, stream_id, 'mp4', 'm3u8_native',
+ m3u8_id='hls', fatal=False))
self._sort_formats(formats)
entries.append({
from ..utils import (
ExtractorError,
int_or_none,
- JSON_LD_RE,
+ js_to_json,
NO_DEFAULT,
parse_age_limit,
parse_duration,
MESSAGES = {
'ProgramRightsAreNotReady': 'Du kan dessverre ikke se eller høre programmet',
'ProgramRightsHasExpired': 'Programmet har gått ut',
+ 'NoProgramRights': 'Ikke tilgjengelig',
'ProgramIsGeoBlocked': 'NRK har ikke rettigheter til å vise dette programmet utenfor Norge',
}
message_type = data.get('messageType', '')
''' % _EPISODE_RE
_API_HOSTS = ('psapi-ne.nrk.no', 'psapi-we.nrk.no')
_TESTS = [{
+ 'url': 'https://tv.nrk.no/program/MDDP12000117',
+ 'md5': '8270824df46ec629b66aeaa5796b36fb',
+ 'info_dict': {
+ 'id': 'MDDP12000117AA',
+ 'ext': 'mp4',
+ 'title': 'Alarm Trolltunga',
+ 'description': 'md5:46923a6e6510eefcce23d5ef2a58f2ce',
+ 'duration': 2223,
+ 'age_limit': 6,
+ },
+ }, {
'url': 'https://tv.nrk.no/serie/20-spoersmaal-tv/MUHH48000314/23-05-2014',
'md5': '9a167e54d04671eb6317a37b7bc8a280',
'info_dict': {
'series': '20 spørsmål',
'episode': '23.05.2014',
},
+ 'skip': 'NoProgramRights',
}, {
'url': 'https://tv.nrk.no/program/mdfp15000514',
'info_dict': {
class NRKTVEpisodeIE(InfoExtractor):
_VALID_URL = r'https?://tv\.nrk\.no/serie/(?P<id>[^/]+/sesong/\d+/episode/\d+)'
- _TEST = {
+ _TESTS = [{
+ 'url': 'https://tv.nrk.no/serie/hellums-kro/sesong/1/episode/2',
+ 'info_dict': {
+ 'id': 'MUHH36005220BA',
+ 'ext': 'mp4',
+ 'title': 'Kro, krig og kjærlighet 2:6',
+ 'description': 'md5:b32a7dc0b1ed27c8064f58b97bda4350',
+ 'duration': 1563,
+ 'series': 'Hellums kro',
+ 'season_number': 1,
+ 'episode_number': 2,
+ 'episode': '2:6',
+ 'age_limit': 6,
+ },
+ 'params': {
+ 'skip_download': True,
+ },
+ }, {
'url': 'https://tv.nrk.no/serie/backstage/sesong/1/episode/8',
'info_dict': {
'id': 'MSUI14000816AA',
'params': {
'skip_download': True,
},
- }
+ 'skip': 'ProgramRightsHasExpired',
+ }]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
- nrk_id = self._parse_json(
- self._search_regex(JSON_LD_RE, webpage, 'JSON-LD', group='json_ld'),
- display_id)['@id']
-
+ info = self._search_json_ld(webpage, display_id, default={})
+ nrk_id = info.get('@id') or self._html_search_meta(
+ 'nrk:program-id', webpage, default=None) or self._search_regex(
+ r'data-program-id=["\'](%s)' % NRKTVIE._EPISODE_RE, webpage,
+ 'nrk id')
assert re.match(NRKTVIE._EPISODE_RE, nrk_id)
- return self.url_result(
- 'nrk:%s' % nrk_id, ie=NRKIE.ie_key(), video_id=nrk_id)
+
+ info.update({
+ '_type': 'url_transparent',
+ 'id': nrk_id,
+ 'url': 'nrk:%s' % nrk_id,
+ 'ie_key': NRKIE.ie_key(),
+ })
+ return info
class NRKTVSerieBaseIE(InfoExtractor):
(r'INITIAL_DATA(?:_V\d)?_*\s*=\s*({.+?})\s*;',
r'({.+?})\s*,\s*"[^"]+"\s*\)\s*</script>'),
webpage, 'config', default='{}' if not fatal else NO_DEFAULT),
- display_id, fatal=False)
+ display_id, fatal=False, transform_source=js_to_json)
if not config:
return
return try_get(
_VALID_URL = r'https?://(?:tv|radio)\.nrk(?:super)?\.no/serie/(?P<id>[^/]+)'
_ITEM_RE = r'(?:data-season=["\']|id=["\']season-)(?P<id>\d+)'
_TESTS = [{
+ 'url': 'https://tv.nrk.no/serie/blank',
+ 'info_dict': {
+ 'id': 'blank',
+ 'title': 'Blank',
+ 'description': 'md5:7664b4e7e77dc6810cd3bca367c25b6e',
+ },
+ 'playlist_mincount': 30,
+ }, {
# new layout, seasons
'url': 'https://tv.nrk.no/serie/backstage',
'info_dict': {
_TESTS = [{
'url': 'https://www.nrk.no/skole/?page=search&q=&mediaId=14099',
- 'md5': '6bc936b01f9dd8ed45bc58b252b2d9b6',
+ 'md5': '18c12c3d071953c3bf8d54ef6b2587b7',
'info_dict': {
'id': '6021',
'ext': 'mp4',
'width': int_or_none(video.get('width')),
'height': int_or_none(video.get('height')),
'filesize': get_file_size(video.get('file_size') or video.get('fileSize')),
- 'tbr': int_or_none(video.get('bitrate'), 1000),
+ 'tbr': int_or_none(video.get('bitrate'), 1000) or None,
'ext': ext,
})
- self._sort_formats(formats)
+ self._sort_formats(formats, ('height', 'width', 'filesize', 'tbr', 'fps', 'format_id'))
thumbnails = []
for image in video_data.get('images', []):
class OnDemandKoreaIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?ondemandkorea\.com/(?P<id>[^/]+)\.html'
_GEO_COUNTRIES = ['US', 'CA']
- _TEST = {
- 'url': 'http://www.ondemandkorea.com/ask-us-anything-e43.html',
+ _TESTS = [{
+ 'url': 'https://www.ondemandkorea.com/ask-us-anything-e43.html',
'info_dict': {
'id': 'ask-us-anything-e43',
'ext': 'mp4',
- 'title': 'Ask Us Anything : E43',
+ 'title': 'Ask Us Anything : Gain, Ji Soo - 09/24/2016',
+ 'description': 'A talk show/game show with a school theme where celebrity guests appear as “transfer students.”',
'thumbnail': r're:^https?://.*\.jpg$',
},
'params': {
'skip_download': 'm3u8 download'
}
- }
+ }, {
+ 'url': 'https://www.ondemandkorea.com/confession-e01-1.html',
+ 'info_dict': {
+ 'id': 'confession-e01-1',
+ 'ext': 'mp4',
+ 'title': 'Confession : E01',
+ 'description': 'Choi Do-hyun, a criminal attorney, is the son of a death row convict. Ever since Choi Pil-su got arrested for murder, Do-hyun has wanted to solve his ',
+ 'thumbnail': r're:^https?://.*\.jpg$',
+ 'subtitles': {
+ 'English': 'mincount:1',
+ },
+ },
+ 'params': {
+ 'skip_download': 'm3u8 download'
+ }
+ }]
def _real_extract(self, url):
video_id = self._match_id(url)
'This video is only available to ODK PLUS members.',
expected=True)
- title = self._og_search_title(webpage)
+ if 'ODK PREMIUM Members Only' in webpage:
+ raise ExtractorError(
+ 'This video is only available to ODK PREMIUM members.',
+ expected=True)
+
+ title = self._search_regex(
+ r'class=["\']episode_title["\'][^>]*>([^<]+)',
+ webpage, 'episode_title', fatal=False) or self._og_search_title(webpage)
jw_config = self._parse_json(
self._search_regex(
- r'(?s)jwplayer\(([\'"])(?:(?!\1).)+\1\)\.setup\s*\((?P<options>.+?)\);',
+ r'(?s)odkPlayer\.init.*?(?P<options>{[^;]+}).*?;',
webpage, 'jw config', group='options'),
video_id, transform_source=js_to_json)
info = self._parse_jwplayer_data(
info.update({
'title': title,
- 'thumbnail': self._og_search_thumbnail(webpage),
+ 'description': self._og_search_description(webpage),
+ 'thumbnail': self._og_search_thumbnail(webpage)
})
return info
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import (
+ clean_html,
determine_ext,
float_or_none,
HEADRequest,
int_or_none,
orderedSet,
remove_end,
+ str_or_none,
strip_jsonp,
unescapeHTML,
unified_strdate,
format_id = '-'.join(format_id_list)
ext = determine_ext(src)
if ext == 'm3u8':
- formats.extend(self._extract_m3u8_formats(
- src, video_id, 'mp4', m3u8_id=format_id, fatal=False))
+ m3u8_formats = self._extract_m3u8_formats(
+ src, video_id, 'mp4', m3u8_id=format_id, fatal=False)
+ if any('/geoprotection' in f['url'] for f in m3u8_formats):
+ self.raise_geo_restricted()
+ formats.extend(m3u8_formats)
elif ext == 'f4m':
formats.extend(self._extract_f4m_formats(
src, video_id, f4m_id=format_id, fatal=False))
class ORFRadioIE(InfoExtractor):
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
- station = mobj.group('station')
show_date = mobj.group('date')
show_id = mobj.group('show')
- if station == 'fm4':
- show_id = '4%s' % show_id
-
data = self._download_json(
- 'http://audioapi.orf.at/%s/api/json/current/broadcast/%s/%s' % (station, show_id, show_date),
- show_id
- )
-
- def extract_entry_dict(info, title, subtitle):
- return {
- 'id': info['loopStreamId'].replace('.mp3', ''),
- 'url': 'http://loopstream01.apa.at/?channel=%s&id=%s' % (station, info['loopStreamId']),
+ 'http://audioapi.orf.at/%s/api/json/current/broadcast/%s/%s'
+ % (self._API_STATION, show_id, show_date), show_id)
+
+ entries = []
+ for info in data['streams']:
+ loop_stream_id = str_or_none(info.get('loopStreamId'))
+ if not loop_stream_id:
+ continue
+ title = str_or_none(data.get('title'))
+ if not title:
+ continue
+ start = int_or_none(info.get('start'), scale=1000)
+ end = int_or_none(info.get('end'), scale=1000)
+ duration = end - start if end and start else None
+ entries.append({
+ 'id': loop_stream_id.replace('.mp3', ''),
+ 'url': 'http://loopstream01.apa.at/?channel=%s&id=%s' % (self._LOOP_STATION, loop_stream_id),
'title': title,
- 'description': subtitle,
- 'duration': (info['end'] - info['start']) / 1000,
- 'timestamp': info['start'] / 1000,
+ 'description': clean_html(data.get('subtitle')),
+ 'duration': duration,
+ 'timestamp': start,
'ext': 'mp3',
- 'series': data.get('programTitle')
- }
-
- entries = [extract_entry_dict(t, data['title'], data['subtitle']) for t in data['streams']]
+ 'series': data.get('programTitle'),
+ })
return {
'_type': 'playlist',
'id': show_id,
- 'title': data['title'],
- 'description': data['subtitle'],
- 'entries': entries
+ 'title': data.get('title'),
+ 'description': clean_html(data.get('subtitle')),
+ 'entries': entries,
}
class ORFFM4IE(ORFRadioIE):
IE_NAME = 'orf:fm4'
IE_DESC = 'radio FM4'
- _VALID_URL = r'https?://(?P<station>fm4)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
+ _VALID_URL = r'https?://(?P<station>fm4)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>4\w+)'
+ _API_STATION = 'fm4'
+ _LOOP_STATION = 'fm4'
_TEST = {
- 'url': 'http://fm4.orf.at/player/20170107/CC',
+ 'url': 'http://fm4.orf.at/player/20170107/4CC',
'md5': '2b0be47375432a7ef104453432a19212',
'info_dict': {
'id': '2017-01-07_2100_tl_54_7DaysSat18_31295',
'timestamp': 1483819257,
'upload_date': '20170107',
},
- 'skip': 'Shows from ORF radios are only available for 7 days.'
+ 'skip': 'Shows from ORF radios are only available for 7 days.',
+ 'only_matching': True,
+ }
+
+
+class ORFNOEIE(ORFRadioIE):
+ IE_NAME = 'orf:noe'
+ IE_DESC = 'Radio Niederösterreich'
+ _VALID_URL = r'https?://(?P<station>noe)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
+ _API_STATION = 'noe'
+ _LOOP_STATION = 'oe2n'
+
+ _TEST = {
+ 'url': 'https://noe.orf.at/player/20200423/NGM',
+ 'only_matching': True,
+ }
+
+
+class ORFWIEIE(ORFRadioIE):
+ IE_NAME = 'orf:wien'
+ IE_DESC = 'Radio Wien'
+ _VALID_URL = r'https?://(?P<station>wien)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
+ _API_STATION = 'wie'
+ _LOOP_STATION = 'oe2w'
+
+ _TEST = {
+ 'url': 'https://wien.orf.at/player/20200423/WGUM',
+ 'only_matching': True,
+ }
+
+
+class ORFBGLIE(ORFRadioIE):
+ IE_NAME = 'orf:burgenland'
+ IE_DESC = 'Radio Burgenland'
+ _VALID_URL = r'https?://(?P<station>burgenland)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
+ _API_STATION = 'bgl'
+ _LOOP_STATION = 'oe2b'
+
+ _TEST = {
+ 'url': 'https://burgenland.orf.at/player/20200423/BGM',
+ 'only_matching': True,
+ }
+
+
+class ORFOOEIE(ORFRadioIE):
+ IE_NAME = 'orf:oberoesterreich'
+ IE_DESC = 'Radio Oberösterreich'
+ _VALID_URL = r'https?://(?P<station>ooe)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
+ _API_STATION = 'ooe'
+ _LOOP_STATION = 'oe2o'
+
+ _TEST = {
+ 'url': 'https://ooe.orf.at/player/20200423/OGMO',
+ 'only_matching': True,
+ }
+
+
+class ORFSTMIE(ORFRadioIE):
+ IE_NAME = 'orf:steiermark'
+ IE_DESC = 'Radio Steiermark'
+ _VALID_URL = r'https?://(?P<station>steiermark)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
+ _API_STATION = 'stm'
+ _LOOP_STATION = 'oe2st'
+
+ _TEST = {
+ 'url': 'https://steiermark.orf.at/player/20200423/STGMS',
+ 'only_matching': True,
+ }
+
+
+class ORFKTNIE(ORFRadioIE):
+ IE_NAME = 'orf:kaernten'
+ IE_DESC = 'Radio Kärnten'
+ _VALID_URL = r'https?://(?P<station>kaernten)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
+ _API_STATION = 'ktn'
+ _LOOP_STATION = 'oe2k'
+
+ _TEST = {
+ 'url': 'https://kaernten.orf.at/player/20200423/KGUMO',
+ 'only_matching': True,
+ }
+
+
+class ORFSBGIE(ORFRadioIE):
+ IE_NAME = 'orf:salzburg'
+ IE_DESC = 'Radio Salzburg'
+ _VALID_URL = r'https?://(?P<station>salzburg)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
+ _API_STATION = 'sbg'
+ _LOOP_STATION = 'oe2s'
+
+ _TEST = {
+ 'url': 'https://salzburg.orf.at/player/20200423/SGUM',
+ 'only_matching': True,
+ }
+
+
+class ORFTIRIE(ORFRadioIE):
+ IE_NAME = 'orf:tirol'
+ IE_DESC = 'Radio Tirol'
+ _VALID_URL = r'https?://(?P<station>tirol)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
+ _API_STATION = 'tir'
+ _LOOP_STATION = 'oe2t'
+
+ _TEST = {
+ 'url': 'https://tirol.orf.at/player/20200423/TGUMO',
+ 'only_matching': True,
+ }
+
+
+class ORFVBGIE(ORFRadioIE):
+ IE_NAME = 'orf:vorarlberg'
+ IE_DESC = 'Radio Vorarlberg'
+ _VALID_URL = r'https?://(?P<station>vorarlberg)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
+ _API_STATION = 'vbg'
+ _LOOP_STATION = 'oe2v'
+
+ _TEST = {
+ 'url': 'https://vorarlberg.orf.at/player/20200423/VGUM',
+ 'only_matching': True,
+ }
+
+
+class ORFOE3IE(ORFRadioIE):
+ IE_NAME = 'orf:oe3'
+ IE_DESC = 'Radio Österreich 3'
+ _VALID_URL = r'https?://(?P<station>oe3)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
+ _API_STATION = 'oe3'
+ _LOOP_STATION = 'oe3'
+
+ _TEST = {
+ 'url': 'https://oe3.orf.at/player/20200424/3WEK',
+ 'only_matching': True,
}
IE_NAME = 'orf:oe1'
IE_DESC = 'Radio Österreich 1'
_VALID_URL = r'https?://(?P<station>oe1)\.orf\.at/player/(?P<date>[0-9]+)/(?P<show>\w+)'
+ _API_STATION = 'oe1'
+ _LOOP_STATION = 'oe1'
_TEST = {
'url': 'http://oe1.orf.at/player/20170108/456544',
from ..utils import (
int_or_none,
parse_resolution,
+ str_or_none,
try_get,
unified_timestamp,
url_or_none,
peertube\.cpy\.re
)'''
_UUID_RE = r'[\da-fA-F]{8}-[\da-fA-F]{4}-[\da-fA-F]{4}-[\da-fA-F]{4}-[\da-fA-F]{12}'
+ _API_BASE = 'https://%s/api/v1/videos/%s/%s'
_VALID_URL = r'''(?x)
(?:
peertube:(?P<host>[^:]+):|
(?P<id>%s)
''' % (_INSTANCES_RE, _UUID_RE)
_TESTS = [{
- 'url': 'https://peertube.cpy.re/videos/watch/2790feb0-8120-4e63-9af3-c943c69f5e6c',
- 'md5': '80f24ff364cc9d333529506a263e7feb',
+ 'url': 'https://framatube.org/videos/watch/9c9de5e8-0a1e-484a-b099-e80766180a6d',
+ 'md5': '9bed8c0137913e17b86334e5885aacff',
'info_dict': {
- 'id': '2790feb0-8120-4e63-9af3-c943c69f5e6c',
+ 'id': '9c9de5e8-0a1e-484a-b099-e80766180a6d',
'ext': 'mp4',
- 'title': 'wow',
- 'description': 'wow such video, so gif',
+ 'title': 'What is PeerTube?',
+ 'description': 'md5:3fefb8dde2b189186ce0719fda6f7b10',
'thumbnail': r're:https?://.*\.(?:jpg|png)',
- 'timestamp': 1519297480,
- 'upload_date': '20180222',
- 'uploader': 'Luclu7',
- 'uploader_id': '7fc42640-efdb-4505-a45d-a15b1a5496f1',
- 'uploder_url': 'https://peertube.nsa.ovh/accounts/luclu7',
- 'license': 'Unknown',
- 'duration': 3,
+ 'timestamp': 1538391166,
+ 'upload_date': '20181001',
+ 'uploader': 'Framasoft',
+ 'uploader_id': '3',
+ 'uploader_url': 'https://framatube.org/accounts/framasoft',
+ 'channel': 'Les vidéos de Framasoft',
+ 'channel_id': '2',
+ 'channel_url': 'https://framatube.org/video-channels/bf54d359-cfad-4935-9d45-9d6be93f63e8',
+ 'language': 'en',
+ 'license': 'Attribution - Share Alike',
+ 'duration': 113,
'view_count': int,
'like_count': int,
'dislike_count': int,
- 'tags': list,
- 'categories': list,
+ 'tags': ['framasoft', 'peertube'],
+ 'categories': ['Science & Technology'],
}
}, {
'url': 'https://peertube.tamanoir.foucry.net/videos/watch/0b04f13d-1e18-4f1d-814e-4979aa7c9c44',
entries = [peertube_url]
return entries
+ def _call_api(self, host, video_id, path, note=None, errnote=None, fatal=True):
+ return self._download_json(
+ self._API_BASE % (host, video_id, path), video_id,
+ note=note, errnote=errnote, fatal=fatal)
+
+ def _get_subtitles(self, host, video_id):
+ captions = self._call_api(
+ host, video_id, 'captions', note='Downloading captions JSON',
+ fatal=False)
+ if not isinstance(captions, dict):
+ return
+ data = captions.get('data')
+ if not isinstance(data, list):
+ return
+ subtitles = {}
+ for e in data:
+ language_id = try_get(e, lambda x: x['language']['id'], compat_str)
+ caption_url = urljoin('https://%s' % host, e.get('captionPath'))
+ if not caption_url:
+ continue
+ subtitles.setdefault(language_id or 'en', []).append({
+ 'url': caption_url,
+ })
+ return subtitles
+
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
host = mobj.group('host') or mobj.group('host_2')
video_id = mobj.group('id')
- video = self._download_json(
- 'https://%s/api/v1/videos/%s' % (host, video_id), video_id)
+ video = self._call_api(
+ host, video_id, '', note='Downloading video JSON')
title = video['name']
formats.append(f)
self._sort_formats(formats)
- def account_data(field):
- return try_get(video, lambda x: x['account'][field], compat_str)
+ full_description = self._call_api(
+ host, video_id, 'description', note='Downloading description JSON',
+ fatal=False)
+
+ description = None
+ if isinstance(full_description, dict):
+ description = str_or_none(full_description.get('description'))
+ if not description:
+ description = video.get('description')
+
+ subtitles = self.extract_subtitles(host, video_id)
+
+ def data(section, field, type_):
+ return try_get(video, lambda x: x[section][field], type_)
+
+ def account_data(field, type_):
+ return data('account', field, type_)
+
+ def channel_data(field, type_):
+ return data('channel', field, type_)
- category = try_get(video, lambda x: x['category']['label'], compat_str)
+ category = data('category', 'label', compat_str)
categories = [category] if category else None
nsfw = video.get('nsfw')
return {
'id': video_id,
'title': title,
- 'description': video.get('description'),
+ 'description': description,
'thumbnail': urljoin(url, video.get('thumbnailPath')),
'timestamp': unified_timestamp(video.get('publishedAt')),
- 'uploader': account_data('displayName'),
- 'uploader_id': account_data('uuid'),
- 'uploder_url': account_data('url'),
- 'license': try_get(
- video, lambda x: x['licence']['label'], compat_str),
+ 'uploader': account_data('displayName', compat_str),
+ 'uploader_id': str_or_none(account_data('id', int)),
+ 'uploader_url': url_or_none(account_data('url', compat_str)),
+ 'channel': channel_data('displayName', compat_str),
+ 'channel_id': str_or_none(channel_data('id', int)),
+ 'channel_url': url_or_none(channel_data('url', compat_str)),
+ 'language': data('language', 'id', compat_str),
+ 'license': data('licence', 'label', compat_str),
'duration': int_or_none(video.get('duration')),
'view_count': int_or_none(video.get('views')),
'like_count': int_or_none(video.get('likes')),
'tags': try_get(video, lambda x: x['tags'], list),
'categories': categories,
'formats': formats,
+ 'subtitles': subtitles
}
item_id, query=query)
def _parse_broadcast_data(self, broadcast, video_id):
- title = broadcast['status']
+ title = broadcast.get('status') or 'Periscope Broadcast'
uploader = broadcast.get('user_display_name') or broadcast.get('username')
title = '%s - %s' % (uploader, title) if uploader else title
is_live = broadcast.get('state').lower() == 'running'
--- /dev/null
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import ExtractorError
+
+
+class PhoenixIE(InfoExtractor):
+ IE_NAME = 'phoenix.de'
+ _VALID_URL = r'''https?://(?:www\.)?phoenix.de/\D+(?P<id>\d+)\.html'''
+ _TESTS = [
+ {
+ 'url': 'https://www.phoenix.de/sendungen/dokumentationen/unsere-welt-in-zukunft---stadt-a-1283620.html',
+ 'md5': '5e765e838aa3531c745a4f5b249ee3e3',
+ 'info_dict': {
+ 'id': '0OB4HFc43Ns',
+ 'ext': 'mp4',
+ 'title': 'Unsere Welt in Zukunft - Stadt',
+ 'description': 'md5:9bfb6fd498814538f953b2dcad7ce044',
+ 'upload_date': '20190912',
+ 'uploader': 'phoenix',
+ 'uploader_id': 'phoenix',
+ }
+ },
+ {
+ 'url': 'https://www.phoenix.de/drohnenangriffe-in-saudi-arabien-a-1286995.html?ref=aktuelles',
+ 'only_matching': True,
+ },
+ # an older page: https://www.phoenix.de/sendungen/gespraeche/phoenix-persoenlich/im-dialog-a-177727.html
+ # seems to not have an embedded video, even though it's uploaded on youtube: https://www.youtube.com/watch?v=4GxnoUHvOkM
+ ]
+
+ def extract_from_json_api(self, video_id, api_url):
+ doc = self._download_json(
+ api_url, video_id,
+ note="Downloading webpage metadata",
+ errnote="Failed to load webpage metadata")
+
+ for a in doc["absaetze"]:
+ if a["typ"] == "video-youtube":
+ return {
+ '_type': 'url_transparent',
+ 'id': a["id"],
+ 'title': doc["titel"],
+ 'url': "https://www.youtube.com/watch?v=%s" % a["id"],
+ 'ie_key': 'Youtube',
+ }
+ raise ExtractorError("No downloadable video found", expected=True)
+
+ def _real_extract(self, url):
+ page_id = self._match_id(url)
+ api_url = 'https://www.phoenix.de/response/id/%s' % page_id
+ return self.extract_from_json_api(page_id, api_url)
headers={'Referer': self._LOGIN_URL})
# login succeeded
- if 'platzi.com/login' not in compat_str(urlh.geturl()):
+ if 'platzi.com/login' not in urlh.geturl():
return
login_error = self._webpage_read_content(
--- /dev/null
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..utils import (
+ ExtractorError,
+ extract_attributes,
+ int_or_none,
+ js_to_json,
+ merge_dicts,
+)
+
+
+class PokemonIE(InfoExtractor):
+ _VALID_URL = r'https?://(?:www\.)?pokemon\.com/[a-z]{2}(?:.*?play=(?P<id>[a-z0-9]{32})|/(?:[^/]+/)+(?P<display_id>[^/?#&]+))'
+ _TESTS = [{
+ 'url': 'https://www.pokemon.com/us/pokemon-episodes/20_30-the-ol-raise-and-switch/',
+ 'md5': '2fe8eaec69768b25ef898cda9c43062e',
+ 'info_dict': {
+ 'id': 'afe22e30f01c41f49d4f1d9eab5cd9a4',
+ 'ext': 'mp4',
+ 'title': 'The Ol’ Raise and Switch!',
+ 'description': 'md5:7db77f7107f98ba88401d3adc80ff7af',
+ },
+ 'add_id': ['LimelightMedia'],
+ }, {
+ # no data-video-title
+ 'url': 'https://www.pokemon.com/fr/episodes-pokemon/films-pokemon/pokemon-lascension-de-darkrai-2008',
+ 'info_dict': {
+ 'id': 'dfbaf830d7e54e179837c50c0c6cc0e1',
+ 'ext': 'mp4',
+ 'title': "Pokémon : L'ascension de Darkrai",
+ 'description': 'md5:d1dbc9e206070c3e14a06ff557659fb5',
+ },
+ 'add_id': ['LimelightMedia'],
+ 'params': {
+ 'skip_download': True,
+ },
+ }, {
+ 'url': 'http://www.pokemon.com/uk/pokemon-episodes/?play=2e8b5c761f1d4a9286165d7748c1ece2',
+ 'only_matching': True,
+ }, {
+ 'url': 'http://www.pokemon.com/fr/episodes-pokemon/18_09-un-hiver-inattendu/',
+ 'only_matching': True,
+ }, {
+ 'url': 'http://www.pokemon.com/de/pokemon-folgen/01_20-bye-bye-smettbo/',
+ 'only_matching': True,
+ }]
+
+ def _real_extract(self, url):
+ video_id, display_id = re.match(self._VALID_URL, url).groups()
+ webpage = self._download_webpage(url, video_id or display_id)
+ video_data = extract_attributes(self._search_regex(
+ r'(<[^>]+data-video-id="%s"[^>]*>)' % (video_id if video_id else '[a-z0-9]{32}'),
+ webpage, 'video data element'))
+ video_id = video_data['data-video-id']
+ title = video_data.get('data-video-title') or self._html_search_meta(
+ 'pkm-title', webpage, ' title', default=None) or self._search_regex(
+ r'<h1[^>]+\bclass=["\']us-title[^>]+>([^<]+)', webpage, 'title')
+ return {
+ '_type': 'url_transparent',
+ 'id': video_id,
+ 'url': 'limelight:media:%s' % video_id,
+ 'title': title,
+ 'description': video_data.get('data-video-summary'),
+ 'thumbnail': video_data.get('data-video-poster'),
+ 'series': 'Pokémon',
+ 'season_number': int_or_none(video_data.get('data-video-season')),
+ 'episode': title,
+ 'episode_number': int_or_none(video_data.get('data-video-episode')),
+ 'ie_key': 'LimelightMedia',
+ }
+
+
+class PokemonWatchIE(InfoExtractor):
+ _VALID_URL = r'https?://watch\.pokemon\.com/[a-z]{2}-[a-z]{2}/player\.html\?id=(?P<id>[a-z0-9]{32})'
+ _API_URL = 'https://www.pokemon.com/api/pokemontv/v2/channels/{0:}'
+ _TESTS = [{
+ 'url': 'https://watch.pokemon.com/en-us/player.html?id=8309a40969894a8e8d5bc1311e9c5667',
+ 'md5': '62833938a31e61ab49ada92f524c42ff',
+ 'info_dict': {
+ 'id': '8309a40969894a8e8d5bc1311e9c5667',
+ 'ext': 'mp4',
+ 'title': 'Lillier and the Staff!',
+ 'description': 'md5:338841b8c21b283d24bdc9b568849f04',
+ }
+ }, {
+ 'url': 'https://watch.pokemon.com/de-de/player.html?id=b3c402e111a4459eb47e12160ab0ba07',
+ 'only_matching': True
+ }]
+
+ def _extract_media(self, channel_array, video_id):
+ for channel in channel_array:
+ for media in channel.get('media'):
+ if media.get('id') == video_id:
+ return media
+ return None
+
+ def _real_extract(self, url):
+ video_id = self._match_id(url)
+
+ info = {
+ '_type': 'url',
+ 'id': video_id,
+ 'url': 'limelight:media:%s' % video_id,
+ 'ie_key': 'LimelightMedia',
+ }
+
+ # API call can be avoided entirely if we are listing formats
+ if self._downloader.params.get('listformats', False):
+ return info
+
+ webpage = self._download_webpage(url, video_id)
+ build_vars = self._parse_json(self._search_regex(
+ r'(?s)buildVars\s*=\s*({.*?})', webpage, 'build vars'),
+ video_id, transform_source=js_to_json)
+ region = build_vars.get('region')
+ channel_array = self._download_json(self._API_URL.format(region), video_id)
+ video_data = self._extract_media(channel_array, video_id)
+
+ if video_data is None:
+ raise ExtractorError(
+ 'Video %s does not exist' % video_id, expected=True)
+
+ info['_type'] = 'url_transparent'
+ images = video_data.get('images')
+
+ return merge_dicts(info, {
+ 'title': video_data.get('title'),
+ 'description': video_data.get('description'),
+ 'thumbnail': images.get('medium') or images.get('small'),
+ 'series': 'Pokémon',
+ 'season_number': int_or_none(video_data.get('season')),
+ 'episode': video_data.get('title'),
+ 'episode_number': int_or_none(video_data.get('episode')),
+ })
--- /dev/null
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..compat import (
+ compat_b64decode,
+ compat_chr,
+)
+from ..utils import int_or_none
+
+
+class PopcorntimesIE(InfoExtractor):
+ _VALID_URL = r'https?://popcorntimes\.tv/[^/]+/m/(?P<id>[^/]+)/(?P<display_id>[^/?#&]+)'
+ _TEST = {
+ 'url': 'https://popcorntimes.tv/de/m/A1XCFvz/haensel-und-gretel-opera-fantasy',
+ 'md5': '93f210991ad94ba8c3485950a2453257',
+ 'info_dict': {
+ 'id': 'A1XCFvz',
+ 'display_id': 'haensel-und-gretel-opera-fantasy',
+ 'ext': 'mp4',
+ 'title': 'Hänsel und Gretel',
+ 'description': 'md5:1b8146791726342e7b22ce8125cf6945',
+ 'thumbnail': r're:^https?://.*\.jpg$',
+ 'creator': 'John Paul',
+ 'release_date': '19541009',
+ 'duration': 4260,
+ 'tbr': 5380,
+ 'width': 720,
+ 'height': 540,
+ },
+ }
+
+ def _real_extract(self, url):
+ mobj = re.match(self._VALID_URL, url)
+ video_id, display_id = mobj.group('id', 'display_id')
+
+ webpage = self._download_webpage(url, display_id)
+
+ title = self._search_regex(
+ r'<h1>([^<]+)', webpage, 'title',
+ default=None) or self._html_search_meta(
+ 'ya:ovs:original_name', webpage, 'title', fatal=True)
+
+ loc = self._search_regex(
+ r'PCTMLOC\s*=\s*(["\'])(?P<value>(?:(?!\1).)+)\1', webpage, 'loc',
+ group='value')
+
+ loc_b64 = ''
+ for c in loc:
+ c_ord = ord(c)
+ if ord('a') <= c_ord <= ord('z') or ord('A') <= c_ord <= ord('Z'):
+ upper = ord('Z') if c_ord <= ord('Z') else ord('z')
+ c_ord += 13
+ if upper < c_ord:
+ c_ord -= 26
+ loc_b64 += compat_chr(c_ord)
+
+ video_url = compat_b64decode(loc_b64).decode('utf-8')
+
+ description = self._html_search_regex(
+ r'(?s)<div[^>]+class=["\']pt-movie-desc[^>]+>(.+?)</div>', webpage,
+ 'description', fatal=False)
+
+ thumbnail = self._search_regex(
+ r'<img[^>]+class=["\']video-preview[^>]+\bsrc=(["\'])(?P<value>(?:(?!\1).)+)\1',
+ webpage, 'thumbnail', default=None,
+ group='value') or self._og_search_thumbnail(webpage)
+
+ creator = self._html_search_meta(
+ 'video:director', webpage, 'creator', default=None)
+
+ release_date = self._html_search_meta(
+ 'video:release_date', webpage, default=None)
+ if release_date:
+ release_date = release_date.replace('-', '')
+
+ def int_meta(name):
+ return int_or_none(self._html_search_meta(
+ name, webpage, default=None))
+
+ return {
+ 'id': video_id,
+ 'display_id': display_id,
+ 'url': video_url,
+ 'title': title,
+ 'description': description,
+ 'thumbnail': thumbnail,
+ 'creator': creator,
+ 'release_date': release_date,
+ 'duration': int_meta('video:duration'),
+ 'tbr': int_meta('ya:ovs:bitrate'),
+ 'width': int_meta('og:video:width'),
+ 'height': int_meta('og:video:height'),
+ 'http_headers': {
+ 'Referer': url,
+ },
+ }
ExtractorError,
int_or_none,
js_to_json,
+ merge_dicts,
urljoin,
)
'view_count': int,
'like_count': int,
'age_limit': 18,
- }
+ },
+ 'skip': 'HTTP Error 404: Not Found',
}, {
- # removed video
'url': 'http://www.pornhd.com/videos/1962/sierra-day-gets-his-cum-all-over-herself-hd-porn-video',
- 'md5': '956b8ca569f7f4d8ec563e2c41598441',
+ 'md5': '1b7b3a40b9d65a8e5b25f7ab9ee6d6de',
'info_dict': {
'id': '1962',
'display_id': 'sierra-day-gets-his-cum-all-over-herself-hd-porn-video',
'ext': 'mp4',
- 'title': 'Sierra loves doing laundry',
+ 'title': 'md5:98c6f8b2d9c229d0f0fde47f61a1a759',
'description': 'md5:8ff0523848ac2b8f9b065ba781ccf294',
'thumbnail': r're:^https?://.*\.jpg',
'view_count': int,
'like_count': int,
'age_limit': 18,
},
- 'skip': 'Not available anymore',
}]
def _real_extract(self, url):
r"(?s)sources'?\s*[:=]\s*(\{.+?\})",
webpage, 'sources', default='{}')), video_id)
+ info = {}
if not sources:
+ entries = self._parse_html5_media_entries(url, webpage, video_id)
+ if entries:
+ info = entries[0]
+
+ if not sources and not info:
message = self._html_search_regex(
r'(?s)<(div|p)[^>]+class="no-video"[^>]*>(?P<value>.+?)</\1',
webpage, 'error message', group='value')
'format_id': format_id,
'height': height,
})
- self._sort_formats(formats)
+ if formats:
+ info['formats'] = formats
+ self._sort_formats(info['formats'])
description = self._html_search_regex(
- r'<(div|p)[^>]+class="description"[^>]*>(?P<value>[^<]+)</\1',
- webpage, 'description', fatal=False, group='value')
+ (r'(?s)<section[^>]+class=["\']video-description[^>]+>(?P<value>.+?)</section>',
+ r'<(div|p)[^>]+class="description"[^>]*>(?P<value>[^<]+)</\1'),
+ webpage, 'description', fatal=False,
+ group='value') or self._html_search_meta(
+ 'description', webpage, default=None) or self._og_search_description(webpage)
view_count = int_or_none(self._html_search_regex(
r'(\d+) views\s*<', webpage, 'view count', fatal=False))
thumbnail = self._search_regex(
r"poster'?\s*:\s*([\"'])(?P<url>(?:(?!\1).)+)\1", webpage,
- 'thumbnail', fatal=False, group='url')
+ 'thumbnail', default=None, group='url')
like_count = int_or_none(self._search_regex(
- (r'(\d+)\s*</11[^>]+>(?: |\s)*\blikes',
+ (r'(\d+)</span>\s*likes',
+ r'(\d+)\s*</11[^>]+>(?: |\s)*\blikes',
r'class=["\']save-count["\'][^>]*>\s*(\d+)'),
webpage, 'like count', fatal=False))
- return {
+ return merge_dicts(info, {
'id': video_id,
'display_id': display_id,
'title': title,
'like_count': like_count,
'formats': formats,
'age_limit': 18,
- }
+ })
determine_ext,
ExtractorError,
int_or_none,
+ NO_DEFAULT,
orderedSet,
remove_quotes,
str_to_int,
_VALID_URL = r'''(?x)
https?://
(?:
- (?:[^/]+\.)?(?P<host>pornhub\.(?:com|net))/(?:(?:view_video\.php|video/show)\?viewkey=|embed/)|
+ (?:[^/]+\.)?(?P<host>pornhub(?:premium)?\.(?:com|net))/(?:(?:view_video\.php|video/show)\?viewkey=|embed/)|
(?:www\.)?thumbzilla\.com/video/
)
(?P<id>[\da-z]+)
}, {
'url': 'https://www.pornhub.net/view_video.php?viewkey=203640933',
'only_matching': True,
+ }, {
+ 'url': 'https://www.pornhubpremium.com/view_video.php?viewkey=ph5e4acdae54a82',
+ 'only_matching': True,
}]
@staticmethod
host = mobj.group('host') or 'pornhub.com'
video_id = mobj.group('id')
+ if 'premium' in host:
+ if not self._downloader.params.get('cookiefile'):
+ raise ExtractorError(
+ 'PornHub Premium requires authentication.'
+ ' You may want to use --cookies.',
+ expected=True)
+
self._set_cookie(host, 'age_verified', '1')
def dl_webpage(platform):
# http://www.pornhub.com/view_video.php?viewkey=1331683002), not relying
# on that anymore.
title = self._html_search_meta(
- 'twitter:title', webpage, default=None) or self._search_regex(
- (r'<h1[^>]+class=["\']title["\'][^>]*>(?P<title>[^<]+)',
- r'<div[^>]+data-video-title=(["\'])(?P<title>.+?)\1',
- r'shareTitle\s*=\s*(["\'])(?P<title>.+?)\1'),
+ 'twitter:title', webpage, default=None) or self._html_search_regex(
+ (r'(?s)<h1[^>]+class=["\']title["\'][^>]*>(?P<title>.+?)</h1>',
+ r'<div[^>]+data-video-title=(["\'])(?P<title>(?:(?!\1).)+)\1',
+ r'shareTitle["\']\s*[=:]\s*(["\'])(?P<title>(?:(?!\1).)+)\1'),
webpage, 'title', group='title')
video_urls = []
else:
thumbnail, duration = [None] * 2
- if not video_urls:
- tv_webpage = dl_webpage('tv')
-
+ def extract_js_vars(webpage, pattern, default=NO_DEFAULT):
assignments = self._search_regex(
- r'(var.+?mediastring.+?)</script>', tv_webpage,
- 'encoded url').split(';')
+ pattern, webpage, 'encoded url', default=default)
+ if not assignments:
+ return {}
+
+ assignments = assignments.split(';')
js_vars = {}
assn = re.sub(r'var\s+', '', assn)
vname, value = assn.split('=', 1)
js_vars[vname] = parse_js_value(value)
+ return js_vars
- video_url = js_vars['mediastring']
- if video_url not in video_urls_set:
- video_urls.append((video_url, None))
- video_urls_set.add(video_url)
+ def add_video_url(video_url):
+ v_url = url_or_none(video_url)
+ if not v_url:
+ return
+ if v_url in video_urls_set:
+ return
+ video_urls.append((v_url, None))
+ video_urls_set.add(v_url)
+
+ if not video_urls:
+ FORMAT_PREFIXES = ('media', 'quality')
+ js_vars = extract_js_vars(
+ webpage, r'(var\s+(?:%s)_.+)' % '|'.join(FORMAT_PREFIXES),
+ default=None)
+ if js_vars:
+ for key, format_url in js_vars.items():
+ if any(key.startswith(p) for p in FORMAT_PREFIXES):
+ add_video_url(format_url)
+ if not video_urls and re.search(
+ r'<[^>]+\bid=["\']lockedPlayer', webpage):
+ raise ExtractorError(
+ 'Video %s is locked' % video_id, expected=True)
+
+ if not video_urls:
+ js_vars = extract_js_vars(
+ dl_webpage('tv'), r'(var.+?mediastring.+?)</script>')
+ add_video_url(js_vars['mediastring'])
for mobj in re.finditer(
r'<a[^>]+\bclass=["\']downloadBtn\b[^>]+\bhref=(["\'])(?P<url>(?:(?!\1).)+)\1',
r'/(\d{6}/\d{2})/', video_url, 'upload data', default=None)
if upload_date:
upload_date = upload_date.replace('/', '')
- if determine_ext(video_url) == 'mpd':
+ ext = determine_ext(video_url)
+ if ext == 'mpd':
formats.extend(self._extract_mpd_formats(
video_url, video_id, mpd_id='dash', fatal=False))
continue
+ elif ext == 'm3u8':
+ formats.extend(self._extract_m3u8_formats(
+ video_url, video_id, 'mp4', entry_protocol='m3u8_native',
+ m3u8_id='hls', fatal=False))
+ continue
tbr = None
mobj = re.search(r'(?P<height>\d+)[pP]?_(?P<tbr>\d+)[kK]', video_url)
if mobj:
class PornHubUserIE(PornHubPlaylistBaseIE):
- _VALID_URL = r'(?P<url>https?://(?:[^/]+\.)?pornhub\.(?:com|net)/(?:(?:user|channel)s|model|pornstar)/(?P<id>[^/?#&]+))(?:[?#&]|/(?!videos)|$)'
+ _VALID_URL = r'(?P<url>https?://(?:[^/]+\.)?(?P<host>pornhub(?:premium)?\.(?:com|net))/(?:(?:user|channel)s|model|pornstar)/(?P<id>[^/?#&]+))(?:[?#&]|/(?!videos)|$)'
_TESTS = [{
'url': 'https://www.pornhub.com/model/zoe_ph',
'playlist_mincount': 118,
class PornHubPagedVideoListIE(PornHubPagedPlaylistBaseIE):
- _VALID_URL = r'https?://(?:[^/]+\.)?(?P<host>pornhub\.(?:com|net))/(?P<id>(?:[^/]+/)*[^/?#&]+)'
+ _VALID_URL = r'https?://(?:[^/]+\.)?(?P<host>pornhub(?:premium)?\.(?:com|net))/(?P<id>(?:[^/]+/)*[^/?#&]+)'
_TESTS = [{
'url': 'https://www.pornhub.com/model/zoe_ph/videos',
'only_matching': True,
class PornHubUserVideosUploadIE(PornHubPagedPlaylistBaseIE):
- _VALID_URL = r'(?P<url>https?://(?:[^/]+\.)?(?P<host>pornhub\.(?:com|net))/(?:(?:user|channel)s|model|pornstar)/(?P<id>[^/]+)/videos/upload)'
+ _VALID_URL = r'(?P<url>https?://(?:[^/]+\.)?(?P<host>pornhub(?:premium)?\.(?:com|net))/(?:(?:user|channel)s|model|pornstar)/(?P<id>[^/]+)/videos/upload)'
_TESTS = [{
'url': 'https://www.pornhub.com/pornstar/jenny-blighe/videos/upload',
'info_dict': {
determine_ext,
float_or_none,
int_or_none,
+ merge_dicts,
unified_strdate,
)
class ProSiebenSat1BaseIE(InfoExtractor):
- _GEO_COUNTRIES = ['DE']
+ _GEO_BYPASS = False
_ACCESS_ID = None
_SUPPORTED_PROTOCOLS = 'dash:clear,hls:clear,progressive:clear'
_V4_BASE_URL = 'https://vas-v4.p7s1video.net/4.0/get'
formats = []
if self._ACCESS_ID:
raw_ct = self._ENCRYPTION_KEY + clip_id + self._IV + self._ACCESS_ID
- server_token = (self._download_json(
+ protocols = self._download_json(
self._V4_BASE_URL + 'protocols', clip_id,
'Downloading protocols JSON',
headers=self.geo_verification_headers(), query={
'access_id': self._ACCESS_ID,
'client_token': sha1((raw_ct).encode()).hexdigest(),
'video_id': clip_id,
- }, fatal=False) or {}).get('server_token')
+ }, fatal=False, expected_status=(403,)) or {}
+ error = protocols.get('error') or {}
+ if error.get('title') == 'Geo check failed':
+ self.raise_geo_restricted(countries=['AT', 'CH', 'DE'])
+ server_token = protocols.get('server_token')
if server_token:
urls = (self._download_json(
self._V4_BASE_URL + 'urls', clip_id, 'Downloading urls JSON', query={
(?:
(?:beta\.)?
(?:
- prosieben(?:maxx)?|sixx|sat1(?:gold)?|kabeleins(?:doku)?|the-voice-of-germany|7tv|advopedia
+ prosieben(?:maxx)?|sixx|sat1(?:gold)?|kabeleins(?:doku)?|the-voice-of-germany|advopedia
)\.(?:de|at|ch)|
ran\.de|fem\.com|advopedia\.de|galileo\.tv/video
)
'info_dict': {
'id': '2104602',
'ext': 'mp4',
- 'title': 'Episode 18 - Staffel 2',
+ 'title': 'CIRCUS HALLIGALLI - Episode 18 - Staffel 2',
'description': 'md5:8733c81b702ea472e069bc48bb658fc1',
'upload_date': '20131231',
'duration': 5845.04,
+ 'series': 'CIRCUS HALLIGALLI',
+ 'season_number': 2,
+ 'episode': 'Episode 18 - Staffel 2',
+ 'episode_number': 18,
},
},
{
'info_dict': {
'id': '2572814',
'ext': 'mp4',
- 'title': 'Andreas Kümmert: Rocket Man',
+ 'title': 'The Voice of Germany - Andreas Kümmert: Rocket Man',
'description': 'md5:6ddb02b0781c6adf778afea606652e38',
+ 'timestamp': 1382041620,
'upload_date': '20131017',
'duration': 469.88,
},
},
},
{
- 'url': 'http://www.fem.com/wellness/videos/wellness-video-clip-kurztripps-zum-valentinstag.html',
+ 'url': 'http://www.fem.com/videos/beauty-lifestyle/kurztrips-zum-valentinstag',
'info_dict': {
'id': '2156342',
'ext': 'mp4',
'playlist_count': 2,
'skip': 'This video is unavailable',
},
- {
- 'url': 'http://www.7tv.de/circus-halligalli/615-best-of-circus-halligalli-ganze-folge',
- 'info_dict': {
- 'id': '4187506',
- 'ext': 'mp4',
- 'title': 'Best of Circus HalliGalli',
- 'description': 'md5:8849752efd90b9772c9db6fdf87fb9e9',
- 'upload_date': '20151229',
- },
- 'params': {
- 'skip_download': True,
- },
- },
{
# title in <h2 class="subtitle">
'url': 'http://www.prosieben.de/stars/oscar-award/videos/jetzt-erst-enthuellt-das-geheimnis-von-emma-stones-oscar-robe-clip',
r'<div[^>]+id="veeseoDescription"[^>]*>(.+?)</div>',
]
_UPLOAD_DATE_REGEXES = [
- r'<meta property="og:published_time" content="(.+?)">',
r'<span>\s*(\d{2}\.\d{2}\.\d{4} \d{2}:\d{2}) \|\s*<span itemprop="duration"',
r'<footer>\s*(\d{2}\.\d{2}\.\d{4}) \d{2}:\d{2} Uhr',
r'<span style="padding-left: 4px;line-height:20px; color:#404040">(\d{2}\.\d{2}\.\d{4})</span>',
if description is None:
description = self._og_search_description(webpage)
thumbnail = self._og_search_thumbnail(webpage)
- upload_date = unified_strdate(self._html_search_regex(
- self._UPLOAD_DATE_REGEXES, webpage, 'upload date', default=None))
+ upload_date = unified_strdate(
+ self._html_search_meta('og:published_time', webpage,
+ 'upload date', default=None)
+ or self._html_search_regex(self._UPLOAD_DATE_REGEXES,
+ webpage, 'upload date', default=None))
+
+ json_ld = self._search_json_ld(webpage, clip_id, default={})
- info.update({
+ return merge_dicts(info, {
'id': clip_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'upload_date': upload_date,
- })
- return info
+ }, json_ld)
def _extract_playlist(self, url, webpage):
playlist_id = self._html_search_regex(
urls = []
formats = []
- def add_http_from_hls(m3u8_f):
- http_url = m3u8_f['url'].replace('/hls/', '/mp4/').replace('/chunklist.m3u8', '.mp4')
- if http_url != m3u8_f['url']:
- f = m3u8_f.copy()
- f.update({
- 'format_id': f['format_id'].replace('hls', 'http'),
- 'protocol': 'http',
- 'url': http_url,
- })
- formats.append(f)
-
for video in videos['data']['videos']:
media_url = url_or_none(video.get('url'))
if not media_url or media_url in urls:
playlist = video.get('is_playlist')
if (video.get('stream_type') == 'hls' and playlist is True) or 'playlist.m3u8' in media_url:
- m3u8_formats = self._extract_m3u8_formats(
+ formats.extend(self._extract_m3u8_formats(
media_url, video_id, 'mp4', entry_protocol='m3u8_native',
- m3u8_id='hls', fatal=False)
- for m3u8_f in m3u8_formats:
- formats.append(m3u8_f)
- add_http_from_hls(m3u8_f)
+ m3u8_id='hls', fatal=False))
continue
quality = int_or_none(video.get('quality'))
format_id += '-%sp' % quality
f['format_id'] = format_id
formats.append(f)
- if is_hls:
- add_http_from_hls(f)
self._sort_formats(formats)
creator = try_get(
+# coding: utf-8
from __future__ import unicode_literals
import re
parse_duration,
strip_or_none,
try_get,
- unescapeHTML,
unified_strdate,
unified_timestamp,
update_url_query,
_UUID_RE = r'[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12}'
_GEO_COUNTRIES = ['IT']
_GEO_BYPASS = False
+ _BASE_URL = 'https://www.raiplay.it'
def _extract_relinker_info(self, relinker_url, video_id):
if not re.match(r'https?://', relinker_url):
class RaiPlayIE(RaiBaseIE):
- _VALID_URL = r'(?P<url>https?://(?:www\.)?raiplay\.it/.+?-(?P<id>%s)\.html)' % RaiBaseIE._UUID_RE
+ _VALID_URL = r'(?P<url>(?P<base>https?://(?:www\.)?raiplay\.it/.+?-)(?P<id>%s)(?P<ext>\.(?:html|json)))' % RaiBaseIE._UUID_RE
_TESTS = [{
- 'url': 'http://www.raiplay.it/video/2016/10/La-Casa-Bianca-e06118bb-59a9-4636-b914-498e4cfd2c66.html?source=twitter',
- 'md5': '340aa3b7afb54bfd14a8c11786450d76',
- 'info_dict': {
- 'id': 'e06118bb-59a9-4636-b914-498e4cfd2c66',
- 'ext': 'mp4',
- 'title': 'La Casa Bianca',
- 'alt_title': 'S2016 - Puntata del 23/10/2016',
- 'description': 'md5:a09d45890850458077d1f68bb036e0a5',
- 'thumbnail': r're:^https?://.*\.jpg$',
- 'uploader': 'Rai 3',
- 'creator': 'Rai 3',
- 'duration': 3278,
- 'timestamp': 1477764300,
- 'upload_date': '20161029',
- 'series': 'La Casa Bianca',
- 'season': '2016',
- },
- }, {
'url': 'http://www.raiplay.it/video/2014/04/Report-del-07042014-cb27157f-9dd0-4aee-b788-b1f67643a391.html',
'md5': '8970abf8caf8aef4696e7b1f2adfc696',
'info_dict': {
'id': 'cb27157f-9dd0-4aee-b788-b1f67643a391',
'ext': 'mp4',
'title': 'Report del 07/04/2014',
- 'alt_title': 'S2013/14 - Puntata del 07/04/2014',
- 'description': 'md5:f27c544694cacb46a078db84ec35d2d9',
+ 'alt_title': 'St 2013/14 - Espresso nel caffè - 07/04/2014 ',
+ 'description': 'md5:d730c168a58f4bb35600fc2f881ec04e',
'thumbnail': r're:^https?://.*\.jpg$',
- 'uploader': 'Rai 5',
- 'creator': 'Rai 5',
+ 'uploader': 'Rai Gulp',
'duration': 6160,
- 'series': 'Report',
- 'season_number': 5,
- 'season': '2013/14',
},
'params': {
'skip_download': True,
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
- url, video_id = mobj.group('url', 'id')
+ url, base, video_id, ext = mobj.group('url', 'base', 'id', 'ext')
media = self._download_json(
- '%s?json' % url, video_id, 'Downloading video JSON')
+ '%s%s.json' % (base, video_id), video_id, 'Downloading video JSON')
title = media['name']
-
video = media['video']
- relinker_info = self._extract_relinker_info(video['contentUrl'], video_id)
+ relinker_info = self._extract_relinker_info(video['content_url'], video_id)
self._sort_formats(relinker_info['formats'])
thumbnails = []
for _, value in media.get('images').items():
if value:
thumbnails.append({
- 'url': value.replace('[RESOLUTION]', '600x400')
+ 'url': urljoin(RaiBaseIE._BASE_URL, value.replace('[RESOLUTION]', '600x400'))
})
timestamp = unified_timestamp(try_get(
'display_id': 'rainews24',
'ext': 'mp4',
'title': 're:^Diretta di Rai News 24 [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
- 'description': 'md5:6eca31500550f9376819f174e5644754',
+ 'description': 'md5:4d00bcf6dc98b27c6ec480de329d1497',
'uploader': 'Rai News 24',
'creator': 'Rai News 24',
'is_live': True,
def _real_extract(self, url):
display_id = self._match_id(url)
- webpage = self._download_webpage(url, display_id)
+ media = self._download_json(
+ '%s.json' % urljoin(RaiBaseIE._BASE_URL, 'dirette/' + display_id),
+ display_id, 'Downloading channel JSON')
+
+ title = media['name']
+ video = media['video']
+ video_id = media['id'].replace('ContentItem-', '')
- video_id = self._search_regex(
- r'data-uniquename=["\']ContentItem-(%s)' % RaiBaseIE._UUID_RE,
- webpage, 'content id')
+ relinker_info = self._extract_relinker_info(video['content_url'], video_id)
+ self._sort_formats(relinker_info['formats'])
- return {
- '_type': 'url_transparent',
- 'ie_key': RaiPlayIE.ie_key(),
- 'url': 'http://www.raiplay.it/dirette/ContentItem-%s.html' % video_id,
+ info = {
'id': video_id,
'display_id': display_id,
+ 'title': self._live_title(title) if relinker_info.get(
+ 'is_live') else title,
+ 'alt_title': media.get('subtitle'),
+ 'description': media.get('description'),
+ 'uploader': strip_or_none(media.get('channel')),
+ 'creator': strip_or_none(media.get('editor')),
+ 'duration': parse_duration(video.get('duration')),
}
+ info.update(relinker_info)
+ return info
+
class RaiPlayPlaylistIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?raiplay\.it/programmi/(?P<id>[^/?#&]+)'
'info_dict': {
'id': 'nondirloalmiocapo',
'title': 'Non dirlo al mio capo',
- 'description': 'md5:9f3d603b2947c1c7abb098f3b14fac86',
+ 'description': 'md5:98ab6b98f7f44c2843fd7d6f045f153b',
},
'playlist_mincount': 12,
}]
def _real_extract(self, url):
playlist_id = self._match_id(url)
- webpage = self._download_webpage(url, playlist_id)
+ media = self._download_json(
+ '%s.json' % urljoin(RaiBaseIE._BASE_URL, 'programmi/' + playlist_id),
+ playlist_id, 'Downloading program JSON')
- title = self._html_search_meta(
- ('programma', 'nomeProgramma'), webpage, 'title')
- description = unescapeHTML(self._html_search_meta(
- ('description', 'og:description'), webpage, 'description'))
+ title = media['name']
+ description = media['program_info']['description']
+
+ content_sets = [s['id'] for b in media['blocks'] for s in b['sets']]
entries = []
- for mobj in re.finditer(
- r'<a\b[^>]+\bhref=(["\'])(?P<path>/raiplay/video/.+?)\1',
- webpage):
- video_url = urljoin(url, mobj.group('path'))
- entries.append(self.url_result(
- video_url, ie=RaiPlayIE.ie_key(),
- video_id=RaiPlayIE._match_id(video_url)))
+ for cs in content_sets:
+ medias = self._download_json(
+ '%s/%s.json' % (urljoin(RaiBaseIE._BASE_URL, 'programmi/' + playlist_id), cs),
+ cs, 'Downloading content set JSON')
+ for m in medias['items']:
+ video_url = urljoin(url, m['path_id'])
+ entries.append(self.url_result(
+ video_url, ie=RaiPlayIE.ie_key(),
+ video_id=RaiPlayIE._match_id(video_url)))
return self.playlist_result(entries, playlist_id, title, description)
}, {
# with ContentItem in og:url
'url': 'http://www.rai.it/dl/RaiTV/programmi/media/ContentItem-efb17665-691c-45d5-a60c-5301333cbb0c.html',
- 'md5': '11959b4e44fa74de47011b5799490adf',
+ 'md5': '6865dd00cf0bbf5772fdd89d59bd768a',
'info_dict': {
'id': 'efb17665-691c-45d5-a60c-5301333cbb0c',
'ext': 'mp4',
'duration': 2214,
'upload_date': '20161103',
}
- }, {
- # drawMediaRaiTV(...)
- 'url': 'http://www.report.rai.it/dl/Report/puntata/ContentItem-0c7a664b-d0f4-4b2c-8835-3f82e46f433e.html',
- 'md5': '2dd727e61114e1ee9c47f0da6914e178',
- 'info_dict': {
- 'id': '59d69d28-6bb6-409d-a4b5-ed44096560af',
- 'ext': 'mp4',
- 'title': 'Il pacco',
- 'description': 'md5:4b1afae1364115ce5d78ed83cd2e5b3a',
- 'thumbnail': r're:^https?://.*\.jpg$',
- 'upload_date': '20141221',
- },
}, {
# initEdizione('ContentItem-...'
'url': 'http://www.tg1.rai.it/dl/tg1/2010/edizioni/ContentSet-9b6e0cba-4bef-4aef-8cf0-9f7f665b7dfb-tg1.html?item=undefined',
'upload_date': '20170401',
},
'skip': 'Changes daily',
- }, {
- # HDS live stream with only relinker URL
- 'url': 'http://www.rai.tv/dl/RaiTV/dirette/PublishingBlock-1912dbbf-3f96-44c3-b4cf-523681fbacbc.html?channel=EuroNews',
- 'info_dict': {
- 'id': '1912dbbf-3f96-44c3-b4cf-523681fbacbc',
- 'ext': 'flv',
- 'title': 'EuroNews',
- },
- 'params': {
- 'skip_download': True,
- },
}, {
# HLS live stream with ContentItem in og:url
'url': 'http://www.rainews.it/dl/rainews/live/ContentItem-3156f2f2-dc70-4953-8e2f-70d7489d4ce9.html',
from .common import InfoExtractor
from ..utils import (
+ determine_ext,
ExtractorError,
int_or_none,
merge_dicts,
webpage = self._download_webpage(
'http://www.redtube.com/%s' % video_id, video_id)
- if any(s in webpage for s in ['video-deleted-info', '>This video has been removed']):
- raise ExtractorError('Video %s has been removed' % video_id, expected=True)
+ ERRORS = (
+ (('video-deleted-info', '>This video has been removed'), 'has been removed'),
+ (('private_video_text', '>This video is private', '>Send a friend request to its owner to be able to view it'), 'is private'),
+ )
+
+ for patterns, message in ERRORS:
+ if any(p in webpage for p in patterns):
+ raise ExtractorError(
+ 'Video %s %s' % (video_id, message), expected=True)
info = self._search_json_ld(webpage, video_id, default={})
if not info.get('title'):
info['title'] = self._html_search_regex(
- (r'<h(\d)[^>]+class="(?:video_title_text|videoTitle)[^"]*">(?P<title>(?:(?!\1).)+)</h\1>',
+ (r'<h(\d)[^>]+class="(?:video_title_text|videoTitle|video_title)[^"]*">(?P<title>(?:(?!\1).)+)</h\1>',
r'(?:videoTitle|title)\s*:\s*(["\'])(?P<title>(?:(?!\1).)+)\1',),
webpage, 'title', group='title',
default=None) or self._og_search_title(webpage)
})
medias = self._parse_json(
self._search_regex(
- r'mediaDefinition\s*:\s*(\[.+?\])', webpage,
+ r'mediaDefinition["\']?\s*:\s*(\[.+?}\s*\])', webpage,
'media definitions', default='{}'),
video_id, fatal=False)
if medias and isinstance(medias, list):
format_url = url_or_none(media.get('videoUrl'))
if not format_url:
continue
+ if media.get('format') == 'hls' or determine_ext(format_url) == 'm3u8':
+ formats.extend(self._extract_m3u8_formats(
+ format_url, video_id, 'mp4',
+ entry_protocol='m3u8_native', m3u8_id='hls',
+ fatal=False))
+ continue
format_id = media.get('quality')
formats.append({
'url': format_url,
https?://(?:(?:www|static)\.)?
(?:
rtlxl\.nl/[^\#]*\#!/[^/]+/|
+ rtlxl\.nl/programma/[^/]+/|
rtl\.nl/(?:(?:system/videoplayer/(?:[^/]+/)+(?:video_)?embed\.html|embed)\b.+?\buuid=|video/)
)
(?P<id>[0-9a-f-]+)'''
_TESTS = [{
+ 'url': 'https://www.rtlxl.nl/programma/rtl-nieuws/0bd1384d-d970-3086-98bb-5c104e10c26f',
+ 'md5': '490428f1187b60d714f34e1f2e3af0b6',
+ 'info_dict': {
+ 'id': '0bd1384d-d970-3086-98bb-5c104e10c26f',
+ 'ext': 'mp4',
+ 'title': 'RTL Nieuws',
+ 'description': 'md5:d41d8cd98f00b204e9800998ecf8427e',
+ 'timestamp': 1593293400,
+ 'upload_date': '20200627',
+ 'duration': 661.08,
+ },
+ }, {
+ # old url pattern. Tests does not pass
'url': 'http://www.rtlxl.nl/#!/rtl-nieuws-132237/82b1aad1-4a14-3d7b-b554-b0aed1b2c416',
'md5': '473d1946c1fdd050b2c0161a4b13c373',
'info_dict': {
is_live = video_type == 'live'
json_data = self._download_json(
- 'http://player.rutv.ru/iframe/data%s/id/%s' % ('live' if is_live else 'video', video_id),
+ 'http://player.vgtrk.com/iframe/data%s/id/%s' % ('live' if is_live else 'video', video_id),
video_id, 'Downloading JSON')
if json_data['errors']:
from ..compat import (
compat_parse_qs,
- compat_str,
compat_urlparse,
)
from ..utils import (
'Downloading login page')
def is_logged(urlh):
- return 'learning.oreilly.com/home/' in compat_str(urlh.geturl())
+ return 'learning.oreilly.com/home/' in urlh.geturl()
if is_logged(urlh):
self.LOGGED_IN = True
return
- redirect_url = compat_str(urlh.geturl())
+ redirect_url = urlh.geturl()
parsed_url = compat_urlparse.urlparse(redirect_url)
qs = compat_parse_qs(parsed_url.query)
next_uri = compat_urlparse.urljoin(
kaltura_session = self._download_json(
'%s/player/kaltura_session/?reference_id=%s' % (self._API_BASE, reference_id),
video_id, 'Downloading kaltura session JSON',
- 'Unable to download kaltura session JSON', fatal=False)
+ 'Unable to download kaltura session JSON', fatal=False,
+ headers={'Accept': 'application/json'})
if kaltura_session:
session = kaltura_session.get('session')
if session:
from .aws import AWSIE
from .anvato import AnvatoIE
+from .common import InfoExtractor
from ..utils import (
smuggle_url,
urlencode_postdata,
'anvato:anvato_scripps_app_web_prod_0837996dbe373629133857ae9eb72e740424d80a:%s' % mcp_id,
{'geo_countries': ['US']}),
AnvatoIE.ie_key(), video_id=mcp_id)
+
+
+class ScrippsNetworksIE(InfoExtractor):
+ _VALID_URL = r'https?://(?:www\.)?(?P<site>cookingchanneltv|discovery|(?:diy|food)network|hgtv|travelchannel)\.com/videos/[0-9a-z-]+-(?P<id>\d+)'
+ _TESTS = [{
+ 'url': 'https://www.cookingchanneltv.com/videos/the-best-of-the-best-0260338',
+ 'info_dict': {
+ 'id': '0260338',
+ 'ext': 'mp4',
+ 'title': 'The Best of the Best',
+ 'description': 'Catch a new episode of MasterChef Canada Tuedsay at 9/8c.',
+ 'timestamp': 1475678834,
+ 'upload_date': '20161005',
+ 'uploader': 'SCNI-SCND',
+ },
+ 'add_ie': ['ThePlatform'],
+ }, {
+ 'url': 'https://www.diynetwork.com/videos/diy-barnwood-tablet-stand-0265790',
+ 'only_matching': True,
+ }, {
+ 'url': 'https://www.foodnetwork.com/videos/chocolate-strawberry-cake-roll-7524591',
+ 'only_matching': True,
+ }, {
+ 'url': 'https://www.hgtv.com/videos/cookie-decorating-101-0301929',
+ 'only_matching': True,
+ }, {
+ 'url': 'https://www.travelchannel.com/videos/two-climates-one-bag-5302184',
+ 'only_matching': True,
+ }, {
+ 'url': 'https://www.discovery.com/videos/guardians-of-the-glades-cooking-with-tom-cobb-5578368',
+ 'only_matching': True,
+ }]
+ _ACCOUNT_MAP = {
+ 'cookingchanneltv': 2433005105,
+ 'discovery': 2706091867,
+ 'diynetwork': 2433004575,
+ 'foodnetwork': 2433005105,
+ 'hgtv': 2433004575,
+ 'travelchannel': 2433005739,
+ }
+ _TP_TEMPL = 'https://link.theplatform.com/s/ip77QC/media/guid/%d/%s?mbr=true'
+
+ def _real_extract(self, url):
+ site, guid = re.match(self._VALID_URL, url).groups()
+ return self.url_result(smuggle_url(
+ self._TP_TEMPL % (self._ACCOUNT_MAP[site], guid),
+ {'force_smil_url': True}), 'ThePlatform', guid)
class ServusIE(InfoExtractor):
- _VALID_URL = r'https?://(?:www\.)?servus\.com/(?:(?:at|de)/p/[^/]+|tv/videos)/(?P<id>[aA]{2}-\w+|\d+-\d+)'
+ _VALID_URL = r'''(?x)
+ https?://
+ (?:www\.)?
+ (?:
+ servus\.com/(?:(?:at|de)/p/[^/]+|tv/videos)|
+ servustv\.com/videos
+ )
+ /(?P<id>[aA]{2}-\w+|\d+-\d+)
+ '''
_TESTS = [{
- 'url': 'https://www.servus.com/de/p/Die-Gr%C3%BCnen-aus-Sicht-des-Volkes/AA-1T6VBU5PW1W12/',
+ # new URL schema
+ 'url': 'https://www.servustv.com/videos/aa-1t6vbu5pw1w12/',
'md5': '3e1dd16775aa8d5cbef23628cfffc1f4',
'info_dict': {
'id': 'AA-1T6VBU5PW1W12',
'description': 'md5:1247204d85783afe3682644398ff2ec4',
'thumbnail': r're:^https?://.*\.jpg',
}
+ }, {
+ # old URL schema
+ 'url': 'https://www.servus.com/de/p/Die-Gr%C3%BCnen-aus-Sicht-des-Volkes/AA-1T6VBU5PW1W12/',
+ 'only_matching': True,
}, {
'url': 'https://www.servus.com/at/p/Wie-das-Leben-beginnt/1309984137314-381415152/',
'only_matching': True,
'info_dict': {
'id': '78932792',
'ext': 'mp4',
- 'title': 'youtube-dl testing video',
+ 'title': 'youtube-dlc testing video',
},
'params': {
'skip_download': True
import itertools
import re
+import json
+import random
from .common import (
InfoExtractor,
SearchInfoExtractor
)
from ..compat import (
+ compat_HTTPError,
+ compat_kwargs,
compat_str,
compat_urlparse,
)
from ..utils import (
+ error_to_compat_str,
ExtractorError,
float_or_none,
HEADRequest,
unified_timestamp,
update_url_query,
url_or_none,
+ urlhandle_detect_ext,
+ sanitized_Request,
)
'repost_count': int,
}
},
- # not streamable song
+ # geo-restricted
{
'url': 'https://soundcloud.com/the-concept-band/goldrushed-mastered?in=the-concept-band/sets/the-royal-concept-ep',
'info_dict': {
'uploader_id': '9615865',
'timestamp': 1337635207,
'upload_date': '20120521',
- 'duration': 30,
+ 'duration': 227.155,
'license': 'all-rights-reserved',
'view_count': int,
'like_count': int,
'comment_count': int,
'repost_count': int,
},
- 'params': {
- # rtmp
- 'skip_download': True,
- },
- 'skip': 'Preview',
},
# private link
{
- 'url': 'https://soundcloud.com/jaimemf/youtube-dl-test-video-a-y-baw/s-8Pjrp',
+ 'url': 'https://soundcloud.com/jaimemf/youtube-dlc-test-video-a-y-baw/s-8Pjrp',
'md5': 'aa0dd32bfea9b0c5ef4f02aacd080604',
'info_dict': {
'id': '123998367',
'skip_download': True,
},
},
- # not available via api.soundcloud.com/i1/tracks/id/streams
{
'url': 'https://soundcloud.com/giovannisarani/mezzo-valzer',
'md5': 'e22aecd2bc88e0e4e432d7dcc0a1abf7',
'ext': 'mp3',
'title': 'Mezzo Valzer',
'description': 'md5:4138d582f81866a530317bae316e8b61',
- 'uploader': 'Giovanni Sarani',
+ 'uploader': 'Micronie',
'uploader_id': '3352531',
'timestamp': 1551394171,
'upload_date': '20190228',
'comment_count': int,
'repost_count': int,
},
- 'expected_warnings': ['Unable to download JSON metadata'],
- }
+ },
+ {
+ # AAC HQ format available (account with active subscription needed)
+ 'url': 'https://soundcloud.com/wandw/the-chainsmokers-ft-daya-dont-let-me-down-ww-remix-1',
+ 'only_matching': True,
+ },
+ {
+ # Go+ (account with active subscription needed)
+ 'url': 'https://soundcloud.com/taylorswiftofficial/look-what-you-made-me-do',
+ 'only_matching': True,
+ },
]
- _API_BASE = 'https://api.soundcloud.com/'
_API_V2_BASE = 'https://api-v2.soundcloud.com/'
_BASE_URL = 'https://soundcloud.com/'
- _CLIENT_ID = 'UW9ajvMgVdMMW3cdeBi8lPfN6dvOVGji'
_IMAGE_REPL_RE = r'-([0-9a-z]+)\.jpg'
_ARTWORK_MAP = {
'original': 0,
}
+ def _store_client_id(self, client_id):
+ self._downloader.cache.store('soundcloud', 'client_id', client_id)
+
+ def _update_client_id(self):
+ webpage = self._download_webpage('https://soundcloud.com/', None)
+ for src in reversed(re.findall(r'<script[^>]+src="([^"]+)"', webpage)):
+ script = self._download_webpage(src, None, fatal=False)
+ if script:
+ client_id = self._search_regex(
+ r'client_id\s*:\s*"([0-9a-zA-Z]{32})"',
+ script, 'client id', default=None)
+ if client_id:
+ self._CLIENT_ID = client_id
+ self._store_client_id(client_id)
+ return
+ raise ExtractorError('Unable to extract client id')
+
+ def _download_json(self, *args, **kwargs):
+ non_fatal = kwargs.get('fatal') is False
+ if non_fatal:
+ del kwargs['fatal']
+ query = kwargs.get('query', {}).copy()
+ for _ in range(2):
+ query['client_id'] = self._CLIENT_ID
+ kwargs['query'] = query
+ try:
+ return super(SoundcloudIE, self)._download_json(*args, **compat_kwargs(kwargs))
+ except ExtractorError as e:
+ if isinstance(e.cause, compat_HTTPError) and e.cause.code == 401:
+ self._store_client_id(None)
+ self._update_client_id()
+ continue
+ elif non_fatal:
+ self._downloader.report_warning(error_to_compat_str(e))
+ return False
+ raise
+
+ def _real_initialize(self):
+ self._CLIENT_ID = self._downloader.cache.load('soundcloud', 'client_id') or "T5R4kgWS2PRf6lzLyIravUMnKlbIxQag" # 'EXLwg5lHTO2dslU5EePe3xkw0m1h86Cd' # 'YUKXoArFcqrlQn9tfNHvvyfnDISj04zk'
+ self._login()
+
+ _USER_AGENT = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36"
+ _API_AUTH_QUERY_TEMPLATE = '?client_id=%s'
+ _API_AUTH_URL_PW = 'https://api-auth.soundcloud.com/web-auth/sign-in/password%s'
+ _access_token = None
+ _HEADERS = {}
+ _NETRC_MACHINE = 'soundcloud'
+
+ def _login(self):
+ username, password = self._get_login_info()
+ if username is None:
+ return
+
+ def genDevId():
+ def genNumBlock():
+ return ''.join([str(random.randrange(10)) for i in range(6)])
+ return '-'.join([genNumBlock() for i in range(4)])
+
+ payload = {
+ 'client_id': self._CLIENT_ID,
+ 'recaptcha_pubkey': 'null',
+ 'recaptcha_response': 'null',
+ 'credentials': {
+ 'identifier': username,
+ 'password': password
+ },
+ 'signature': self.sign(username, password, self._CLIENT_ID),
+ 'device_id': genDevId(),
+ 'user_agent': self._USER_AGENT
+ }
+
+ query = self._API_AUTH_QUERY_TEMPLATE % self._CLIENT_ID
+ login = sanitized_Request(self._API_AUTH_URL_PW % query, json.dumps(payload).encode('utf-8'))
+ response = self._download_json(login, None)
+ self._access_token = response.get('session').get('access_token')
+ if not self._access_token:
+ self.report_warning('Unable to get access token, login may has failed')
+ else:
+ self._HEADERS = {'Authorization': 'OAuth ' + self._access_token}
+
+ # signature generation
+ def sign(self, user, pw, clid):
+ a = 33
+ i = 1
+ s = 440123
+ w = 117
+ u = 1800000
+ l = 1042
+ b = 37
+ k = 37
+ c = 5
+ n = "0763ed7314c69015fd4a0dc16bbf4b90" # _KEY
+ y = "8" # _REV
+ r = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36" # _USER_AGENT
+ e = user # _USERNAME
+ t = clid # _CLIENT_ID
+
+ d = '-'.join([str(mInt) for mInt in [a, i, s, w, u, l, b, k]])
+ p = n + y + d + r + e + t + d + n
+ h = p
+
+ m = 8011470
+ f = 0
+
+ for f in range(f, len(h)):
+ m = (m >> 1) + ((1 & m) << 23)
+ m += ord(h[f])
+ m &= 16777215
+
+ # c is not even needed
+ out = str(y) + ':' + str(d) + ':' + format(m, 'x') + ':' + str(c)
+
+ return out
+
@classmethod
def _resolv_url(cls, url):
- return SoundcloudIE._API_V2_BASE + 'resolve?url=' + url + '&client_id=' + cls._CLIENT_ID
+ return SoundcloudIE._API_V2_BASE + 'resolve?url=' + url
- def _extract_info_dict(self, info, full_title=None, secret_token=None, version=2):
+ def _extract_info_dict(self, info, full_title=None, secret_token=None):
track_id = compat_str(info['id'])
title = info['title']
- track_base_url = self._API_BASE + 'tracks/%s' % track_id
format_urls = set()
formats = []
query['secret_token'] = secret_token
if info.get('downloadable') and info.get('has_downloads_left'):
- format_url = update_url_query(
- info.get('download_url') or track_base_url + '/download', query)
- format_urls.add(format_url)
- if version == 2:
- v1_info = self._download_json(
- track_base_url, track_id, query=query, fatal=False) or {}
- else:
- v1_info = info
- formats.append({
- 'format_id': 'download',
- 'ext': v1_info.get('original_format') or 'mp3',
- 'filesize': int_or_none(v1_info.get('original_content_size')),
- 'url': format_url,
- 'preference': 10,
- })
+ download_url = update_url_query(
+ self._API_V2_BASE + 'tracks/' + track_id + '/download', query)
+ redirect_url = (self._download_json(download_url, track_id, fatal=False) or {}).get('redirectUri')
+ if redirect_url:
+ urlh = self._request_webpage(
+ HEADRequest(redirect_url), track_id, fatal=False)
+ if urlh:
+ format_url = urlh.geturl()
+ format_urls.add(format_url)
+ formats.append({
+ 'format_id': 'download',
+ 'ext': urlhandle_detect_ext(urlh) or 'mp3',
+ 'filesize': int_or_none(urlh.headers.get('Content-Length')),
+ 'url': format_url,
+ 'preference': 10,
+ })
def invalid_url(url):
- return not url or url in format_urls or re.search(r'/(?:preview|playlist)/0/30/', url)
+ return not url or url in format_urls
- def add_format(f, protocol):
+ def add_format(f, protocol, is_preview=False):
mobj = re.search(r'\.(?P<abr>\d+)\.(?P<ext>[0-9a-z]{3,4})(?=[/?])', stream_url)
if mobj:
for k, v in mobj.groupdict().items():
format_id_list = []
if protocol:
format_id_list.append(protocol)
+ ext = f.get('ext')
+ if ext == 'aac':
+ f['abr'] = '256'
for k in ('ext', 'abr'):
v = f.get(k)
if v:
format_id_list.append(v)
+ preview = is_preview or re.search(r'/(?:preview|playlist)/0/30/', f['url'])
+ if preview:
+ format_id_list.append('preview')
abr = f.get('abr')
if abr:
f['abr'] = int(abr)
+ if protocol == 'hls':
+ protocol = 'm3u8' if ext == 'aac' else 'm3u8_native'
+ else:
+ protocol = 'http'
f.update({
'format_id': '_'.join(format_id_list),
- 'protocol': 'm3u8_native' if protocol == 'hls' else 'http',
+ 'protocol': protocol,
+ 'preference': -10 if preview else None,
})
formats.append(f)
if not isinstance(t, dict):
continue
format_url = url_or_none(t.get('url'))
- if not format_url or t.get('snipped') or '/preview/' in format_url:
+ if not format_url:
continue
stream = self._download_json(
- format_url, track_id, query=query, fatal=False)
+ format_url, track_id, query=query, fatal=False, headers=self._HEADERS)
if not isinstance(stream, dict):
continue
stream_url = url_or_none(stream.get('url'))
add_format({
'url': stream_url,
'ext': ext,
- }, 'http' if protocol == 'progressive' else protocol)
-
- if not formats:
- # Old API, does not work for some tracks (e.g.
- # https://soundcloud.com/giovannisarani/mezzo-valzer)
- # and might serve preview URLs (e.g.
- # http://www.soundcloud.com/snbrn/ele)
- format_dict = self._download_json(
- track_base_url + '/streams', track_id,
- 'Downloading track url', query=query, fatal=False) or {}
-
- for key, stream_url in format_dict.items():
- if invalid_url(stream_url):
- continue
- format_urls.add(stream_url)
- mobj = re.search(r'(http|hls)_([^_]+)_(\d+)_url', key)
- if mobj:
- protocol, ext, abr = mobj.groups()
- add_format({
- 'abr': abr,
- 'ext': ext,
- 'url': stream_url,
- }, protocol)
-
- if not formats:
- # We fallback to the stream_url in the original info, this
- # cannot be always used, sometimes it can give an HTTP 404 error
- urlh = self._request_webpage(
- HEADRequest(info.get('stream_url') or track_base_url + '/stream'),
- track_id, query=query, fatal=False)
- if urlh:
- stream_url = urlh.geturl()
- if not invalid_url(stream_url):
- add_format({'url': stream_url}, 'http')
+ }, 'http' if protocol == 'progressive' else protocol,
+ t.get('snipped') or '/preview/' in format_url)
for f in formats:
f['vcodec'] = 'none'
+ if not formats and info.get('policy') == 'BLOCK':
+ self.raise_geo_restricted()
self._sort_formats(formats)
user = info.get('user') or {}
track_id = mobj.group('track_id')
- query = {
- 'client_id': self._CLIENT_ID,
- }
+ query = {}
if track_id:
info_json_url = self._API_V2_BASE + 'tracks/' + track_id
full_title = track_id
resolve_title += '/%s' % token
info_json_url = self._resolv_url(self._BASE_URL + resolve_title)
- version = 2
info = self._download_json(
- info_json_url, full_title, 'Downloading info JSON', query=query, fatal=False)
- if not info:
- info = self._download_json(
- info_json_url.replace(self._API_V2_BASE, self._API_BASE),
- full_title, 'Downloading info JSON', query=query)
- version = 1
+ info_json_url, full_title, 'Downloading info JSON', query=query, headers=self._HEADERS)
- return self._extract_info_dict(info, full_title, token, version)
+ return self._extract_info_dict(info, full_title, token)
class SoundcloudPlaylistBaseIE(SoundcloudIE):
- def _extract_track_entries(self, tracks, token=None):
+ def _extract_set(self, playlist, token=None):
+ playlist_id = compat_str(playlist['id'])
+ tracks = playlist.get('tracks') or []
+ if not all([t.get('permalink_url') for t in tracks]) and token:
+ tracks = self._download_json(
+ self._API_V2_BASE + 'tracks', playlist_id,
+ 'Downloading tracks', query={
+ 'ids': ','.join([compat_str(t['id']) for t in tracks]),
+ 'playlistId': playlist_id,
+ 'playlistSecretToken': token,
+ }, headers=self._HEADERS)
entries = []
for track in tracks:
track_id = str_or_none(track.get('id'))
url += '?secret_token=' + token
entries.append(self.url_result(
url, SoundcloudIE.ie_key(), track_id))
- return entries
+ return self.playlist_result(
+ entries, playlist_id,
+ playlist.get('title'),
+ playlist.get('description'))
class SoundcloudSetIE(SoundcloudPlaylistBaseIE):
- _VALID_URL = r'https?://(?:(?:www|m)\.)?soundcloud\.com/(?P<uploader>[\w\d-]+)/sets/(?P<slug_title>[\w\d-]+)(?:/(?P<token>[^?/]+))?'
+ _VALID_URL = r'https?://(?:(?:www|m)\.)?soundcloud\.com/(?P<uploader>[\w\d-]+)/sets/(?P<slug_title>[:\w\d-]+)(?:/(?P<token>[^?/]+))?'
IE_NAME = 'soundcloud:set'
_TESTS = [{
'url': 'https://soundcloud.com/the-concept-band/sets/the-royal-concept-ep',
'info_dict': {
'id': '2284613',
'title': 'The Royal Concept EP',
+ 'description': 'md5:71d07087c7a449e8941a70a29e34671e',
},
'playlist_mincount': 5,
}, {
'url': 'https://soundcloud.com/the-concept-band/sets/the-royal-concept-ep/token',
'only_matching': True,
+ }, {
+ 'url': 'https://soundcloud.com/discover/sets/weekly::flacmatic',
+ 'only_matching': True,
+ }, {
+ 'url': 'https://soundcloud.com/discover/sets/charts-top:all-music:de',
+ 'only_matching': True,
+ }, {
+ 'url': 'https://soundcloud.com/discover/sets/charts-top:hiphoprap:kr',
+ 'only_matching': True,
}]
def _real_extract(self, url):
full_title += '/' + token
info = self._download_json(self._resolv_url(
- self._BASE_URL + full_title), full_title)
+ self._BASE_URL + full_title), full_title, headers=self._HEADERS)
if 'errors' in info:
msgs = (compat_str(err['error_message']) for err in info['errors'])
raise ExtractorError('unable to download video webpage: %s' % ','.join(msgs))
- entries = self._extract_track_entries(info['tracks'], token)
-
- return self.playlist_result(
- entries, str_or_none(info.get('id')), info.get('title'))
+ return self._extract_set(info, token)
-class SoundcloudPagedPlaylistBaseIE(SoundcloudPlaylistBaseIE):
+class SoundcloudPagedPlaylistBaseIE(SoundcloudIE):
def _extract_playlist(self, base_url, playlist_id, playlist_title):
COMMON_QUERY = {
- 'limit': 2000000000,
- 'client_id': self._CLIENT_ID,
+ 'limit': 200,
'linked_partitioning': '1',
}
for i in itertools.count():
response = self._download_json(
next_href, playlist_id,
- 'Downloading track page %s' % (i + 1), query=query)
+ 'Downloading track page %s' % (i + 1), query=query, headers=self._HEADERS)
collection = response['collection']
user = self._download_json(
self._resolv_url(self._BASE_URL + uploader),
- uploader, 'Downloading user info')
+ uploader, 'Downloading user info', headers=self._HEADERS)
resource = mobj.group('rsrc') or 'all'
def _real_extract(self, url):
track_name = self._match_id(url)
- track = self._download_json(self._resolv_url(url), track_name)
+ track = self._download_json(self._resolv_url(url), track_name, headers=self._HEADERS)
track_id = self._search_regex(
r'soundcloud:track-stations:(\d+)', track['id'], 'track id')
mobj = re.match(self._VALID_URL, url)
playlist_id = mobj.group('id')
- query = {
- 'client_id': self._CLIENT_ID,
- }
+ query = {}
token = mobj.group('token')
if token:
query['secret_token'] = token
data = self._download_json(
self._API_V2_BASE + 'playlists/' + playlist_id,
- playlist_id, 'Downloading playlist', query=query)
-
- entries = self._extract_track_entries(data['tracks'], token)
+ playlist_id, 'Downloading playlist', query=query, headers=self._HEADERS)
- return self.playlist_result(
- entries, playlist_id, data.get('title'), data.get('description'))
+ return self._extract_set(data, token)
class SoundcloudSearchIE(SearchInfoExtractor, SoundcloudIE):
self._MAX_RESULTS_PER_PAGE)
query.update({
'limit': limit,
- 'client_id': self._CLIENT_ID,
'linked_partitioning': 1,
'offset': 0,
})
for i in itertools.count(1):
response = self._download_json(
next_url, collection_id, 'Downloading page {0}'.format(i),
- 'Unable to download API page')
+ 'Unable to download API page', headers=self._HEADERS)
collection = response.get('collection', [])
if not collection:
from .common import InfoExtractor
from ..utils import (
+ determine_ext,
ExtractorError,
merge_dicts,
orderedSet,
url.replace('/%s/embed' % video_id, '/%s/video' % video_id),
video_id, headers={'Cookie': 'country=US'})
- if re.search(r'<[^>]+\bid=["\']video_removed', webpage):
+ if re.search(r'<[^>]+\b(?:id|class)=["\']video_removed', webpage):
raise ExtractorError(
'Video %s is not available' % video_id, expected=True)
if not f_url:
return
f = parse_resolution(format_id)
- f.update({
- 'url': f_url,
- 'format_id': format_id,
- })
- formats.append(f)
+ ext = determine_ext(f_url)
+ if format_id.startswith('m3u8') or ext == 'm3u8':
+ formats.extend(self._extract_m3u8_formats(
+ f_url, video_id, 'mp4', entry_protocol='m3u8_native',
+ m3u8_id='hls', fatal=False))
+ elif format_id.startswith('mpd') or ext == 'mpd':
+ formats.extend(self._extract_mpd_formats(
+ f_url, video_id, mpd_id='dash', fatal=False))
+ elif ext == 'mp4' or f.get('width') or f.get('height'):
+ f.update({
+ 'url': f_url,
+ 'format_id': format_id,
+ })
+ formats.append(f)
STREAM_URL_PREFIX = 'stream_url_'
r'data-streamkey\s*=\s*(["\'])(?P<value>(?:(?!\1).)+)\1',
webpage, 'stream key', group='value')
- sb_csrf_session = self._get_cookies(
- 'https://spankbang.com')['sb_csrf_session'].value
-
stream = self._download_json(
'https://spankbang.com/api/videos/stream', video_id,
'Downloading stream JSON', data=urlencode_postdata({
'id': stream_key,
'data': 0,
- 'sb_csrf_session': sb_csrf_session,
}), headers={
'Referer': url,
- 'X-CSRFToken': sb_csrf_session,
+ 'X-Requested-With': 'XMLHttpRequest',
})
for format_id, format_url in stream.items():
- if format_id.startswith(STREAM_URL_PREFIX):
- if format_url and isinstance(format_url, list):
- format_url = format_url[0]
- extract_format(
- format_id[len(STREAM_URL_PREFIX):], format_url)
+ if format_url and isinstance(format_url, list):
+ format_url = format_url[0]
+ extract_format(format_id, format_url)
- self._sort_formats(formats)
+ self._sort_formats(formats, field_preference=('preference', 'height', 'width', 'fps', 'tbr', 'format_id'))
info = self._search_json_ld(webpage, video_id, default={})
--- /dev/null
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..utils import (
+ float_or_none,
+ int_or_none,
+ merge_dicts,
+ str_or_none,
+ str_to_int,
+ url_or_none,
+)
+
+
+class SpankwireIE(InfoExtractor):
+ _VALID_URL = r'''(?x)
+ https?://
+ (?:www\.)?spankwire\.com/
+ (?:
+ [^/]+/video|
+ EmbedPlayer\.aspx/?\?.*?\bArticleId=
+ )
+ (?P<id>\d+)
+ '''
+ _TESTS = [{
+ # download URL pattern: */<height>P_<tbr>K_<video_id>.mp4
+ 'url': 'http://www.spankwire.com/Buckcherry-s-X-Rated-Music-Video-Crazy-Bitch/video103545/',
+ 'md5': '5aa0e4feef20aad82cbcae3aed7ab7cd',
+ 'info_dict': {
+ 'id': '103545',
+ 'ext': 'mp4',
+ 'title': 'Buckcherry`s X Rated Music Video Crazy Bitch',
+ 'description': 'Crazy Bitch X rated music video.',
+ 'duration': 222,
+ 'uploader': 'oreusz',
+ 'uploader_id': '124697',
+ 'timestamp': 1178587885,
+ 'upload_date': '20070508',
+ 'average_rating': float,
+ 'view_count': int,
+ 'comment_count': int,
+ 'age_limit': 18,
+ 'categories': list,
+ 'tags': list,
+ },
+ }, {
+ # download URL pattern: */mp4_<format_id>_<video_id>.mp4
+ 'url': 'http://www.spankwire.com/Titcums-Compiloation-I/video1921551/',
+ 'md5': '09b3c20833308b736ae8902db2f8d7e6',
+ 'info_dict': {
+ 'id': '1921551',
+ 'ext': 'mp4',
+ 'title': 'Titcums Compiloation I',
+ 'description': 'cum on tits',
+ 'uploader': 'dannyh78999',
+ 'uploader_id': '3056053',
+ 'upload_date': '20150822',
+ 'age_limit': 18,
+ },
+ 'params': {
+ 'proxy': '127.0.0.1:8118'
+ },
+ 'skip': 'removed',
+ }, {
+ 'url': 'https://www.spankwire.com/EmbedPlayer.aspx/?ArticleId=156156&autostart=true',
+ 'only_matching': True,
+ }]
+
+ @staticmethod
+ def _extract_urls(webpage):
+ return re.findall(
+ r'<iframe[^>]+\bsrc=["\']((?:https?:)?//(?:www\.)?spankwire\.com/EmbedPlayer\.aspx/?\?.*?\bArticleId=\d+)',
+ webpage)
+
+ def _real_extract(self, url):
+ video_id = self._match_id(url)
+
+ video = self._download_json(
+ 'https://www.spankwire.com/api/video/%s.json' % video_id, video_id)
+
+ title = video['title']
+
+ formats = []
+ videos = video.get('videos')
+ if isinstance(videos, dict):
+ for format_id, format_url in videos.items():
+ video_url = url_or_none(format_url)
+ if not format_url:
+ continue
+ height = int_or_none(self._search_regex(
+ r'(\d+)[pP]', format_id, 'height', default=None))
+ m = re.search(
+ r'/(?P<height>\d+)[pP]_(?P<tbr>\d+)[kK]', video_url)
+ if m:
+ tbr = int(m.group('tbr'))
+ height = height or int(m.group('height'))
+ else:
+ tbr = None
+ formats.append({
+ 'url': video_url,
+ 'format_id': '%dp' % height if height else format_id,
+ 'height': height,
+ 'tbr': tbr,
+ })
+ m3u8_url = url_or_none(video.get('HLS'))
+ if m3u8_url:
+ formats.extend(self._extract_m3u8_formats(
+ m3u8_url, video_id, 'mp4', entry_protocol='m3u8_native',
+ m3u8_id='hls', fatal=False))
+ self._sort_formats(formats, ('height', 'tbr', 'width', 'format_id'))
+
+ view_count = str_to_int(video.get('viewed'))
+
+ thumbnails = []
+ for preference, t in enumerate(('', '2x'), start=0):
+ thumbnail_url = url_or_none(video.get('poster%s' % t))
+ if not thumbnail_url:
+ continue
+ thumbnails.append({
+ 'url': thumbnail_url,
+ 'preference': preference,
+ })
+
+ def extract_names(key):
+ entries_list = video.get(key)
+ if not isinstance(entries_list, list):
+ return
+ entries = []
+ for entry in entries_list:
+ name = str_or_none(entry.get('name'))
+ if name:
+ entries.append(name)
+ return entries
+
+ categories = extract_names('categories')
+ tags = extract_names('tags')
+
+ uploader = None
+ info = {}
+
+ webpage = self._download_webpage(
+ 'https://www.spankwire.com/_/video%s/' % video_id, video_id,
+ fatal=False)
+ if webpage:
+ info = self._search_json_ld(webpage, video_id, default={})
+ thumbnail_url = None
+ if 'thumbnail' in info:
+ thumbnail_url = url_or_none(info['thumbnail'])
+ del info['thumbnail']
+ if not thumbnail_url:
+ thumbnail_url = self._og_search_thumbnail(webpage)
+ if thumbnail_url:
+ thumbnails.append({
+ 'url': thumbnail_url,
+ 'preference': 10,
+ })
+ uploader = self._html_search_regex(
+ r'(?s)by\s*<a[^>]+\bclass=["\']uploaded__by[^>]*>(.+?)</a>',
+ webpage, 'uploader', fatal=False)
+ if not view_count:
+ view_count = str_to_int(self._search_regex(
+ r'data-views=["\']([\d,.]+)', webpage, 'view count',
+ fatal=False))
+
+ return merge_dicts({
+ 'id': video_id,
+ 'title': title,
+ 'description': video.get('description'),
+ 'duration': int_or_none(video.get('duration')),
+ 'thumbnails': thumbnails,
+ 'uploader': uploader,
+ 'uploader_id': str_or_none(video.get('userId')),
+ 'timestamp': int_or_none(video.get('time_approved_on')),
+ 'average_rating': float_or_none(video.get('rating')),
+ 'view_count': view_count,
+ 'comment_count': int_or_none(video.get('comments')),
+ 'age_limit': 18,
+ 'categories': categories,
+ 'tags': tags,
+ 'formats': formats,
+ }, info)
_TESTS = [{
'url': 'http://www.bellator.com/fight/atwr7k/bellator-158-michael-page-vs-evangelista-cyborg',
'info_dict': {
- 'id': 'b55e434e-fde1-4a98-b7cc-92003a034de4',
- 'ext': 'mp4',
- 'title': 'Douglas Lima vs. Paul Daley - Round 1',
- 'description': 'md5:805a8dd29310fd611d32baba2f767885',
- },
- 'params': {
- # m3u8 download
- 'skip_download': True,
+ 'title': 'Michael Page vs. Evangelista Cyborg',
+ 'description': 'md5:0d917fc00ffd72dd92814963fc6cbb05',
},
+ 'playlist_count': 3,
}, {
'url': 'http://www.bellator.com/video-clips/bw6k7n/bellator-158-foundations-michael-venom-page',
'only_matching': True,
_FEED_URL = 'http://www.bellator.com/feeds/mrss/'
_GEO_COUNTRIES = ['US']
+ def _extract_mgid(self, webpage):
+ return self._extract_triforce_mgid(webpage)
+
class ParamountNetworkIE(MTVServicesInfoExtractor):
_VALID_URL = r'https?://(?:www\.)?paramountnetwork\.com/[^/]+/[\da-z]{6}(?:[/?#&]|$)'
class SportDeutschlandIE(InfoExtractor):
_VALID_URL = r'https?://sportdeutschland\.tv/(?P<sport>[^/?#]+)/(?P<id>[^?#/]+)(?:$|[?#])'
_TESTS = [{
- 'url': 'http://sportdeutschland.tv/badminton/live-li-ning-badminton-weltmeisterschaft-2014-kopenhagen',
+ 'url': 'https://sportdeutschland.tv/badminton/re-live-deutsche-meisterschaften-2020-halbfinals?playlistId=0',
'info_dict': {
- 'id': 'live-li-ning-badminton-weltmeisterschaft-2014-kopenhagen',
+ 'id': 're-live-deutsche-meisterschaften-2020-halbfinals',
'ext': 'mp4',
- 'title': 're:Li-Ning Badminton Weltmeisterschaft 2014 Kopenhagen',
- 'categories': ['Badminton'],
+ 'title': 're:Re-live: Deutsche Meisterschaften 2020.*Halbfinals',
+ 'categories': ['Badminton-Deutschland'],
'view_count': int,
- 'thumbnail': r're:^https?://.*\.jpg$',
- 'description': r're:Die Badminton-WM 2014 aus Kopenhagen bei Sportdeutschland\.TV',
+ 'thumbnail': r're:^https?://.*\.(?:jpg|png)$',
'timestamp': int,
- 'upload_date': 're:^201408[23][0-9]$',
+ 'upload_date': '20200201',
+ 'description': 're:.*', # meaningless description for THIS video
},
- 'params': {
- 'skip_download': 'Live stream',
- },
- }, {
- 'url': 'http://sportdeutschland.tv/li-ning-badminton-wm-2014/lee-li-ning-badminton-weltmeisterschaft-2014-kopenhagen-herren-einzel-wei-vs',
- 'info_dict': {
- 'id': 'lee-li-ning-badminton-weltmeisterschaft-2014-kopenhagen-herren-einzel-wei-vs',
- 'ext': 'mp4',
- 'upload_date': '20140825',
- 'description': 'md5:60a20536b57cee7d9a4ec005e8687504',
- 'timestamp': 1408976060,
- 'duration': 2732,
- 'title': 'Li-Ning Badminton Weltmeisterschaft 2014 Kopenhagen: Herren Einzel, Wei Lee vs. Keun Lee',
- 'thumbnail': r're:^https?://.*\.jpg$',
- 'view_count': int,
- 'categories': ['Li-Ning Badminton WM 2014'],
-
- }
}]
def _real_extract(self, url):
video_id = mobj.group('id')
sport_id = mobj.group('sport')
- api_url = 'http://proxy.vidibusdynamic.net/sportdeutschland.tv/api/permalinks/%s/%s?access_token=true' % (
+ api_url = 'https://proxy.vidibusdynamic.net/ssl/backend.sportdeutschland.tv/api/permalinks/%s/%s?access_token=true' % (
sport_id, video_id)
req = sanitized_Request(api_url, headers={
'Accept': 'application/vnd.vidibus.v2.html+json',
# coding: utf-8
from __future__ import unicode_literals
-from .ard import ARDMediathekIE
+from .ard import ARDMediathekBaseIE
from ..utils import (
ExtractorError,
get_element_by_attribute,
)
-class SRMediathekIE(ARDMediathekIE):
+class SRMediathekIE(ARDMediathekBaseIE):
IE_NAME = 'sr:mediathek'
IE_DESC = 'Saarländischer Rundfunk'
_VALID_URL = r'https?://sr-mediathek(?:\.sr-online)?\.de/index\.php\?.*?&id=(?P<id>[0-9]+)'
--- /dev/null
+# coding: utf-8
+from __future__ import unicode_literals
+
+import itertools
+from .common import InfoExtractor
+
+
+class StoryFireIE(InfoExtractor):
+ _VALID_URL = r'(?:(?:https?://(?:www\.)?storyfire\.com/video-details)|(?:https://storyfire.app.link))/(?P<id>[^/\s]+)'
+ _TESTS = [{
+ 'url': 'https://storyfire.com/video-details/5df1d132b6378700117f9181',
+ 'md5': '560953bfca81a69003cfa5e53ac8a920',
+ 'info_dict': {
+ 'id': '5df1d132b6378700117f9181',
+ 'ext': 'mp4',
+ 'title': 'Buzzfeed Teaches You About Memes',
+ 'uploader_id': 'ntZAJFECERSgqHSxzonV5K2E89s1',
+ 'timestamp': 1576129028,
+ 'description': 'Mocking Buzzfeed\'s meme lesson. Reuploaded from YouTube because of their new policies',
+ 'uploader': 'whang!',
+ 'upload_date': '20191212',
+ },
+ 'params': {'format': 'bestvideo'} # There are no merged formats in the playlist.
+ }, {
+ 'url': 'https://storyfire.app.link/5GxAvWOQr8', # Alternate URL format, with unrelated short ID
+ 'md5': '7a2dc6d60c4889edfed459c620fe690d',
+ 'info_dict': {
+ 'id': '5f1e11ecd78a57b6c702001d',
+ 'ext': 'm4a',
+ 'title': 'Weird Nintendo Prototype Leaks',
+ 'description': 'A stream taking a look at some weird Nintendo Prototypes with Luigi in Mario 64 and weird Yoshis',
+ 'timestamp': 1595808576,
+ 'upload_date': '20200727',
+ 'uploader': 'whang!',
+ 'uploader_id': 'ntZAJFECERSgqHSxzonV5K2E89s1',
+ },
+ 'params': {'format': 'bestaudio'} # Verifying audio extraction
+
+ }]
+
+ _aformats = {
+ 'audio-medium-audio': {'acodec': 'aac', 'abr': 125, 'preference': -10},
+ 'audio-high-audio': {'acodec': 'aac', 'abr': 254, 'preference': -1},
+ }
+
+ def _real_extract(self, url):
+ video_id = self._match_id(url)
+ webpage = self._download_webpage(url, video_id)
+
+ # Extracting the json blob is mandatory to proceed with extraction.
+ jsontext = self._html_search_regex(
+ r'<script id="__NEXT_DATA__" type="application/json">(.+?)</script>',
+ webpage, 'json_data')
+
+ json = self._parse_json(jsontext, video_id)
+
+ # The currentVideo field in the json is mandatory
+ # because it contains the only link to the m3u playlist
+ video = json['props']['initialState']['video']['currentVideo']
+ videourl = video['vimeoVideoURL'] # Video URL is mandatory
+
+ # Extract other fields from the json in an error tolerant fashion
+ # ID may be incorrect (on short URL format), correct it.
+ parsed_id = video.get('_id')
+ if parsed_id:
+ video_id = parsed_id
+
+ title = video.get('title')
+ description = video.get('description')
+
+ thumbnail = video.get('storyImage')
+ views = video.get('views')
+ likes = video.get('likesCount')
+ comments = video.get('commentsCount')
+ duration = video.get('videoDuration')
+ publishdate = video.get('publishDate') # Apparently epoch time, day only
+
+ uploader = video.get('username')
+ uploader_id = video.get('hostID')
+ # Construct an uploader URL
+ uploader_url = None
+ if uploader_id:
+ uploader_url = "https://storyfire.com/user/%s/video" % uploader_id
+
+ # Collect root playlist to determine formats
+ formats = self._extract_m3u8_formats(
+ videourl, video_id, 'mp4', 'm3u8_native')
+
+ # Modify formats to fill in missing information about audio codecs
+ for format in formats:
+ aformat = self._aformats.get(format['format_id'])
+ if aformat:
+ format['acodec'] = aformat['acodec']
+ format['abr'] = aformat['abr']
+ format['preference'] = aformat['preference']
+ format['ext'] = 'm4a'
+
+ self._sort_formats(formats)
+
+ return {
+ 'id': video_id,
+ 'title': title,
+ 'description': description,
+ 'ext': "mp4",
+ 'url': videourl,
+ 'formats': formats,
+
+ 'thumbnail': thumbnail,
+ 'view_count': views,
+ 'like_count': likes,
+ 'comment_count': comments,
+ 'duration': duration,
+ 'timestamp': publishdate,
+
+ 'uploader': uploader,
+ 'uploader_id': uploader_id,
+ 'uploader_url': uploader_url,
+
+ }
+
+
+class StoryFireUserIE(InfoExtractor):
+ _VALID_URL = r'https?://(?:www\.)?storyfire\.com/user/(?P<id>[^/\s]+)/video'
+ _TESTS = [{
+ 'url': 'https://storyfire.com/user/ntZAJFECERSgqHSxzonV5K2E89s1/video',
+ 'info_dict': {
+ 'id': 'ntZAJFECERSgqHSxzonV5K2E89s1',
+ 'title': 'whang!',
+ },
+ 'playlist_mincount': 18
+ }, {
+ 'url': 'https://storyfire.com/user/UQ986nFxmAWIgnkZQ0ftVhq4nOk2/video',
+ 'info_dict': {
+ 'id': 'UQ986nFxmAWIgnkZQ0ftVhq4nOk2',
+ 'title': 'McJuggerNuggets',
+ },
+ 'playlist_mincount': 143
+
+ }]
+
+ # Generator for fetching playlist items
+ def _enum_videos(self, baseurl, user_id, firstjson):
+ totalVideos = int(firstjson['videosCount'])
+ haveVideos = 0
+ json = firstjson
+
+ for page in itertools.count(1):
+ for video in json['videos']:
+ id = video['_id']
+ url = "https://storyfire.com/video-details/%s" % id
+ haveVideos += 1
+ yield {
+ '_type': 'url',
+ 'id': id,
+ 'url': url,
+ 'ie_key': 'StoryFire',
+
+ 'title': video.get('title'),
+ 'description': video.get('description'),
+ 'view_count': video.get('views'),
+ 'comment_count': video.get('commentsCount'),
+ 'duration': video.get('videoDuration'),
+ 'timestamp': video.get('publishDate'),
+ }
+ # Are there more pages we could fetch?
+ if haveVideos < totalVideos:
+ pageurl = baseurl + ("%i" % haveVideos)
+ json = self._download_json(pageurl, user_id,
+ note='Downloading page %s' % page)
+
+ # Are there any videos in the new json?
+ videos = json.get('videos')
+ if not videos or len(videos) == 0:
+ break # no videos
+
+ else:
+ break # We have fetched all the videos, stop
+
+ def _real_extract(self, url):
+ user_id = self._match_id(url)
+
+ baseurl = "https://storyfire.com/app/publicVideos/%s?skip=" % user_id
+
+ # Download first page to ensure it can be downloaded, and get user information if available.
+ firstpage = baseurl + "0"
+ firstjson = self._download_json(firstpage, user_id)
+
+ title = None
+ videos = firstjson.get('videos')
+ if videos and len(videos):
+ title = videos[1].get('username')
+
+ return {
+ '_type': 'playlist',
+ 'entries': self._enum_videos(baseurl, user_id, firstjson),
+ 'id': user_id,
+ 'title': title,
+ }
+
+
+class StoryFireSeriesIE(InfoExtractor):
+ _VALID_URL = r'https?://(?:www\.)?storyfire\.com/write/series/stories/(?P<id>[^/\s]+)'
+ _TESTS = [{
+ 'url': 'https://storyfire.com/write/series/stories/-Lq6MsuIHLODO6d2dDkr/',
+ 'info_dict': {
+ 'id': '-Lq6MsuIHLODO6d2dDkr',
+ },
+ 'playlist_mincount': 13
+ }, {
+ 'url': 'https://storyfire.com/write/series/stories/the_mortal_one/',
+ 'info_dict': {
+ 'id': 'the_mortal_one',
+ },
+ 'playlist_count': 0 # This playlist has entries, but no videos.
+ }, {
+ 'url': 'https://storyfire.com/write/series/stories/story_time',
+ 'info_dict': {
+ 'id': 'story_time',
+ },
+ 'playlist_mincount': 10
+ }]
+
+ # Generator for returning playlist items
+ # This object is substantially different than the one in the user videos page above
+ def _enum_videos(self, jsonlist):
+ for video in jsonlist:
+ id = video['_id']
+ if video.get('hasVideo'): # Boolean element
+ url = "https://storyfire.com/video-details/%s" % id
+ yield {
+ '_type': 'url',
+ 'id': id,
+ 'url': url,
+ 'ie_key': 'StoryFire',
+
+ 'title': video.get('title'),
+ 'description': video.get('description'),
+ 'view_count': video.get('views'),
+ 'likes_count': video.get('likesCount'),
+ 'comment_count': video.get('commentsCount'),
+ 'duration': video.get('videoDuration'),
+ 'timestamp': video.get('publishDate'),
+ }
+
+ def _real_extract(self, url):
+ list_id = self._match_id(url)
+
+ listurl = "https://storyfire.com/app/seriesStories/%s/list" % list_id
+ json = self._download_json(listurl, list_id)
+
+ return {
+ '_type': 'playlist',
+ 'entries': self._enum_videos(json),
+ 'id': list_id
+ }
_VALID_URL = r'https?://streamcloud\.eu/(?P<id>[a-zA-Z0-9_-]+)(?:/(?P<fname>[^#?]*)\.html)?'
_TESTS = [{
- 'url': 'http://streamcloud.eu/skp9j99s4bpz/youtube-dl_test_video_____________-BaW_jenozKc.mp4.html',
+ 'url': 'http://streamcloud.eu/skp9j99s4bpz/youtube-dlc_test_video_____________-BaW_jenozKc.mp4.html',
'md5': '6bea4c7fa5daaacc2a946b7146286686',
'info_dict': {
'id': 'skp9j99s4bpz',
'ext': 'mp4',
- 'title': 'youtube-dl test video \'/\\ ä ↭',
+ 'title': 'youtube-dlc test video \'/\\ ä ↭',
},
'skip': 'Only available from the EU'
}, {
--- /dev/null
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import int_or_none
+
+
+class StretchInternetIE(InfoExtractor):
+ _VALID_URL = r'https?://portal\.stretchinternet\.com/[^/]+/(?:portal|full)\.htm\?.*?\beventId=(?P<id>\d+)'
+ _TEST = {
+ 'url': 'https://portal.stretchinternet.com/umary/portal.htm?eventId=573272&streamType=video',
+ 'info_dict': {
+ 'id': '573272',
+ 'ext': 'mp4',
+ 'title': 'University of Mary Wrestling vs. Upper Iowa',
+ 'timestamp': 1575668361,
+ 'upload_date': '20191206',
+ }
+ }
+
+ def _real_extract(self, url):
+ video_id = self._match_id(url)
+
+ event = self._download_json(
+ 'https://api.stretchinternet.com/trinity/event/tcg/' + video_id,
+ video_id)[0]
+
+ return {
+ 'id': video_id,
+ 'title': event['title'],
+ 'timestamp': int_or_none(event.get('dateCreated'), 1000),
+ 'url': 'https://' + event['media'][0]['url'],
+ }
import re
from .common import InfoExtractor
-from ..compat import (
- compat_parse_qs,
- compat_urllib_parse_urlparse,
-)
+from ..compat import compat_str
from ..utils import (
determine_ext,
dict_get,
int_or_none,
- orderedSet,
+ str_or_none,
strip_or_none,
try_get,
- urljoin,
- compat_str,
)
self._adjust_title(info_dict)
return info_dict
- svt_id = self._search_regex(
- r'<video[^>]+data-video-id=["\']([\da-zA-Z-]+)',
- webpage, 'video id')
+ svt_id = try_get(
+ data, lambda x: x['statistics']['dataLake']['content']['id'],
+ compat_str)
+
+ if not svt_id:
+ svt_id = self._search_regex(
+ (r'<video[^>]+data-video-id=["\']([\da-zA-Z-]+)',
+ r'"content"\s*:\s*{.*?"id"\s*:\s*"([\da-zA-Z-]+)"'),
+ webpage, 'video id')
return self._extract_by_video_id(svt_id, webpage)
class SVTSeriesIE(SVTPlayBaseIE):
- _VALID_URL = r'https?://(?:www\.)?svtplay\.se/(?P<id>[^/?&#]+)'
+ _VALID_URL = r'https?://(?:www\.)?svtplay\.se/(?P<id>[^/?&#]+)(?:.+?\btab=(?P<season_slug>[^&#]+))?'
_TESTS = [{
'url': 'https://www.svtplay.se/rederiet',
'info_dict': {
- 'id': 'rederiet',
+ 'id': '14445680',
'title': 'Rederiet',
- 'description': 'md5:505d491a58f4fcf6eb418ecab947e69e',
+ 'description': 'md5:d9fdfff17f5d8f73468176ecd2836039',
},
'playlist_mincount': 318,
}, {
- 'url': 'https://www.svtplay.se/rederiet?tab=sasong2',
+ 'url': 'https://www.svtplay.se/rederiet?tab=season-2-14445680',
'info_dict': {
- 'id': 'rederiet-sasong2',
+ 'id': 'season-2-14445680',
'title': 'Rederiet - Säsong 2',
- 'description': 'md5:505d491a58f4fcf6eb418ecab947e69e',
+ 'description': 'md5:d9fdfff17f5d8f73468176ecd2836039',
},
- 'playlist_count': 12,
+ 'playlist_mincount': 12,
}]
@classmethod
return False if SVTIE.suitable(url) or SVTPlayIE.suitable(url) else super(SVTSeriesIE, cls).suitable(url)
def _real_extract(self, url):
- series_id = self._match_id(url)
-
- qs = compat_parse_qs(compat_urllib_parse_urlparse(url).query)
- season_slug = qs.get('tab', [None])[0]
-
- if season_slug:
- series_id += '-%s' % season_slug
-
- webpage = self._download_webpage(
- url, series_id, 'Downloading series page')
-
- root = self._parse_json(
- self._search_regex(
- self._SVTPLAY_RE, webpage, 'content', group='json'),
- series_id)
+ series_slug, season_id = re.match(self._VALID_URL, url).groups()
+
+ series = self._download_json(
+ 'https://api.svt.se/contento/graphql', series_slug,
+ 'Downloading series page', query={
+ 'query': '''{
+ listablesBySlug(slugs: ["%s"]) {
+ associatedContent(include: [productionPeriod, season]) {
+ items {
+ item {
+ ... on Episode {
+ videoSvtId
+ }
+ }
+ }
+ id
+ name
+ }
+ id
+ longDescription
+ name
+ shortDescription
+ }
+}''' % series_slug,
+ })['data']['listablesBySlug'][0]
season_name = None
entries = []
- for season in root['relatedVideoContent']['relatedVideosAccordion']:
+ for season in series['associatedContent']:
if not isinstance(season, dict):
continue
- if season_slug:
- if season.get('slug') != season_slug:
+ if season_id:
+ if season.get('id') != season_id:
continue
season_name = season.get('name')
- videos = season.get('videos')
- if not isinstance(videos, list):
+ items = season.get('items')
+ if not isinstance(items, list):
continue
- for video in videos:
- content_url = video.get('contentUrl')
- if not content_url or not isinstance(content_url, compat_str):
+ for item in items:
+ video = item.get('item') or {}
+ content_id = video.get('videoSvtId')
+ if not content_id or not isinstance(content_id, compat_str):
continue
- entries.append(
- self.url_result(
- urljoin(url, content_url),
- ie=SVTPlayIE.ie_key(),
- video_title=video.get('title')
- ))
-
- metadata = root.get('metaData')
- if not isinstance(metadata, dict):
- metadata = {}
+ entries.append(self.url_result(
+ 'svt:' + content_id, SVTPlayIE.ie_key(), content_id))
- title = metadata.get('title')
- season_name = season_name or season_slug
+ title = series.get('name')
+ season_name = season_name or season_id
if title and season_name:
title = '%s - %s' % (title, season_name)
- elif season_slug:
- title = season_slug
+ elif season_id:
+ title = season_id
return self.playlist_result(
- entries, series_id, title, metadata.get('description'))
+ entries, season_id or series.get('id'), title,
+ dict_get(series, ('longDescription', 'shortDescription')))
class SVTPageIE(InfoExtractor):
- _VALID_URL = r'https?://(?:www\.)?svt\.se/(?:[^/]+/)*(?P<id>[^/?&#]+)'
+ _VALID_URL = r'https?://(?:www\.)?svt\.se/(?P<path>(?:[^/]+/)*(?P<id>[^/?&#]+))'
_TESTS = [{
- 'url': 'https://www.svt.se/sport/oseedat/guide-sommartraningen-du-kan-gora-var-och-nar-du-vill',
+ 'url': 'https://www.svt.se/sport/ishockey/bakom-masken-lehners-kamp-mot-mental-ohalsa',
'info_dict': {
- 'id': 'guide-sommartraningen-du-kan-gora-var-och-nar-du-vill',
- 'title': 'GUIDE: Sommarträning du kan göra var och när du vill',
+ 'id': '25298267',
+ 'title': 'Bakom masken – Lehners kamp mot mental ohälsa',
},
- 'playlist_count': 7,
+ 'playlist_count': 4,
}, {
- 'url': 'https://www.svt.se/nyheter/inrikes/ebba-busch-thor-kd-har-delvis-ratt-om-no-go-zoner',
+ 'url': 'https://www.svt.se/nyheter/utrikes/svenska-andrea-ar-en-mil-fran-branderna-i-kalifornien',
'info_dict': {
- 'id': 'ebba-busch-thor-kd-har-delvis-ratt-om-no-go-zoner',
- 'title': 'Ebba Busch Thor har bara delvis rätt om ”no-go-zoner”',
+ 'id': '24243746',
+ 'title': 'Svenska Andrea redo att fly sitt hem i Kalifornien',
},
- 'playlist_count': 1,
+ 'playlist_count': 2,
}, {
# only programTitle
'url': 'http://www.svt.se/sport/ishockey/jagr-tacklar-giroux-under-intervjun',
'info_dict': {
- 'id': '2900353',
+ 'id': '8439V2K',
'ext': 'mp4',
'title': 'Stjärnorna skojar till det - under SVT-intervjun',
'duration': 27,
return False if SVTIE.suitable(url) else super(SVTPageIE, cls).suitable(url)
def _real_extract(self, url):
- playlist_id = self._match_id(url)
+ path, display_id = re.match(self._VALID_URL, url).groups()
- webpage = self._download_webpage(url, playlist_id)
+ article = self._download_json(
+ 'https://api.svt.se/nss-api/page/' + path, display_id,
+ query={'q': 'articles'})['articles']['content'][0]
- entries = [
- self.url_result(
- 'svt:%s' % video_id, ie=SVTPlayIE.ie_key(), video_id=video_id)
- for video_id in orderedSet(re.findall(
- r'data-video-id=["\'](\d+)', webpage))]
+ entries = []
- title = strip_or_none(self._og_search_title(webpage, default=None))
+ def _process_content(content):
+ if content.get('_type') in ('VIDEOCLIP', 'VIDEOEPISODE'):
+ video_id = compat_str(content['image']['svtId'])
+ entries.append(self.url_result(
+ 'svt:' + video_id, SVTPlayIE.ie_key(), video_id))
- return self.playlist_result(entries, playlist_id, title)
+ for media in article.get('media', []):
+ _process_content(media)
+
+ for obj in article.get('structuredBody', []):
+ _process_content(obj.get('content') or {})
+
+ return self.playlist_result(
+ entries, str_or_none(article.get('id')),
+ strip_or_none(article.get('title')))
from .common import InfoExtractor
from .wistia import WistiaIE
-from ..compat import compat_str
from ..utils import (
clean_html,
ExtractorError,
+ int_or_none,
get_element_by_class,
+ strip_or_none,
urlencode_postdata,
urljoin,
)
_SITES = {
# Only notable ones here
- 'upskillcourses.com': 'upskill',
- 'academy.gns3.com': 'gns3',
+ 'v1.upskillcourses.com': 'upskill',
+ 'gns3.teachable.com': 'gns3',
'academyhacker.com': 'academyhacker',
'stackskills.com': 'stackskills',
'market.saleshacker.com': 'saleshacker',
self._logged_in = True
return
- login_url = compat_str(urlh.geturl())
+ login_url = urlh.geturl()
login_form = self._hidden_inputs(login_page)
''' % TeachableBaseIE._VALID_URL_SUB_TUPLE
_TESTS = [{
- 'url': 'http://upskillcourses.com/courses/essential-web-developer-course/lectures/1747100',
+ 'url': 'https://gns3.teachable.com/courses/gns3-certified-associate/lectures/6842364',
'info_dict': {
- 'id': 'uzw6zw58or',
- 'ext': 'mp4',
- 'title': 'Welcome to the Course!',
- 'description': 'md5:65edb0affa582974de4625b9cdea1107',
- 'duration': 138.763,
- 'timestamp': 1479846621,
- 'upload_date': '20161122',
+ 'id': 'untlgzk1v7',
+ 'ext': 'bin',
+ 'title': 'Overview',
+ 'description': 'md5:071463ff08b86c208811130ea1c2464c',
+ 'duration': 736.4,
+ 'timestamp': 1542315762,
+ 'upload_date': '20181115',
+ 'chapter': 'Welcome',
+ 'chapter_number': 1,
},
'params': {
'skip_download': True,
},
}, {
- 'url': 'http://upskillcourses.com/courses/119763/lectures/1747100',
+ 'url': 'http://v1.upskillcourses.com/courses/119763/lectures/1747100',
'only_matching': True,
}, {
- 'url': 'https://academy.gns3.com/courses/423415/lectures/6885939',
+ 'url': 'https://gns3.teachable.com/courses/423415/lectures/6885939',
'only_matching': True,
}, {
- 'url': 'teachable:https://upskillcourses.com/courses/essential-web-developer-course/lectures/1747100',
+ 'url': 'teachable:https://v1.upskillcourses.com/courses/essential-web-developer-course/lectures/1747100',
'only_matching': True,
}]
webpage = self._download_webpage(url, video_id)
- wistia_url = WistiaIE._extract_url(webpage)
- if not wistia_url:
+ wistia_urls = WistiaIE._extract_urls(webpage)
+ if not wistia_urls:
if any(re.search(p, webpage) for p in (
r'class=["\']lecture-contents-locked',
r'>\s*Lecture contents locked',
- r'id=["\']lecture-locked')):
+ r'id=["\']lecture-locked',
+ # https://academy.tailoredtutors.co.uk/courses/108779/lectures/1955313
+ r'class=["\'](?:inner-)?lesson-locked',
+ r'>LESSON LOCKED<')):
self.raise_login_required('Lecture contents locked')
+ raise ExtractorError('Unable to find video URL')
title = self._og_search_title(webpage, default=None)
- return {
+ chapter = None
+ chapter_number = None
+ section_item = self._search_regex(
+ r'(?s)(?P<li><li[^>]+\bdata-lecture-id=["\']%s[^>]+>.+?</li>)' % video_id,
+ webpage, 'section item', default=None, group='li')
+ if section_item:
+ chapter_number = int_or_none(self._search_regex(
+ r'data-ss-position=["\'](\d+)', section_item, 'section id',
+ default=None))
+ if chapter_number is not None:
+ sections = []
+ for s in re.findall(
+ r'(?s)<div[^>]+\bclass=["\']section-title[^>]+>(.+?)</div>', webpage):
+ section = strip_or_none(clean_html(s))
+ if not section:
+ sections = []
+ break
+ sections.append(section)
+ if chapter_number <= len(sections):
+ chapter = sections[chapter_number - 1]
+
+ entries = [{
'_type': 'url_transparent',
'url': wistia_url,
'ie_key': WistiaIE.ie_key(),
'title': title,
- }
+ 'chapter': chapter,
+ 'chapter_number': chapter_number,
+ } for wistia_url in wistia_urls]
+
+ return self.playlist_result(entries, video_id, title)
class TeachableCourseIE(TeachableBaseIE):
/(?:courses|p)/(?:enrolled/)?(?P<id>[^/?#&]+)
''' % TeachableBaseIE._VALID_URL_SUB_TUPLE
_TESTS = [{
- 'url': 'http://upskillcourses.com/courses/essential-web-developer-course/',
+ 'url': 'http://v1.upskillcourses.com/courses/essential-web-developer-course/',
'info_dict': {
'id': 'essential-web-developer-course',
'title': 'The Essential Web Developer Course (Free)',
},
'playlist_count': 192,
}, {
- 'url': 'http://upskillcourses.com/courses/119763/',
+ 'url': 'http://v1.upskillcourses.com/courses/119763/',
'only_matching': True,
}, {
- 'url': 'http://upskillcourses.com/courses/enrolled/119763',
+ 'url': 'http://v1.upskillcourses.com/courses/enrolled/119763',
'only_matching': True,
}, {
- 'url': 'https://academy.gns3.com/courses/enrolled/423415',
+ 'url': 'https://gns3.teachable.com/courses/enrolled/423415',
'only_matching': True,
}, {
'url': 'teachable:https://learn.vrdev.school/p/gear-vr-developer-mini',
--- /dev/null
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from .jwplatform import JWPlatformIE
+from .nexx import NexxIE
+from ..compat import compat_urlparse
+from ..utils import (
+ NO_DEFAULT,
+ smuggle_url,
+)
+
+
+class Tele5IE(InfoExtractor):
+ _VALID_URL = r'https?://(?:www\.)?tele5\.de/(?:[^/]+/)*(?P<id>[^/?#&]+)'
+ _GEO_COUNTRIES = ['DE']
+ _TESTS = [{
+ 'url': 'https://www.tele5.de/mediathek/filme-online/videos?vid=1549416',
+ 'info_dict': {
+ 'id': '1549416',
+ 'ext': 'mp4',
+ 'upload_date': '20180814',
+ 'timestamp': 1534290623,
+ 'title': 'Pandorum',
+ },
+ 'params': {
+ 'skip_download': True,
+ },
+ }, {
+ # jwplatform, nexx unavailable
+ 'url': 'https://www.tele5.de/filme/ghoul-das-geheimnis-des-friedhofmonsters/',
+ 'info_dict': {
+ 'id': 'WJuiOlUp',
+ 'ext': 'mp4',
+ 'upload_date': '20200603',
+ 'timestamp': 1591214400,
+ 'title': 'Ghoul - Das Geheimnis des Friedhofmonsters',
+ 'description': 'md5:42002af1d887ff3d5b2b3ca1f8137d97',
+ },
+ 'params': {
+ 'skip_download': True,
+ },
+ 'add_ie': [JWPlatformIE.ie_key()],
+ }, {
+ 'url': 'https://www.tele5.de/kalkofes-mattscheibe/video-clips/politik-und-gesellschaft?ve_id=1551191',
+ 'only_matching': True,
+ }, {
+ 'url': 'https://www.tele5.de/video-clip/?ve_id=1609440',
+ 'only_matching': True,
+ }, {
+ 'url': 'https://www.tele5.de/filme/schlefaz-dragon-crusaders/',
+ 'only_matching': True,
+ }, {
+ 'url': 'https://www.tele5.de/filme/making-of/avengers-endgame/',
+ 'only_matching': True,
+ }, {
+ 'url': 'https://www.tele5.de/star-trek/raumschiff-voyager/ganze-folge/das-vinculum/',
+ 'only_matching': True,
+ }, {
+ 'url': 'https://www.tele5.de/anders-ist-sevda/',
+ 'only_matching': True,
+ }]
+
+ def _real_extract(self, url):
+ qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
+ video_id = (qs.get('vid') or qs.get('ve_id') or [None])[0]
+
+ NEXX_ID_RE = r'\d{6,}'
+ JWPLATFORM_ID_RE = r'[a-zA-Z0-9]{8}'
+
+ def nexx_result(nexx_id):
+ return self.url_result(
+ 'https://api.nexx.cloud/v3/759/videos/byid/%s' % nexx_id,
+ ie=NexxIE.ie_key(), video_id=nexx_id)
+
+ nexx_id = jwplatform_id = None
+
+ if video_id:
+ if re.match(NEXX_ID_RE, video_id):
+ return nexx_result(video_id)
+ elif re.match(JWPLATFORM_ID_RE, video_id):
+ jwplatform_id = video_id
+
+ if not nexx_id:
+ display_id = self._match_id(url)
+ webpage = self._download_webpage(url, display_id)
+
+ def extract_id(pattern, name, default=NO_DEFAULT):
+ return self._html_search_regex(
+ (r'id\s*=\s*["\']video-player["\'][^>]+data-id\s*=\s*["\'](%s)' % pattern,
+ r'\s+id\s*=\s*["\']player_(%s)' % pattern,
+ r'\bdata-id\s*=\s*["\'](%s)' % pattern), webpage, name,
+ default=default)
+
+ nexx_id = extract_id(NEXX_ID_RE, 'nexx id', default=None)
+ if nexx_id:
+ return nexx_result(nexx_id)
+
+ if not jwplatform_id:
+ jwplatform_id = extract_id(JWPLATFORM_ID_RE, 'jwplatform id')
+
+ return self.url_result(
+ smuggle_url(
+ 'jwplatform:%s' % jwplatform_id,
+ {'geo_countries': self._GEO_COUNTRIES}),
+ ie=JWPlatformIE.ie_key(), video_id=jwplatform_id)
determine_ext,
int_or_none,
str_or_none,
+ try_get,
urljoin,
)
'info_dict': {
'id': '1876350223',
'title': 'Bacalao con kokotxas al pil-pil',
- 'description': 'md5:1382dacd32dd4592d478cbdca458e5bb',
+ 'description': 'md5:716caf5601e25c3c5ab6605b1ae71529',
},
'playlist': [{
'md5': 'adb28c37238b675dad0f042292f209a7',
'description': 'md5:2771356ff7bfad9179c5f5cd954f1477',
'duration': 50,
},
+ }, {
+ # video in opening's content
+ 'url': 'https://www.telecinco.es/vivalavida/fiorella-sobrina-edmundo-arrocet-entrevista_18_2907195140.html',
+ 'info_dict': {
+ 'id': '2907195140',
+ 'title': 'La surrealista entrevista a la sobrina de Edmundo Arrocet: "No puedes venir aquí y tomarnos por tontos"',
+ 'description': 'md5:73f340a7320143d37ab895375b2bf13a',
+ },
+ 'playlist': [{
+ 'md5': 'adb28c37238b675dad0f042292f209a7',
+ 'info_dict': {
+ 'id': 'TpI2EttSDAReWpJ1o0NVh2',
+ 'ext': 'mp4',
+ 'title': 'La surrealista entrevista a la sobrina de Edmundo Arrocet: "No puedes venir aquí y tomarnos por tontos"',
+ 'duration': 1015,
+ },
+ }],
+ 'params': {
+ 'skip_download': True,
+ },
}, {
'url': 'http://www.telecinco.es/informativos/nacional/Pablo_Iglesias-Informativos_Telecinco-entrevista-Pedro_Piqueras_2_1945155182.html',
'only_matching': True,
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
article = self._parse_json(self._search_regex(
- r'window\.\$REACTBASE_STATE\.article\s*=\s*({.+})',
+ r'window\.\$REACTBASE_STATE\.article(?:_multisite)?\s*=\s*({.+})',
webpage, 'article'), display_id)['article']
title = article.get('title')
- description = clean_html(article.get('leadParagraph'))
+ description = clean_html(article.get('leadParagraph')) or ''
if article.get('editorialType') != 'VID':
entries = []
- for p in article.get('body', []):
+ body = [article.get('opening')]
+ body.extend(try_get(article, lambda x: x['body'], list) or [])
+ for p in body:
+ if not isinstance(p, dict):
+ continue
content = p.get('content')
- if p.get('type') != 'video' or not content:
+ if not content:
+ continue
+ type_ = p.get('type')
+ if type_ == 'paragraph':
+ content_str = str_or_none(content)
+ if content_str:
+ description += content_str
continue
- entries.append(self._parse_content(content, url))
+ if type_ == 'video' and isinstance(content, dict):
+ entries.append(self._parse_content(content, url))
return self.playlist_result(
entries, str_or_none(article.get('id')), title, description)
content = article['opening']['content']
'ext': 'mp4',
'title': 'Un petit choc et puis repart!',
'description': 'md5:b04a7e6b3f74e32d7b294cffe8658374',
- 'upload_date': '20180222',
- 'timestamp': 1519326631,
},
'params': {
'skip_download': True,
class TenPlayIE(InfoExtractor):
- _VALID_URL = r'https?://(?:www\.)?10play\.com\.au/[^/]+/episodes/[^/]+/[^/]+/(?P<id>tpv\d{6}[a-z]{5})'
- _TEST = {
+ _VALID_URL = r'https?://(?:www\.)?10play\.com\.au/(?:[^/]+/)+(?P<id>tpv\d{6}[a-z]{5})'
+ _TESTS = [{
'url': 'https://10play.com.au/masterchef/episodes/season-1/masterchef-s1-ep-1/tpv190718kwzga',
'info_dict': {
'id': '6060533435001',
'format': 'bestvideo',
'skip_download': True,
}
- }
+ }, {
+ 'url': 'https://10play.com.au/how-to-stay-married/web-extras/season-1/terrys-talks-ep-1-embracing-change/tpv190915ylupc',
+ 'only_matching': True,
+ }]
BRIGHTCOVE_URL_TEMPLATE = 'https://players.brightcove.net/2199827728001/cN6vRtRQt_default/index.html?videoId=%s'
def _real_extract(self, url):
_VALID_URL = r'https?://(?:www\.)?tfo\.org/(?:en|fr)/(?:[^/]+/){2}(?P<id>\d+)'
_TEST = {
'url': 'http://www.tfo.org/en/universe/tfo-247/100463871/video-game-hackathon',
- 'md5': '47c987d0515561114cf03d1226a9d4c7',
+ 'md5': 'cafbe4f47a8dae0ca0159937878100d6',
'info_dict': {
- 'id': '100463871',
+ 'id': '7da3d50e495c406b8fc0b997659cc075',
'ext': 'mp4',
'title': 'Video Game Hackathon',
'description': 'md5:558afeba217c6c8d96c60e5421795c07',
- 'upload_date': '20160212',
- 'timestamp': 1455310233,
}
}
from __future__ import unicode_literals
from .common import InfoExtractor
-from ..compat import compat_str
-from ..utils import try_get
class ThisOldHouseIE(InfoExtractor):
- _VALID_URL = r'https?://(?:www\.)?thisoldhouse\.com/(?:watch|how-to|tv-episode)/(?P<id>[^/?#]+)'
+ _VALID_URL = r'https?://(?:www\.)?thisoldhouse\.com/(?:watch|how-to|tv-episode|(?:[^/]+/)?\d+)/(?P<id>[^/?#]+)'
_TESTS = [{
'url': 'https://www.thisoldhouse.com/how-to/how-to-build-storage-bench',
- 'md5': '568acf9ca25a639f0c4ff905826b662f',
'info_dict': {
- 'id': '2REGtUDQ',
+ 'id': '5dcdddf673c3f956ef5db202',
'ext': 'mp4',
'title': 'How to Build a Storage Bench',
'description': 'In the workshop, Tom Silva and Kevin O\'Connor build a storage bench for an entryway.',
'timestamp': 1442548800,
'upload_date': '20150918',
- }
+ },
+ 'params': {
+ 'skip_download': True,
+ },
}, {
'url': 'https://www.thisoldhouse.com/watch/arlington-arts-crafts-arts-and-crafts-class-begins',
'only_matching': True,
}, {
'url': 'https://www.thisoldhouse.com/tv-episode/ask-toh-shelf-rough-electric',
'only_matching': True,
+ }, {
+ 'url': 'https://www.thisoldhouse.com/furniture/21017078/how-to-build-a-storage-bench',
+ 'only_matching': True,
+ }, {
+ 'url': 'https://www.thisoldhouse.com/21113884/s41-e13-paradise-lost',
+ 'only_matching': True,
+ }, {
+ # iframe www.thisoldhouse.com
+ 'url': 'https://www.thisoldhouse.com/21083431/seaside-transformation-the-westerly-project',
+ 'only_matching': True,
}]
+ _ZYPE_TMPL = 'https://player.zype.com/embed/%s.html?api_key=hsOk_yMSPYNrT22e9pu8hihLXjaZf0JW5jsOWv4ZqyHJFvkJn6rtToHl09tbbsbe'
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
video_id = self._search_regex(
- (r'data-mid=(["\'])(?P<id>(?:(?!\1).)+)\1',
- r'id=(["\'])inline-video-player-(?P<id>(?:(?!\1).)+)\1'),
- webpage, 'video id', default=None, group='id')
- if not video_id:
- drupal_settings = self._parse_json(self._search_regex(
- r'jQuery\.extend\(Drupal\.settings\s*,\s*({.+?})\);',
- webpage, 'drupal settings'), display_id)
- video_id = try_get(
- drupal_settings, lambda x: x['jwplatform']['video_id'],
- compat_str) or list(drupal_settings['comScore'])[0]
- return self.url_result('jwplatform:' + video_id, 'JWPlatform', video_id)
+ r'<iframe[^>]+src=[\'"](?:https?:)?//(?:www\.)?thisoldhouse\.(?:chorus\.build|com)/videos/zype/([0-9a-f]{24})',
+ webpage, 'video id')
+ return self.url_result(self._ZYPE_TMPL % video_id, 'Zype', video_id)
class ToggleIE(InfoExtractor):
IE_NAME = 'toggle'
- _VALID_URL = r'https?://video\.toggle\.sg/(?:en|zh)/(?:[^/]+/){2,}(?P<id>[0-9]+)'
+ _VALID_URL = r'https?://(?:(?:www\.)?mewatch|video\.toggle)\.sg/(?:en|zh)/(?:[^/]+/){2,}(?P<id>[0-9]+)'
_TESTS = [{
- 'url': 'http://video.toggle.sg/en/series/lion-moms-tif/trailers/lion-moms-premier/343115',
+ 'url': 'http://www.mewatch.sg/en/series/lion-moms-tif/trailers/lion-moms-premier/343115',
'info_dict': {
'id': '343115',
'ext': 'mp4',
}
}, {
'note': 'DRM-protected video',
- 'url': 'http://video.toggle.sg/en/movies/dug-s-special-mission/341413',
+ 'url': 'http://www.mewatch.sg/en/movies/dug-s-special-mission/341413',
'info_dict': {
'id': '341413',
'ext': 'wvm',
}, {
# this also tests correct video id extraction
'note': 'm3u8 links are geo-restricted, but Android/mp4 is okay',
- 'url': 'http://video.toggle.sg/en/series/28th-sea-games-5-show/28th-sea-games-5-show-ep11/332861',
+ 'url': 'http://www.mewatch.sg/en/series/28th-sea-games-5-show/28th-sea-games-5-show-ep11/332861',
'info_dict': {
'id': '332861',
'ext': 'mp4',
'url': 'http://video.toggle.sg/en/clips/seraph-sun-aloysius-will-suddenly-sing-some-old-songs-in-high-pitch-on-set/343331',
'only_matching': True,
}, {
- 'url': 'http://video.toggle.sg/zh/series/zero-calling-s2-hd/ep13/336367',
+ 'url': 'http://www.mewatch.sg/en/clips/seraph-sun-aloysius-will-suddenly-sing-some-old-songs-in-high-pitch-on-set/343331',
'only_matching': True,
}, {
- 'url': 'http://video.toggle.sg/en/series/vetri-s2/webisodes/jeeva-is-an-orphan-vetri-s2-webisode-7/342302',
+ 'url': 'http://www.mewatch.sg/zh/series/zero-calling-s2-hd/ep13/336367',
'only_matching': True,
}, {
- 'url': 'http://video.toggle.sg/en/movies/seven-days/321936',
+ 'url': 'http://www.mewatch.sg/en/series/vetri-s2/webisodes/jeeva-is-an-orphan-vetri-s2-webisode-7/342302',
'only_matching': True,
}, {
- 'url': 'https://video.toggle.sg/en/tv-show/news/may-2017-cna-singapore-tonight/fri-19-may-2017/512456',
+ 'url': 'http://www.mewatch.sg/en/movies/seven-days/321936',
'only_matching': True,
}, {
- 'url': 'http://video.toggle.sg/en/channels/eleven-plus/401585',
+ 'url': 'https://www.mewatch.sg/en/tv-show/news/may-2017-cna-singapore-tonight/fri-19-may-2017/512456',
+ 'only_matching': True,
+ }, {
+ 'url': 'http://www.mewatch.sg/en/channels/eleven-plus/401585',
'only_matching': True,
}]
--- /dev/null
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+
+
+class TruNewsIE(InfoExtractor):
+ _VALID_URL = r'https?://(?:www\.)?trunews\.com/stream/(?P<id>[^/?#&]+)'
+ _TEST = {
+ 'url': 'https://www.trunews.com/stream/will-democrats-stage-a-circus-during-president-trump-s-state-of-the-union-speech',
+ 'info_dict': {
+ 'id': '5c5a21e65d3c196e1c0020cc',
+ 'display_id': 'will-democrats-stage-a-circus-during-president-trump-s-state-of-the-union-speech',
+ 'ext': 'mp4',
+ 'title': "Will Democrats Stage a Circus During President Trump's State of the Union Speech?",
+ 'description': 'md5:c583b72147cc92cf21f56a31aff7a670',
+ 'duration': 3685,
+ 'timestamp': 1549411440,
+ 'upload_date': '20190206',
+ },
+ 'add_ie': ['Zype'],
+ }
+ _ZYPE_TEMPL = 'https://player.zype.com/embed/%s.js?api_key=X5XnahkjCwJrT_l5zUqypnaLEObotyvtUKJWWlONxDoHVjP8vqxlArLV8llxMbyt'
+
+ def _real_extract(self, url):
+ display_id = self._match_id(url)
+
+ zype_id = self._download_json(
+ 'https://api.zype.com/videos', display_id, query={
+ 'app_key': 'PUVKp9WgGUb3-JUw6EqafLx8tFVP6VKZTWbUOR-HOm__g4fNDt1bCsm_LgYf_k9H',
+ 'per_page': 1,
+ 'active': 'true',
+ 'friendly_title': display_id,
+ })['response'][0]['_id']
+ return self.url_result(self._ZYPE_TEMPL % zype_id, 'Zype', zype_id)
import re
from .common import InfoExtractor
-from ..compat import compat_str
from ..utils import (
ExtractorError,
int_or_none,
url = 'http://%s.tumblr.com/post/%s/' % (blog, video_id)
webpage, urlh = self._download_webpage_handle(url, video_id)
- redirect_url = compat_str(urlh.geturl())
+ redirect_url = urlh.geturl()
if 'tumblr.com/safe-mode' in redirect_url or redirect_url.startswith('/safe-mode'):
raise ExtractorError(
'This Tumblr may contain sensitive media. '
video_id = self._match_id(url)
video = self._download_json(
- 'http://play.tv2bornholm.dk/controls/AJAX.aspx/specifikVideo', video_id,
+ 'https://play.tv2bornholm.dk/controls/AJAX.aspx/specifikVideo', video_id,
data=json.dumps({
'playlist_id': video_id,
'serienavn': '',
manifest_url.replace('.m3u8', '.f4m'),
video_id, f4m_id='hds', fatal=False))
formats.extend(self._extract_ism_formats(
- re.sub(r'\.ism/.+?\.m3u8', r'.ism/Manifest', manifest_url),
+ re.sub(r'\.ism/.*?\.m3u8', r'.ism/Manifest', manifest_url),
video_id, ism_id='mss', fatal=False))
if not formats and info.get('is_geo_restricted'):
--- /dev/null
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import (
+ determine_ext,
+ extract_attributes,
+ int_or_none,
+ parse_duration,
+)
+
+
+class TV5MondePlusIE(InfoExtractor):
+ IE_DESC = 'TV5MONDE+'
+ _VALID_URL = r'https?://(?:www\.)?(?:tv5mondeplus|revoir\.tv5monde)\.com/toutes-les-videos/[^/]+/(?P<id>[^/?#]+)'
+ _TESTS = [{
+ # movie
+ 'url': 'https://revoir.tv5monde.com/toutes-les-videos/cinema/rendez-vous-a-atlit',
+ 'md5': '8cbde5ea7b296cf635073e27895e227f',
+ 'info_dict': {
+ 'id': '822a4756-0712-7329-1859-a13ac7fd1407',
+ 'display_id': 'rendez-vous-a-atlit',
+ 'ext': 'mp4',
+ 'title': 'Rendez-vous à Atlit',
+ 'description': 'md5:2893a4c5e1dbac3eedff2d87956e4efb',
+ 'upload_date': '20200130',
+ },
+ }, {
+ # series episode
+ 'url': 'https://revoir.tv5monde.com/toutes-les-videos/series-fictions/c-est-la-vie-ennemie-juree',
+ 'info_dict': {
+ 'id': '0df7007c-4900-3936-c601-87a13a93a068',
+ 'display_id': 'c-est-la-vie-ennemie-juree',
+ 'ext': 'mp4',
+ 'title': "C'est la vie - Ennemie jurée",
+ 'description': 'md5:dfb5c63087b6f35fe0cc0af4fe44287e',
+ 'upload_date': '20200130',
+ 'series': "C'est la vie",
+ 'episode': 'Ennemie jurée',
+ },
+ 'params': {
+ 'skip_download': True,
+ },
+ }, {
+ 'url': 'https://revoir.tv5monde.com/toutes-les-videos/series-fictions/neuf-jours-en-hiver-neuf-jours-en-hiver',
+ 'only_matching': True,
+ }, {
+ 'url': 'https://revoir.tv5monde.com/toutes-les-videos/info-societe/le-journal-de-la-rts-edition-du-30-01-20-19h30',
+ 'only_matching': True,
+ }]
+ _GEO_BYPASS = False
+
+ def _real_extract(self, url):
+ display_id = self._match_id(url)
+ webpage = self._download_webpage(url, display_id)
+
+ if ">Ce programme n'est malheureusement pas disponible pour votre zone géographique.<" in webpage:
+ self.raise_geo_restricted(countries=['FR'])
+
+ title = episode = self._html_search_regex(r'<h1>([^<]+)', webpage, 'title')
+ vpl_data = extract_attributes(self._search_regex(
+ r'(<[^>]+class="video_player_loader"[^>]+>)',
+ webpage, 'video player loader'))
+
+ video_files = self._parse_json(
+ vpl_data['data-broadcast'], display_id).get('files', [])
+ formats = []
+ for video_file in video_files:
+ v_url = video_file.get('url')
+ if not v_url:
+ continue
+ video_format = video_file.get('format') or determine_ext(v_url)
+ if video_format == 'm3u8':
+ formats.extend(self._extract_m3u8_formats(
+ v_url, display_id, 'mp4', 'm3u8_native',
+ m3u8_id='hls', fatal=False))
+ else:
+ formats.append({
+ 'url': v_url,
+ 'format_id': video_format,
+ })
+ self._sort_formats(formats)
+
+ description = self._html_search_regex(
+ r'(?s)<div[^>]+class=["\']episode-texte[^>]+>(.+?)</div>', webpage,
+ 'description', fatal=False)
+
+ series = self._html_search_regex(
+ r'<p[^>]+class=["\']episode-emission[^>]+>([^<]+)', webpage,
+ 'series', default=None)
+
+ if series and series != title:
+ title = '%s - %s' % (series, title)
+
+ upload_date = self._search_regex(
+ r'(?:date_publication|publish_date)["\']\s*:\s*["\'](\d{4}_\d{2}_\d{2})',
+ webpage, 'upload date', default=None)
+ if upload_date:
+ upload_date = upload_date.replace('_', '')
+
+ video_id = self._search_regex(
+ (r'data-guid=["\']([\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})',
+ r'id_contenu["\']\s:\s*(\d+)'), webpage, 'video id',
+ default=display_id)
+
+ return {
+ 'id': video_id,
+ 'display_id': display_id,
+ 'title': title,
+ 'description': description,
+ 'thumbnail': vpl_data.get('data-image'),
+ 'duration': int_or_none(vpl_data.get('data-duration')) or parse_duration(self._html_search_meta('duration', webpage)),
+ 'upload_date': upload_date,
+ 'formats': formats,
+ 'series': series,
+ 'episode': episode,
+ }
class TVAIE(InfoExtractor):
- _VALID_URL = r'https?://videos\.tva\.ca/details/_(?P<id>\d+)'
- _TEST = {
+ _VALID_URL = r'https?://videos?\.tva\.ca/details/_(?P<id>\d+)'
+ _TESTS = [{
'url': 'https://videos.tva.ca/details/_5596811470001',
'info_dict': {
'id': '5596811470001',
# m3u8 download
'skip_download': True,
}
- }
+ }, {
+ 'url': 'https://video.tva.ca/details/_5596811470001',
+ 'only_matching': True,
+ }]
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/5481942443001/default_default/index.html?videoId=%s'
def _real_extract(self, url):
from ..compat import compat_str
from ..utils import (
ExtractorError,
+ get_element_by_id,
int_or_none,
parse_iso8601,
parse_duration,
str_or_none,
+ try_get,
update_url_query,
urljoin,
)
ie=TVNowIE.ie_key(), video_id=mobj.group('id'))
+class TVNowFilmIE(TVNowBaseIE):
+ _VALID_URL = r'''(?x)
+ (?P<base_url>https?://
+ (?:www\.)?tvnow\.(?:de|at|ch)/
+ (?:filme))/
+ (?P<title>[^/?$&]+)-(?P<id>\d+)
+ '''
+ _TESTS = [{
+ 'url': 'https://www.tvnow.de/filme/lord-of-war-haendler-des-todes-7959',
+ 'info_dict': {
+ 'id': '1426690',
+ 'display_id': 'lord-of-war-haendler-des-todes',
+ 'ext': 'mp4',
+ 'title': 'Lord of War',
+ 'description': 'md5:5eda15c0d5b8cb70dac724c8a0ff89a9',
+ 'timestamp': 1550010000,
+ 'upload_date': '20190212',
+ 'duration': 7016,
+ },
+ }, {
+ 'url': 'https://www.tvnow.de/filme/the-machinist-12157',
+ 'info_dict': {
+ 'id': '328160',
+ 'display_id': 'the-machinist',
+ 'ext': 'mp4',
+ 'title': 'The Machinist',
+ 'description': 'md5:9a0e363fdd74b3a9e1cdd9e21d0ecc28',
+ 'timestamp': 1496469720,
+ 'upload_date': '20170603',
+ 'duration': 5836,
+ },
+ }, {
+ 'url': 'https://www.tvnow.de/filme/horst-schlaemmer-isch-kandidiere-17777',
+ 'only_matching': True, # DRM protected
+ }]
+
+ def _real_extract(self, url):
+ mobj = re.match(self._VALID_URL, url)
+ display_id = mobj.group('title')
+
+ webpage = self._download_webpage(url, display_id, fatal=False)
+ if not webpage:
+ raise ExtractorError('Cannot download "%s"' % url, expected=True)
+
+ json_text = get_element_by_id('now-web-state', webpage)
+ if not json_text:
+ raise ExtractorError('Cannot read video data', expected=True)
+
+ json_data = self._parse_json(
+ json_text,
+ display_id,
+ transform_source=lambda x: x.replace('&q;', '"'),
+ fatal=False)
+ if not json_data:
+ raise ExtractorError('Cannot read video data', expected=True)
+
+ player_key = next(
+ (key for key in json_data.keys() if 'module/player' in key),
+ None)
+ page_key = next(
+ (key for key in json_data.keys() if 'page/filme' in key),
+ None)
+ movie_id = try_get(
+ json_data,
+ [
+ lambda x: x[player_key]['body']['id'],
+ lambda x: x[page_key]['body']['modules'][0]['id'],
+ lambda x: x[page_key]['body']['modules'][1]['id']],
+ int)
+ if not movie_id:
+ raise ExtractorError('Cannot extract movie ID', expected=True)
+
+ info = self._call_api(
+ 'movies/%d' % movie_id,
+ display_id,
+ query={'fields': ','.join(self._VIDEO_FIELDS)})
+
+ return self._extract_video(info, display_id)
+
+
class TVNowNewBaseIE(InfoExtractor):
def _call_api(self, path, video_id, query={}):
result = self._download_json(
display_id, video_id = re.match(self._VALID_URL, url).groups()
info = self._call_api('player/' + video_id, video_id)
return self._extract_video(info, video_id, display_id)
+
+
+class TVNowFilmIE(TVNowIE):
+ _VALID_URL = r'''(?x)
+ (?P<base_url>https?://
+ (?:www\.)?tvnow\.(?:de|at|ch)/
+ (?:filme))/
+ (?P<title>[^/?$&]+)-(?P<id>\d+)
+ '''
+ _TESTS = [{
+ 'url': 'https://www.tvnow.de/filme/lord-of-war-haendler-des-todes-7959',
+ 'info_dict': {
+ 'id': '1426690',
+ 'display_id': 'lord-of-war-haendler-des-todes',
+ 'ext': 'mp4',
+ 'title': 'Lord of War',
+ 'description': 'md5:5eda15c0d5b8cb70dac724c8a0ff89a9',
+ 'timestamp': 1550010000,
+ 'upload_date': '20190212',
+ 'duration': 7016,
+ },
+ }, {
+ 'url': 'https://www.tvnow.de/filme/the-machinist-12157',
+ 'info_dict': {
+ 'id': '328160',
+ 'display_id': 'the-machinist',
+ 'ext': 'mp4',
+ 'title': 'The Machinist',
+ 'description': 'md5:9a0e363fdd74b3a9e1cdd9e21d0ecc28',
+ 'timestamp': 1496469720,
+ 'upload_date': '20170603',
+ 'duration': 5836,
+ },
+ }, {
+ 'url': 'https://www.tvnow.de/filme/horst-schlaemmer-isch-kandidiere-17777',
+ 'only_matching': True, # DRM protected
+ }]
+
+ def _real_extract(self, url):
+ mobj = re.match(self._VALID_URL, url)
+ display_id = mobj.group('title')
+
+ webpage = self._download_webpage(url, display_id, fatal=False)
+ if not webpage:
+ raise ExtractorError('Cannot download "%s"' % url, expected=True)
+
+ json_text = get_element_by_id('now-web-state', webpage)
+ if not json_text:
+ raise ExtractorError('Cannot read video data', expected=True)
+
+ json_data = self._parse_json(
+ json_text,
+ display_id,
+ transform_source=lambda x: x.replace('&q;', '"'),
+ fatal=False)
+ if not json_data:
+ raise ExtractorError('Cannot read video data', expected=True)
+
+ player_key = next(
+ (key for key in json_data.keys() if 'module/player' in key),
+ None)
+ page_key = next(
+ (key for key in json_data.keys() if 'page/filme' in key),
+ None)
+ movie_id = try_get(
+ json_data,
+ [
+ lambda x: x[player_key]['body']['id'],
+ lambda x: x[page_key]['body']['modules'][0]['id'],
+ lambda x: x[page_key]['body']['modules'][1]['id']],
+ int)
+ if not movie_id:
+ raise ExtractorError('Cannot extract movie ID', expected=True)
+
+ info = self._call_api('player/%d' % movie_id, display_id)
+ return self._extract_video(info, url, display_id)
"""
from .common import InfoExtractor
from ..compat import (
compat_HTTPError,
- compat_str,
compat_urlparse,
)
from ..utils import (
int_or_none,
parse_iso8601,
qualities,
- smuggle_url,
try_get,
- unsmuggle_url,
update_url_query,
url_or_none,
)
]
def _real_extract(self, url):
- url, smuggled_data = unsmuggle_url(url, {})
- self._initialize_geo_bypass({
- 'countries': smuggled_data.get('geo_countries'),
- })
-
video_id = self._match_id(url)
geo_country = self._search_regex(
r'https?://[^/]+\.([a-z]{2})', url,
'ext': ext,
}
if video_url.startswith('rtmp'):
- if smuggled_data.get('skip_rtmp'):
- continue
m = re.search(
r'^(?P<url>rtmp://[^/]+/(?P<app>[^/]+))/(?P<playpath>.+)$', video_url)
if not m:
_VALID_URL = r'''(?x)
https?://
(?:www\.)?
- viafree\.
- (?:
- (?:dk|no)/programmer|
- se/program
- )
- /(?:[^/]+/)+(?P<id>[^/?#&]+)
+ viafree\.(?P<country>dk|no|se)
+ /(?P<id>program(?:mer)?/(?:[^/]+/)+[^/?#&]+)
'''
_TESTS = [{
- 'url': 'http://www.viafree.se/program/livsstil/husraddarna/sasong-2/avsnitt-2',
+ 'url': 'http://www.viafree.no/programmer/underholdning/det-beste-vorspielet/sesong-2/episode-1',
'info_dict': {
- 'id': '395375',
+ 'id': '757786',
'ext': 'mp4',
- 'title': 'Husräddarna S02E02',
- 'description': 'md5:4db5c933e37db629b5a2f75dfb34829e',
- 'series': 'Husräddarna',
- 'season': 'Säsong 2',
+ 'title': 'Det beste vorspielet - Sesong 2 - Episode 1',
+ 'description': 'md5:b632cb848331404ccacd8cd03e83b4c3',
+ 'series': 'Det beste vorspielet',
'season_number': 2,
- 'duration': 2576,
- 'timestamp': 1400596321,
- 'upload_date': '20140520',
+ 'duration': 1116,
+ 'timestamp': 1471200600,
+ 'upload_date': '20160814',
},
'params': {
'skip_download': True,
},
- 'add_ie': [TVPlayIE.ie_key()],
}, {
# with relatedClips
'url': 'http://www.viafree.se/program/reality/sommaren-med-youtube-stjarnorna/sasong-1/avsnitt-1',
- 'info_dict': {
- 'id': '758770',
- 'ext': 'mp4',
- 'title': 'Sommaren med YouTube-stjärnorna S01E01',
- 'description': 'md5:2bc69dce2c4bb48391e858539bbb0e3f',
- 'series': 'Sommaren med YouTube-stjärnorna',
- 'season': 'Säsong 1',
- 'season_number': 1,
- 'duration': 1326,
- 'timestamp': 1470905572,
- 'upload_date': '20160811',
- },
- 'params': {
- 'skip_download': True,
- },
- 'add_ie': [TVPlayIE.ie_key()],
+ 'only_matching': True,
}, {
# Different og:image URL schema
'url': 'http://www.viafree.se/program/reality/sommaren-med-youtube-stjarnorna/sasong-1/avsnitt-2',
'only_matching': True,
}, {
- 'url': 'http://www.viafree.no/programmer/underholdning/det-beste-vorspielet/sesong-2/episode-1',
+ 'url': 'http://www.viafree.se/program/livsstil/husraddarna/sasong-2/avsnitt-2',
'only_matching': True,
}, {
'url': 'http://www.viafree.dk/programmer/reality/paradise-hotel/saeson-7/episode-5',
'only_matching': True,
}]
+ _GEO_BYPASS = False
@classmethod
def suitable(cls, url):
return False if TVPlayIE.suitable(url) else super(ViafreeIE, cls).suitable(url)
def _real_extract(self, url):
- video_id = self._match_id(url)
+ country, path = re.match(self._VALID_URL, url).groups()
+ content = self._download_json(
+ 'https://viafree-content.mtg-api.com/viafree-content/v1/%s/path/%s' % (country, path), path)
+ program = content['_embedded']['viafreeBlocks'][0]['_embedded']['program']
+ guid = program['guid']
+ meta = content['meta']
+ title = meta['title']
- webpage = self._download_webpage(url, video_id)
+ try:
+ stream_href = self._download_json(
+ program['_links']['streamLink']['href'], guid,
+ headers=self.geo_verification_headers())['embedded']['prioritizedStreams'][0]['links']['stream']['href']
+ except ExtractorError as e:
+ if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
+ self.raise_geo_restricted(countries=[country])
+ raise
+
+ formats = self._extract_m3u8_formats(stream_href, guid, 'mp4')
+ self._sort_formats(formats)
+ episode = program.get('episode') or {}
- data = self._parse_json(
- self._search_regex(
- r'(?s)window\.App\s*=\s*({.+?})\s*;\s*</script',
- webpage, 'data', default='{}'),
- video_id, transform_source=lambda x: re.sub(
- r'(?s)function\s+[a-zA-Z_][\da-zA-Z_]*\s*\([^)]*\)\s*{[^}]*}\s*',
- 'null', x), fatal=False)
-
- video_id = None
-
- if data:
- video_id = try_get(
- data, lambda x: x['context']['dispatcher']['stores'][
- 'ContentPageProgramStore']['currentVideo']['id'],
- compat_str)
-
- # Fallback #1 (extract from og:image URL schema)
- if not video_id:
- thumbnail = self._og_search_thumbnail(webpage, default=None)
- if thumbnail:
- video_id = self._search_regex(
- # Patterns seen:
- # http://cdn.playapi.mtgx.tv/imagecache/600x315/cloud/content-images/inbox/765166/a2e95e5f1d735bab9f309fa345cc3f25.jpg
- # http://cdn.playapi.mtgx.tv/imagecache/600x315/cloud/content-images/seasons/15204/758770/4a5ba509ca8bc043e1ebd1a76131cdf2.jpg
- r'https?://[^/]+/imagecache/(?:[^/]+/)+(\d{6,})/',
- thumbnail, 'video id', default=None)
-
- # Fallback #2. Extract from raw JSON string.
- # May extract wrong video id if relatedClips is present.
- if not video_id:
- video_id = self._search_regex(
- r'currentVideo["\']\s*:\s*.+?["\']id["\']\s*:\s*["\'](\d{6,})',
- webpage, 'video id')
-
- return self.url_result(
- smuggle_url(
- 'mtg:%s' % video_id,
- {
- 'geo_countries': [
- compat_urlparse.urlparse(url).netloc.rsplit('.', 1)[-1]],
- # rtmp host mtgfs.fplive.net for viafree is unresolvable
- 'skip_rtmp': True,
- }),
- ie=TVPlayIE.ie_key(), video_id=video_id)
+ return {
+ 'id': guid,
+ 'title': title,
+ 'thumbnail': meta.get('image'),
+ 'description': meta.get('description'),
+ 'series': episode.get('seriesTitle'),
+ 'episode_number': int_or_none(episode.get('episodeNumber')),
+ 'season_number': int_or_none(episode.get('seasonNumber')),
+ 'duration': int_or_none(try_get(program, lambda x: x['video']['duration']['milliseconds']), 1000),
+ 'timestamp': parse_iso8601(try_get(program, lambda x: x['availability']['start'])),
+ 'formats': formats,
+ }
class TVPlayHomeIE(InfoExtractor):
_VALID_URL = r'''(?x)
https?://
(?P<host>
- (?:(?:www|porno)\.)?24video\.
- (?:net|me|xxx|sexy?|tube|adult|site)
+ (?:(?:www|porno?)\.)?24video\.
+ (?:net|me|xxx|sexy?|tube|adult|site|vip)
)/
(?:
video/(?:(?:view|xml)/)?|
}, {
'url': 'https://porno.24video.net/video/2640421-vsya-takaya-gibkaya-i-v-masle',
'only_matching': True,
+ }, {
+ 'url': 'https://www.24video.vip/video/view/1044982',
+ 'only_matching': True,
+ }, {
+ 'url': 'https://porn.24video.net/video/2640421-vsya-takay',
+ 'only_matching': True,
}]
def _real_extract(self, url):
# coding: utf-8
from __future__ import unicode_literals
+import collections
import itertools
-import re
-import random
import json
+import random
+import re
from .common import InfoExtractor
from ..compat import (
compat_kwargs,
compat_parse_qs,
compat_str,
+ compat_urlparse,
compat_urllib_parse_urlencode,
compat_urllib_parse_urlparse,
)
from ..utils import (
clean_html,
ExtractorError,
+ float_or_none,
int_or_none,
- orderedSet,
parse_duration,
parse_iso8601,
+ qualities,
+ str_or_none,
try_get,
unified_timestamp,
update_url_query,
def _call_api(self, path, item_id, *args, **kwargs):
headers = kwargs.get('headers', {}).copy()
- headers['Client-ID'] = self._CLIENT_ID
- kwargs['headers'] = headers
+ headers.update({
+ 'Accept': 'application/vnd.twitchtv.v5+json; charset=UTF-8',
+ 'Client-ID': self._CLIENT_ID,
+ })
+ kwargs.update({
+ 'headers': headers,
+ 'expected_status': (400, 410),
+ })
response = self._download_json(
'%s/%s' % (self._API_BASE, path), item_id,
*args, **compat_kwargs(kwargs))
})
self._sort_formats(formats)
+ def _download_access_token(self, channel_name):
+ return self._call_api(
+ 'api/channels/%s/access_token' % channel_name, channel_name,
+ 'Downloading access token JSON')
-class TwitchItemBaseIE(TwitchBaseIE):
- def _download_info(self, item, item_id):
- return self._extract_info(self._call_api(
- 'kraken/videos/%s%s' % (item, item_id), item_id,
- 'Downloading %s info JSON' % self._ITEM_TYPE))
-
- def _extract_media(self, item_id):
- info = self._download_info(self._ITEM_SHORTCUT, item_id)
- response = self._call_api(
- 'api/videos/%s%s' % (self._ITEM_SHORTCUT, item_id), item_id,
- 'Downloading %s playlist JSON' % self._ITEM_TYPE)
- entries = []
- chunks = response['chunks']
- qualities = list(chunks.keys())
- for num, fragment in enumerate(zip(*chunks.values()), start=1):
- formats = []
- for fmt_num, fragment_fmt in enumerate(fragment):
- format_id = qualities[fmt_num]
- fmt = {
- 'url': fragment_fmt['url'],
- 'format_id': format_id,
- 'quality': 1 if format_id == 'live' else 0,
- }
- m = re.search(r'^(?P<height>\d+)[Pp]', format_id)
- if m:
- fmt['height'] = int(m.group('height'))
- formats.append(fmt)
- self._sort_formats(formats)
- entry = dict(info)
- entry['id'] = '%s_%d' % (entry['id'], num)
- entry['title'] = '%s part %d' % (entry['title'], num)
- entry['formats'] = formats
- entries.append(entry)
- return self.playlist_result(entries, info['id'], info['title'])
-
- def _extract_info(self, info):
- status = info.get('status')
- if status == 'recording':
- is_live = True
- elif status == 'recorded':
- is_live = False
- else:
- is_live = None
- return {
- 'id': info['_id'],
- 'title': info.get('title') or 'Untitled Broadcast',
- 'description': info.get('description'),
- 'duration': int_or_none(info.get('length')),
- 'thumbnail': info.get('preview'),
- 'uploader': info.get('channel', {}).get('display_name'),
- 'uploader_id': info.get('channel', {}).get('name'),
- 'timestamp': parse_iso8601(info.get('recorded_at')),
- 'view_count': int_or_none(info.get('views')),
- 'is_live': is_live,
- }
-
- def _real_extract(self, url):
- return self._extract_media(self._match_id(url))
-
+ def _extract_channel_id(self, token, channel_name):
+ return compat_str(self._parse_json(token, channel_name)['channel_id'])
-class TwitchVideoIE(TwitchItemBaseIE):
- IE_NAME = 'twitch:video'
- _VALID_URL = r'%s/[^/]+/b/(?P<id>\d+)' % TwitchBaseIE._VALID_URL_BASE
- _ITEM_TYPE = 'video'
- _ITEM_SHORTCUT = 'a'
- _TEST = {
- 'url': 'http://www.twitch.tv/riotgames/b/577357806',
- 'info_dict': {
- 'id': 'a577357806',
- 'title': 'Worlds Semifinals - Star Horn Royal Club vs. OMG',
- },
- 'playlist_mincount': 12,
- 'skip': 'HTTP Error 404: Not Found',
- }
-
-
-class TwitchChapterIE(TwitchItemBaseIE):
- IE_NAME = 'twitch:chapter'
- _VALID_URL = r'%s/[^/]+/c/(?P<id>\d+)' % TwitchBaseIE._VALID_URL_BASE
- _ITEM_TYPE = 'chapter'
- _ITEM_SHORTCUT = 'c'
-
- _TESTS = [{
- 'url': 'http://www.twitch.tv/acracingleague/c/5285812',
- 'info_dict': {
- 'id': 'c5285812',
- 'title': 'ACRL Off Season - Sports Cars @ Nordschleife',
- },
- 'playlist_mincount': 3,
- 'skip': 'HTTP Error 404: Not Found',
- }, {
- 'url': 'http://www.twitch.tv/tsm_theoddone/c/2349361',
- 'only_matching': True,
- }]
-
-
-class TwitchVodIE(TwitchItemBaseIE):
+class TwitchVodIE(TwitchBaseIE):
IE_NAME = 'twitch:vod'
_VALID_URL = r'''(?x)
https?://
'only_matching': True,
}]
+ def _download_info(self, item_id):
+ return self._extract_info(
+ self._call_api(
+ 'kraken/videos/%s' % item_id, item_id,
+ 'Downloading video info JSON'))
+
+ @staticmethod
+ def _extract_info(info):
+ status = info.get('status')
+ if status == 'recording':
+ is_live = True
+ elif status == 'recorded':
+ is_live = False
+ else:
+ is_live = None
+ _QUALITIES = ('small', 'medium', 'large')
+ quality_key = qualities(_QUALITIES)
+ thumbnails = []
+ preview = info.get('preview')
+ if isinstance(preview, dict):
+ for thumbnail_id, thumbnail_url in preview.items():
+ thumbnail_url = url_or_none(thumbnail_url)
+ if not thumbnail_url:
+ continue
+ if thumbnail_id not in _QUALITIES:
+ continue
+ thumbnails.append({
+ 'url': thumbnail_url,
+ 'preference': quality_key(thumbnail_id),
+ })
+ return {
+ 'id': info['_id'],
+ 'title': info.get('title') or 'Untitled Broadcast',
+ 'description': info.get('description'),
+ 'duration': int_or_none(info.get('length')),
+ 'thumbnails': thumbnails,
+ 'uploader': info.get('channel', {}).get('display_name'),
+ 'uploader_id': info.get('channel', {}).get('name'),
+ 'timestamp': parse_iso8601(info.get('recorded_at')),
+ 'view_count': int_or_none(info.get('views')),
+ 'is_live': is_live,
+ }
+
def _real_extract(self, url):
- item_id = self._match_id(url)
+ vod_id = self._match_id(url)
- info = self._download_info(self._ITEM_SHORTCUT, item_id)
+ info = self._download_info(vod_id)
access_token = self._call_api(
- 'api/vods/%s/access_token' % item_id, item_id,
+ 'api/vods/%s/access_token' % vod_id, vod_id,
'Downloading %s access token' % self._ITEM_TYPE)
formats = self._extract_m3u8_formats(
'%s/vod/%s.m3u8?%s' % (
- self._USHER_BASE, item_id,
+ self._USHER_BASE, vod_id,
compat_urllib_parse_urlencode({
'allow_source': 'true',
'allow_audio_only': 'true',
'nauth': access_token['token'],
'nauthsig': access_token['sig'],
})),
- item_id, 'mp4', entry_protocol='m3u8_native')
+ vod_id, 'mp4', entry_protocol='m3u8_native')
self._prefer_source(formats)
info['formats'] = formats
info['subtitles'] = {
'rechat': [{
'url': update_url_query(
- 'https://api.twitch.tv/v5/videos/%s/comments' % item_id, {
+ 'https://api.twitch.tv/v5/videos/%s/comments' % vod_id, {
'client_id': self._CLIENT_ID,
}),
'ext': 'json',
return info
-class TwitchPlaylistBaseIE(TwitchBaseIE):
- _PLAYLIST_PATH = 'kraken/channels/%s/videos/?offset=%d&limit=%d'
+def _make_video_result(node):
+ assert isinstance(node, dict)
+ video_id = node.get('id')
+ if not video_id:
+ return
+ return {
+ '_type': 'url_transparent',
+ 'ie_key': TwitchVodIE.ie_key(),
+ 'id': video_id,
+ 'url': 'https://www.twitch.tv/videos/%s' % video_id,
+ 'title': node.get('title'),
+ 'thumbnail': node.get('previewThumbnailURL'),
+ 'duration': float_or_none(node.get('lengthSeconds')),
+ 'view_count': int_or_none(node.get('viewCount')),
+ }
+
+
+class TwitchGraphQLBaseIE(TwitchBaseIE):
_PAGE_LIMIT = 100
- def _extract_playlist(self, channel_id):
- info = self._call_api(
- 'kraken/channels/%s' % channel_id,
- channel_id, 'Downloading channel info JSON')
- channel_name = info.get('display_name') or info.get('name')
+ def _download_gql(self, video_id, op, variables, sha256_hash, note, fatal=True):
+ return self._download_json(
+ 'https://gql.twitch.tv/gql', video_id, note,
+ data=json.dumps({
+ 'operationName': op,
+ 'variables': variables,
+ 'extensions': {
+ 'persistedQuery': {
+ 'version': 1,
+ 'sha256Hash': sha256_hash,
+ }
+ }
+ }).encode(),
+ headers={
+ 'Content-Type': 'text/plain;charset=UTF-8',
+ 'Client-ID': self._CLIENT_ID,
+ }, fatal=fatal)
+
+
+class TwitchCollectionIE(TwitchGraphQLBaseIE):
+ _VALID_URL = r'https?://(?:(?:www|go|m)\.)?twitch\.tv/collections/(?P<id>[^/]+)'
+
+ _TESTS = [{
+ 'url': 'https://www.twitch.tv/collections/wlDCoH0zEBZZbQ',
+ 'info_dict': {
+ 'id': 'wlDCoH0zEBZZbQ',
+ 'title': 'Overthrow Nook, capitalism for children',
+ },
+ 'playlist_mincount': 13,
+ }]
+
+ _OPERATION_NAME = 'CollectionSideBar'
+ _SHA256_HASH = '27111f1b382effad0b6def325caef1909c733fe6a4fbabf54f8d491ef2cf2f14'
+
+ def _real_extract(self, url):
+ collection_id = self._match_id(url)
+ collection = self._download_gql(
+ collection_id, self._OPERATION_NAME,
+ {'collectionID': collection_id}, self._SHA256_HASH,
+ 'Downloading collection GraphQL')['data']['collection']
+ title = collection.get('title')
entries = []
+ for edge in collection['items']['edges']:
+ if not isinstance(edge, dict):
+ continue
+ node = edge.get('node')
+ if not isinstance(node, dict):
+ continue
+ video = _make_video_result(node)
+ if video:
+ entries.append(video)
+ return self.playlist_result(
+ entries, playlist_id=collection_id, playlist_title=title)
+
+
+class TwitchPlaylistBaseIE(TwitchGraphQLBaseIE):
+ def _entries(self, channel_name, *args):
+ cursor = None
+ variables_common = self._make_variables(channel_name, *args)
+ entries_key = '%ss' % self._ENTRY_KIND
+ for page_num in itertools.count(1):
+ variables = variables_common.copy()
+ variables['limit'] = self._PAGE_LIMIT
+ if cursor:
+ variables['cursor'] = cursor
+ page = self._download_gql(
+ channel_name, self._OPERATION_NAME, variables,
+ self._SHA256_HASH,
+ 'Downloading %ss GraphQL page %s' % (self._NODE_KIND, page_num),
+ fatal=False)
+ if not page:
+ break
+ edges = try_get(
+ page, lambda x: x['data']['user'][entries_key]['edges'], list)
+ if not edges:
+ break
+ for edge in edges:
+ if not isinstance(edge, dict):
+ continue
+ if edge.get('__typename') != self._EDGE_KIND:
+ continue
+ node = edge.get('node')
+ if not isinstance(node, dict):
+ continue
+ if node.get('__typename') != self._NODE_KIND:
+ continue
+ entry = self._extract_entry(node)
+ if entry:
+ cursor = edge.get('cursor')
+ yield entry
+ if not cursor or not isinstance(cursor, compat_str):
+ break
+
+ # Deprecated kraken v5 API
+ def _entries_kraken(self, channel_name, broadcast_type, sort):
+ access_token = self._download_access_token(channel_name)
+ channel_id = self._extract_channel_id(access_token['token'], channel_name)
offset = 0
- limit = self._PAGE_LIMIT
- broken_paging_detected = False
counter_override = None
for counter in itertools.count(1):
response = self._call_api(
- self._PLAYLIST_PATH % (channel_id, offset, limit),
+ 'kraken/channels/%s/videos/' % channel_id,
channel_id,
- 'Downloading %s JSON page %s'
- % (self._PLAYLIST_TYPE, counter_override or counter))
- page_entries = self._extract_playlist_page(response)
- if not page_entries:
+ 'Downloading video JSON page %s' % (counter_override or counter),
+ query={
+ 'offset': offset,
+ 'limit': self._PAGE_LIMIT,
+ 'broadcast_type': broadcast_type,
+ 'sort': sort,
+ })
+ videos = response.get('videos')
+ if not isinstance(videos, list):
break
+ for video in videos:
+ if not isinstance(video, dict):
+ continue
+ video_url = url_or_none(video.get('url'))
+ if not video_url:
+ continue
+ yield {
+ '_type': 'url_transparent',
+ 'ie_key': TwitchVodIE.ie_key(),
+ 'id': video.get('_id'),
+ 'url': video_url,
+ 'title': video.get('title'),
+ 'description': video.get('description'),
+ 'timestamp': unified_timestamp(video.get('published_at')),
+ 'duration': float_or_none(video.get('length')),
+ 'view_count': int_or_none(video.get('views')),
+ 'language': video.get('language'),
+ }
+ offset += self._PAGE_LIMIT
total = int_or_none(response.get('_total'))
- # Since the beginning of March 2016 twitch's paging mechanism
- # is completely broken on the twitch side. It simply ignores
- # a limit and returns the whole offset number of videos.
- # Working around by just requesting all videos at once.
- # Upd: pagination bug was fixed by twitch on 15.03.2016.
- if not broken_paging_detected and total and len(page_entries) > limit:
- self.report_warning(
- 'Twitch pagination is broken on twitch side, requesting all videos at once',
- channel_id)
- broken_paging_detected = True
- offset = total
- counter_override = '(all at once)'
- continue
- entries.extend(page_entries)
- if broken_paging_detected or total and len(page_entries) >= total:
+ if total and offset >= total:
break
- offset += limit
- return self.playlist_result(
- [self._make_url_result(entry) for entry in orderedSet(entries)],
- channel_id, channel_name)
-
- def _make_url_result(self, url):
- try:
- video_id = 'v%s' % TwitchVodIE._match_id(url)
- return self.url_result(url, TwitchVodIE.ie_key(), video_id=video_id)
- except AssertionError:
- return self.url_result(url)
-
- def _extract_playlist_page(self, response):
- videos = response.get('videos')
- return [video['url'] for video in videos] if videos else []
- def _real_extract(self, url):
- return self._extract_playlist(self._match_id(url))
-
-class TwitchProfileIE(TwitchPlaylistBaseIE):
- IE_NAME = 'twitch:profile'
- _VALID_URL = r'%s/(?P<id>[^/]+)/profile/?(?:\#.*)?$' % TwitchBaseIE._VALID_URL_BASE
- _PLAYLIST_TYPE = 'profile'
+class TwitchVideosIE(TwitchPlaylistBaseIE):
+ _VALID_URL = r'https?://(?:(?:www|go|m)\.)?twitch\.tv/(?P<id>[^/]+)/(?:videos|profile)'
_TESTS = [{
- 'url': 'http://www.twitch.tv/vanillatv/profile',
+ # All Videos sorted by Date
+ 'url': 'https://www.twitch.tv/spamfish/videos?filter=all',
'info_dict': {
- 'id': 'vanillatv',
- 'title': 'VanillaTV',
+ 'id': 'spamfish',
+ 'title': 'spamfish - All Videos sorted by Date',
},
- 'playlist_mincount': 412,
+ 'playlist_mincount': 924,
}, {
- 'url': 'http://m.twitch.tv/vanillatv/profile',
- 'only_matching': True,
- }]
-
-
-class TwitchVideosBaseIE(TwitchPlaylistBaseIE):
- _VALID_URL_VIDEOS_BASE = r'%s/(?P<id>[^/]+)/videos' % TwitchBaseIE._VALID_URL_BASE
- _PLAYLIST_PATH = TwitchPlaylistBaseIE._PLAYLIST_PATH + '&broadcast_type='
-
-
-class TwitchAllVideosIE(TwitchVideosBaseIE):
- IE_NAME = 'twitch:videos:all'
- _VALID_URL = r'%s/all' % TwitchVideosBaseIE._VALID_URL_VIDEOS_BASE
- _PLAYLIST_PATH = TwitchVideosBaseIE._PLAYLIST_PATH + 'archive,upload,highlight'
- _PLAYLIST_TYPE = 'all videos'
-
- _TESTS = [{
- 'url': 'https://www.twitch.tv/spamfish/videos/all',
+ # All Videos sorted by Popular
+ 'url': 'https://www.twitch.tv/spamfish/videos?filter=all&sort=views',
'info_dict': {
'id': 'spamfish',
- 'title': 'Spamfish',
+ 'title': 'spamfish - All Videos sorted by Popular',
},
- 'playlist_mincount': 869,
+ 'playlist_mincount': 931,
}, {
- 'url': 'https://m.twitch.tv/spamfish/videos/all',
- 'only_matching': True,
- }]
-
-
-class TwitchUploadsIE(TwitchVideosBaseIE):
- IE_NAME = 'twitch:videos:uploads'
- _VALID_URL = r'%s/uploads' % TwitchVideosBaseIE._VALID_URL_VIDEOS_BASE
- _PLAYLIST_PATH = TwitchVideosBaseIE._PLAYLIST_PATH + 'upload'
- _PLAYLIST_TYPE = 'uploads'
-
- _TESTS = [{
- 'url': 'https://www.twitch.tv/spamfish/videos/uploads',
+ # Past Broadcasts sorted by Date
+ 'url': 'https://www.twitch.tv/spamfish/videos?filter=archives',
'info_dict': {
'id': 'spamfish',
- 'title': 'Spamfish',
+ 'title': 'spamfish - Past Broadcasts sorted by Date',
},
- 'playlist_mincount': 0,
+ 'playlist_mincount': 27,
}, {
- 'url': 'https://m.twitch.tv/spamfish/videos/uploads',
+ # Highlights sorted by Date
+ 'url': 'https://www.twitch.tv/spamfish/videos?filter=highlights',
+ 'info_dict': {
+ 'id': 'spamfish',
+ 'title': 'spamfish - Highlights sorted by Date',
+ },
+ 'playlist_mincount': 901,
+ }, {
+ # Uploads sorted by Date
+ 'url': 'https://www.twitch.tv/esl_csgo/videos?filter=uploads&sort=time',
+ 'info_dict': {
+ 'id': 'esl_csgo',
+ 'title': 'esl_csgo - Uploads sorted by Date',
+ },
+ 'playlist_mincount': 5,
+ }, {
+ # Past Premieres sorted by Date
+ 'url': 'https://www.twitch.tv/spamfish/videos?filter=past_premieres',
+ 'info_dict': {
+ 'id': 'spamfish',
+ 'title': 'spamfish - Past Premieres sorted by Date',
+ },
+ 'playlist_mincount': 1,
+ }, {
+ 'url': 'https://www.twitch.tv/spamfish/videos/all',
+ 'only_matching': True,
+ }, {
+ 'url': 'https://m.twitch.tv/spamfish/videos/all',
+ 'only_matching': True,
+ }, {
+ 'url': 'https://www.twitch.tv/spamfish/videos',
'only_matching': True,
}]
+ Broadcast = collections.namedtuple('Broadcast', ['type', 'label'])
+
+ _DEFAULT_BROADCAST = Broadcast(None, 'All Videos')
+ _BROADCASTS = {
+ 'archives': Broadcast('ARCHIVE', 'Past Broadcasts'),
+ 'highlights': Broadcast('HIGHLIGHT', 'Highlights'),
+ 'uploads': Broadcast('UPLOAD', 'Uploads'),
+ 'past_premieres': Broadcast('PAST_PREMIERE', 'Past Premieres'),
+ 'all': _DEFAULT_BROADCAST,
+ }
+
+ _DEFAULT_SORTED_BY = 'Date'
+ _SORTED_BY = {
+ 'time': _DEFAULT_SORTED_BY,
+ 'views': 'Popular',
+ }
+
+ _SHA256_HASH = 'a937f1d22e269e39a03b509f65a7490f9fc247d7f83d6ac1421523e3b68042cb'
+ _OPERATION_NAME = 'FilterableVideoTower_Videos'
+ _ENTRY_KIND = 'video'
+ _EDGE_KIND = 'VideoEdge'
+ _NODE_KIND = 'Video'
+
+ @classmethod
+ def suitable(cls, url):
+ return (False
+ if any(ie.suitable(url) for ie in (
+ TwitchVideosClipsIE,
+ TwitchVideosCollectionsIE))
+ else super(TwitchVideosIE, cls).suitable(url))
+
+ @staticmethod
+ def _make_variables(channel_name, broadcast_type, sort):
+ return {
+ 'channelOwnerLogin': channel_name,
+ 'broadcastType': broadcast_type,
+ 'videoSort': sort.upper(),
+ }
+
+ @staticmethod
+ def _extract_entry(node):
+ return _make_video_result(node)
+
+ def _real_extract(self, url):
+ channel_name = self._match_id(url)
+ qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
+ filter = qs.get('filter', ['all'])[0]
+ sort = qs.get('sort', ['time'])[0]
+ broadcast = self._BROADCASTS.get(filter, self._DEFAULT_BROADCAST)
+ return self.playlist_result(
+ self._entries(channel_name, broadcast.type, sort),
+ playlist_id=channel_name,
+ playlist_title='%s - %s sorted by %s'
+ % (channel_name, broadcast.label,
+ self._SORTED_BY.get(sort, self._DEFAULT_SORTED_BY)))
-class TwitchPastBroadcastsIE(TwitchVideosBaseIE):
- IE_NAME = 'twitch:videos:past-broadcasts'
- _VALID_URL = r'%s/past-broadcasts' % TwitchVideosBaseIE._VALID_URL_VIDEOS_BASE
- _PLAYLIST_PATH = TwitchVideosBaseIE._PLAYLIST_PATH + 'archive'
- _PLAYLIST_TYPE = 'past broadcasts'
+
+class TwitchVideosClipsIE(TwitchPlaylistBaseIE):
+ _VALID_URL = r'https?://(?:(?:www|go|m)\.)?twitch\.tv/(?P<id>[^/]+)/(?:clips|videos/*?\?.*?\bfilter=clips)'
_TESTS = [{
- 'url': 'https://www.twitch.tv/spamfish/videos/past-broadcasts',
+ # Clips
+ 'url': 'https://www.twitch.tv/vanillatv/clips?filter=clips&range=all',
'info_dict': {
- 'id': 'spamfish',
- 'title': 'Spamfish',
+ 'id': 'vanillatv',
+ 'title': 'vanillatv - Clips Top All',
},
- 'playlist_mincount': 0,
+ 'playlist_mincount': 1,
}, {
- 'url': 'https://m.twitch.tv/spamfish/videos/past-broadcasts',
+ 'url': 'https://www.twitch.tv/dota2ruhub/videos?filter=clips&range=7d',
'only_matching': True,
}]
+ Clip = collections.namedtuple('Clip', ['filter', 'label'])
-class TwitchHighlightsIE(TwitchVideosBaseIE):
- IE_NAME = 'twitch:videos:highlights'
- _VALID_URL = r'%s/highlights' % TwitchVideosBaseIE._VALID_URL_VIDEOS_BASE
- _PLAYLIST_PATH = TwitchVideosBaseIE._PLAYLIST_PATH + 'highlight'
- _PLAYLIST_TYPE = 'highlights'
+ _DEFAULT_CLIP = Clip('LAST_WEEK', 'Top 7D')
+ _RANGE = {
+ '24hr': Clip('LAST_DAY', 'Top 24H'),
+ '7d': _DEFAULT_CLIP,
+ '30d': Clip('LAST_MONTH', 'Top 30D'),
+ 'all': Clip('ALL_TIME', 'Top All'),
+ }
+
+ # NB: values other than 20 result in skipped videos
+ _PAGE_LIMIT = 20
+
+ _SHA256_HASH = 'b73ad2bfaecfd30a9e6c28fada15bd97032c83ec77a0440766a56fe0bd632777'
+ _OPERATION_NAME = 'ClipsCards__User'
+ _ENTRY_KIND = 'clip'
+ _EDGE_KIND = 'ClipEdge'
+ _NODE_KIND = 'Clip'
+
+ @staticmethod
+ def _make_variables(channel_name, filter):
+ return {
+ 'login': channel_name,
+ 'criteria': {
+ 'filter': filter,
+ },
+ }
+
+ @staticmethod
+ def _extract_entry(node):
+ assert isinstance(node, dict)
+ clip_url = url_or_none(node.get('url'))
+ if not clip_url:
+ return
+ return {
+ '_type': 'url_transparent',
+ 'ie_key': TwitchClipsIE.ie_key(),
+ 'id': node.get('id'),
+ 'url': clip_url,
+ 'title': node.get('title'),
+ 'thumbnail': node.get('thumbnailURL'),
+ 'duration': float_or_none(node.get('durationSeconds')),
+ 'timestamp': unified_timestamp(node.get('createdAt')),
+ 'view_count': int_or_none(node.get('viewCount')),
+ 'language': node.get('language'),
+ }
+
+ def _real_extract(self, url):
+ channel_name = self._match_id(url)
+ qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
+ range = qs.get('range', ['7d'])[0]
+ clip = self._RANGE.get(range, self._DEFAULT_CLIP)
+ return self.playlist_result(
+ self._entries(channel_name, clip.filter),
+ playlist_id=channel_name,
+ playlist_title='%s - Clips %s' % (channel_name, clip.label))
+
+
+class TwitchVideosCollectionsIE(TwitchPlaylistBaseIE):
+ _VALID_URL = r'https?://(?:(?:www|go|m)\.)?twitch\.tv/(?P<id>[^/]+)/videos/*?\?.*?\bfilter=collections'
_TESTS = [{
- 'url': 'https://www.twitch.tv/spamfish/videos/highlights',
+ # Collections
+ 'url': 'https://www.twitch.tv/spamfish/videos?filter=collections',
'info_dict': {
'id': 'spamfish',
- 'title': 'Spamfish',
+ 'title': 'spamfish - Collections',
},
- 'playlist_mincount': 805,
- }, {
- 'url': 'https://m.twitch.tv/spamfish/videos/highlights',
- 'only_matching': True,
+ 'playlist_mincount': 3,
}]
+ _SHA256_HASH = '07e3691a1bad77a36aba590c351180439a40baefc1c275356f40fc7082419a84'
+ _OPERATION_NAME = 'ChannelCollectionsContent'
+ _ENTRY_KIND = 'collection'
+ _EDGE_KIND = 'CollectionsItemEdge'
+ _NODE_KIND = 'Collection'
+
+ @staticmethod
+ def _make_variables(channel_name):
+ return {
+ 'ownerLogin': channel_name,
+ }
+
+ @staticmethod
+ def _extract_entry(node):
+ assert isinstance(node, dict)
+ collection_id = node.get('id')
+ if not collection_id:
+ return
+ return {
+ '_type': 'url_transparent',
+ 'ie_key': TwitchCollectionIE.ie_key(),
+ 'id': collection_id,
+ 'url': 'https://www.twitch.tv/collections/%s' % collection_id,
+ 'title': node.get('title'),
+ 'thumbnail': node.get('thumbnailURL'),
+ 'duration': float_or_none(node.get('lengthSeconds')),
+ 'timestamp': unified_timestamp(node.get('updatedAt')),
+ 'view_count': int_or_none(node.get('viewCount')),
+ }
+
+ def _real_extract(self, url):
+ channel_name = self._match_id(url)
+ return self.playlist_result(
+ self._entries(channel_name), playlist_id=channel_name,
+ playlist_title='%s - Collections' % channel_name)
+
class TwitchStreamIE(TwitchBaseIE):
IE_NAME = 'twitch:stream'
def suitable(cls, url):
return (False
if any(ie.suitable(url) for ie in (
- TwitchVideoIE,
- TwitchChapterIE,
TwitchVodIE,
- TwitchProfileIE,
- TwitchAllVideosIE,
- TwitchUploadsIE,
- TwitchPastBroadcastsIE,
- TwitchHighlightsIE,
+ TwitchCollectionIE,
+ TwitchVideosIE,
+ TwitchVideosClipsIE,
+ TwitchVideosCollectionsIE,
TwitchClipsIE))
else super(TwitchStreamIE, cls).suitable(url))
def _real_extract(self, url):
- channel_id = self._match_id(url)
+ channel_name = self._match_id(url)
+
+ access_token = self._download_access_token(channel_name)
+
+ token = access_token['token']
+ channel_id = self._extract_channel_id(token, channel_name)
stream = self._call_api(
- 'kraken/streams/%s?stream_type=all' % channel_id, channel_id,
- 'Downloading stream JSON').get('stream')
+ 'kraken/streams/%s?stream_type=all' % channel_id,
+ channel_id, 'Downloading stream JSON').get('stream')
if not stream:
raise ExtractorError('%s is offline' % channel_id, expected=True)
# (e.g. http://www.twitch.tv/TWITCHPLAYSPOKEMON) that will lead to constructing
# an invalid m3u8 URL. Working around by use of original channel name from stream
# JSON and fallback to lowercase if it's not available.
- channel_id = stream.get('channel', {}).get('name') or channel_id.lower()
-
- access_token = self._call_api(
- 'api/channels/%s/access_token' % channel_id, channel_id,
- 'Downloading channel access token')
+ channel_name = try_get(
+ stream, lambda x: x['channel']['name'],
+ compat_str) or channel_name.lower()
query = {
'allow_source': 'true',
'playlist_include_framerate': 'true',
'segment_preference': '4',
'sig': access_token['sig'].encode('utf-8'),
- 'token': access_token['token'].encode('utf-8'),
+ 'token': token.encode('utf-8'),
}
formats = self._extract_m3u8_formats(
'%s/api/channel/hls/%s.m3u8?%s'
- % (self._USHER_BASE, channel_id, compat_urllib_parse_urlencode(query)),
+ % (self._USHER_BASE, channel_name, compat_urllib_parse_urlencode(query)),
channel_id, 'mp4')
self._prefer_source(formats)
})
return {
- 'id': compat_str(stream['_id']),
- 'display_id': channel_id,
+ 'id': str_or_none(stream.get('_id')) or channel_id,
+ 'display_id': channel_name,
'title': title,
'description': description,
'thumbnails': thumbnails,
class TwitchClipsIE(TwitchBaseIE):
IE_NAME = 'twitch:clips'
- _VALID_URL = r'https?://(?:clips\.twitch\.tv/(?:embed\?.*?\bclip=|(?:[^/]+/)*)|(?:www\.)?twitch\.tv/[^/]+/clip/)(?P<id>[^/?#&]+)'
+ _VALID_URL = r'''(?x)
+ https?://
+ (?:
+ clips\.twitch\.tv/(?:embed\?.*?\bclip=|(?:[^/]+/)*)|
+ (?:(?:www|go|m)\.)?twitch\.tv/[^/]+/clip/
+ )
+ (?P<id>[^/?#&]+)
+ '''
_TESTS = [{
'url': 'https://clips.twitch.tv/FaintLightGullWholeWheat',
}, {
'url': 'https://clips.twitch.tv/embed?clip=InquisitiveBreakableYogurtJebaited',
'only_matching': True,
+ }, {
+ 'url': 'https://m.twitch.tv/rossbroadcast/clip/ConfidentBraveHumanChefFrank',
+ 'only_matching': True,
+ }, {
+ 'url': 'https://go.twitch.tv/rossbroadcast/clip/ConfidentBraveHumanChefFrank',
+ 'only_matching': True,
}]
def _real_extract(self, url):
'info_dict': {
'id': '700207533655363584',
'ext': 'mp4',
- 'title': 'Simon Vertugo - BEAT PROD: @suhmeduh #Damndaniel',
+ 'title': 'simon vetugo - BEAT PROD: @suhmeduh #Damndaniel',
'description': 'BEAT PROD: @suhmeduh https://t.co/HBrQ4AfpvZ #Damndaniel https://t.co/byBooq2ejZ',
'thumbnail': r're:^https?://.*\.jpg',
- 'uploader': 'Simon Vertugo',
+ 'uploader': 'simon vetugo',
'uploader_id': 'simonvertugo',
'duration': 30.0,
'timestamp': 1455777459,
# Twitch Clip Embed
'url': 'https://twitter.com/GunB1g/status/1163218564784017422',
'only_matching': True,
+ }, {
+ # promo_video_website card
+ 'url': 'https://twitter.com/GunB1g/status/1163218564784017422',
+ 'only_matching': True,
}]
def _real_extract(self, url):
return try_get(o, lambda x: x[x['type'].lower() + '_value'])
card_name = card['name'].split(':')[-1]
- if card_name == 'amplify':
- formats = self._extract_formats_from_vmap_url(
- get_binding_value('amplify_url_vmap'),
- get_binding_value('amplify_content_id') or twid)
+ if card_name in ('amplify', 'promo_video_website'):
+ is_amplify = card_name == 'amplify'
+ vmap_url = get_binding_value('amplify_url_vmap') if is_amplify else get_binding_value('player_stream_url')
+ content_id = get_binding_value('%s_content_id' % (card_name if is_amplify else 'player'))
+ formats = self._extract_formats_from_vmap_url(vmap_url, content_id or twid)
self._sort_formats(formats)
thumbnails = []
IE_NAME = 'twitter:broadcast'
_VALID_URL = TwitterBaseIE._BASE_REGEX + r'i/broadcasts/(?P<id>[0-9a-zA-Z]{13})'
+ _TEST = {
+ # untitled Periscope video
+ 'url': 'https://twitter.com/i/broadcasts/1yNGaQLWpejGj',
+ 'info_dict': {
+ 'id': '1yNGaQLWpejGj',
+ 'ext': 'mp4',
+ 'title': 'Andrea May Sahouri - Periscope Broadcast',
+ 'uploader': 'Andrea May Sahouri',
+ 'uploader_id': '1PXEdBZWpGwKe',
+ },
+ }
+
def _real_extract(self, url):
broadcast_id = self._match_id(url)
broadcast = self._call_api(
raise ExtractorError(
'Udemy asks you to solve a CAPTCHA. Login with browser, '
'solve CAPTCHA, then export cookies and pass cookie file to '
- 'youtube-dl with --cookies.', expected=True)
+ 'youtube-dlc with --cookies.', expected=True)
return ret
def _download_json(self, url_or_request, *args, **kwargs):
from __future__ import unicode_literals
from .common import InfoExtractor
+from ..compat import (
+ compat_str,
+ compat_urllib_parse_urlencode,
+)
from ..utils import (
clean_html,
int_or_none,
parse_duration,
+ parse_iso8601,
+ qualities,
update_url_query,
- str_or_none,
)
_VALID_URL = r'https?://(?:.+?\.)?uol\.com\.br/.*?(?:(?:mediaId|v)=|view/(?:[a-z0-9]+/)?|video(?:=|/(?:\d{4}/\d{2}/\d{2}/)?))(?P<id>\d+|[\w-]+-[A-Z0-9]+)'
_TESTS = [{
'url': 'http://player.mais.uol.com.br/player_video_v3.swf?mediaId=15951931',
- 'md5': '25291da27dc45e0afb5718a8603d3816',
+ 'md5': '4f1e26683979715ff64e4e29099cf020',
'info_dict': {
'id': '15951931',
'ext': 'mp4',
'title': 'Miss simpatia é encontrada morta',
'description': 'md5:3f8c11a0c0556d66daf7e5b45ef823b2',
+ 'timestamp': 1470421860,
+ 'upload_date': '20160805',
}
}, {
'url': 'http://tvuol.uol.com.br/video/incendio-destroi-uma-das-maiores-casas-noturnas-de-londres-04024E9A3268D4C95326',
- 'md5': 'e41a2fb7b7398a3a46b6af37b15c00c9',
+ 'md5': '2850a0e8dfa0a7307e04a96c5bdc5bc2',
'info_dict': {
'id': '15954259',
'ext': 'mp4',
'title': 'Incêndio destrói uma das maiores casas noturnas de Londres',
'description': 'Em Londres, um incêndio destruiu uma das maiores boates da cidade. Não há informações sobre vítimas.',
+ 'timestamp': 1470674520,
+ 'upload_date': '20160808',
}
}, {
'url': 'http://mais.uol.com.br/static/uolplayer/index.html?mediaId=15951931',
'only_matching': True,
}]
- _FORMATS = {
- '2': {
- 'width': 640,
- 'height': 360,
- },
- '5': {
- 'width': 1280,
- 'height': 720,
- },
- '6': {
- 'width': 426,
- 'height': 240,
- },
- '7': {
- 'width': 1920,
- 'height': 1080,
- },
- '8': {
- 'width': 192,
- 'height': 144,
- },
- '9': {
- 'width': 568,
- 'height': 320,
- },
- '11': {
- 'width': 640,
- 'height': 360,
- }
- }
-
def _real_extract(self, url):
video_id = self._match_id(url)
- media_id = None
-
- if video_id.isdigit():
- media_id = video_id
-
- if not media_id:
- embed_page = self._download_webpage(
- 'https://jsuol.com.br/c/tv/uol/embed/?params=[embed,%s]' % video_id,
- video_id, 'Downloading embed page', fatal=False)
- if embed_page:
- media_id = self._search_regex(
- (r'uol\.com\.br/(\d+)', r'mediaId=(\d+)'),
- embed_page, 'media id', default=None)
-
- if not media_id:
- webpage = self._download_webpage(url, video_id)
- media_id = self._search_regex(r'mediaId=(\d+)', webpage, 'media id')
video_data = self._download_json(
- 'http://mais.uol.com.br/apiuol/v3/player/getMedia/%s.json' % media_id,
- media_id)['item']
+ # https://api.mais.uol.com.br/apiuol/v4/player/data/[MEDIA_ID]
+ 'https://api.mais.uol.com.br/apiuol/v3/media/detail/' + video_id,
+ video_id)['item']
+ media_id = compat_str(video_data['mediaId'])
title = video_data['title']
+ ver = video_data.get('revision', 2)
- query = {
- 'ver': video_data.get('numRevision', 2),
- 'r': 'http://mais.uol.com.br',
- }
- for k in ('token', 'sign'):
- v = video_data.get(k)
- if v:
- query[k] = v
-
+ uol_formats = self._download_json(
+ 'https://croupier.mais.uol.com.br/v3/formats/%s/jsonp' % media_id,
+ media_id)
+ quality = qualities(['mobile', 'WEBM', '360p', '720p', '1080p'])
formats = []
- for f in video_data.get('formats', []):
+ for format_id, f in uol_formats.items():
+ if not isinstance(f, dict):
+ continue
f_url = f.get('url') or f.get('secureUrl')
if not f_url:
continue
+ query = {
+ 'ver': ver,
+ 'r': 'http://mais.uol.com.br',
+ }
+ for k in ('token', 'sign'):
+ v = f.get(k)
+ if v:
+ query[k] = v
f_url = update_url_query(f_url, query)
- format_id = str_or_none(f.get('id'))
- if format_id == '10':
- formats.extend(self._extract_m3u8_formats(
- f_url, video_id, 'mp4', 'm3u8_native',
- m3u8_id='hls', fatal=False))
+ format_id = format_id
+ if format_id == 'HLS':
+ m3u8_formats = self._extract_m3u8_formats(
+ f_url, media_id, 'mp4', 'm3u8_native',
+ m3u8_id='hls', fatal=False)
+ encoded_query = compat_urllib_parse_urlencode(query)
+ for m3u8_f in m3u8_formats:
+ m3u8_f['extra_param_to_segment_url'] = encoded_query
+ m3u8_f['url'] = update_url_query(m3u8_f['url'], query)
+ formats.extend(m3u8_formats)
continue
- fmt = {
+ formats.append({
'format_id': format_id,
'url': f_url,
- 'source_preference': 1,
- }
- fmt.update(self._FORMATS.get(format_id, {}))
- formats.append(fmt)
- self._sort_formats(formats, ('height', 'width', 'source_preference', 'tbr', 'ext'))
+ 'quality': quality(format_id),
+ 'preference': -1,
+ })
+ self._sort_formats(formats)
tags = []
for tag in video_data.get('tags', []):
continue
tags.append(tag_description)
+ thumbnails = []
+ for q in ('Small', 'Medium', 'Wmedium', 'Large', 'Wlarge', 'Xlarge'):
+ q_url = video_data.get('thumb' + q)
+ if not q_url:
+ continue
+ thumbnails.append({
+ 'id': q,
+ 'url': q_url,
+ })
+
return {
'id': media_id,
'title': title,
- 'description': clean_html(video_data.get('desMedia')),
- 'thumbnail': video_data.get('thumbnail'),
- 'duration': int_or_none(video_data.get('durationSeconds')) or parse_duration(video_data.get('duration')),
+ 'description': clean_html(video_data.get('description')),
+ 'thumbnails': thumbnails,
+ 'duration': parse_duration(video_data.get('duration')),
'tags': tags,
'formats': formats,
+ 'timestamp': parse_iso8601(video_data.get('publishDate'), ' '),
+ 'view_count': int_or_none(video_data.get('viewsQtty')),
}
# coding: utf-8
from __future__ import unicode_literals
-import re
-import time
+import functools
import hashlib
import json
import random
+import re
+import time
from .adobepass import AdobePassIE
-from .youtube import YoutubeIE
from .common import InfoExtractor
+from .youtube import YoutubeIE
from ..compat import (
compat_HTTPError,
compat_str,
)
from ..utils import (
+ clean_html,
ExtractorError,
int_or_none,
+ OnDemandPagedList,
parse_age_limit,
str_or_none,
try_get,
)
-class ViceIE(AdobePassIE):
+class ViceBaseIE(InfoExtractor):
+ def _call_api(self, resource, resource_key, resource_id, locale, fields, args=''):
+ return self._download_json(
+ 'https://video.vice.com/api/v1/graphql', resource_id, query={
+ 'query': '''{
+ %s(locale: "%s", %s: "%s"%s) {
+ %s
+ }
+}''' % (resource, locale, resource_key, resource_id, args, fields),
+ })['data'][resource]
+
+
+class ViceIE(ViceBaseIE, AdobePassIE):
IE_NAME = 'vice'
- _VALID_URL = r'https?://(?:(?:video|vms)\.vice|(?:www\.)?viceland)\.com/(?P<locale>[^/]+)/(?:video/[^/]+|embed)/(?P<id>[\da-f]+)'
+ _VALID_URL = r'https?://(?:(?:video|vms)\.vice|(?:www\.)?vice(?:land|tv))\.com/(?P<locale>[^/]+)/(?:video/[^/]+|embed)/(?P<id>[\da-f]{24})'
_TESTS = [{
'url': 'https://video.vice.com/en_us/video/pet-cremator/58c69e38a55424f1227dc3f7',
'info_dict': {
- 'id': '5e647f0125e145c9aef2069412c0cbde',
+ 'id': '58c69e38a55424f1227dc3f7',
'ext': 'mp4',
'title': '10 Questions You Always Wanted To Ask: Pet Cremator',
'description': 'md5:fe856caacf61fe0e74fab15ce2b07ca5',
# m3u8 download
'skip_download': True,
},
- 'add_ie': ['UplynkPreplay'],
}, {
# geo restricted to US
'url': 'https://video.vice.com/en_us/video/the-signal-from-tolva/5816510690b70e6c5fd39a56',
'info_dict': {
- 'id': '930c0ad1f47141cc955087eecaddb0e2',
+ 'id': '5816510690b70e6c5fd39a56',
'ext': 'mp4',
- 'uploader': 'waypoint',
+ 'uploader': 'vice',
'title': 'The Signal From Tölva',
'description': 'md5:3927e3c79f9e8094606a2b3c5b5e55d5',
- 'uploader_id': '57f7d621e05ca860fa9ccaf9',
+ 'uploader_id': '57a204088cb727dec794c67b',
'timestamp': 1477941983,
'upload_date': '20161031',
},
# m3u8 download
'skip_download': True,
},
- 'add_ie': ['UplynkPreplay'],
}, {
'url': 'https://video.vice.com/alps/video/ulfs-wien-beruchtigste-grafitti-crew-part-1/581b12b60a0e1f4c0fb6ea2f',
'info_dict': {
'id': '581b12b60a0e1f4c0fb6ea2f',
'ext': 'mp4',
'title': 'ULFs - Wien berüchtigste Grafitti Crew - Part 1',
- 'description': '<p>Zwischen Hinterzimmer-Tattoos und U-Bahnschächten erzählen uns die Ulfs, wie es ist, "süchtig nach Sachbeschädigung" zu sein.</p>',
- 'uploader': 'VICE',
+ 'description': 'Zwischen Hinterzimmer-Tattoos und U-Bahnschächten erzählen uns die Ulfs, wie es ist, "süchtig nach Sachbeschädigung" zu sein.',
+ 'uploader': 'vice',
'uploader_id': '57a204088cb727dec794c67b',
'timestamp': 1485368119,
'upload_date': '20170125',
'params': {
# AES-encrypted m3u8
'skip_download': True,
- 'proxy': '127.0.0.1:8118',
},
- 'add_ie': ['UplynkPreplay'],
}, {
'url': 'https://video.vice.com/en_us/video/pizza-show-trailer/56d8c9a54d286ed92f7f30e4',
'only_matching': True,
@staticmethod
def _extract_urls(webpage):
return re.findall(
- r'<iframe\b[^>]+\bsrc=["\']((?:https?:)?//video\.vice\.com/[^/]+/embed/[\da-f]+)',
+ r'<iframe\b[^>]+\bsrc=["\']((?:https?:)?//video\.vice\.com/[^/]+/embed/[\da-f]{24})',
webpage)
@staticmethod
def _real_extract(self, url):
locale, video_id = re.match(self._VALID_URL, url).groups()
- webpage = self._download_webpage(
- 'https://video.vice.com/%s/embed/%s' % (locale, video_id),
- video_id)
-
- video = self._parse_json(
- self._search_regex(
- r'PREFETCH_DATA\s*=\s*({.+?})\s*;\s*\n', webpage,
- 'app state'), video_id)['video']
- video_id = video.get('vms_id') or video.get('id') or video_id
- title = video['title']
- is_locked = video.get('locked')
+ video = self._call_api('videos', 'id', video_id, locale, '''body
+ locked
+ rating
+ thumbnail_url
+ title''')[0]
+ title = video['title'].strip()
rating = video.get('rating')
- thumbnail = video.get('thumbnail_url')
- duration = int_or_none(video.get('duration'))
- series = try_get(
- video, lambda x: x['episode']['season']['show']['title'],
- compat_str)
- episode_number = try_get(
- video, lambda x: x['episode']['episode_number'])
- season_number = try_get(
- video, lambda x: x['episode']['season']['season_number'])
- uploader = None
query = {}
- if is_locked:
+ if video.get('locked'):
resource = self._get_mvpd_resource(
'VICELAND', title, video_id, rating)
query['tvetoken'] = self._extract_mvpd_auth(
query.update({
'exp': exp,
'sign': hashlib.sha512(('%s:GET:%d' % (video_id, exp)).encode()).hexdigest(),
- '_ad_blocked': None,
- '_ad_unit': '',
- '_debug': '',
+ 'skipadstitching': 1,
'platform': 'desktop',
'rn': random.randint(10000, 100000),
- 'fbprebidtoken': '',
})
try:
raise
video_data = preplay['video']
- base = video_data['base']
- uplynk_preplay_url = preplay['preplayURL']
- episode = video_data.get('episode', {})
- channel = video_data.get('channel', {})
+ formats = self._extract_m3u8_formats(
+ preplay['playURL'], video_id, 'mp4', 'm3u8_native')
+ self._sort_formats(formats)
+ episode = video_data.get('episode') or {}
+ channel = video_data.get('channel') or {}
+ season = video_data.get('season') or {}
subtitles = {}
- cc_url = preplay.get('ccURL')
- if cc_url:
- subtitles['en'] = [{
+ for subtitle in preplay.get('subtitleURLs', []):
+ cc_url = subtitle.get('url')
+ if not cc_url:
+ continue
+ language_code = try_get(subtitle, lambda x: x['languages'][0]['language_code'], compat_str) or 'en'
+ subtitles.setdefault(language_code, []).append({
'url': cc_url,
- }]
+ })
return {
- '_type': 'url_transparent',
- 'url': uplynk_preplay_url,
+ 'formats': formats,
'id': video_id,
'title': title,
- 'description': base.get('body') or base.get('display_body'),
- 'thumbnail': thumbnail,
- 'duration': int_or_none(video_data.get('video_duration')) or duration,
+ 'description': clean_html(video.get('body')),
+ 'thumbnail': video.get('thumbnail_url'),
+ 'duration': int_or_none(video_data.get('video_duration')),
'timestamp': int_or_none(video_data.get('created_at'), 1000),
- 'age_limit': parse_age_limit(video_data.get('video_rating')),
- 'series': video_data.get('show_title') or series,
- 'episode_number': int_or_none(episode.get('episode_number') or episode_number),
+ 'age_limit': parse_age_limit(video_data.get('video_rating') or rating),
+ 'series': try_get(video_data, lambda x: x['show']['base']['display_title'], compat_str),
+ 'episode_number': int_or_none(episode.get('episode_number')),
'episode_id': str_or_none(episode.get('id') or video_data.get('episode_id')),
- 'season_number': int_or_none(season_number),
- 'season_id': str_or_none(episode.get('season_id')),
- 'uploader': channel.get('base', {}).get('title') or channel.get('name') or uploader,
+ 'season_number': int_or_none(season.get('season_number')),
+ 'season_id': str_or_none(season.get('id') or video_data.get('season_id')),
+ 'uploader': channel.get('name'),
'uploader_id': str_or_none(channel.get('id')),
'subtitles': subtitles,
- 'ie_key': 'UplynkPreplay',
}
-class ViceShowIE(InfoExtractor):
+class ViceShowIE(ViceBaseIE):
IE_NAME = 'vice:show'
- _VALID_URL = r'https?://(?:.+?\.)?vice\.com/(?:[^/]+/)?show/(?P<id>[^/?#&]+)'
-
- _TEST = {
- 'url': 'https://munchies.vice.com/en/show/fuck-thats-delicious-2',
+ _VALID_URL = r'https?://(?:video\.vice|(?:www\.)?vice(?:land|tv))\.com/(?P<locale>[^/]+)/show/(?P<id>[^/?#&]+)'
+ _PAGE_SIZE = 25
+ _TESTS = [{
+ 'url': 'https://video.vice.com/en_us/show/fck-thats-delicious',
'info_dict': {
- 'id': 'fuck-thats-delicious-2',
- 'title': "Fuck, That's Delicious",
- 'description': 'Follow the culinary adventures of rapper Action Bronson during his ongoing world tour.',
+ 'id': '57a2040c8cb727dec794c901',
+ 'title': 'F*ck, That’s Delicious',
+ 'description': 'The life and eating habits of rap’s greatest bon vivant, Action Bronson.',
},
- 'playlist_count': 17,
- }
+ 'playlist_mincount': 64,
+ }, {
+ 'url': 'https://www.vicetv.com/en_us/show/fck-thats-delicious',
+ 'only_matching': True,
+ }]
- def _real_extract(self, url):
- show_id = self._match_id(url)
- webpage = self._download_webpage(url, show_id)
+ def _fetch_page(self, locale, show_id, page):
+ videos = self._call_api('videos', 'show_id', show_id, locale, '''body
+ id
+ url''', ', page: %d, per_page: %d' % (page + 1, self._PAGE_SIZE))
+ for video in videos:
+ yield self.url_result(
+ video['url'], ViceIE.ie_key(), video.get('id'))
- entries = [
- self.url_result(video_url, ViceIE.ie_key())
- for video_url, _ in re.findall(
- r'<h2[^>]+class="article-title"[^>]+data-id="\d+"[^>]*>\s*<a[^>]+href="(%s.*?)"'
- % ViceIE._VALID_URL, webpage)]
+ def _real_extract(self, url):
+ locale, display_id = re.match(self._VALID_URL, url).groups()
+ show = self._call_api('shows', 'slug', display_id, locale, '''dek
+ id
+ title''')[0]
+ show_id = show['id']
- title = self._search_regex(
- r'<title>(.+?)</title>', webpage, 'title', default=None)
- if title:
- title = re.sub(r'(.+)\s*\|\s*.+$', r'\1', title).strip()
- description = self._html_search_meta(
- 'description', webpage, 'description')
+ entries = OnDemandPagedList(
+ functools.partial(self._fetch_page, locale, show_id),
+ self._PAGE_SIZE)
- return self.playlist_result(entries, show_id, title, description)
+ return self.playlist_result(
+ entries, show_id, show.get('title'), show.get('dek'))
-class ViceArticleIE(InfoExtractor):
+class ViceArticleIE(ViceBaseIE):
IE_NAME = 'vice:article'
- _VALID_URL = r'https://www\.vice\.com/[^/]+/article/(?P<id>[^?#]+)'
+ _VALID_URL = r'https://(?:www\.)?vice\.com/(?P<locale>[^/]+)/article/(?:[0-9a-z]{6}/)?(?P<id>[^?#]+)'
_TESTS = [{
'url': 'https://www.vice.com/en_us/article/on-set-with-the-woman-making-mormon-porn-in-utah',
'info_dict': {
- 'id': '41eae2a47b174a1398357cec55f1f6fc',
+ 'id': '58dc0a3dee202d2a0ccfcbd8',
'ext': 'mp4',
- 'title': 'Mormon War on Porn ',
- 'description': 'md5:6394a8398506581d0346b9ab89093fef',
+ 'title': 'Mormon War on Porn',
+ 'description': 'md5:1c5d91fe25fa8aa304f9def118b92dbf',
'uploader': 'vice',
'uploader_id': '57a204088cb727dec794c67b',
'timestamp': 1491883129,
# AES-encrypted m3u8
'skip_download': True,
},
- 'add_ie': ['UplynkPreplay'],
+ 'add_ie': [ViceIE.ie_key()],
}, {
'url': 'https://www.vice.com/en_us/article/how-to-hack-a-car',
- 'md5': '7fe8ebc4fa3323efafc127b82bd821d9',
+ 'md5': '13010ee0bc694ea87ec40724397c2349',
'info_dict': {
'id': '3jstaBeXgAs',
'ext': 'mp4',
'uploader_id': 'MotherboardTV',
'upload_date': '20140529',
},
- 'add_ie': ['Youtube'],
+ 'add_ie': [YoutubeIE.ie_key()],
}, {
'url': 'https://www.vice.com/en_us/article/znm9dx/karley-sciortino-slutever-reloaded',
'md5': 'a7ecf64ee4fa19b916c16f4b56184ae2',
'info_dict': {
- 'id': 'e2ed435eb67e43efb66e6ef9a6930a88',
+ 'id': '57f41d3556a0a80f54726060',
'ext': 'mp4',
'title': "Making The World's First Male Sex Doll",
- 'description': 'md5:916078ef0e032d76343116208b6cc2c4',
+ 'description': 'md5:19b00b215b99961cf869c40fbe9df755',
'uploader': 'vice',
'uploader_id': '57a204088cb727dec794c67b',
'timestamp': 1476919911,
},
'params': {
'skip_download': True,
+ 'format': 'bestvideo',
},
'add_ie': [ViceIE.ie_key()],
}, {
}]
def _real_extract(self, url):
- display_id = self._match_id(url)
-
- webpage = self._download_webpage(url, display_id)
+ locale, display_id = re.match(self._VALID_URL, url).groups()
- prefetch_data = self._parse_json(self._search_regex(
- r'__APP_STATE\s*=\s*({.+?})(?:\s*\|\|\s*{}\s*)?;\s*\n',
- webpage, 'app state'), display_id)['pageData']
- body = prefetch_data['body']
+ article = self._call_api('articles', 'slug', display_id, locale, '''body
+ embed_code''')[0]
+ body = article['body']
def _url_res(video_url, ie_key):
return {
'ie_key': ie_key,
}
- vice_url = ViceIE._extract_url(webpage)
+ vice_url = ViceIE._extract_url(body)
if vice_url:
return _url_res(vice_url, ViceIE.ie_key())
video_url = self._html_search_regex(
r'data-video-url="([^"]+)"',
- prefetch_data['embed_code'], 'video URL')
+ article['embed_code'], 'video URL')
return _url_res(video_url, ViceIE.ie_key())
from __future__ import unicode_literals
import re
+import random
+import string
+import struct
from .common import InfoExtractor
from ..utils import (
+ ExtractorError,
int_or_none,
mimetype2ext,
parse_codecs,
xpath_element,
xpath_text,
)
+from ..compat import (
+ compat_b64decode,
+ compat_ord,
+ compat_parse_qs,
+)
class VideaIE(InfoExtractor):
r'<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//videa\.hu/player\?.*?\bv=.+?)\1',
webpage)]
+ def rc4(self, ciphertext, key):
+ res = b''
+
+ keyLen = len(key)
+ S = list(range(256))
+
+ j = 0
+ for i in range(256):
+ j = (j + S[i] + ord(key[i % keyLen])) % 256
+ S[i], S[j] = S[j], S[i]
+
+ i = 0
+ j = 0
+ for m in range(len(ciphertext)):
+ i = (i + 1) % 256
+ j = (j + S[i]) % 256
+ S[i], S[j] = S[j], S[i]
+ k = S[(S[i] + S[j]) % 256]
+ res += struct.pack("B", k ^ compat_ord(ciphertext[m]))
+
+ return res
+
def _real_extract(self, url):
video_id = self._match_id(url)
+ webpage = self._download_webpage(url, video_id, fatal=True)
+ error = self._search_regex(r'<p class="error-text">([^<]+)</p>', webpage, 'error', default=None)
+ if error:
+ raise ExtractorError(error, expected=True)
+
+ video_src_params_raw = self._search_regex(r'<iframe[^>]+id="videa_player_iframe"[^>]+src="/player\?([^"]+)"', webpage, 'video_src_params')
+ video_src_params = compat_parse_qs(video_src_params_raw)
+ player_page = self._download_webpage("https://videa.hu/videojs_player?%s" % video_src_params_raw, video_id, fatal=True)
+ nonce = self._search_regex(r'_xt\s*=\s*"([^"]+)"', player_page, 'nonce')
+ random_seed = ''.join(random.choice(string.ascii_uppercase + string.ascii_lowercase + string.digits) for _ in range(8))
+ static_secret = 'xHb0ZvME5q8CBcoQi6AngerDu3FGO9fkUlwPmLVY_RTzj2hJIS4NasXWKy1td7p'
+ l = nonce[:32]
+ s = nonce[32:]
+ result = ''
+ for i in range(0, 32):
+ result += s[i - (static_secret.index(l[i]) - 31)]
- info = self._download_xml(
+ video_src_params['_s'] = random_seed
+ video_src_params['_t'] = result[:16]
+ encryption_key_stem = result[16:] + random_seed
+
+ [b64_info, handle] = self._download_webpage_handle(
'http://videa.hu/videaplayer_get_xml.php', video_id,
- query={'v': video_id})
+ query=video_src_params, fatal=True)
+
+ encrypted_info = compat_b64decode(b64_info)
+ key = encryption_key_stem + handle.info()['x-videa-xs']
+ info_str = self.rc4(encrypted_info, key).decode('utf8')
+ info = self._parse_xml(info_str, video_id)
video = xpath_element(info, './/video', 'video', fatal=True)
sources = xpath_element(info, './/video_sources', 'sources', fatal=True)
+ hash_values = xpath_element(info, './/hash_values', 'hash_values', fatal=True)
title = xpath_text(video, './title', fatal=True)
source_url = source.text
if not source_url:
continue
+ source_url += '?md5=%s&expires=%s' % (hash_values.find('hash_value_%s' % source.get('name')).text, source.get('exp'))
f = parse_codecs(source.get('codecs'))
f.update({
'url': source_url,
'info_dict': {
'id': 'cghql9yq6emu',
'ext': 'mp4',
- 'title': 'youtube-dl test video 1\\\\2\'3/4<5\\\\6ä7↭',
+ 'title': 'youtube-dlc test video 1\\\\2\'3/4<5\\\\6ä7↭',
},
'params': {
# m3u8 download
--- /dev/null
+from __future__ import unicode_literals
+
+import json
+import re
+
+from .common import InfoExtractor
+from ..compat import compat_HTTPError
+from ..utils import (
+ ExtractorError,
+ int_or_none,
+ parse_age_limit,
+)
+
+
+class ViewLiftBaseIE(InfoExtractor):
+ _API_BASE = 'https://prod-api.viewlift.com/'
+ _DOMAINS_REGEX = r'(?:(?:main\.)?snagfilms|snagxtreme|funnyforfree|kiddovid|winnersview|(?:monumental|lax)sportsnetwork|vayafilm|failarmy|ftfnext|lnppass\.legapallacanestro|moviespree|app\.myoutdoortv|neoufitness|pflmma|theidentitytb)\.com|(?:hoichoi|app\.horseandcountry|kronon|marquee|supercrosslive)\.tv'
+ _SITE_MAP = {
+ 'ftfnext': 'lax',
+ 'funnyforfree': 'snagfilms',
+ 'hoichoi': 'hoichoitv',
+ 'kiddovid': 'snagfilms',
+ 'laxsportsnetwork': 'lax',
+ 'legapallacanestro': 'lnp',
+ 'marquee': 'marquee-tv',
+ 'monumentalsportsnetwork': 'monumental-network',
+ 'moviespree': 'bingeflix',
+ 'pflmma': 'pfl',
+ 'snagxtreme': 'snagfilms',
+ 'theidentitytb': 'tampabay',
+ 'vayafilm': 'snagfilms',
+ }
+ _TOKENS = {}
+
+ def _call_api(self, site, path, video_id, query):
+ token = self._TOKENS.get(site)
+ if not token:
+ token_query = {'site': site}
+ email, password = self._get_login_info(netrc_machine=site)
+ if email:
+ resp = self._download_json(
+ self._API_BASE + 'identity/signin', video_id,
+ 'Logging in', query=token_query, data=json.dumps({
+ 'email': email,
+ 'password': password,
+ }).encode())
+ else:
+ resp = self._download_json(
+ self._API_BASE + 'identity/anonymous-token', video_id,
+ 'Downloading authorization token', query=token_query)
+ self._TOKENS[site] = token = resp['authorizationToken']
+ return self._download_json(
+ self._API_BASE + path, video_id,
+ headers={'Authorization': token}, query=query)
+
+
+class ViewLiftEmbedIE(ViewLiftBaseIE):
+ IE_NAME = 'viewlift:embed'
+ _VALID_URL = r'https?://(?:(?:www|embed)\.)?(?P<domain>%s)/embed/player\?.*\bfilmId=(?P<id>[\da-f]{8}-(?:[\da-f]{4}-){3}[\da-f]{12})' % ViewLiftBaseIE._DOMAINS_REGEX
+ _TESTS = [{
+ 'url': 'http://embed.snagfilms.com/embed/player?filmId=74849a00-85a9-11e1-9660-123139220831&w=500',
+ 'md5': '2924e9215c6eff7a55ed35b72276bd93',
+ 'info_dict': {
+ 'id': '74849a00-85a9-11e1-9660-123139220831',
+ 'ext': 'mp4',
+ 'title': '#whilewewatch',
+ 'description': 'md5:b542bef32a6f657dadd0df06e26fb0c8',
+ 'timestamp': 1334350096,
+ 'upload_date': '20120413',
+ }
+ }, {
+ # invalid labels, 360p is better that 480p
+ 'url': 'http://www.snagfilms.com/embed/player?filmId=17ca0950-a74a-11e0-a92a-0026bb61d036',
+ 'md5': '882fca19b9eb27ef865efeeaed376a48',
+ 'info_dict': {
+ 'id': '17ca0950-a74a-11e0-a92a-0026bb61d036',
+ 'ext': 'mp4',
+ 'title': 'Life in Limbo',
+ },
+ 'skip': 'The video does not exist',
+ }, {
+ 'url': 'http://www.snagfilms.com/embed/player?filmId=0000014c-de2f-d5d6-abcf-ffef58af0017',
+ 'only_matching': True,
+ }]
+
+ @staticmethod
+ def _extract_url(webpage):
+ mobj = re.search(
+ r'<iframe[^>]+?src=(["\'])(?P<url>(?:https?:)?//(?:embed\.)?(?:%s)/embed/player.+?)\1' % ViewLiftBaseIE._DOMAINS_REGEX,
+ webpage)
+ if mobj:
+ return mobj.group('url')
+
+ def _real_extract(self, url):
+ domain, film_id = re.match(self._VALID_URL, url).groups()
+ site = domain.split('.')[-2]
+ if site in self._SITE_MAP:
+ site = self._SITE_MAP[site]
+ try:
+ content_data = self._call_api(
+ site, 'entitlement/video/status', film_id, {
+ 'id': film_id
+ })['video']
+ except ExtractorError as e:
+ if isinstance(e.cause, compat_HTTPError) and e.cause.code == 403:
+ error_message = self._parse_json(e.cause.read().decode(), film_id).get('errorMessage')
+ if error_message == 'User does not have a valid subscription or has not purchased this content.':
+ self.raise_login_required()
+ raise ExtractorError(error_message, expected=True)
+ raise
+ gist = content_data['gist']
+ title = gist['title']
+ video_assets = content_data['streamingInfo']['videoAssets']
+
+ formats = []
+ mpeg_video_assets = video_assets.get('mpeg') or []
+ for video_asset in mpeg_video_assets:
+ video_asset_url = video_asset.get('url')
+ if not video_asset:
+ continue
+ bitrate = int_or_none(video_asset.get('bitrate'))
+ height = int_or_none(self._search_regex(
+ r'^_?(\d+)[pP]$', video_asset.get('renditionValue'),
+ 'height', default=None))
+ formats.append({
+ 'url': video_asset_url,
+ 'format_id': 'http%s' % ('-%d' % bitrate if bitrate else ''),
+ 'tbr': bitrate,
+ 'height': height,
+ 'vcodec': video_asset.get('codec'),
+ })
+
+ hls_url = video_assets.get('hls')
+ if hls_url:
+ formats.extend(self._extract_m3u8_formats(
+ hls_url, film_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
+ self._sort_formats(formats, ('height', 'tbr', 'format_id'))
+
+ info = {
+ 'id': film_id,
+ 'title': title,
+ 'description': gist.get('description'),
+ 'thumbnail': gist.get('videoImageUrl'),
+ 'duration': int_or_none(gist.get('runtime')),
+ 'age_limit': parse_age_limit(content_data.get('parentalRating')),
+ 'timestamp': int_or_none(gist.get('publishDate'), 1000),
+ 'formats': formats,
+ }
+ for k in ('categories', 'tags'):
+ info[k] = [v['title'] for v in content_data.get(k, []) if v.get('title')]
+ return info
+
+
+class ViewLiftIE(ViewLiftBaseIE):
+ IE_NAME = 'viewlift'
+ _VALID_URL = r'https?://(?:www\.)?(?P<domain>%s)(?P<path>(?:/(?:films/title|show|(?:news/)?videos?|watch))?/(?P<id>[^?#]+))' % ViewLiftBaseIE._DOMAINS_REGEX
+ _TESTS = [{
+ 'url': 'http://www.snagfilms.com/films/title/lost_for_life',
+ 'md5': '19844f897b35af219773fd63bdec2942',
+ 'info_dict': {
+ 'id': '0000014c-de2f-d5d6-abcf-ffef58af0017',
+ 'display_id': 'lost_for_life',
+ 'ext': 'mp4',
+ 'title': 'Lost for Life',
+ 'description': 'md5:ea10b5a50405ae1f7b5269a6ec594102',
+ 'thumbnail': r're:^https?://.*\.jpg',
+ 'duration': 4489,
+ 'categories': 'mincount:3',
+ 'age_limit': 14,
+ 'upload_date': '20150421',
+ 'timestamp': 1429656820,
+ }
+ }, {
+ 'url': 'http://www.snagfilms.com/show/the_world_cut_project/india',
+ 'md5': 'e6292e5b837642bbda82d7f8bf3fbdfd',
+ 'info_dict': {
+ 'id': '00000145-d75c-d96e-a9c7-ff5c67b20000',
+ 'display_id': 'the_world_cut_project/india',
+ 'ext': 'mp4',
+ 'title': 'India',
+ 'description': 'md5:5c168c5a8f4719c146aad2e0dfac6f5f',
+ 'thumbnail': r're:^https?://.*\.jpg',
+ 'duration': 979,
+ 'timestamp': 1399478279,
+ 'upload_date': '20140507',
+ }
+ }, {
+ 'url': 'http://main.snagfilms.com/augie_alone/s_2_ep_12_love',
+ 'info_dict': {
+ 'id': '00000148-7b53-de26-a9fb-fbf306f70020',
+ 'display_id': 'augie_alone/s_2_ep_12_love',
+ 'ext': 'mp4',
+ 'title': 'S. 2 Ep. 12 - Love',
+ 'description': 'Augie finds love.',
+ 'thumbnail': r're:^https?://.*\.jpg',
+ 'duration': 107,
+ 'upload_date': '20141012',
+ 'timestamp': 1413129540,
+ 'age_limit': 17,
+ },
+ 'params': {
+ 'skip_download': True,
+ },
+ }, {
+ 'url': 'http://main.snagfilms.com/films/title/the_freebie',
+ 'only_matching': True,
+ }, {
+ # Film is not playable in your area.
+ 'url': 'http://www.snagfilms.com/films/title/inside_mecca',
+ 'only_matching': True,
+ }, {
+ # Film is not available.
+ 'url': 'http://www.snagfilms.com/show/augie_alone/flirting',
+ 'only_matching': True,
+ }, {
+ 'url': 'http://www.winnersview.com/videos/the-good-son',
+ 'only_matching': True,
+ }, {
+ # Was once Kaltura embed
+ 'url': 'https://www.monumentalsportsnetwork.com/videos/john-carlson-postgame-2-25-15',
+ 'only_matching': True,
+ }, {
+ 'url': 'https://www.marquee.tv/watch/sadlerswells-sacredmonsters',
+ 'only_matching': True,
+ }]
+
+ @classmethod
+ def suitable(cls, url):
+ return False if ViewLiftEmbedIE.suitable(url) else super(ViewLiftIE, cls).suitable(url)
+
+ def _real_extract(self, url):
+ domain, path, display_id = re.match(self._VALID_URL, url).groups()
+ site = domain.split('.')[-2]
+ if site in self._SITE_MAP:
+ site = self._SITE_MAP[site]
+ modules = self._call_api(
+ site, 'content/pages', display_id, {
+ 'includeContent': 'true',
+ 'moduleOffset': 1,
+ 'path': path,
+ 'site': site,
+ })['modules']
+ film_id = next(m['contentData'][0]['gist']['id'] for m in modules if m.get('moduleType') == 'VideoDetailModule')
+ return {
+ '_type': 'url_transparent',
+ 'url': 'http://%s/embed/player?filmId=%s' % (domain, film_id),
+ 'id': film_id,
+ 'display_id': display_id,
+ 'ie_key': 'ViewLiftEmbed',
+ }
from ..utils import (
ExtractorError,
int_or_none,
+ HEADRequest,
parse_age_limit,
parse_iso8601,
sanitized_Request,
def _call_api(self, path, video_id, note, timestamp=None, post_data=None):
resp = self._download_json(
- self._prepare_call(path, timestamp, post_data), video_id, note)
+ self._prepare_call(path, timestamp, post_data), video_id, note, headers={'x-viki-app-ver': '2.2.5.1428709186'}, expected_status=[200, 400, 404])
error = resp.get('error')
if error:
if error == 'invalid timestamp':
resp = self._download_json(
self._prepare_call(path, int(resp['current_timestamp']), post_data),
- video_id, '%s (retry)' % note)
+ video_id, '%s (retry)' % note, headers={'x-viki-app-ver': '2.2.5.1428709186'}, expected_status=[200, 400, 404])
error = resp.get('error')
if error:
self._raise_error(resp['error'])
video = self._call_api(
'videos/%s.json' % video_id, video_id, 'Downloading video JSON')
+ streams = self._call_api(
+ 'videos/%s/streams.json' % video_id, video_id,
+ 'Downloading video streams JSON')
+
+ formats = []
+ for format_id, stream_dict in streams.items():
+ height = int_or_none(self._search_regex(
+ r'^(\d+)[pP]$', format_id, 'height', default=None))
+ for protocol, format_dict in stream_dict.items():
+ # rtmps URLs does not seem to work
+ if protocol == 'rtmps':
+ continue
+ format_url = format_dict.get('url')
+ format_drms = format_dict.get('drms')
+ format_stream_id = format_dict.get('id')
+ if format_id == 'm3u8':
+ m3u8_formats = self._extract_m3u8_formats(
+ format_url, video_id, 'mp4',
+ entry_protocol='m3u8_native',
+ m3u8_id='m3u8-%s' % protocol, fatal=False)
+ # Despite CODECS metadata in m3u8 all video-only formats
+ # are actually video+audio
+ for f in m3u8_formats:
+ if f.get('acodec') == 'none' and f.get('vcodec') != 'none':
+ f['acodec'] = None
+ formats.extend(m3u8_formats)
+ elif format_id == 'mpd':
+ mpd_formats = self._extract_mpd_formats(
+ format_url, video_id,
+ mpd_id='mpd-%s' % protocol, fatal=False)
+ formats.extend(mpd_formats)
+ elif format_id == 'mpd':
+
+ formats.extend(mpd_formats)
+ elif format_url.startswith('rtmp'):
+ mobj = re.search(
+ r'^(?P<url>rtmp://[^/]+/(?P<app>.+?))/(?P<playpath>mp4:.+)$',
+ format_url)
+ if not mobj:
+ continue
+ formats.append({
+ 'format_id': 'rtmp-%s' % format_id,
+ 'ext': 'flv',
+ 'url': mobj.group('url'),
+ 'play_path': mobj.group('playpath'),
+ 'app': mobj.group('app'),
+ 'page_url': url,
+ 'drms': format_drms,
+ 'stream_id': format_stream_id,
+ })
+ else:
+ urlh = self._request_webpage(
+ HEADRequest(format_url), video_id, 'Checking file size', fatal=False)
+ formats.append({
+ 'url': format_url,
+ 'format_id': '%s-%s' % (format_id, protocol),
+ 'height': height,
+ 'drms': format_drms,
+ 'stream_id': format_stream_id,
+ 'filesize': int_or_none(urlh.headers.get('Content-Length')),
+ })
+ self._sort_formats(formats)
+
self._check_errors(video)
title = self.dict_selection(video.get('titles', {}), 'en', allow_fallback=False)
'url': thumbnail.get('url'),
})
+ stream_ids = []
+ for f in formats:
+ s_id = f.get('stream_id')
+ if s_id is not None:
+ stream_ids.append(s_id)
+
subtitles = {}
for subtitle_lang, _ in video.get('subtitle_completions', {}).items():
subtitles[subtitle_lang] = [{
'ext': subtitles_format,
'url': self._prepare_call(
- 'videos/%s/subtitles/%s.%s' % (video_id, subtitle_lang, subtitles_format)),
+ 'videos/%s/subtitles/%s.%s?stream_id=%s' % (video_id, subtitle_lang, subtitles_format, stream_ids[0])),
} for subtitles_format in ('srt', 'vtt')]
result = {
'subtitles': subtitles,
}
- streams = self._call_api(
- 'videos/%s/streams.json' % video_id, video_id,
- 'Downloading video streams JSON')
-
if 'external' in streams:
result.update({
'_type': 'url_transparent',
})
return result
- formats = []
- for format_id, stream_dict in streams.items():
- height = int_or_none(self._search_regex(
- r'^(\d+)[pP]$', format_id, 'height', default=None))
- for protocol, format_dict in stream_dict.items():
- # rtmps URLs does not seem to work
- if protocol == 'rtmps':
- continue
- format_url = format_dict['url']
- if format_id == 'm3u8':
- m3u8_formats = self._extract_m3u8_formats(
- format_url, video_id, 'mp4',
- entry_protocol='m3u8_native',
- m3u8_id='m3u8-%s' % protocol, fatal=False)
- # Despite CODECS metadata in m3u8 all video-only formats
- # are actually video+audio
- for f in m3u8_formats:
- if f.get('acodec') == 'none' and f.get('vcodec') != 'none':
- f['acodec'] = None
- formats.extend(m3u8_formats)
- elif format_url.startswith('rtmp'):
- mobj = re.search(
- r'^(?P<url>rtmp://[^/]+/(?P<app>.+?))/(?P<playpath>mp4:.+)$',
- format_url)
- if not mobj:
- continue
- formats.append({
- 'format_id': 'rtmp-%s' % format_id,
- 'ext': 'flv',
- 'url': mobj.group('url'),
- 'play_path': mobj.group('playpath'),
- 'app': mobj.group('app'),
- 'page_url': url,
- })
- else:
- formats.append({
- 'url': format_url,
- 'format_id': '%s-%s' % (format_id, protocol),
- 'height': height,
- })
- self._sort_formats(formats)
-
result['formats'] = formats
return result
unified_timestamp,
unsmuggle_url,
urlencode_postdata,
+ urljoin,
unescapeHTML,
)
})
# TODO: fix handling of 308 status code returned for live archive manifest requests
+ sep_pattern = r'/sep/video/'
for files_type in ('hls', 'dash'):
for cdn_name, cdn_data in config_files.get(files_type, {}).get('cdns', {}).items():
manifest_url = cdn_data.get('url')
if not manifest_url:
continue
format_id = '%s-%s' % (files_type, cdn_name)
- if files_type == 'hls':
- formats.extend(self._extract_m3u8_formats(
- manifest_url, video_id, 'mp4',
- 'm3u8' if is_live else 'm3u8_native', m3u8_id=format_id,
- note='Downloading %s m3u8 information' % cdn_name,
- fatal=False))
- elif files_type == 'dash':
- mpd_pattern = r'/%s/(?:sep/)?video/' % video_id
- mpd_manifest_urls = []
- if re.search(mpd_pattern, manifest_url):
- for suffix, repl in (('', 'video'), ('_sep', 'sep/video')):
- mpd_manifest_urls.append((format_id + suffix, re.sub(
- mpd_pattern, '/%s/%s/' % (video_id, repl), manifest_url)))
- else:
- mpd_manifest_urls = [(format_id, manifest_url)]
- for f_id, m_url in mpd_manifest_urls:
+ sep_manifest_urls = []
+ if re.search(sep_pattern, manifest_url):
+ for suffix, repl in (('', 'video'), ('_sep', 'sep/video')):
+ sep_manifest_urls.append((format_id + suffix, re.sub(
+ sep_pattern, '/%s/' % repl, manifest_url)))
+ else:
+ sep_manifest_urls = [(format_id, manifest_url)]
+ for f_id, m_url in sep_manifest_urls:
+ if files_type == 'hls':
+ formats.extend(self._extract_m3u8_formats(
+ m_url, video_id, 'mp4',
+ 'm3u8' if is_live else 'm3u8_native', m3u8_id=f_id,
+ note='Downloading %s m3u8 information' % cdn_name,
+ fatal=False))
+ elif files_type == 'dash':
if 'json=1' in m_url:
real_m_url = (self._download_json(m_url, video_id, fatal=False) or {}).get('url')
if real_m_url:
m_url.replace('/master.json', '/master.mpd'), video_id, f_id,
'Downloading %s MPD information' % cdn_name,
fatal=False)
- for f in mpd_formats:
- if f.get('vcodec') == 'none':
- f['preference'] = -50
- elif f.get('acodec') == 'none':
- f['preference'] = -40
formats.extend(mpd_formats)
live_archive = live_event.get('archive') or {}
'preference': 1,
})
+ for f in formats:
+ if f.get('vcodec') == 'none':
+ f['preference'] = -50
+ elif f.get('acodec') == 'none':
+ f['preference'] = -40
+
subtitles = {}
text_tracks = config['request'].get('text_tracks')
if text_tracks:
for tt in text_tracks:
subtitles[tt['lang']] = [{
'ext': 'vtt',
- 'url': 'https://vimeo.com' + tt['url'],
+ 'url': urljoin('https://vimeo.com', tt['url']),
}]
thumbnails = []
# Retrieve video webpage to extract further information
webpage, urlh = self._download_webpage_handle(
url, video_id, headers=headers)
- redirect_url = compat_str(urlh.geturl())
+ redirect_url = urlh.geturl()
except ExtractorError as ee:
if isinstance(ee.cause, compat_HTTPError) and ee.cause.code == 403:
errmsg = ee.cause.read()
if b'Because of its privacy settings, this video cannot be played here' in errmsg:
raise ExtractorError(
'Cannot download embed-only video without embedding '
- 'URL. Please call youtube-dl with the URL of the page '
+ 'URL. Please call youtube-dlc with the URL of the page '
'that embeds this video.',
expected=True)
raise
return self._TITLE or self._html_search_regex(
self._TITLE_RE, webpage, 'list title', fatal=False)
- def _login_list_password(self, page_url, list_id, webpage):
- login_form = self._search_regex(
- r'(?s)<form[^>]+?id="pw_form"(.*?)</form>',
- webpage, 'login form', default=None)
- if not login_form:
- return webpage
-
- password = self._downloader.params.get('videopassword')
- if password is None:
- raise ExtractorError('This album is protected by a password, use the --video-password option', expected=True)
- fields = self._hidden_inputs(login_form)
- token, vuid = self._extract_xsrft_and_vuid(webpage)
- fields['token'] = token
- fields['password'] = password
- post = urlencode_postdata(fields)
- password_path = self._search_regex(
- r'action="([^"]+)"', login_form, 'password URL')
- password_url = compat_urlparse.urljoin(page_url, password_path)
- password_request = sanitized_Request(password_url, post)
- password_request.add_header('Content-type', 'application/x-www-form-urlencoded')
- self._set_vimeo_cookie('vuid', vuid)
- self._set_vimeo_cookie('xsrft', token)
-
- return self._download_webpage(
- password_request, list_id,
- 'Verifying the password', 'Wrong password')
-
def _title_and_entries(self, list_id, base_url):
for pagenum in itertools.count(1):
page_url = self._page_url(base_url, pagenum)
'Downloading page %s' % pagenum)
if pagenum == 1:
- webpage = self._login_list_password(page_url, list_id, webpage)
yield self._extract_list_title(webpage)
# Try extracting href first since not all videos are available via
_BASE_URL_TEMPL = 'https://vimeo.com/%s'
-class VimeoAlbumIE(VimeoChannelIE):
+class VimeoAlbumIE(VimeoBaseInfoExtractor):
IE_NAME = 'vimeo:album'
_VALID_URL = r'https://vimeo\.com/(?:album|showcase)/(?P<id>\d+)(?:$|[?#]|/(?!video))'
_TITLE_RE = r'<header id="page_header">\n\s*<h1>(.*?)</h1>'
def _real_extract(self, url):
album_id = self._match_id(url)
webpage = self._download_webpage(url, album_id)
- webpage = self._login_list_password(url, album_id, webpage)
- api_config = self._extract_vimeo_config(webpage, album_id)['api']
+ viewer = self._parse_json(self._search_regex(
+ r'bootstrap_data\s*=\s*({.+?})</script>',
+ webpage, 'bootstrap data'), album_id)['viewer']
+ jwt = viewer['jwt']
+ album = self._download_json(
+ 'https://api.vimeo.com/albums/' + album_id,
+ album_id, headers={'Authorization': 'jwt ' + jwt},
+ query={'fields': 'description,name,privacy'})
+ hashed_pass = None
+ if try_get(album, lambda x: x['privacy']['view']) == 'password':
+ password = self._downloader.params.get('videopassword')
+ if not password:
+ raise ExtractorError(
+ 'This album is protected by a password, use the --video-password option',
+ expected=True)
+ self._set_vimeo_cookie('vuid', viewer['vuid'])
+ try:
+ hashed_pass = self._download_json(
+ 'https://vimeo.com/showcase/%s/auth' % album_id,
+ album_id, 'Verifying the password', data=urlencode_postdata({
+ 'password': password,
+ 'token': viewer['xsrft'],
+ }), headers={
+ 'X-Requested-With': 'XMLHttpRequest',
+ })['hashed_pass']
+ except ExtractorError as e:
+ if isinstance(e.cause, compat_HTTPError) and e.cause.code == 401:
+ raise ExtractorError('Wrong password', expected=True)
+ raise
entries = OnDemandPagedList(functools.partial(
- self._fetch_page, album_id, api_config['jwt'],
- api_config.get('hashed_pass')), self._PAGE_SIZE)
- return self.playlist_result(entries, album_id, self._html_search_regex(
- r'<title>\s*(.+?)(?:\s+on Vimeo)?</title>', webpage, 'title', fatal=False))
+ self._fetch_page, album_id, jwt, hashed_pass), self._PAGE_SIZE)
+ return self.playlist_result(
+ entries, album_id, album.get('name'), album.get('description'))
class VimeoGroupsIE(VimeoChannelIE):
import itertools
from .common import InfoExtractor
-from ..compat import (
- compat_urllib_parse_urlencode,
- compat_str,
-)
+from .naver import NaverBaseIE
+from ..compat import compat_str
from ..utils import (
- dict_get,
ExtractorError,
- float_or_none,
- int_or_none,
+ merge_dicts,
remove_start,
try_get,
urlencode_postdata,
)
-class VLiveIE(InfoExtractor):
+class VLiveIE(NaverBaseIE):
IE_NAME = 'vlive'
_VALID_URL = r'https?://(?:(?:www|m)\.)?vlive\.tv/video/(?P<id>[0-9]+)'
_NETRC_MACHINE = 'vlive'
'title': "[V LIVE] Girl's Day's Broadcast",
'creator': "Girl's Day",
'view_count': int,
+ 'uploader_id': 'muploader_a',
},
}, {
'url': 'http://www.vlive.tv/video/16937',
'creator': 'EXO',
'view_count': int,
'subtitles': 'mincount:12',
+ 'uploader_id': 'muploader_j',
},
'params': {
'skip_download': True,
'This video is only available for CH+ subscribers')
long_video_id, key = video_info['vid'], video_info['inkey']
- playinfo = self._download_json(
- 'http://global.apis.naver.com/rmcnmv/rmcnmv/vod_play_videoInfo.json?%s'
- % compat_urllib_parse_urlencode({
- 'videoId': long_video_id,
- 'key': key,
- 'ptc': 'http',
- 'doct': 'json', # document type (xml or json)
- 'cpt': 'vtt', # captions type (vtt or ttml)
- }), video_id)
-
- formats = [{
- 'url': vid['source'],
- 'format_id': vid.get('encodingOption', {}).get('name'),
- 'abr': float_or_none(vid.get('bitrate', {}).get('audio')),
- 'vbr': float_or_none(vid.get('bitrate', {}).get('video')),
- 'width': int_or_none(vid.get('encodingOption', {}).get('width')),
- 'height': int_or_none(vid.get('encodingOption', {}).get('height')),
- 'filesize': int_or_none(vid.get('size')),
- } for vid in playinfo.get('videos', {}).get('list', []) if vid.get('source')]
- self._sort_formats(formats)
-
- view_count = int_or_none(playinfo.get('meta', {}).get('count'))
-
- subtitles = {}
- for caption in playinfo.get('captions', {}).get('list', []):
- lang = dict_get(caption, ('locale', 'language', 'country', 'label'))
- if lang and caption.get('source'):
- subtitles[lang] = [{
- 'ext': 'vtt',
- 'url': caption['source']}]
-
- info = self._get_common_fields(webpage)
- info.update({
- 'id': video_id,
- 'formats': formats,
- 'view_count': view_count,
- 'subtitles': subtitles,
- })
- return info
+ return merge_dicts(
+ self._get_common_fields(webpage),
+ self._extract_video_info(video_id, long_video_id, key))
def _download_init_page(self, video_id):
return self._download_webpage(
class VODPlatformIE(InfoExtractor):
- _VALID_URL = r'https?://(?:www\.)?vod-platform\.net/[eE]mbed/(?P<id>[^/?#]+)'
- _TEST = {
+ _VALID_URL = r'https?://(?:(?:www\.)?vod-platform\.net|embed\.kwikmotion\.com)/[eE]mbed/(?P<id>[^/?#]+)'
+ _TESTS = [{
# from http://www.lbcgroup.tv/watch/chapter/29143/52844/%D8%A7%D9%84%D9%86%D8%B5%D8%B1%D8%A9-%D9%81%D9%8A-%D8%B6%D9%8A%D8%A7%D9%81%D8%A9-%D8%A7%D9%84%D9%80-cnn/ar
'url': 'http://vod-platform.net/embed/RufMcytHDolTH1MuKHY9Fw',
'md5': '1db2b7249ce383d6be96499006e951fc',
'ext': 'mp4',
'title': 'LBCi News_ النصرة في ضيافة الـ "سي.أن.أن"',
}
- }
+ }, {
+ 'url': 'http://embed.kwikmotion.com/embed/RufMcytHDolTH1MuKHY9Fw',
+ 'only_matching': True,
+ }]
def _real_extract(self, url):
video_id = self._match_id(url)
--- /dev/null
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..compat import compat_str
+from ..utils import (
+ ExtractorError,
+ determine_ext,
+ int_or_none,
+ urljoin,
+)
+
+
+class VoiceRepublicIE(InfoExtractor):
+ _VALID_URL = r'https?://voicerepublic\.com/(?:talks|embed)/(?P<id>[0-9a-z-]+)'
+ _TESTS = [{
+ 'url': 'http://voicerepublic.com/talks/watching-the-watchers-building-a-sousveillance-state',
+ 'md5': 'b9174d651323f17783000876347116e3',
+ 'info_dict': {
+ 'id': '2296',
+ 'display_id': 'watching-the-watchers-building-a-sousveillance-state',
+ 'ext': 'm4a',
+ 'title': 'Watching the Watchers: Building a Sousveillance State',
+ 'description': 'Secret surveillance programs have metadata too. The people and companies that operate secret surveillance programs can be surveilled.',
+ 'duration': 1556,
+ 'view_count': int,
+ }
+ }, {
+ 'url': 'http://voicerepublic.com/embed/watching-the-watchers-building-a-sousveillance-state',
+ 'only_matching': True,
+ }]
+
+ def _real_extract(self, url):
+ display_id = self._match_id(url)
+
+ webpage = self._download_webpage(url, display_id)
+
+ if '>Queued for processing, please stand by...<' in webpage:
+ raise ExtractorError(
+ 'Audio is still queued for processing', expected=True)
+
+ talk = self._parse_json(self._search_regex(
+ r'initialSnapshot\s*=\s*({.+?});',
+ webpage, 'talk'), display_id)['talk']
+ title = talk['title']
+ formats = [{
+ 'url': urljoin(url, talk_url),
+ 'format_id': format_id,
+ 'ext': determine_ext(talk_url) or format_id,
+ 'vcodec': 'none',
+ } for format_id, talk_url in talk['media_links'].items()]
+ self._sort_formats(formats)
+
+ return {
+ 'id': compat_str(talk.get('id') or display_id),
+ 'display_id': display_id,
+ 'title': title,
+ 'description': talk.get('teaser'),
+ 'thumbnail': talk.get('image_url'),
+ 'duration': int_or_none(talk.get('archived_duration')),
+ 'view_count': int_or_none(talk.get('play_count')),
+ 'formats': formats,
+ }
site, display_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(url, display_id)
attrs = extract_attributes(self._search_regex(
- r'(<[^>]+class="vrtvideo"[^>]*>)', webpage, 'vrt video'))
+ r'(<[^>]+class="vrtvideo( [^"]*)?"[^>]*>)', webpage, 'vrt video'))
- asset_id = attrs['data-videoid']
- publication_id = attrs.get('data-publicationid')
+ asset_id = attrs['data-video-id']
+ publication_id = attrs.get('data-publication-id')
if publication_id:
asset_id = publication_id + '$' + asset_id
- client = attrs.get('data-client') or self._CLIENT_MAP[site]
+ client = attrs.get('data-client-code') or self._CLIENT_MAP[site]
title = strip_or_none(get_element_by_class(
'vrt-title', webpage) or self._html_search_meta(
media_resource = metadata['mediaResource']
formats = []
+ subtitles = {}
# check if the metadata contains a direct URL to a file
for kind, media_resource in media_resource.items():
+ if kind == 'captionsHash':
+ for ext, url in media_resource.items():
+ subtitles.setdefault('de', []).append({
+ 'url': url,
+ 'ext': ext,
+ })
+ continue
+
if kind not in ('dflt', 'alt'):
continue
self._sort_formats(formats)
- subtitles = {}
- caption_url = media_resource.get('captionURL')
- if caption_url:
- subtitles['de'] = [{
- 'url': caption_url,
- 'ext': 'ttml',
- }]
-
title = tracker_data['trackerClipTitle']
return {
--- /dev/null
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..utils import (
+ ExtractorError,
+ int_or_none,
+ float_or_none,
+ unescapeHTML,
+)
+
+
+class WistiaIE(InfoExtractor):
+ _VALID_URL = r'(?:wistia:|https?://(?:fast\.)?wistia\.(?:net|com)/embed/(?:iframe|medias)/)(?P<id>[a-z0-9]{10})'
+ _EMBED_BASE_URL = 'http://fast.wistia.com/embed/'
+
+ _TESTS = [{
+ 'url': 'http://fast.wistia.net/embed/iframe/sh7fpupwlt',
+ 'md5': 'cafeb56ec0c53c18c97405eecb3133df',
+ 'info_dict': {
+ 'id': 'sh7fpupwlt',
+ 'ext': 'mov',
+ 'title': 'Being Resourceful',
+ 'description': 'a Clients From Hell Video Series video from worldwidewebhosting',
+ 'upload_date': '20131204',
+ 'timestamp': 1386185018,
+ 'duration': 117,
+ },
+ }, {
+ 'url': 'wistia:sh7fpupwlt',
+ 'only_matching': True,
+ }, {
+ # with hls video
+ 'url': 'wistia:807fafadvk',
+ 'only_matching': True,
+ }, {
+ 'url': 'http://fast.wistia.com/embed/iframe/sh7fpupwlt',
+ 'only_matching': True,
+ }, {
+ 'url': 'http://fast.wistia.net/embed/medias/sh7fpupwlt.json',
+ 'only_matching': True,
+ }]
+
+ # https://wistia.com/support/embed-and-share/video-on-your-website
+ @staticmethod
+ def _extract_url(webpage):
+ urls = WistiaIE._extract_urls(webpage)
+ return urls[0] if urls else None
+
+ @staticmethod
+ def _extract_urls(webpage):
+ urls = []
+ for match in re.finditer(
+ r'<(?:meta[^>]+?content|(?:iframe|script)[^>]+?src)=["\'](?P<url>(?:https?:)?//(?:fast\.)?wistia\.(?:net|com)/embed/(?:iframe|medias)/[a-z0-9]{10})', webpage):
+ urls.append(unescapeHTML(match.group('url')))
+ for match in re.finditer(
+ r'''(?sx)
+ <div[^>]+class=(["'])(?:(?!\1).)*?\bwistia_async_(?P<id>[a-z0-9]{10})\b(?:(?!\1).)*?\1
+ ''', webpage):
+ urls.append('wistia:%s' % match.group('id'))
+ for match in re.finditer(r'(?:data-wistia-?id=["\']|Wistia\.embed\(["\']|id=["\']wistia_)(?P<id>[a-z0-9]{10})', webpage):
+ urls.append('wistia:%s' % match.group('id'))
+ return urls
+
+ def _real_extract(self, url):
+ video_id = self._match_id(url)
+
+ data_json = self._download_json(
+ self._EMBED_BASE_URL + 'medias/%s.json' % video_id, video_id,
+ # Some videos require this.
+ headers={
+ 'Referer': url if url.startswith('http') else self._EMBED_BASE_URL + 'iframe/' + video_id,
+ })
+
+ if data_json.get('error'):
+ raise ExtractorError(
+ 'Error while getting the playlist', expected=True)
+
+ data = data_json['media']
+ title = data['name']
+
+ formats = []
+ thumbnails = []
+ for a in data['assets']:
+ aurl = a.get('url')
+ if not aurl:
+ continue
+ astatus = a.get('status')
+ atype = a.get('type')
+ if (astatus is not None and astatus != 2) or atype in ('preview', 'storyboard'):
+ continue
+ elif atype in ('still', 'still_image'):
+ thumbnails.append({
+ 'url': aurl,
+ 'width': int_or_none(a.get('width')),
+ 'height': int_or_none(a.get('height')),
+ 'filesize': int_or_none(a.get('size')),
+ })
+ else:
+ aext = a.get('ext')
+ display_name = a.get('display_name')
+ format_id = atype
+ if atype and atype.endswith('_video') and display_name:
+ format_id = '%s-%s' % (atype[:-6], display_name)
+ f = {
+ 'format_id': format_id,
+ 'url': aurl,
+ 'tbr': int_or_none(a.get('bitrate')) or None,
+ 'preference': 1 if atype == 'original' else None,
+ }
+ if display_name == 'Audio':
+ f.update({
+ 'vcodec': 'none',
+ })
+ else:
+ f.update({
+ 'width': int_or_none(a.get('width')),
+ 'height': int_or_none(a.get('height')),
+ 'vcodec': a.get('codec'),
+ })
+ if a.get('container') == 'm3u8' or aext == 'm3u8':
+ ts_f = f.copy()
+ ts_f.update({
+ 'ext': 'ts',
+ 'format_id': f['format_id'].replace('hls-', 'ts-'),
+ 'url': f['url'].replace('.bin', '.ts'),
+ })
+ formats.append(ts_f)
+ f.update({
+ 'ext': 'mp4',
+ 'protocol': 'm3u8_native',
+ })
+ else:
+ f.update({
+ 'container': a.get('container'),
+ 'ext': aext,
+ 'filesize': int_or_none(a.get('size')),
+ })
+ formats.append(f)
+
+ self._sort_formats(formats)
+
+ subtitles = {}
+ for caption in data.get('captions', []):
+ language = caption.get('language')
+ if not language:
+ continue
+ subtitles[language] = [{
+ 'url': self._EMBED_BASE_URL + 'captions/' + video_id + '.vtt?language=' + language,
+ }]
+
+ return {
+ 'id': video_id,
+ 'title': title,
+ 'description': data.get('seoDescription'),
+ 'formats': formats,
+ 'thumbnails': thumbnails,
+ 'duration': float_or_none(data.get('duration')),
+ 'timestamp': int_or_none(data.get('createdAt')),
+ 'subtitles': subtitles,
+ }
class XHamsterIE(InfoExtractor):
- _DOMAINS = r'(?:xhamster\.(?:com|one|desi)|xhms\.pro|xhamster[27]\.com)'
+ _DOMAINS = r'(?:xhamster\.(?:com|one|desi)|xhms\.pro|xhamster\d+\.com)'
_VALID_URL = r'''(?x)
https?://
(?:.+?\.)?%s/
(?:
- movies/(?P<id>\d+)/(?P<display_id>[^/]*)\.html|
- videos/(?P<display_id_2>[^/]*)-(?P<id_2>\d+)
+ movies/(?P<id>[\dA-Za-z]+)/(?P<display_id>[^/]*)\.html|
+ videos/(?P<display_id_2>[^/]*)-(?P<id_2>[\dA-Za-z]+)
)
''' % _DOMAINS
_TESTS = [{
}, {
'url': 'https://xhamster2.com/videos/femaleagent-shy-beauty-takes-the-bait-1509445',
'only_matching': True,
+ }, {
+ 'url': 'https://xhamster11.com/videos/femaleagent-shy-beauty-takes-the-bait-1509445',
+ 'only_matching': True,
+ }, {
+ 'url': 'https://xhamster26.com/videos/femaleagent-shy-beauty-takes-the-bait-1509445',
+ 'only_matching': True,
}, {
'url': 'http://xhamster.com/movies/1509445/femaleagent_shy_beauty_takes_the_bait.html',
'only_matching': True,
}, {
'url': 'http://xhamster.com/movies/2221348/britney_spears_sexy_booty.html?hd',
'only_matching': True,
+ }, {
+ 'url': 'http://de.xhamster.com/videos/skinny-girl-fucks-herself-hard-in-the-forest-xhnBJZx',
+ 'only_matching': True,
}]
def _real_extract(self, url):
display_id = mobj.group('display_id') or mobj.group('display_id_2')
desktop_url = re.sub(r'^(https?://(?:.+?\.)?)m\.', r'\1', url)
- webpage = self._download_webpage(desktop_url, video_id)
+ webpage, urlh = self._download_webpage_handle(desktop_url, video_id)
error = self._html_search_regex(
r'<div[^>]+id=["\']videoClosed["\'][^>]*>(.+?)</div>',
initials = self._parse_json(
self._search_regex(
- r'window\.initials\s*=\s*({.+?})\s*;\s*\n', webpage, 'initials',
+ (r'window\.initials\s*=\s*({.+?})\s*;\s*</script>',
+ r'window\.initials\s*=\s*({.+?})\s*;'), webpage, 'initials',
default='{}'),
video_id, fatal=False)
if initials:
'ext': determine_ext(format_url, 'mp4'),
'height': get_height(quality),
'filesize': filesize,
+ 'http_headers': {
+ 'Referer': urlh.geturl(),
+ },
})
self._sort_formats(formats)
'display_id': 'A-Super-Run-Part-1-YT',
'ext': 'flv',
'title': 'A Super Run - Part 1 (YT)',
- 'description': 'md5:ca0d47afff4a9b2942e4b41aa970fd93',
+ 'description': 'md5:4cc3af1aa1b0413289babc88f0d4f616',
'uploader': 'tshirtguy59',
'duration': 579,
'view_count': int,
'Cookie': 'age_verified=1; cookiesAccepted=1',
})
- sources = self._parse_json(self._search_regex(
- r'(["\'])?sources\1?\s*:\s*(?P<sources>{.+?}),',
- webpage, 'sources', group='sources'), video_id,
- transform_source=js_to_json)
+ title, thumbnail, duration = [None] * 3
+
+ config = self._parse_json(self._search_regex(
+ r'playerConf\s*=\s*({.+?})\s*,\s*\n', webpage, 'config',
+ default='{}'), video_id, transform_source=js_to_json, fatal=False)
+ if config:
+ config = config.get('mainRoll')
+ if isinstance(config, dict):
+ title = config.get('title')
+ thumbnail = config.get('poster')
+ duration = int_or_none(config.get('duration'))
+ sources = config.get('sources') or config.get('format')
+
+ if not isinstance(sources, dict):
+ sources = self._parse_json(self._search_regex(
+ r'(["\'])?sources\1?\s*:\s*(?P<sources>{.+?}),',
+ webpage, 'sources', group='sources'), video_id,
+ transform_source=js_to_json)
formats = []
for format_id, format_url in sources.items():
self._remove_duplicate_formats(formats)
self._sort_formats(formats)
- title = self._search_regex(
- (r'<h1>\s*(?P<title>[^<]+?)\s*</h1>', r'videoTitle\s*:\s*(["\'])(?P<title>.+?)\1'),
- webpage, 'title', group='title')
- description = self._search_regex(
+ if not title:
+ title = self._search_regex(
+ (r'<h1>\s*(?P<title>[^<]+?)\s*</h1>', r'videoTitle\s*:\s*(["\'])(?P<title>.+?)\1'),
+ webpage, 'title', group='title')
+ description = self._og_search_description(
+ webpage, default=None) or self._html_search_meta(
+ 'twitter:description', webpage, default=None) or self._search_regex(
r'</h1>\s*<p>([^<]+)', webpage, 'description', fatal=False)
uploader = self._search_regex(
(r'<input[^>]+name="contentOwnerId"[^>]+value="([^"]+)"',
r'<span[^>]+class="nickname"[^>]*>([^<]+)'),
webpage, 'uploader', fatal=False)
- duration = parse_duration(self._search_regex(
- r'<dt>Runtime:?</dt>\s*<dd>([^<]+)</dd>',
- webpage, 'duration', fatal=False))
+ if not duration:
+ duration = parse_duration(self._search_regex(
+ r'<dt>Runtime:?</dt>\s*<dd>([^<]+)</dd>',
+ webpage, 'duration', fatal=False))
view_count = str_to_int(self._search_regex(
- r'<dt>Views:?</dt>\s*<dd>([\d,\.]+)</dd>',
+ (r'["\']viewsCount["\'][^>]*>(\d+)\s+views',
+ r'<dt>Views:?</dt>\s*<dd>([\d,\.]+)</dd>'),
webpage, 'view count', fatal=False))
comment_count = str_to_int(self._html_search_regex(
r'>Comments? \(([\d,\.]+)\)<',
'display_id': display_id,
'title': title,
'description': description,
+ 'thumbnail': thumbnail,
'uploader': uploader,
'duration': duration,
'view_count': view_count,
'id': 'greenshowers-4056496',
'age_limit': 18,
},
- 'playlist_mincount': 155,
+ 'playlist_mincount': 154,
}
def _real_extract(self, url):
)
from ..utils import (
clean_html,
+ ExtractorError,
int_or_none,
mimetype2ext,
parse_iso8601,
'url': 'https://gyao.yahoo.co.jp/episode/%E3%81%8D%E3%81%AE%E3%81%86%E4%BD%95%E9%A3%9F%E3%81%B9%E3%81%9F%EF%BC%9F%20%E7%AC%AC2%E8%A9%B1%202019%2F4%2F12%E6%94%BE%E9%80%81%E5%88%86/5cb02352-b725-409e-9f8d-88f947a9f682',
'only_matching': True,
}]
+ _GEO_BYPASS = False
def _real_extract(self, url):
video_id = self._match_id(url).replace('/', ':')
- video = self._download_json(
- 'https://gyao.yahoo.co.jp/dam/v1/videos/' + video_id,
- video_id, query={
- 'fields': 'longDescription,title,videoId',
- }, headers={
- 'X-User-Agent': 'Unknown Pc GYAO!/2.0.0 Web',
- })
+ headers = self.geo_verification_headers()
+ headers['Accept'] = 'application/json'
+ resp = self._download_json(
+ 'https://gyao.yahoo.co.jp/apis/playback/graphql', video_id, query={
+ 'appId': 'dj00aiZpPUNJeDh2cU1RazU3UCZzPWNvbnN1bWVyc2VjcmV0Jng9NTk-',
+ 'query': '''{
+ content(parameter: {contentId: "%s", logicaAgent: PC_WEB}) {
+ video {
+ delivery {
+ id
+ }
+ title
+ }
+ }
+}''' % video_id,
+ }, headers=headers)
+ content = resp['data']['content']
+ if not content:
+ msg = resp['errors'][0]['message']
+ if msg == 'not in japan':
+ self.raise_geo_restricted(countries=['JP'])
+ raise ExtractorError(msg)
+ video = content['video']
return {
'_type': 'url_transparent',
'id': video_id,
'title': video['title'],
'url': smuggle_url(
- 'http://players.brightcove.net/4235717419001/SyG5P0gjb_default/index.html?videoId=' + video['videoId'],
+ 'http://players.brightcove.net/4235717419001/SyG5P0gjb_default/index.html?videoId=' + video['delivery']['id'],
{'geo_countries': ['JP']}),
- 'description': video.get('longDescription'),
'ie_key': BrightcoveNewIE.ie_key(),
}
class YahooGyaOIE(InfoExtractor):
IE_NAME = 'yahoo:gyao'
- _VALID_URL = r'https?://(?:gyao\.yahoo\.co\.jp/(?:p|title/[^/]+)|streaming\.yahoo\.co\.jp/p/y)/(?P<id>\d+/v\d+|[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})'
+ _VALID_URL = r'https?://(?:gyao\.yahoo\.co\.jp/(?:p|title(?:/[^/]+)?)|streaming\.yahoo\.co\.jp/p/y)/(?P<id>\d+/v\d+|[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})'
_TESTS = [{
'url': 'https://gyao.yahoo.co.jp/p/00449/v03102/',
'info_dict': {
}, {
'url': 'https://gyao.yahoo.co.jp/title/%E3%81%97%E3%82%83%E3%81%B9%E3%81%8F%E3%82%8A007/5b025a49-b2e5-4dc7-945c-09c6634afacf',
'only_matching': True,
+ }, {
+ 'url': 'https://gyao.yahoo.co.jp/title/5b025a49-b2e5-4dc7-945c-09c6634afacf',
+ 'only_matching': True,
}]
def _real_extract(self, url):
@staticmethod
def _raise_captcha():
raise ExtractorError(
- 'YandexMusic has considered youtube-dl requests automated and '
+ 'YandexMusic has considered youtube-dlc requests automated and '
'asks you to solve a CAPTCHA. You can either wait for some '
'time until unblocked and optionally use --sleep-interval '
'in future or alternatively you can go to https://music.yandex.ru/ '
'solve CAPTCHA, then export cookies and pass cookie file to '
- 'youtube-dl with --cookies',
+ 'youtube-dlc with --cookies',
expected=True)
def _download_webpage_handle(self, *args, **kwargs):
encodings = self._parse_json(
self._search_regex(
- r'encodings\s*=\s*(\[.+?\]);\n', webpage, 'encodings',
+ r'[Ee]ncodings\s*=\s*(\[.+?\]);\n', webpage, 'encodings',
default='[]'),
video_id, fatal=False)
for encoding in encodings:
from .common import InfoExtractor
from ..utils import (
int_or_none,
- sanitized_Request,
str_to_int,
unescapeHTML,
unified_strdate,
class YouPornIE(InfoExtractor):
- _VALID_URL = r'https?://(?:www\.)?youporn\.com/watch/(?P<id>\d+)/(?P<display_id>[^/?#&]+)'
+ _VALID_URL = r'https?://(?:www\.)?youporn\.com/(?:watch|embed)/(?P<id>\d+)(?:/(?P<display_id>[^/?#&]+))?'
_TESTS = [{
'url': 'http://www.youporn.com/watch/505835/sex-ed-is-it-safe-to-masturbate-daily/',
'md5': '3744d24c50438cf5b6f6d59feb5055c2',
'params': {
'skip_download': True,
},
+ }, {
+ 'url': 'https://www.youporn.com/embed/505835/sex-ed-is-it-safe-to-masturbate-daily/',
+ 'only_matching': True,
+ }, {
+ 'url': 'http://www.youporn.com/watch/505835',
+ 'only_matching': True,
}]
+ @staticmethod
+ def _extract_urls(webpage):
+ return re.findall(
+ r'<iframe[^>]+\bsrc=["\']((?:https?:)?//(?:www\.)?youporn\.com/embed/\d+)',
+ webpage)
+
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
- display_id = mobj.group('display_id')
+ display_id = mobj.group('display_id') or video_id
- request = sanitized_Request(url)
- request.add_header('Cookie', 'age_verified=1')
- webpage = self._download_webpage(request, display_id)
+ webpage = self._download_webpage(
+ 'http://www.youporn.com/watch/%s' % video_id, display_id,
+ headers={'Cookie': 'age_verified=1'})
title = self._html_search_regex(
r'(?s)<div[^>]+class=["\']watchVideoTitle[^>]+>(.+?)</div>',
from __future__ import unicode_literals
from .common import InfoExtractor
+from ..compat import compat_str
from ..utils import (
parse_duration,
urljoin,
class YourPornIE(InfoExtractor):
- _VALID_URL = r'https?://(?:www\.)?(?:yourporn\.sexy|sxyprn\.com)/post/(?P<id>[^/?#&.]+)'
+ _VALID_URL = r'https?://(?:www\.)?sxyprn\.com/post/(?P<id>[^/?#&.]+)'
_TESTS = [{
- 'url': 'https://yourporn.sexy/post/57ffcb2e1179b.html',
+ 'url': 'https://sxyprn.com/post/57ffcb2e1179b.html',
'md5': '6f8682b6464033d87acaa7a8ff0c092e',
'info_dict': {
'id': '57ffcb2e1179b',
webpage = self._download_webpage(url, video_id)
- video_url = urljoin(url, self._parse_json(
+ parts = self._parse_json(
self._search_regex(
r'data-vnfo=(["\'])(?P<data>{.+?})\1', webpage, 'data info',
group='data'),
- video_id)[video_id]).replace('/cdn/', '/cdn5/')
+ video_id)[video_id].split('/')
+
+ num = 0
+ for c in parts[6] + parts[7]:
+ if c.isnumeric():
+ num += int(c)
+ parts[5] = compat_str(int(parts[5]) - num)
+ parts[1] += '8'
+ video_url = urljoin(url, '/'.join(parts))
title = (self._search_regex(
r'<[^>]+\bclass=["\']PostEditTA[^>]+>([^<]+)', webpage, 'title',
'thumbnail': thumbnail,
'duration': duration,
'age_limit': 18,
+ 'ext': 'mp4',
}
from ..utils import (
bool_or_none,
clean_html,
- dict_get,
error_to_compat_str,
extract_attributes,
ExtractorError,
_PLAYLIST_ID_RE = r'(?:PL|LL|EC|UU|FL|RD|UL|TL|PU|OLAK5uy_)[0-9A-Za-z-_]{10,}'
+ _YOUTUBE_CLIENT_HEADERS = {
+ 'x-youtube-client-name': '1',
+ 'x-youtube-client-version': '1.20200609.04.02',
+ }
+
def _set_language(self):
self._set_cookie(
- '.youtube.com', 'PREF', 'f1=50000000&hl=en',
+ '.youtube.com', 'PREF', 'f1=50000000&f6=8&hl=en',
# YouTube sets the expire time to about two months
expire_time=time.time() + 2 * 30 * 24 * 3600)
# Downloading page may result in intermittent 5xx HTTP error
# that is usually worked around with a retry
more = self._download_json(
- 'https://youtube.com/%s' % mobj.group('more'), playlist_id,
+ 'https://www.youtube.com/%s' % mobj.group('more'), playlist_id,
'Downloading page #%s%s'
% (page_num, ' (retry #%d)' % count if count else ''),
- transform_source=uppercase_escape)
+ transform_source=uppercase_escape,
+ headers=self._YOUTUBE_CLIENT_HEADERS)
break
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code in (500, 503):
(?:www\.)?invidious\.drycat\.fr/|
(?:www\.)?tube\.poal\.co/|
(?:www\.)?vid\.wxzm\.sx/|
+ (?:www\.)?yewtu\.be/|
(?:www\.)?yt\.elukerio\.org/|
(?:www\.)?yt\.lelux\.fi/|
+ (?:www\.)?invidious\.ggc-project\.de/|
+ (?:www\.)?yt\.maisputain\.ovh/|
+ (?:www\.)?invidious\.13ad\.de/|
+ (?:www\.)?invidious\.toot\.koeln/|
+ (?:www\.)?invidious\.fdn\.fr/|
+ (?:www\.)?watch\.nettohikari\.com/|
(?:www\.)?kgg2m7yk5aybusll\.onion/|
(?:www\.)?qklhadlycap4cnod\.onion/|
(?:www\.)?axqzx4s6s54s32yentfqojs3x5i7faxza6xo3ehd4bzzsg2ii4fv2iid\.onion/|
(?:www\.)?fz253lmuao3strwbfbmx46yu7acac2jz27iwtorgmbqlkurlclmancad\.onion/|
(?:www\.)?invidious\.l4qlywnpwqsluw65ts7md3khrivpirse744un3x7mlskqauz5pyuzgqd\.onion/|
(?:www\.)?owxfohz4kjyv25fvlqilyxast7inivgiktls3th44jhk3ej3i7ya\.b32\.i2p/|
+ (?:www\.)?4l2dgddgsrkf2ous66i6seeyi6etzfgrue332grh2n7madpwopotugyd\.onion/|
youtube\.googleapis\.com/) # the various hostnames, with wildcard subdomains
(?:.*?\#/)? # handle anchor (#/) redirect urls
(?: # the various things that can precede the ID:
(?(1).+)? # if we found the ID, everything can follow
$""" % {'playlist_id': YoutubeBaseInfoExtractor._PLAYLIST_ID_RE}
_NEXT_URL_RE = r'[\?&]next_url=([^&]+)'
+ _PLAYER_INFO_RE = (
+ r'/(?P<id>[a-zA-Z0-9_-]{8,})/player_ias\.vflset(?:/[a-zA-Z]{2,3}_[a-zA-Z]{2,3})?/base\.(?P<ext>[a-z]+)$',
+ r'\b(?P<id>vfl[a-zA-Z0-9_-]+)\b.*?\.(?P<ext>[a-z]+)$',
+ )
_formats = {
'5': {'ext': 'flv', 'width': 400, 'height': 240, 'acodec': 'mp3', 'abr': 64, 'vcodec': 'h263'},
'6': {'ext': 'flv', 'width': 450, 'height': 270, 'acodec': 'mp3', 'abr': 64, 'vcodec': 'h263'},
'396': {'acodec': 'none', 'vcodec': 'av01.0.05M.08'},
'397': {'acodec': 'none', 'vcodec': 'av01.0.05M.08'},
}
- _SUBTITLE_FORMATS = ('srv1', 'srv2', 'srv3', 'ttml', 'vtt')
+ _SUBTITLE_FORMATS = ('srv1', 'srv2', 'srv3', 'ttml', 'vtt', 'json3')
_GEO_BYPASS = False
'upload_date': '20120506',
'title': 'Icona Pop - I Love It (feat. Charli XCX) [OFFICIAL VIDEO]',
'alt_title': 'I Love It (feat. Charli XCX)',
- 'description': 'md5:f3ceb5ef83a08d95b9d146f973157cc8',
+ 'description': 'md5:19a2f98d9032b9311e686ed039564f63',
'tags': ['Icona Pop i love it', 'sweden', 'pop music', 'big beat records', 'big beat', 'charli',
'xcx', 'charli xcx', 'girls', 'hbo', 'i love it', "i don't care", 'icona', 'pop',
'iconic ep', 'iconic', 'love', 'it'],
'id': 'nfWlot6h_JM',
'ext': 'm4a',
'title': 'Taylor Swift - Shake It Off',
- 'description': 'md5:bec2185232c05479482cb5a9b82719bf',
+ 'description': 'md5:307195cd21ff7fa352270fe884570ef0',
'duration': 242,
'uploader': 'TaylorSwiftVEVO',
'uploader_id': 'TaylorSwiftVEVO',
'upload_date': '20140818',
- 'creator': 'Taylor Swift',
},
'params': {
'youtube_include_dash_manifest': True,
'upload_date': '20100430',
'uploader_id': 'deadmau5',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/deadmau5',
- 'creator': 'deadmau5',
+ 'creator': 'Dada Life, deadmau5',
'description': 'md5:12c56784b8032162bb936a5f76d55360',
'uploader': 'deadmau5',
'title': 'Deadmau5 - Some Chords (HD)',
- 'alt_title': 'Some Chords',
+ 'alt_title': 'This Machine Kills Some Chords',
},
'expected_warnings': [
'DASH manifest missing',
'skip_download': True,
'youtube_include_dash_manifest': False,
},
+ 'skip': 'not actual anymore',
},
{
# Youtube Music Auto-generated description
'title': 'Voyeur Girl',
'description': 'md5:7ae382a65843d6df2685993e90a8628f',
'upload_date': '20190312',
- 'uploader': 'Various Artists - Topic',
- 'uploader_id': 'UCVWKBi1ELZn0QX2CBLSkiyw',
+ 'uploader': 'Stephen - Topic',
+ 'uploader_id': 'UC-pWHpBjdGG69N9mM2auIAA',
'artist': 'Stephen',
'track': 'Voyeur Girl',
'album': 'it\'s too much love to know my dear',
'id': '-hcAI0g-f5M',
'ext': 'mp4',
'title': 'Put It On Me',
- 'description': 'md5:93c55acc682ae7b0c668f2e34e1c069e',
+ 'description': 'md5:f6422397c07c4c907c6638e1fee380a5',
'upload_date': '20180426',
'uploader': 'Matt Maeson - Topic',
'uploader_id': 'UCnEkIGqtGcQMLk73Kp-Q5LQ',
'url': 'https://www.youtubekids.com/watch?v=3b8nCWDgZ6Q',
'only_matching': True,
},
+ {
+ # invalid -> valid video id redirection
+ 'url': 'DJztXj2GPfl',
+ 'info_dict': {
+ 'id': 'DJztXj2GPfk',
+ 'ext': 'mp4',
+ 'title': 'Panjabi MC - Mundian To Bach Ke (The Dictator Soundtrack)',
+ 'description': 'md5:bf577a41da97918e94fa9798d9228825',
+ 'upload_date': '20090125',
+ 'uploader': 'Prochorowka',
+ 'uploader_id': 'Prochorowka',
+ 'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/Prochorowka',
+ 'artist': 'Panjabi MC',
+ 'track': 'Beware of the Boys (Mundian to Bach Ke) - Motivo Hi-Lectro Remix',
+ 'album': 'Beware of the Boys (Mundian To Bach Ke)',
+ },
+ 'params': {
+ 'skip_download': True,
+ },
+ }
]
def __init__(self, *args, **kwargs):
""" Return a string representation of a signature """
return '.'.join(compat_str(len(part)) for part in example_sig.split('.'))
- def _extract_signature_function(self, video_id, player_url, example_sig):
- id_m = re.match(
- r'.*?-(?P<id>[a-zA-Z0-9_-]+)(?:/watch_as3|/html5player(?:-new)?|(?:/[a-z]{2,3}_[A-Z]{2})?/base)?\.(?P<ext>[a-z]+)$',
- player_url)
- if not id_m:
+ @classmethod
+ def _extract_player_info(cls, player_url):
+ for player_re in cls._PLAYER_INFO_RE:
+ id_m = re.search(player_re, player_url)
+ if id_m:
+ break
+ else:
raise ExtractorError('Cannot identify player %r' % player_url)
- player_type = id_m.group('ext')
- player_id = id_m.group('id')
+ return id_m.group('ext'), id_m.group('id')
+
+ def _extract_signature_function(self, video_id, player_url, example_sig):
+ player_type, player_id = self._extract_player_info(player_url)
# Read from filesystem cache
func_id = '%s_%s_%s' % (
funcname = self._search_regex(
(r'\b[cs]\s*&&\s*[adf]\.set\([^,]+\s*,\s*encodeURIComponent\s*\(\s*(?P<sig>[a-zA-Z0-9$]+)\(',
r'\b[a-zA-Z0-9]+\s*&&\s*[a-zA-Z0-9]+\.set\([^,]+\s*,\s*encodeURIComponent\s*\(\s*(?P<sig>[a-zA-Z0-9$]+)\(',
+ r'(?:\b|[^a-zA-Z0-9$])(?P<sig>[a-zA-Z0-9$]{2})\s*=\s*function\(\s*a\s*\)\s*{\s*a\s*=\s*a\.split\(\s*""\s*\)',
r'(?P<sig>[a-zA-Z0-9$]+)\s*=\s*function\(\s*a\s*\)\s*{\s*a\s*=\s*a\.split\(\s*""\s*\)',
# Obsolete patterns
r'(["\'])signature\1\s*,\s*(?P<sig>[a-zA-Z0-9$]+)\(',
raise ExtractorError(
'Signature extraction failed: ' + tb, cause=e)
- def _get_subtitles(self, video_id, webpage):
+ def _get_subtitles(self, video_id, webpage, has_live_chat_replay):
try:
subs_doc = self._download_xml(
'https://video.google.com/timedtext?hl=en&type=list&v=%s' % video_id,
'ext': ext,
})
sub_lang_list[lang] = sub_formats
+ if has_live_chat_replay:
+ sub_lang_list['live_chat'] = [
+ {
+ 'video_id': video_id,
+ 'ext': 'json',
+ 'protocol': 'youtube_live_chat_replay',
+ },
+ ]
if not sub_lang_list:
self._downloader.report_warning('video doesn\'t have subtitles')
return {}
return self._parse_json(
uppercase_escape(config), video_id, fatal=False)
+ def _get_yt_initial_data(self, video_id, webpage):
+ config = self._search_regex(
+ (r'window\["ytInitialData"\]\s*=\s*(.*?)(?<=});',
+ r'var\s+ytInitialData\s*=\s*(.*?)(?<=});'),
+ webpage, 'ytInitialData', default=None)
+ if config:
+ return self._parse_json(
+ uppercase_escape(config), video_id, fatal=False)
+
def _get_automatic_captions(self, video_id, webpage):
"""We need the webpage for getting the captions url, pass it as an
argument to speed up the process."""
player_response, video_id, fatal=False)
if player_response:
renderer = player_response['captions']['playerCaptionsTracklistRenderer']
- base_url = renderer['captionTracks'][0]['baseUrl']
- sub_lang_list = []
- for lang in renderer['translationLanguages']:
- lang_code = lang.get('languageCode')
- if lang_code:
- sub_lang_list.append(lang_code)
- return make_captions(base_url, sub_lang_list)
-
+ caption_tracks = renderer['captionTracks']
+ for caption_track in caption_tracks:
+ if 'kind' not in caption_track:
+ # not an automatic transcription
+ continue
+ base_url = caption_track['baseUrl']
+ sub_lang_list = []
+ for lang in renderer['translationLanguages']:
+ lang_code = lang.get('languageCode')
+ if lang_code:
+ sub_lang_list.append(lang_code)
+ return make_captions(base_url, sub_lang_list)
+
+ self._downloader.report_warning("Couldn't find automatic captions for %s" % video_id)
+ return {}
# Some videos don't provide ttsurl but rather caption_tracks and
# caption_translation_languages (e.g. 20LmZk1hakA)
# Does not used anymore as of 22.06.2017
video_id = mobj.group(2)
return video_id
+ def _extract_chapters_from_json(self, webpage, video_id, duration):
+ if not webpage:
+ return
+ initial_data = self._parse_json(
+ self._search_regex(
+ r'window\["ytInitialData"\] = (.+);\n', webpage,
+ 'player args', default='{}'),
+ video_id, fatal=False)
+ if not initial_data or not isinstance(initial_data, dict):
+ return
+ chapters_list = try_get(
+ initial_data,
+ lambda x: x['playerOverlays']
+ ['playerOverlayRenderer']
+ ['decoratedPlayerBarRenderer']
+ ['decoratedPlayerBarRenderer']
+ ['playerBar']
+ ['chapteredPlayerBarRenderer']
+ ['chapters'],
+ list)
+ if not chapters_list:
+ return
+
+ def chapter_time(chapter):
+ return float_or_none(
+ try_get(
+ chapter,
+ lambda x: x['chapterRenderer']['timeRangeStartMillis'],
+ int),
+ scale=1000)
+ chapters = []
+ for next_num, chapter in enumerate(chapters_list, start=1):
+ start_time = chapter_time(chapter)
+ if start_time is None:
+ continue
+ end_time = (chapter_time(chapters_list[next_num])
+ if next_num < len(chapters_list) else duration)
+ if end_time is None:
+ continue
+ title = try_get(
+ chapter, lambda x: x['chapterRenderer']['title']['simpleText'],
+ compat_str)
+ chapters.append({
+ 'start_time': start_time,
+ 'end_time': end_time,
+ 'title': title,
+ })
+ return chapters
+
@staticmethod
- def _extract_chapters(description, duration):
+ def _extract_chapters_from_description(description, duration):
if not description:
return None
chapter_lines = re.findall(
})
return chapters
+ def _extract_chapters(self, webpage, description, video_id, duration):
+ return (self._extract_chapters_from_json(webpage, video_id, duration)
+ or self._extract_chapters_from_description(description, duration))
+
def _real_extract(self, url):
url, smuggled_data = unsmuggle_url(url, {})
# Get video webpage
url = proto + '://www.youtube.com/watch?v=%s&gl=US&hl=en&has_verified=1&bpctr=9999999999' % video_id
- video_webpage = self._download_webpage(url, video_id)
+ video_webpage, urlh = self._download_webpage_handle(url, video_id)
+
+ qs = compat_parse_qs(compat_urllib_parse_urlparse(urlh.geturl()).query)
+ video_id = qs.get('v', [None])[0] or video_id
# Attempt to extract SWF player URL
mobj = re.search(r'swfConfig.*?"(https?:\\/\\/.*?watch.*?-.*?\.swf)"', video_webpage)
def extract_view_count(v_info):
return int_or_none(try_get(v_info, lambda x: x['view_count'][0]))
- def extract_token(v_info):
- return dict_get(v_info, ('account_playback_token', 'accountPlaybackToken', 'token'))
-
def extract_player_response(player_response, video_id):
pl_response = str_or_none(player_response)
if not pl_response:
player_response = {}
# Get video info
+ video_info = {}
embed_webpage = None
- if re.search(r'player-age-gate-content">', video_webpage) is not None:
+ if (self._og_search_property('restrictions:age', video_webpage, default=None) == '18+'
+ or re.search(r'player-age-gate-content">', video_webpage) is not None):
age_gate = True
# We simulate the access to the video from www.youtube.com/v/{video_id}
# this can be viewed without login into Youtube
r'"sts"\s*:\s*(\d+)', embed_webpage, 'sts', default=''),
})
video_info_url = proto + '://www.youtube.com/get_video_info?' + data
- video_info_webpage = self._download_webpage(
- video_info_url, video_id,
- note='Refetching age-gated info webpage',
- errnote='unable to download video info webpage')
- video_info = compat_parse_qs(video_info_webpage)
- pl_response = video_info.get('player_response', [None])[0]
- player_response = extract_player_response(pl_response, video_id)
- add_dash_mpd(video_info)
- view_count = extract_view_count(video_info)
+ try:
+ video_info_webpage = self._download_webpage(
+ video_info_url, video_id,
+ note='Refetching age-gated info webpage',
+ errnote='unable to download video info webpage')
+ except ExtractorError:
+ video_info_webpage = None
+ if video_info_webpage:
+ video_info = compat_parse_qs(video_info_webpage)
+ pl_response = video_info.get('player_response', [None])[0]
+ player_response = extract_player_response(pl_response, video_id)
+ add_dash_mpd(video_info)
+ view_count = extract_view_count(video_info)
else:
age_gate = False
- video_info = None
- sts = None
# Try looking directly into the video webpage
ytplayer_config = self._get_ytplayer_config(video_id, video_webpage)
if ytplayer_config:
args['ypc_vid'], YoutubeIE.ie_key(), video_id=args['ypc_vid'])
if args.get('livestream') == '1' or args.get('live_playback') == 1:
is_live = True
- sts = ytplayer_config.get('sts')
if not player_response:
player_response = extract_player_response(args.get('player_response'), video_id)
if not video_info or self._downloader.params.get('youtube_include_dash_manifest', True):
add_dash_mpd_pr(player_response)
- # We also try looking in get_video_info since it may contain different dashmpd
- # URL that points to a DASH manifest with possibly different itag set (some itags
- # are missing from DASH manifest pointed by webpage's dashmpd, some - from DASH
- # manifest pointed by get_video_info's dashmpd).
- # The general idea is to take a union of itags of both DASH manifests (for example
- # video with such 'manifest behavior' see https://github.com/ytdl-org/youtube-dl/issues/6093)
- self.report_video_info_webpage_download(video_id)
- for el in ('embedded', 'detailpage', 'vevo', ''):
- query = {
- 'video_id': video_id,
- 'ps': 'default',
- 'eurl': '',
- 'gl': 'US',
- 'hl': 'en',
- }
- if el:
- query['el'] = el
- if sts:
- query['sts'] = sts
- video_info_webpage = self._download_webpage(
- '%s://www.youtube.com/get_video_info' % proto,
- video_id, note=False,
- errnote='unable to download video info webpage',
- fatal=False, query=query)
- if not video_info_webpage:
- continue
- get_video_info = compat_parse_qs(video_info_webpage)
- if not player_response:
- pl_response = get_video_info.get('player_response', [None])[0]
- player_response = extract_player_response(pl_response, video_id)
- add_dash_mpd(get_video_info)
- if view_count is None:
- view_count = extract_view_count(get_video_info)
- if not video_info:
- video_info = get_video_info
- get_token = extract_token(get_video_info)
- if get_token:
- # Different get_video_info requests may report different results, e.g.
- # some may report video unavailability, but some may serve it without
- # any complaint (see https://github.com/ytdl-org/youtube-dl/issues/7362,
- # the original webpage as well as el=info and el=embedded get_video_info
- # requests report video unavailability due to geo restriction while
- # el=detailpage succeeds and returns valid data). This is probably
- # due to YouTube measures against IP ranges of hosting providers.
- # Working around by preferring the first succeeded video_info containing
- # the token if no such video_info yet was found.
- token = extract_token(video_info)
- if not token:
- video_info = get_video_info
- break
def extract_unavailable_message():
messages = []
if messages:
return '\n'.join(messages)
- if not video_info:
+ if not video_info and not player_response:
unavailable_message = extract_unavailable_message()
if not unavailable_message:
unavailable_message = 'Unable to extract video data'
raise ExtractorError(
'YouTube said: %s' % unavailable_message, expected=True, video_id=video_id)
+ if not isinstance(video_info, dict):
+ video_info = {}
+
video_details = try_get(
player_response, lambda x: x['videoDetails'], dict) or {}
+ microformat = try_get(
+ player_response, lambda x: x['microformat']['playerMicroformatRenderer'], dict) or {}
+
video_title = video_info.get('title', [None])[0] or video_details.get('title')
if not video_title:
self._downloader.report_warning('Unable to extract video title')
''', replace_url, video_description)
video_description = clean_html(video_description)
else:
- video_description = self._html_search_meta('description', video_webpage) or video_details.get('shortDescription')
+ video_description = video_details.get('shortDescription') or self._html_search_meta('description', video_webpage)
if not smuggled_data.get('force_singlefeed', False):
if not self._downloader.params.get('noplaylist'):
# fields may contain comma as well (see
# https://github.com/ytdl-org/youtube-dl/issues/8536)
feed_data = compat_parse_qs(compat_urllib_parse_unquote_plus(feed))
+
+ def feed_entry(name):
+ return try_get(feed_data, lambda x: x[name][0], compat_str)
+
+ feed_id = feed_entry('id')
+ if not feed_id:
+ continue
+ feed_title = feed_entry('title')
+ title = video_title
+ if feed_title:
+ title += ' (%s)' % feed_title
entries.append({
'_type': 'url_transparent',
'ie_key': 'Youtube',
'url': smuggle_url(
'%s://www.youtube.com/watch?v=%s' % (proto, feed_data['id'][0]),
{'force_singlefeed': True}),
- 'title': '%s (%s)' % (video_title, feed_data['title'][0]),
+ 'title': title,
})
- feed_ids.append(feed_data['id'][0])
+ feed_ids.append(feed_id)
self.to_screen(
'Downloading multifeed video (%s) - add --no-playlist to just download video %s'
% (', '.join(feed_ids), video_id))
view_count = extract_view_count(video_info)
if view_count is None and video_details:
view_count = int_or_none(video_details.get('viewCount'))
+ if view_count is None and microformat:
+ view_count = int_or_none(microformat.get('viewCount'))
if is_live is None:
is_live = bool_or_none(video_details.get('isLive'))
+ has_live_chat_replay = False
+ if not is_live:
+ yt_initial_data = self._get_yt_initial_data(video_id, video_webpage)
+ try:
+ yt_initial_data['contents']['twoColumnWatchNextResults']['conversationBar']['liveChatRenderer']['continuations'][0]['reloadContinuationData']['continuation']
+ has_live_chat_replay = True
+ except (KeyError, IndexError, TypeError):
+ pass
+
# Check for "rental" videos
if 'ypc_video_rental_bar_text' in video_info and 'author' not in video_info:
raise ExtractorError('"rental" videos not supported. See https://github.com/ytdl-org/youtube-dl/issues/359 for more information.', expected=True)
}
for fmt in streaming_formats:
- if fmt.get('drm_families'):
+ if fmt.get('drmFamilies') or fmt.get('drm_families'):
continue
url = url_or_none(fmt.get('url'))
if not url:
- cipher = fmt.get('cipher')
+ cipher = fmt.get('cipher') or fmt.get('signatureCipher')
if not cipher:
continue
url_data = compat_parse_qs(cipher)
if self._downloader.params.get('verbose'):
if player_url is None:
- player_version = 'unknown'
player_desc = 'unknown'
else:
- if player_url.endswith('swf'):
- player_version = self._search_regex(
- r'-(.+?)(?:/watch_as3)?\.swf$', player_url,
- 'flash player', fatal=False)
- player_desc = 'flash player %s' % player_version
- else:
- player_version = self._search_regex(
- [r'html5player-([^/]+?)(?:/html5player(?:-new)?)?\.js',
- r'(?:www|player(?:_ias)?)-([^/]+)(?:/[a-z]{2,3}_[A-Z]{2})?/base\.js'],
- player_url,
- 'html5 player', fatal=False)
- player_desc = 'html5 player %s' % player_version
-
+ player_type, player_version = self._extract_player_info(player_url)
+ player_desc = '%s player %s' % ('flash' if player_type == 'swf' else 'html5', player_version)
parts_sizes = self._signature_cache_id(encrypted_sig)
self.to_screen('{%s} signature length %s, %s' %
(format_id, parts_sizes, player_desc))
video_uploader_id = mobj.group('uploader_id')
video_uploader_url = mobj.group('uploader_url')
else:
- self._downloader.report_warning('unable to extract uploader nickname')
+ owner_profile_url = url_or_none(microformat.get('ownerProfileUrl'))
+ if owner_profile_url:
+ video_uploader_id = self._search_regex(
+ r'(?:user|channel)/([^/]+)', owner_profile_url, 'uploader id',
+ default=None)
+ video_uploader_url = owner_profile_url
channel_id = (
str_or_none(video_details.get('channelId'))
video_webpage, 'channel id', default=None, group='id'))
channel_url = 'http://www.youtube.com/channel/%s' % channel_id if channel_id else None
- # thumbnail image
- # We try first to get a high quality image:
- m_thumb = re.search(r'<span itemprop="thumbnail".*?href="(.*?)">',
- video_webpage, re.DOTALL)
- if m_thumb is not None:
- video_thumbnail = m_thumb.group(1)
- elif 'thumbnail_url' not in video_info:
- self._downloader.report_warning('unable to extract video thumbnail')
+ thumbnails = []
+ thumbnails_list = try_get(
+ video_details, lambda x: x['thumbnail']['thumbnails'], list) or []
+ for t in thumbnails_list:
+ if not isinstance(t, dict):
+ continue
+ thumbnail_url = url_or_none(t.get('url'))
+ if not thumbnail_url:
+ continue
+ thumbnails.append({
+ 'url': thumbnail_url,
+ 'width': int_or_none(t.get('width')),
+ 'height': int_or_none(t.get('height')),
+ })
+
+ if not thumbnails:
video_thumbnail = None
- else: # don't panic if we can't find it
- video_thumbnail = compat_urllib_parse_unquote_plus(video_info['thumbnail_url'][0])
+ # We try first to get a high quality image:
+ m_thumb = re.search(r'<span itemprop="thumbnail".*?href="(.*?)">',
+ video_webpage, re.DOTALL)
+ if m_thumb is not None:
+ video_thumbnail = m_thumb.group(1)
+ thumbnail_url = try_get(video_info, lambda x: x['thumbnail_url'][0], compat_str)
+ if thumbnail_url:
+ video_thumbnail = compat_urllib_parse_unquote_plus(thumbnail_url)
+ if video_thumbnail:
+ thumbnails.append({'url': video_thumbnail})
# upload date
upload_date = self._html_search_meta(
[r'(?s)id="eow-date.*?>(.*?)</span>',
r'(?:id="watch-uploader-info".*?>.*?|["\']simpleText["\']\s*:\s*["\'])(?:Published|Uploaded|Streamed live|Started) on (.+?)[<"\']'],
video_webpage, 'upload date', default=None)
+ if not upload_date:
+ upload_date = microformat.get('publishDate') or microformat.get('uploadDate')
upload_date = unified_strdate(upload_date)
video_license = self._html_search_regex(
m_cat_container = self._search_regex(
r'(?s)<h4[^>]*>\s*Category\s*</h4>\s*<ul[^>]*>(.*?)</ul>',
video_webpage, 'categories', default=None)
+ category = None
if m_cat_container:
category = self._html_search_regex(
r'(?s)<a[^<]+>(.*?)</a>', m_cat_container, 'category',
default=None)
- video_categories = None if category is None else [category]
- else:
- video_categories = None
+ if not category:
+ category = try_get(
+ microformat, lambda x: x['category'], compat_str)
+ video_categories = None if category is None else [category]
video_tags = [
unescapeHTML(m.group('content'))
for m in re.finditer(self._meta_regex('og:video:tag'), video_webpage)]
+ if not video_tags:
+ video_tags = try_get(video_details, lambda x: x['keywords'], list)
def _extract_count(count_name):
return str_to_int(self._search_regex(
or try_get(video_info, lambda x: float_or_none(x['avg_rating'][0])))
# subtitles
- video_subtitles = self.extract_subtitles(video_id, video_webpage)
+ video_subtitles = self.extract_subtitles(
+ video_id, video_webpage, has_live_chat_replay)
automatic_captions = self.extract_automatic_captions(video_id, video_webpage)
video_duration = try_get(
errnote='Unable to download video annotations', fatal=False,
data=urlencode_postdata({xsrf_field_name: xsrf_token}))
- chapters = self._extract_chapters(description_original, video_duration)
+ chapters = self._extract_chapters(video_webpage, description_original, video_id, video_duration)
# Look for the DASH manifest
if self._downloader.params.get('youtube_include_dash_manifest', True):
f['stretched_ratio'] = ratio
if not formats:
- token = extract_token(video_info)
- if not token:
- if 'reason' in video_info:
- if 'The uploader has not made this video available in your country.' in video_info['reason']:
- regions_allowed = self._html_search_meta(
- 'regionsAllowed', video_webpage, default=None)
- countries = regions_allowed.split(',') if regions_allowed else None
- self.raise_geo_restricted(
- msg=video_info['reason'][0], countries=countries)
- reason = video_info['reason'][0]
- if 'Invalid parameters' in reason:
- unavailable_message = extract_unavailable_message()
- if unavailable_message:
- reason = unavailable_message
- raise ExtractorError(
- 'YouTube said: %s' % reason,
- expected=True, video_id=video_id)
- else:
- raise ExtractorError(
- '"token" parameter not in video info for unknown reason',
- video_id=video_id)
-
- if not formats and (video_info.get('license_info') or try_get(player_response, lambda x: x['streamingData']['licenseInfos'])):
- raise ExtractorError('This video is DRM protected.', expected=True)
+ if 'reason' in video_info:
+ if 'The uploader has not made this video available in your country.' in video_info['reason']:
+ regions_allowed = self._html_search_meta(
+ 'regionsAllowed', video_webpage, default=None)
+ countries = regions_allowed.split(',') if regions_allowed else None
+ self.raise_geo_restricted(
+ msg=video_info['reason'][0], countries=countries)
+ reason = video_info['reason'][0]
+ if 'Invalid parameters' in reason:
+ unavailable_message = extract_unavailable_message()
+ if unavailable_message:
+ reason = unavailable_message
+ raise ExtractorError(
+ 'YouTube said: %s' % reason,
+ expected=True, video_id=video_id)
+ if video_info.get('license_info') or try_get(player_response, lambda x: x['streamingData']['licenseInfos']):
+ raise ExtractorError('This video is DRM protected.', expected=True)
self._sort_formats(formats)
'creator': video_creator or artist,
'title': video_title,
'alt_title': video_alt_title or track,
- 'thumbnail': video_thumbnail,
+ 'thumbnails': thumbnails,
'description': video_description,
'categories': video_categories,
'tags': video_tags,
_VIDEO_RE = _VIDEO_RE_TPL % r'(?P<id>[0-9A-Za-z_-]{11})'
IE_NAME = 'youtube:playlist'
_TESTS = [{
- 'url': 'https://www.youtube.com/playlist?list=PLwiyx1dc3P2JR9N8gQaQN_BCvlSlap7re',
+ 'url': 'https://www.youtube.com/playlist?list=PL4lCao7KL_QFVb7Iudeipvc2BCavECqzc',
'info_dict': {
- 'title': 'ytdl test PL',
- 'id': 'PLwiyx1dc3P2JR9N8gQaQN_BCvlSlap7re',
+ 'uploader_id': 'UCmlqkdCBesrv2Lak1mF_MxA',
+ 'uploader': 'Sergey M.',
+ 'id': 'PL4lCao7KL_QFVb7Iudeipvc2BCavECqzc',
+ 'title': 'youtube-dl public playlist',
},
- 'playlist_count': 3,
+ 'playlist_count': 1,
}, {
- 'url': 'https://www.youtube.com/playlist?list=PLtPgu7CB4gbZDA7i_euNxn75ISqxwZPYx',
+ 'url': 'https://www.youtube.com/playlist?list=PL4lCao7KL_QFodcLWhDpGCYnngnHtQ-Xf',
'info_dict': {
- 'id': 'PLtPgu7CB4gbZDA7i_euNxn75ISqxwZPYx',
- 'title': 'YDL_Empty_List',
+ 'uploader_id': 'UCmlqkdCBesrv2Lak1mF_MxA',
+ 'uploader': 'Sergey M.',
+ 'id': 'PL4lCao7KL_QFodcLWhDpGCYnngnHtQ-Xf',
+ 'title': 'youtube-dl empty playlist',
},
'playlist_count': 0,
- 'skip': 'This playlist is private',
}, {
'note': 'Playlist with deleted videos (#651). As a bonus, the video #51 is also twice in this list.',
'url': 'https://www.youtube.com/playlist?list=PLwP_SiAcdui0KVebT0mU9Apz359a4ubsC',
'uploader': 'Christiaan008',
'uploader_id': 'ChRiStIaAn008',
},
- 'playlist_count': 95,
+ 'playlist_count': 96,
}, {
'note': 'issue #673',
'url': 'PLBB231211A4F62143',
ids = []
last_id = playlist_id[-11:]
for n in itertools.count(1):
- url = 'https://youtube.com/watch?v=%s&list=%s' % (last_id, playlist_id)
+ url = 'https://www.youtube.com/watch?v=%s&list=%s' % (last_id, playlist_id)
webpage = self._download_webpage(
url, playlist_id, 'Downloading page {0} of Youtube mix'.format(n))
new_ids = orderedSet(re.findall(
class YoutubeUserIE(YoutubeChannelIE):
IE_DESC = 'YouTube.com user videos (URL or "ytuser" keyword)'
- _VALID_URL = r'(?:(?:https?://(?:\w+\.)?youtube\.com/(?:(?P<user>user|c)/)?(?!(?:attribution_link|watch|results|shared)(?:$|[^a-z_A-Z0-9-])))|ytuser:)(?!feed/)(?P<id>[A-Za-z0-9_-]+)'
+ _VALID_URL = r'(?:(?:https?://(?:\w+\.)?youtube\.com/(?:(?P<user>user|c)/)?(?!(?:attribution_link|watch|results|shared)(?:$|[^a-z_A-Z0-9%-])))|ytuser:)(?!feed/)(?P<id>[A-Za-z0-9_%-]+)'
_TEMPLATE_URL = 'https://www.youtube.com/%s/%s/videos'
IE_NAME = 'youtube:user'
}, {
'url': 'https://www.youtube.com/c/gametrailers',
'only_matching': True,
+ }, {
+ 'url': 'https://www.youtube.com/c/Pawe%C5%82Zadro%C5%BCniak',
+ 'only_matching': True,
}, {
'url': 'https://www.youtube.com/gametrailers',
'only_matching': True,
class YoutubePlaylistsIE(YoutubePlaylistsBaseInfoExtractor):
IE_DESC = 'YouTube.com user/channel playlists'
- _VALID_URL = r'https?://(?:\w+\.)?youtube\.com/(?:user|channel)/(?P<id>[^/]+)/playlists'
+ _VALID_URL = r'https?://(?:\w+\.)?youtube\.com/(?:user|channel|c)/(?P<id>[^/]+)/playlists'
IE_NAME = 'youtube:playlists'
_TESTS = [{
'title': 'Chem Player',
},
'skip': 'Blocked',
+ }, {
+ 'url': 'https://www.youtube.com/c/ChristophLaimer/playlists',
+ 'only_matching': True,
}]
break
more = self._download_json(
- 'https://youtube.com/%s' % mobj.group('more'), self._PLAYLIST_TITLE,
+ 'https://www.youtube.com/%s' % mobj.group('more'), self._PLAYLIST_TITLE,
'Downloading page #%s' % page_num,
- transform_source=uppercase_escape)
+ transform_source=uppercase_escape,
+ headers=self._YOUTUBE_CLIENT_HEADERS)
content_html = more['content_html']
more_widget_html = more['load_more_widget_html']
'timestamp': 1359044972,
'upload_date': '20130124',
'view_count': int,
- 'comment_count': int,
},
},
{
class ZDFIE(ZDFBaseIE):
- _VALID_URL = r'https?://www\.zdf\.de/(?:[^/]+/)*(?P<id>[^/?]+)\.html'
+ IE_NAME = "ZDF-3sat"
+ _VALID_URL = r'https?://www\.(zdf|3sat)\.de/(?:[^/]+/)*(?P<id>[^/?]+)\.html'
_QUALITIES = ('auto', 'low', 'med', 'high', 'veryhigh')
_GEO_COUNTRIES = ['DE']
_TESTS = [{
+ 'url': 'https://www.3sat.de/wissen/wissenschaftsdoku/luxusgut-lebensraum-100.html',
+ 'info_dict': {
+ 'id': 'luxusgut-lebensraum-100',
+ 'ext': 'mp4',
+ 'title': 'Luxusgut Lebensraum',
+ 'description': 'md5:5c09b2f45ac3bc5233d1b50fc543d061',
+ 'duration': 2601,
+ 'timestamp': 1566497700,
+ 'upload_date': '20190822',
+ }
+ }, {
'url': 'https://www.zdf.de/dokumentation/terra-x/die-magie-der-farben-von-koenigspurpur-und-jeansblau-100.html',
'info_dict': {
'id': 'die-magie-der-farben-von-koenigspurpur-und-jeansblau-100',
'id': 'das-aktuelle-sportstudio',
'title': 'das aktuelle sportstudio | ZDF',
},
- 'playlist_count': 21,
+ 'playlist_mincount': 23,
}, {
'url': 'https://www.zdf.de/dokumentation/planet-e',
'info_dict': {
'id': 'planet-e',
'title': 'planet e.',
},
- 'playlist_count': 4,
+ 'playlist_mincount': 50,
}, {
'url': 'https://www.zdf.de/filme/taunuskrimi/',
'only_matching': True,
--- /dev/null
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..compat import compat_HTTPError
+from ..utils import (
+ dict_get,
+ ExtractorError,
+ int_or_none,
+ js_to_json,
+ parse_iso8601,
+)
+
+
+class ZypeIE(InfoExtractor):
+ _ID_RE = r'[\da-fA-F]+'
+ _COMMON_RE = r'//player\.zype\.com/embed/%s\.(?:js|json|html)\?.*?(?:access_token|(?:ap[ip]|player)_key)='
+ _VALID_URL = r'https?:%s[^&]+' % (_COMMON_RE % ('(?P<id>%s)' % _ID_RE))
+ _TEST = {
+ 'url': 'https://player.zype.com/embed/5b400b834b32992a310622b9.js?api_key=jZ9GUhRmxcPvX7M3SlfejB6Hle9jyHTdk2jVxG7wOHPLODgncEKVdPYBhuz9iWXQ&autoplay=false&controls=true&da=false',
+ 'md5': 'eaee31d474c76a955bdaba02a505c595',
+ 'info_dict': {
+ 'id': '5b400b834b32992a310622b9',
+ 'ext': 'mp4',
+ 'title': 'Smoky Barbecue Favorites',
+ 'thumbnail': r're:^https?://.*\.jpe?g',
+ 'description': 'md5:5ff01e76316bd8d46508af26dc86023b',
+ 'timestamp': 1504915200,
+ 'upload_date': '20170909',
+ },
+ }
+
+ @staticmethod
+ def _extract_urls(webpage):
+ return [
+ mobj.group('url')
+ for mobj in re.finditer(
+ r'<script[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?%s.+?)\1' % (ZypeIE._COMMON_RE % ZypeIE._ID_RE),
+ webpage)]
+
+ def _real_extract(self, url):
+ video_id = self._match_id(url)
+
+ try:
+ response = self._download_json(re.sub(
+ r'\.(?:js|html)\?', '.json?', url), video_id)['response']
+ except ExtractorError as e:
+ if isinstance(e.cause, compat_HTTPError) and e.cause.code in (400, 401, 403):
+ raise ExtractorError(self._parse_json(
+ e.cause.read().decode(), video_id)['message'], expected=True)
+ raise
+
+ body = response['body']
+ video = response['video']
+ title = video['title']
+
+ if isinstance(body, dict):
+ formats = []
+ for output in body.get('outputs', []):
+ output_url = output.get('url')
+ if not output_url:
+ continue
+ name = output.get('name')
+ if name == 'm3u8':
+ formats = self._extract_m3u8_formats(
+ output_url, video_id, 'mp4',
+ 'm3u8_native', m3u8_id='hls', fatal=False)
+ else:
+ f = {
+ 'format_id': name,
+ 'tbr': int_or_none(output.get('bitrate')),
+ 'url': output_url,
+ }
+ if name in ('m4a', 'mp3'):
+ f['vcodec'] = 'none'
+ else:
+ f.update({
+ 'height': int_or_none(output.get('height')),
+ 'width': int_or_none(output.get('width')),
+ })
+ formats.append(f)
+ text_tracks = body.get('subtitles') or []
+ else:
+ m3u8_url = self._search_regex(
+ r'(["\'])(?P<url>(?:(?!\1).)+\.m3u8(?:(?!\1).)*)\1',
+ body, 'm3u8 url', group='url')
+ formats = self._extract_m3u8_formats(
+ m3u8_url, video_id, 'mp4', 'm3u8_native', m3u8_id='hls')
+ text_tracks = self._search_regex(
+ r'textTracks\s*:\s*(\[[^]]+\])',
+ body, 'text tracks', default=None)
+ if text_tracks:
+ text_tracks = self._parse_json(
+ text_tracks, video_id, js_to_json, False)
+ self._sort_formats(formats)
+
+ subtitles = {}
+ if text_tracks:
+ for text_track in text_tracks:
+ tt_url = dict_get(text_track, ('file', 'src'))
+ if not tt_url:
+ continue
+ subtitles.setdefault(text_track.get('label') or 'English', []).append({
+ 'url': tt_url,
+ })
+
+ thumbnails = []
+ for thumbnail in video.get('thumbnails', []):
+ thumbnail_url = thumbnail.get('url')
+ if not thumbnail_url:
+ continue
+ thumbnails.append({
+ 'url': thumbnail_url,
+ 'width': int_or_none(thumbnail.get('width')),
+ 'height': int_or_none(thumbnail.get('height')),
+ })
+
+ return {
+ 'id': video_id,
+ 'display_id': video.get('friendly_title'),
+ 'title': title,
+ 'thumbnails': thumbnails,
+ 'description': dict_get(video, ('description', 'ott_description', 'short_description')),
+ 'timestamp': parse_iso8601(video.get('published_at')),
+ 'duration': int_or_none(video.get('duration')),
+ 'view_count': int_or_none(video.get('request_count')),
+ 'average_rating': int_or_none(video.get('rating')),
+ 'season_number': int_or_none(video.get('season')),
+ 'episode_number': int_or_none(video.get('episode')),
+ 'formats': formats,
+ 'subtitles': subtitles,
+ }
def _readUserConf():
xdg_config_home = compat_getenv('XDG_CONFIG_HOME')
if xdg_config_home:
- userConfFile = os.path.join(xdg_config_home, 'youtube-dl', 'config')
+ userConfFile = os.path.join(xdg_config_home, 'youtube-dlc', 'config')
if not os.path.isfile(userConfFile):
- userConfFile = os.path.join(xdg_config_home, 'youtube-dl.conf')
+ userConfFile = os.path.join(xdg_config_home, 'youtube-dlc.conf')
else:
- userConfFile = os.path.join(compat_expanduser('~'), '.config', 'youtube-dl', 'config')
+ userConfFile = os.path.join(compat_expanduser('~'), '.config', 'youtube-dlc', 'config')
if not os.path.isfile(userConfFile):
- userConfFile = os.path.join(compat_expanduser('~'), '.config', 'youtube-dl.conf')
+ userConfFile = os.path.join(compat_expanduser('~'), '.config', 'youtube-dlc.conf')
userConf = _readOptions(userConfFile, None)
if userConf is None:
appdata_dir = compat_getenv('appdata')
if appdata_dir:
userConf = _readOptions(
- os.path.join(appdata_dir, 'youtube-dl', 'config'),
+ os.path.join(appdata_dir, 'youtube-dlc', 'config'),
default=None)
if userConf is None:
userConf = _readOptions(
- os.path.join(appdata_dir, 'youtube-dl', 'config.txt'),
+ os.path.join(appdata_dir, 'youtube-dlc', 'config.txt'),
default=None)
if userConf is None:
userConf = _readOptions(
- os.path.join(compat_expanduser('~'), 'youtube-dl.conf'),
+ os.path.join(compat_expanduser('~'), 'youtube-dlc.conf'),
default=None)
if userConf is None:
userConf = _readOptions(
- os.path.join(compat_expanduser('~'), 'youtube-dl.conf.txt'),
+ os.path.join(compat_expanduser('~'), 'youtube-dlc.conf.txt'),
default=None)
if userConf is None:
action='help',
help='Print this help text and exit')
general.add_option(
- '-v', '--version',
+ '--version',
action='version',
help='Print program version and exit')
general.add_option(
general.add_option(
'--default-search',
dest='default_search', metavar='PREFIX',
- help='Use this prefix for unqualified URLs. For example "gvsearch2:" downloads two videos from google videos for youtube-dl "large apple". Use the value "auto" to let youtube-dl guess ("auto_warning" to emit a warning when guessing). "error" just throws an error. The default value "fixup_error" repairs broken URLs, but emits an error if this is not possible instead of searching.')
+ help='Use this prefix for unqualified URLs. For example "gvsearch2:" downloads two videos from google videos for youtube-dlc "large apple". Use the value "auto" to let youtube-dlc guess ("auto_warning" to emit a warning when guessing). "error" just throws an error. The default value "fixup_error" repairs broken URLs, but emits an error if this is not possible instead of searching.')
general.add_option(
'--ignore-config',
action='store_true',
help='Do not read configuration files. '
- 'When given in the global configuration file /etc/youtube-dl.conf: '
- 'Do not read the user configuration in ~/.config/youtube-dl/config '
- '(%APPDATA%/youtube-dl/config.txt on Windows)')
+ 'When given in the global configuration file /etc/youtube-dlc.conf: '
+ 'Do not read the user configuration in ~/.config/youtube-dlc/config '
+ '(%APPDATA%/youtube-dlc/config.txt on Windows)')
general.add_option(
'--config-location',
dest='config_location', metavar='PATH',
authentication.add_option(
'-p', '--password',
dest='password', metavar='PASSWORD',
- help='Account password. If this option is left out, youtube-dl will ask interactively.')
+ help='Account password. If this option is left out, youtube-dlc will ask interactively.')
authentication.add_option(
'-2', '--twofactor',
dest='twofactor', metavar='TWOFACTOR',
adobe_pass.add_option(
'--ap-password',
dest='ap_password', metavar='PASSWORD',
- help='Multiple-system operator account password. If this option is left out, youtube-dl will ask interactively.')
+ help='Multiple-system operator account password. If this option is left out, youtube-dlc will ask interactively.')
adobe_pass.add_option(
'--ap-list-mso',
action='store_true', dest='ap_list_mso', default=False,
verbosity.add_option(
'-C', '--call-home',
dest='call_home', action='store_true', default=False,
- help='Contact the youtube-dl server for debugging')
+ help='Contact the youtube-dlc server for debugging')
verbosity.add_option(
'--no-call-home',
dest='call_home', action='store_false', default=False,
- help='Do NOT contact the youtube-dl server for debugging')
+ help='Do NOT contact the youtube-dlc server for debugging')
filesystem = optparse.OptionGroup(parser, 'Filesystem Options')
filesystem.add_option(
filesystem.add_option(
'-c', '--continue',
action='store_true', dest='continue_dl', default=True,
- help='Force resume of partially downloaded files. By default, youtube-dl will resume downloads if possible.')
+ help='Force resume of partially downloaded files. By default, youtube-dlc will resume downloads if possible.')
filesystem.add_option(
'--no-continue',
action='store_false', dest='continue_dl',
help='File to read cookies from and dump cookie jar in')
filesystem.add_option(
'--cache-dir', dest='cachedir', default=None, metavar='DIR',
- help='Location in the filesystem where youtube-dl can store some downloaded information permanently. By default $XDG_CACHE_HOME/youtube-dl or ~/.cache/youtube-dl . At the moment, only YouTube player files (for videos with obfuscated signatures) are cached, but that may change.')
+ help='Location in the filesystem where youtube-dlc can store some downloaded information permanently. By default $XDG_CACHE_HOME/youtube-dlc or ~/.cache/youtube-dlc . At the moment, only YouTube player files (for videos with obfuscated signatures) are cached, but that may change.')
filesystem.add_option(
'--no-cache-dir', action='store_const', const=False, dest='cachedir',
help='Disable filesystem caching')
postproc.add_option(
'--exec',
metavar='CMD', dest='exec_cmd',
- help='Execute a command on the file after downloading, similar to find\'s -exec syntax. Example: --exec \'adb push {} /sdcard/Music/ && rm {}\'')
+ help='Execute a command on the file after downloading and post-processing, similar to find\'s -exec syntax. Example: --exec \'adb push {} /sdcard/Music/ && rm {}\'')
postproc.add_option(
'--convert-subs', '--convert-subtitles',
metavar='FORMAT', dest='convertsubtitles', default=None,
if '--config-location' in command_line_conf:
location = compat_expanduser(opts.config_location)
if os.path.isdir(location):
- location = os.path.join(location, 'youtube-dl.conf')
+ location = os.path.join(location, 'youtube-dlc.conf')
if not os.path.exists(location):
parser.error('config-location %s does not exist.' % location)
custom_conf = _readOptions(location)
elif '--ignore-config' in command_line_conf:
pass
else:
- system_conf = _readOptions('/etc/youtube-dl.conf')
+ system_conf = _readOptions('/etc/youtube-dlc.conf')
if '--ignore-config' not in system_conf:
user_conf = _readUserConf()
'Skipping embedding the thumbnail because the file is missing.')
return [], info
+ # Check for mislabeled webp file
+ with open(encodeFilename(thumbnail_filename), "rb") as f:
+ b = f.read(16)
+ if b'\x57\x45\x42\x50' in b: # Binary for WEBP
+ [thumbnail_filename_path, thumbnail_filename_extension] = os.path.splitext(thumbnail_filename)
+ if not thumbnail_filename_extension == ".webp":
+ webp_thumbnail_filename = thumbnail_filename_path + ".webp"
+ os.rename(encodeFilename(thumbnail_filename), encodeFilename(webp_thumbnail_filename))
+ thumbnail_filename = webp_thumbnail_filename
+
+ # If not a jpg or png thumbnail, convert it to jpg using ffmpeg
+ if not os.path.splitext(thumbnail_filename)[1].lower() in ['.jpg', '.png']:
+ jpg_thumbnail_filename = os.path.splitext(thumbnail_filename)[0] + ".jpg"
+ jpg_thumbnail_filename = os.path.join(os.path.dirname(jpg_thumbnail_filename), os.path.basename(jpg_thumbnail_filename).replace('%', '_')) # ffmpeg interprets % as image sequence
+
+ self._downloader.to_screen('[ffmpeg] Converting thumbnail "%s" to JPEG' % thumbnail_filename)
+
+ self.run_ffmpeg(thumbnail_filename, jpg_thumbnail_filename, ['-bsf:v', 'mjpeg2jpeg'])
+
+ os.remove(encodeFilename(thumbnail_filename))
+ thumbnail_filename = jpg_thumbnail_filename
+
if info['ext'] == 'mp3':
options = [
'-c', 'copy', '-map', '0', '-map', '1',
os.remove(encodeFilename(filename))
os.rename(encodeFilename(temp_filename), encodeFilename(filename))
+ elif info['ext'] == 'mkv':
+ os.rename(encodeFilename(thumbnail_filename), encodeFilename('cover.jpg'))
+ old_thumbnail_filename = thumbnail_filename
+ thumbnail_filename = 'cover.jpg'
+
+ options = [
+ '-c', 'copy', '-attach', thumbnail_filename, '-metadata:s:t', 'mimetype=image/jpeg']
+
+ self._downloader.to_screen('[ffmpeg] Adding thumbnail to "%s"' % filename)
+
+ self.run_ffmpeg_multiple_files([filename], temp_filename, options)
+
+ if not self._already_have_thumbnail:
+ os.remove(encodeFilename(thumbnail_filename))
+ else:
+ os.rename(encodeFilename(thumbnail_filename), encodeFilename(old_thumbnail_filename))
+ os.remove(encodeFilename(filename))
+ os.rename(encodeFilename(temp_filename), encodeFilename(filename))
+
elif info['ext'] in ['m4a', 'mp4']:
if not check_executable('AtomicParsley', ['-v']):
raise EmbedThumbnailPPError('AtomicParsley was not found. Please install.')
metadata[meta_f] = info[info_f]
break
+ # See [1-4] for some info on media metadata/metadata supported
+ # by ffmpeg.
+ # 1. https://kdenlive.org/en/project/adding-meta-data-to-mp4-video/
+ # 2. https://wiki.multimedia.cx/index.php/FFmpeg_Metadata
+ # 3. https://kodi.wiki/view/Video_file_tagging
+ # 4. http://atomicparsley.sourceforge.net/mpeg-4files.html
+
add('title', ('track', 'title'))
add('date', 'upload_date')
add(('description', 'comment'), 'description')
add('album')
add('album_artist')
add('disc', 'disc_number')
+ add('show', 'series')
+ add('season_number')
+ add('episode_id', ('episode', 'episode_id'))
+ add('episode_sort', 'episode_number')
if not metadata:
self._downloader.to_screen('[ffmpeg] There isn\'t any metadata to add')
if is_outdated_version(
self._versions[self.basename], required_version):
warning = ('Your copy of %s is outdated and unable to properly mux separate video and audio files, '
- 'youtube-dl will download single file media. '
+ 'youtube-dlc will download single file media. '
'Update %s to version %s or newer to fix this.') % (
self.basename, self.basename, required_version)
if self._downloader:
import sys
from zipimport import zipimporter
+from .compat import compat_realpath
from .utils import encode_compat_str
from .version import __version__
UPDATES_RSA_KEY = (0x9d60ee4d8f805312fdb15a62f87b95bd66177b91df176765d13514a0f1754bcd2057295c5b6f1d35daa6742c3ffc9a82d3e118861c207995a8031e151d863c9927e304576bc80692bc8e094896fcf11b66f3e29e04e3a71e9a11558558acea1840aec37fc396fb6b65dc81a1c4144e03bd1c011de62e3f1357b327d08426fe93, 65537)
if not isinstance(globals().get('__loader__'), zipimporter) and not hasattr(sys, 'frozen'):
- to_screen('It looks like you installed youtube-dl with a package manager, pip, setup.py or a tarball. Please use that to update.')
+ to_screen('It looks like you installed youtube-dlc with a package manager, pip, setup.py or a tarball. Please use that to update.')
return
# Check if there is a new version
to_screen('ERROR: can\'t find the current version. Please try again later.')
return
if newversion == __version__:
- to_screen('youtube-dl is up-to-date (' + __version__ + ')')
+ to_screen('youtube-dlc is up-to-date (' + __version__ + ')')
return
# Download and check versions info
def version_tuple(version_str):
return tuple(map(int, version_str.split('.')))
if version_tuple(__version__) >= version_tuple(version_id):
- to_screen('youtube-dl is up to date (%s)' % __version__)
+ to_screen('youtube-dlc is up to date (%s)' % __version__)
return
to_screen('Updating to version ' + version_id + ' ...')
print_notes(to_screen, versions_info['versions'])
# sys.executable is set to the full pathname of the exe-file for py2exe
- filename = sys.executable if hasattr(sys, 'frozen') else sys.argv[0]
+ # though symlinks are not followed so that we need to do this manually
+ # with help of realpath
+ filename = compat_realpath(sys.executable if hasattr(sys, 'frozen') else sys.argv[0])
if not os.access(filename, os.W_OK):
to_screen('ERROR: no write permissions on %s' % filename)
return
try:
- bat = os.path.join(directory, 'youtube-dl-updater.bat')
+ bat = os.path.join(directory, 'youtube-dlc-updater.bat')
with io.open(bat, 'w') as batfile:
batfile.write('''
@echo off
echo Waiting for file handle to be closed ...
ping 127.0.0.1 -n 5 -w 1000 > NUL
move /Y "%s.new" "%s" > NUL
-echo Updated youtube-dl to version %s.
+echo Updated youtube-dlc to version %s.
start /b "" cmd /c del "%%~f0"&exit /b"
\n''' % (exe, exe, version_id))
to_screen('ERROR: unable to overwrite current version')
return
- to_screen('Updated youtube-dl. Restart youtube-dl to use the new version.')
+ to_screen('Updated youtube-dlc. Restart youtube-dlc to use the new version.')
def get_notes(versions, fromVersion):
import binascii
import calendar
import codecs
+import collections
import contextlib
import ctypes
import datetime
import subprocess
import sys
import tempfile
+import time
import traceback
import xml.etree.ElementTree
import zlib
os.unlink(fn)
except OSError:
pass
+ try:
+ mask = os.umask(0)
+ os.umask(mask)
+ os.chmod(tf.name, 0o666 & ~mask)
+ except OSError:
+ pass
os.rename(tf.name, fn)
except Exception:
try:
def bug_reports_message():
if ytdl_is_updateable():
- update_cmd = 'type youtube-dl -U to update'
+ update_cmd = 'type youtube-dlc -U to update'
else:
update_cmd = 'see https://yt-dl.org/update on how to update'
msg = '; please report this issue on https://yt-dl.org/bug .'
msg += ' Make sure you are using the latest version; %s.' % update_cmd
- msg += ' Be sure to call youtube-dl with the --verbose flag and include its complete output.'
+ msg += ' Be sure to call youtube-dlc with the --verbose flag and include its complete output.'
return msg
def __init__(self, msg, tb=None, expected=False, cause=None, video_id=None):
""" tb, if given, is the original traceback (so that it can be printed out).
- If expected is set, this is a normal error message and most likely not a bug in youtube-dl.
+ If expected is set, this is a normal error message and most likely not a bug in youtube-dlc.
"""
if sys.exc_info()[0] in (compat_urllib_error.URLError, socket.timeout, UnavailableVideoError):
class YoutubeDLCookieJar(compat_cookiejar.MozillaCookieJar):
+ """
+ See [1] for cookie file format.
+
+ 1. https://curl.haxx.se/docs/http-cookies.html
+ """
_HTTPONLY_PREFIX = '#HttpOnly_'
+ _ENTRY_LEN = 7
+ _HEADER = '''# Netscape HTTP Cookie File
+# This file is generated by youtube-dlc. Do not edit.
+
+'''
+ _CookieFileEntry = collections.namedtuple(
+ 'CookieFileEntry',
+ ('domain_name', 'include_subdomains', 'path', 'https_only', 'expires_at', 'name', 'value'))
def save(self, filename=None, ignore_discard=False, ignore_expires=False):
+ """
+ Save cookies to a file.
+
+ Most of the code is taken from CPython 3.8 and slightly adapted
+ to support cookie files with UTF-8 in both python 2 and 3.
+ """
+ if filename is None:
+ if self.filename is not None:
+ filename = self.filename
+ else:
+ raise ValueError(compat_cookiejar.MISSING_FILENAME_TEXT)
+
# Store session cookies with `expires` set to 0 instead of an empty
# string
for cookie in self:
if cookie.expires is None:
cookie.expires = 0
- compat_cookiejar.MozillaCookieJar.save(self, filename, ignore_discard, ignore_expires)
+
+ with io.open(filename, 'w', encoding='utf-8') as f:
+ f.write(self._HEADER)
+ now = time.time()
+ for cookie in self:
+ if not ignore_discard and cookie.discard:
+ continue
+ if not ignore_expires and cookie.is_expired(now):
+ continue
+ if cookie.secure:
+ secure = 'TRUE'
+ else:
+ secure = 'FALSE'
+ if cookie.domain.startswith('.'):
+ initial_dot = 'TRUE'
+ else:
+ initial_dot = 'FALSE'
+ if cookie.expires is not None:
+ expires = compat_str(cookie.expires)
+ else:
+ expires = ''
+ if cookie.value is None:
+ # cookies.txt regards 'Set-Cookie: foo' as a cookie
+ # with no name, whereas http.cookiejar regards it as a
+ # cookie with no value.
+ name = ''
+ value = cookie.name
+ else:
+ name = cookie.name
+ value = cookie.value
+ f.write(
+ '\t'.join([cookie.domain, initial_dot, cookie.path,
+ secure, expires, name, value]) + '\n')
def load(self, filename=None, ignore_discard=False, ignore_expires=False):
"""Load cookies from a file."""
else:
raise ValueError(compat_cookiejar.MISSING_FILENAME_TEXT)
+ def prepare_line(line):
+ if line.startswith(self._HTTPONLY_PREFIX):
+ line = line[len(self._HTTPONLY_PREFIX):]
+ # comments and empty lines are fine
+ if line.startswith('#') or not line.strip():
+ return line
+ cookie_list = line.split('\t')
+ if len(cookie_list) != self._ENTRY_LEN:
+ raise compat_cookiejar.LoadError('invalid length %d' % len(cookie_list))
+ cookie = self._CookieFileEntry(*cookie_list)
+ if cookie.expires_at and not cookie.expires_at.isdigit():
+ raise compat_cookiejar.LoadError('invalid expires at %s' % cookie.expires_at)
+ return line
+
cf = io.StringIO()
- with open(filename) as f:
+ with io.open(filename, encoding='utf-8') as f:
for line in f:
- if line.startswith(self._HTTPONLY_PREFIX):
- line = line[len(self._HTTPONLY_PREFIX):]
- cf.write(compat_str(line))
+ try:
+ cf.write(prepare_line(line))
+ except compat_cookiejar.LoadError as e:
+ write_string(
+ 'WARNING: skipping cookie file entry due to %s: %r\n'
+ % (e, line), sys.stderr)
+ continue
cf.seek(0)
self._really_load(cf, filename, ignore_discard, ignore_expires)
# Session cookies are denoted by either `expires` field set to
https_response = http_response
+class YoutubeDLRedirectHandler(compat_urllib_request.HTTPRedirectHandler):
+ if sys.version_info[0] < 3:
+ def redirect_request(self, req, fp, code, msg, headers, newurl):
+ # On python 2 urlh.geturl() may sometimes return redirect URL
+ # as byte string instead of unicode. This workaround allows
+ # to force it always return unicode.
+ return compat_urllib_request.HTTPRedirectHandler.redirect_request(self, req, fp, code, msg, headers, compat_str(newurl))
+
+
def extract_timezone(date_str):
m = re.search(
r'^.{8,}?(?P<tz>Z$| ?(?P<sign>\+|-)(?P<hours>[0-9]{2}):?(?P<minutes>[0-9]{2})$)',
or False if the executable is not present """
try:
# STDIN should be redirected too. On UNIX-like systems, ffmpeg triggers
- # SIGTTOU if youtube-dl is run in the background.
+ # SIGTTOU if youtube-dlc is run in the background.
# See https://github.com/ytdl-org/youtube-dl/issues/955#issuecomment-209789656
out, _ = subprocess.Popen(
[encodeArgument(exe)] + args,
def ytdl_is_updateable():
- """ Returns if youtube-dl can be updated with -U """
+ """ Returns if youtube-dlc can be updated with -U """
from zipimport import zipimporter
return isinstance(globals().get('__loader__'), zipimporter) or hasattr(sys, 'frozen')
# Per RFC 3003, audio/mpeg can be .mp1, .mp2 or .mp3. Here use .mp3 as
# it's the most popular one
'audio/mpeg': 'mp3',
+ 'audio/x-wav': 'wav',
}.get(mt)
if ext is not None:
return ext
'vnd.ms-sstr+xml': 'ism',
'quicktime': 'mov',
'mp2t': 'ts',
+ 'x-wav': 'wav',
}.get(res, res)
return None # No Proxy
if compat_urlparse.urlparse(proxy).scheme.lower() in ('socks', 'socks4', 'socks4a', 'socks5'):
req.add_header('Ytdl-socks-proxy', proxy)
- # youtube-dl's http/https handlers do wrapping the socket with socks
+ # youtube-dlc's http/https handlers do wrapping the socket with socks
return None
return compat_urllib_request.ProxyHandler.proxy_open(
self, req, proxy, type)
# TODO: fallback to CLI tools
raise XAttrUnavailableError(
'python-pyxattr is detected but is too old. '
- 'youtube-dl requires %s or above while your version is %s. '
+ 'youtube-dlc requires %s or above while your version is %s. '
'Falling back to other xattr implementations' % (
pyxattr_required_version, xattr.__version__))
from __future__ import unicode_literals
-__version__ = '2019.11.28'
+__version__ = '2020.09.06'