Merge branch 'fstirlitz-filmon'

author Remita Amine <redacted>

Fri, 3 Feb 2017 09:15:52 +0000 (10:15 +0100)

committer Remita Amine <redacted>

Fri, 3 Feb 2017 09:15:52 +0000 (10:15 +0100)
author Remita Amine <redacted>
Fri, 3 Feb 2017 09:15:52 +0000 (10:15 +0100)
committer Remita Amine <redacted>
Fri, 3 Feb 2017 09:15:52 +0000 (10:15 +0100)
diff --git a/.github/ISSUE_TEMPLATE.md b/.github/ISSUE_TEMPLATE.md

index bfae97dddee163cfca367638be2a62a6c4236707..8914569b64d4514d3a80a357e073f9bfddfcdcdf 100644 (file)
--- a/.github/ISSUE_TEMPLATE.md
+++ b/.github/ISSUE_TEMPLATE.md
@@ -6,8 +6,8 @@ ## Please follow the guide below
  
  ---
  
-### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.11.08.1*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
-- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.11.08.1**
+### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2017.02.01*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
+- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2017.02.01**
  
  ### Before submitting an *issue* make sure you have:
  - [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@@ -35,7 +35,7 @@ ### If the purpose of this *issue* is a *bug report*, *site support request* or
  [debug] User config: []
  [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
  [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
-[debug] youtube-dl version 2016.11.08.1
+[debug] youtube-dl version 2017.02.01
  [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
  [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
  [debug] Proxy map: {}
@@ -50,6 +50,8 @@ ### If the purpose of this *issue* is a *site support request* please provide al
  - Single video: https://youtu.be/BaW_jenozKc
  - Playlist: https://www.youtube.com/playlist?list=PL4lCao7KL_QFVb7Iudeipvc2BCavECqzc
  
+Note that **youtube-dl does not support sites dedicated to [copyright infringement](https://github.com/rg3/youtube-dl#can-you-add-support-for-this-anime-video-site-or-site-which-shows-current-movies-for-free)**. In order for site support request to be accepted all provided example URLs should not violate any copyrights.
+
  ---
  
  ### Description of your *issue*, suggested solution and other information
diff --git a/.github/ISSUE_TEMPLATE_tmpl.md b/.github/ISSUE_TEMPLATE_tmpl.md

index ab9968129f33790aaf6471f0f41f6b21164fe0a7..df79503d3ec8fe02e76b6f2c529a60959037934e 100644 (file)
--- a/.github/ISSUE_TEMPLATE_tmpl.md
+++ b/.github/ISSUE_TEMPLATE_tmpl.md
@@ -50,6 +50,8 @@ ### If the purpose of this *issue* is a *site support request* please provide al
  - Single video: https://youtu.be/BaW_jenozKc
  - Playlist: https://www.youtube.com/playlist?list=PL4lCao7KL_QFVb7Iudeipvc2BCavECqzc
  
+Note that **youtube-dl does not support sites dedicated to [copyright infringement](https://github.com/rg3/youtube-dl#can-you-add-support-for-this-anime-video-site-or-site-which-shows-current-movies-for-free)**. In order for site support request to be accepted all provided example URLs should not violate any copyrights.
+
  ---
  
  ### Description of your *issue*, suggested solution and other information
diff --git a/.gitignore b/.gitignore

index 354505d66d61d6416b3700da1587ea4d4970fbc3..9ce4b5e2d5d78771b718313ff942afe77db92f02 100644 (file)
--- a/.gitignore
+++ b/.gitignore
@@ -31,6 +31,9 @@ updates_key.pem
  *.mp3
  *.3gp
  *.wav
+*.ape
+*.mkv
+*.swf
  *.part
  *.swp
  test/testdata
diff --git a/AUTHORS b/AUTHORS

index 4a6f7e13f45fd72ae3da87c475fadf892d2f7a4f..f2875d5049cf065b231a8676d6e6a928d431f6c5 100644 (file)
--- a/AUTHORS
+++ b/AUTHORS
@@ -190,3 +190,14 @@ John Hawkinson
  Rich Leeper
  Zhong Jianxin
  Thor77
+Mattias Wadman
+Arjan Verwer
+Costy Petrisor
+Logan B
+Alex Seiler
+Vijay Singh
+Paul Hartmann
+Stephen Chen
+Fabian Stahl
+Bagira
+Odd Stråbø
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md

index 0b5a5c1f81b791ac7d37361845e400617c9e084e..d606eab0edb0af07bf112f1e7f05e2f7668d5b8a 100644 (file)
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -58,7 +58,7 @@ ###  Does the issue involve one problem, and one problem only?
  
  Some of our users seem to think there is a limit of issues they can or should open. There is no limit of issues they can or should open. While it may seem appealing to be able to dump all your issues into one ticket, that means that someone who solves one of your issues cannot mark the issue as closed. Typically, reporting a bunch of issues leads to the ticket lingering since nobody wants to attack that behemoth, until someone mercifully splits the issue into multiple ones.
  
-In particular, every site support request issue should only pertain to services at one site (generally under a common domain, but always using the same backend technology). Do not request support for vimeo user videos, Whitehouse podcasts, and Google Plus pages in the same issue. Also, make sure that you don't post bug reports alongside feature requests. As a rule of thumb, a feature request does not include outputs of youtube-dl that are not immediately related to the feature at hand. Do not post reports of a network error alongside the request for a new video service.
+In particular, every site support request issue should only pertain to services at one site (generally under a common domain, but always using the same backend technology). Do not request support for vimeo user videos, White house podcasts, and Google Plus pages in the same issue. Also, make sure that you don't post bug reports alongside feature requests. As a rule of thumb, a feature request does not include outputs of youtube-dl that are not immediately related to the feature at hand. Do not post reports of a network error alongside the request for a new video service.
  
  ###  Is anyone going to need the feature?
  
@@ -92,9 +92,9 @@ # DEVELOPER INSTRUCTIONS
  
  ### Adding support for a new site
  
-If you want to add support for a new site, first of all **make sure** this site is **not dedicated to [copyright infringement](#can-you-add-support-for-this-anime-video-site-or-site-which-shows-current-movies-for-free)**. youtube-dl does **not support** such sites thus pull requests adding support for them **will be rejected**.
+If you want to add support for a new site, first of all **make sure** this site is **not dedicated to [copyright infringement](README.md#can-you-add-support-for-this-anime-video-site-or-site-which-shows-current-movies-for-free)**. youtube-dl does **not support** such sites thus pull requests adding support for them **will be rejected**.
  
-After you have ensured this site is distributing it's content legally, you can follow this quick list (assuming your service is called `yourextractor`):
+After you have ensured this site is distributing its content legally, you can follow this quick list (assuming your service is called `yourextractor`):
  
  1. [Fork this repository](https://github.com/rg3/youtube-dl/fork)
  2. Check out the source code with:
@@ -124,7 +124,7 @@ ### Adding support for a new site
                  'id': '42',
                  'ext': 'mp4',
                  'title': 'Video title goes here',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
                  # TODO more properties, either as:
                  # * A value
                  # * MD5 checksum; start the string with md5:
@@ -199,7 +199,7 @@ #### Example
  }
  ```
  
-Assume you want to extract `summary` and put it into the resulting info dict as `description`. Since `description` is an optional metafield you should be ready that this key may be missing from the `meta` dict, so that you should extract it like:
+Assume you want to extract `summary` and put it into the resulting info dict as `description`. Since `description` is an optional meta field you should be ready that this key may be missing from the `meta` dict, so that you should extract it like:
  
  ```python
  description = meta.get('summary')  # correct
diff --git a/ChangeLog b/ChangeLog

index d97156e20a787c8cba8ef984a7f088cf1d6eeee9..487ed3f0f56d0433a91376821902f98d9ba7bd73 100644 (file)
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,6 +1,397 @@
  version <unreleased>
  
  Extractors
++ [infoq] Add audio only formats (#11565)
+* [youtube] Fix ytsearch when cookies are provided (#11924)
++ [bilibili] Support new Bangumi URLs (#11845)
+
+version 2017.02.01
+
+Extractors
++ [facebook] Add another fallback extraction scenario (#11926)
+* [prosiebensat1] Fix extraction of descriptions (#11810, #11929)
+- [crunchyroll] Remove ScaledBorderAndShadow settings (#9028)
++ [vimeo] Extract upload timestamp
++ [vimeo] Extract license (#8726, #11880)
++ [nrk:series] Add support for series (#11571, #11711)
++ [elpais] Fix extraction for some URLs (#11765)
+
+
+version 2017.01.31
+
+Core
++ [compat] Add compat_etree_register_namespace
+
+Extractors
+* [youtube] Fix extraction for domainless player URLs (#11890, #11891, #11892,
+  #11894, #11895, #11897, #11900, #11903, #11904, #11906, #11907, #11909,
+  #11913, #11914, #11915, #11916, #11917, #11918, #11919)
++ [vimeo] Extract both mixed and separated DASH formats
++ [ruutu] Extract DASH formats
+* [itv] Fix extraction for python 2.6
+
+
+version 2017.01.29
+
+Core
+* [extractor/common] Fix initialization template (#11605, #11825)
++ [extractor/common] Document fragment_base_url and fragment's path fields
+* [extractor/common] Fix duration per DASH segment (#11868)
++ Introduce --autonumber-start option for initial value of %(autonumber)s
+  template (#727, #2702, #9362, #10457, #10529, #11862)
+
+Extractors
++ [azmedien:playlist] Add support for topic and themen playlists (#11817)
+* [npo] Fix subtitles extraction
++ [itv] Extract subtitles
++ [itv] Add support for itv.com (#9240)
++ [mtv81] Add support for mtv81.com (#7619)
++ [vlive] Add support for channels (#11826)
++ [kaltura] Add fallback for fileExt
++ [kaltura] Improve uploader_id extraction
++ [konserthusetplay] Add support for rspoplay.se (#11828)
+
+
+version 2017.01.28
+
+Core
+* [utils] Improve parse_duration
+
+Extractors
+* [crunchyroll] Improve series and season metadata extraction (#11832)
+* [soundcloud] Improve formats extraction and extract audio bitrate
++ [soundcloud] Extract HLS formats
+* [soundcloud] Fix track URL extraction (#11852)
++ [twitch:vod] Expand URL regular expressions (#11846)
+* [aenetworks] Fix season episodes extraction (#11669)
++ [tva] Add support for videos.tva.ca (#11842)
+* [jamendo] Improve and extract more metadata (#11836)
++ [disney] Add support for Disney sites (#7409, #11801, #4975, #11000)
+* [vevo] Remove request to old API and catch API v2 errors
++ [cmt,mtv,southpark] Add support for episode URLs (#11837)
++ [youtube] Add fallback for duration extraction (#11841)
+
+
+version 2017.01.25
+
+Extractors
++ [openload] Fallback video extension to mp4
++ [extractor/generic] Add support for Openload embeds (#11536, #11812)
+* [srgssr] Fix rts video extraction (#11831)
++ [afreecatv:global] Add support for afreeca.tv (#11807)
++ [crackle] Extract vtt subtitles
++ [crackle] Extract multiple resolutions for thumbnails
++ [crackle] Add support for mobile URLs
++ [konserthusetplay] Extract subtitles (#11823)
++ [konserthusetplay] Add support for HLS videos (#11823)
+* [vimeo:review] Fix config URL extraction (#11821)
+
+
+version 2017.01.24
+
+Extractors
+* [pluralsight] Fix extraction (#11820)
++ [nextmedia] Add support for NextTV (壹電視)
+* [24video] Fix extraction (#11811)
+* [youtube:playlist] Fix nonexistent and private playlist detection (#11604)
++ [chirbit] Extract uploader (#11809)
+
+
+version 2017.01.22
+
+Extractors
++ [pornflip] Add support for pornflip.com (#11556, #11795)
+* [chaturbate] Fix extraction (#11797, #11802)
++ [azmedien] Add support for AZ Medien sites (#11784, #11785)
++ [nextmedia] Support redirected URLs
++ [vimeo:channel] Extract videos' titles for playlist entries (#11796)
++ [youtube] Extract episode metadata (#9695, #11774)
++ [cspan] Support Ustream embedded videos (#11547)
++ [1tv] Add support for HLS videos (#11786)
+* [uol] Fix extraction (#11770)
+* [mtv] Relax triforce feed regular expression (#11766)
+
+
+version 2017.01.18
+
+Extractors
+* [bilibili] Fix extraction (#11077)
++ [canalplus] Add fallback for video id (#11764)
+* [20min] Fix extraction (#11683, #11751)
+* [imdb] Extend URL regular expression (#11744)
++ [naver] Add support for tv.naver.com links (#11743)
+
+
+version 2017.01.16
+
+Core
+* [options] Apply custom config to final composite configuration (#11741)
+* [YoutubeDL] Improve protocol auto determining (#11720)
+
+Extractors
+* [xiami] Relax URL regular expressions
+* [xiami] Improve track metadata extraction (#11699)
++ [limelight] Check hand-make direct HTTP links
++ [limelight] Add support for direct HTTP links at video.llnw.net (#11737)
++ [brightcove] Recognize another player ID pattern (#11688)
++ [niconico] Support login via cookies (#7968)
+* [yourupload] Fix extraction (#11601)
++ [beam:live] Add support for beam.pro live streams (#10702, #11596)
+* [vevo] Improve geo restriction detection
++ [dramafever] Add support for URLs with language code (#11714)
+* [cbc] Improve playlist support (#11704)
+
+
+version 2017.01.14
+
+Core
++ [common] Add ability to customize akamai manifest host
++ [utils] Add more date formats
+
+Extractors
+- [mtv] Eliminate _transform_rtmp_url
+* [mtv] Generalize triforce mgid extraction
++ [cmt] Add support for full episodes and video clips (#11623)
++ [mitele] Extract DASH formats
++ [ooyala] Add support for videos with embedToken (#11684)
+* [mixcloud] Fix extraction (#11674)
+* [openload] Fix extraction (#10408)
+* [tv4] Improve extraction (#11698)
+* [freesound] Fix and improve extraction (#11602)
++ [nick] Add support for beta.nick.com (#11655)
+* [mtv,cc] Use HLS by default with native HLS downloader (#11641)
+* [mtv] Fix non-HLS extraction
+
+
+version 2017.01.10
+
+Extractors
+* [youtube] Fix extraction (#11663, #11664)
++ [inc] Add support for inc.com (#11277, #11647)
++ [youtube] Add itag 212 (#11575)
++ [egghead:course] Add support for egghead.io courses
+
+
+version 2017.01.08
+
+Core
+* Fix "invalid escape sequence" errors under Python 3.6 (#11581)
+
+Extractors
++ [hitrecord] Add support for hitrecord.org (#10867, #11626)
+- [videott] Remove extractor
+* [swrmediathek] Improve extraction
+- [sharesix] Remove extractor
+- [aol:features] Remove extractor
+* [sendtonews] Improve info extraction
+* [3sat,phoenix] Fix extraction (#11619)
+* [comedycentral/mtv] Add support for HLS videos (#11600)
+* [discoverygo] Fix JSON data parsing (#11219, #11522)
+
+
+version 2017.01.05
+
+Extractors
++ [zdf] Fix extraction (#11055, #11063)
+* [pornhub:playlist] Improve extraction (#11594)
++ [cctv] Add support for ncpa-classic.com (#11591)
++ [tunein] Add support for embeds (#11579)
+
+
+version 2017.01.02
+
+Extractors
+* [cctv] Improve extraction (#879, #6753, #8541)
++ [nrktv:episodes] Add support for episodes (#11571)
++ [arkena] Add support for video.arkena.com (#11568)
+
+
+version 2016.12.31
+
+Core
++ Introduce --config-location option for custom configuration files (#6745,
+  #10648)
+
+Extractors
++ [twitch] Add support for player.twitch.tv (#11535, #11537)
++ [videa] Add support for videa.hu (#8181, #11133)
+* [vk] Fix postlive videos extraction
+* [vk] Extract from playerParams (#11555)
+- [freevideo] Remove extractor (#11515)
++ [showroomlive] Add support for showroom-live.com (#11458)
+* [xhamster] Fix duration extraction (#11549)
+* [rtve:live] Fix extraction (#11529)
+* [brightcove:legacy] Improve embeds detection (#11523)
++ [twitch] Add support for rechat messages (#11524)
+* [acast] Fix audio and timestamp extraction (#11521)
+
+
+version 2016.12.22
+
+Core
+* [extractor/common] Improve detection of video-only formats in m3u8
+  manifests (#11507)
+
+Extractors
++ [theplatform] Pass geo verification headers to SMIL request (#10146)
++ [viu] Pass geo verification headers to auth request
+* [rtl2] Extract more formats and metadata
+* [vbox7] Skip malformed JSON-LD (#11501)
+* [uplynk] Force downloading using native HLS downloader (#11496)
++ [laola1] Add support for another extraction scenario (#11460)
+
+
+version 2016.12.20
+
+Core
+* [extractor/common] Improve fragment URL construction for DASH media
+* [extractor/common] Fix codec information extraction for mixed audio/video
+  DASH media (#11490)
+
+Extractors
+* [vbox7] Fix extraction (#11494)
++ [uktvplay] Add support for uktvplay.uktv.co.uk (#11027)
++ [piksel] Add support for player.piksel.com (#11246)
++ [vimeo] Add support for DASH formats
+* [vimeo] Fix extraction for HLS formats (#11490)
+* [kaltura] Fix wrong widget ID in some cases (#11480)
++ [nrktv:direkte] Add support for live streams (#11488)
+* [pbs] Fix extraction for geo restricted videos (#7095)
+* [brightcove:new] Skip widevine classic videos
++ [viu] Add support for viu.com (#10607, #11329)
+
+
+version 2016.12.18
+
+Core
++ [extractor/common] Recognize DASH formats in html5 media entries
+
+Extractors
++ [ccma] Add support for ccma.cat (#11359)
+* [laola1tv] Improve extraction
++ [laola1tv] Add support embed URLs (#11460)
+* [nbc] Fix extraction for MSNBC videos (#11466)
+* [twitch] Adapt to new videos pages URL schema (#11469)
++ [meipai] Add support for meipai.com (#10718)
+* [jwplatform] Improve subtitles and duration extraction
++ [ondemandkorea] Add support for ondemandkorea.com (#10772)
++ [vvvvid] Add support for vvvvid.it (#5915)
+
+
+version 2016.12.15
+
+Core
++ [utils] Add convenience urljoin
+
+Extractors
++ [openload] Recognize oload.tv URLs (#10408)
++ [facebook] Recognize .onion URLs (#11443)
+* [vlive] Fix extraction (#11375, #11383)
++ [canvas] Extract DASH formats
++ [melonvod] Add support for vod.melon.com (#11419)
+
+
+version 2016.12.12
+
+Core
++ [utils] Add common user agents map
++ [common] Recognize HLS manifests that contain video only formats (#11394)
+
+Extractors
++ [dplay] Use Safari user agent for HLS (#11418)
++ [facebook] Detect login required error message
+* [facebook] Improve video selection (#11390)
++ [canalplus] Add another video id pattern (#11399)
+* [mixcloud] Relax URL regular expression (#11406)
+* [ctvnews] Relax URL regular expression (#11394)
++ [rte] Capture and output error message (#7746, #10498)
++ [prosiebensat1] Add support for DASH formats
+* [srgssr] Improve extraction for geo restricted videos (#11089)
+* [rts] Improve extraction for geo restricted videos (#4989)
+
+
+version 2016.12.09
+
+Core
+* [socks] Fix error reporting (#11355)
+
+Extractors
+* [openload] Fix extraction (#10408)
+* [pandoratv] Fix extraction (#11023)
++ [telebruxelles] Add support for emission URLs
+* [telebruxelles] Extract all formats
++ [bloomberg] Add another video id regular expression (#11371)
+* [fusion] Update ooyala id regular expression (#11364)
++ [1tv] Add support for playlists (#11335)
+* [1tv] Improve extraction (#11335)
++ [aenetworks] Extract more formats (#11321)
++ [thisoldhouse] Recognize /tv-episode/ URLs (#11271)
+
+
+version 2016.12.01
+
+Extractors
+* [soundcloud] Update client id (#11327)
+* [ruutu] Detect DRM protected videos
++ [liveleak] Add support for youtube embeds (#10688)
+* [spike] Fix full episodes support (#11312)
+* [comedycentral] Fix full episodes support
+* [normalboots] Rewrite in terms of JWPlatform (#11184)
+* [teamfourstar] Rewrite in terms of JWPlatform (#11184)
+- [screenwavemedia] Remove extractor (#11184)
+
+
+version 2016.11.27
+
+Extractors
++ [webcaster] Add support for webcaster.pro
++ [azubu] Add support for azubu.uol.com.br (#11305)
+* [viki] Prefer hls formats
+* [viki] Fix rtmp formats extraction (#11255)
+* [puls4] Relax URL regular expression (#11267)
+* [vevo] Improve artist extraction (#10911)
+* [mitele] Relax URL regular expression and extract more metadata (#11244)
++ [cbslocal] Recognize New York site (#11285)
++ [youtube:playlist] Pass disable_polymer in URL query (#11193)
+
+
+version 2016.11.22
+
+Extractors
+* [hellporno] Fix video extension extraction (#11247)
++ [hellporno] Add support for hellporno.net (#11247)
++ [amcnetworks] Recognize more BBC America URLs (#11263)
+* [funnyordie] Improve extraction (#11208)
+* [extractor/generic] Improve limelight embeds support
+- [crunchyroll] Remove ScaledBorderAndShadow from ASS subtitles (#8207, #9028)
+* [bandcamp] Fix free downloads extraction and extract all formats (#11067)
+* [twitter:card] Relax URL regular expression (#11225)
++ [tvanouvelles] Add support for tvanouvelles.ca (#10616)
+
+
+version 2016.11.18
+
+Extractors
+* [youtube:live] Relax URL regular expression (#11164)
+* [openload] Fix extraction (#10408, #11122)
+* [vlive] Prefer locale over language for subtitles id (#11203)
+
+
+version 2016.11.14.1
+
+Core
++ [downoader/fragment,f4m,hls] Respect HTTP headers from info dict
+* [extractor/common] Fix media templates with Bandwidth substitution pattern in
+  MPD manifests (#11175)
+* [extractor/common] Improve thumbnail extraction from JSON-LD
+
+Extractors
++ [nrk] Workaround geo restriction
++ [nrk] Improve error detection and messages
++ [afreecatv] Add support for vod.afreecatv.com (#11174)
+* [cda] Fix and improve extraction (#10929, #10936)
+* [plays] Fix extraction (#11165)
+* [eagleplatform] Fix extraction (#11160)
  + [audioboom] Recognize /posts/ URLs (#11149)
  
  
diff --git a/Makefile b/Makefile

index b7cec16669fff13abde79db73501853ac155cf5b..9d1ddc9d1b106cb16d5a1c879d0b6e7e7150d584 100644 (file)
--- a/Makefile
+++ b/Makefile
@@ -1,7 +1,7 @@
  all: youtube-dl README.md CONTRIBUTING.md README.txt youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish supportedsites
  
  clean:
-       rm -rf youtube-dl.1.temp.md youtube-dl.1 youtube-dl.bash-completion README.txt MANIFEST build/ dist/ .coverage cover/ youtube-dl.tar.gz youtube-dl.zsh youtube-dl.fish youtube_dl/extractor/lazy_extractors.py *.dump *.part* *.info.json *.mp4 *.m4a *.flv *.mp3 *.avi *.mkv *.webm *.3gp *.wav *.jpg *.png CONTRIBUTING.md.tmp ISSUE_TEMPLATE.md.tmp youtube-dl youtube-dl.exe
+       rm -rf youtube-dl.1.temp.md youtube-dl.1 youtube-dl.bash-completion README.txt MANIFEST build/ dist/ .coverage cover/ youtube-dl.tar.gz youtube-dl.zsh youtube-dl.fish youtube_dl/extractor/lazy_extractors.py *.dump *.part* *.info.json *.mp4 *.m4a *.flv *.mp3 *.avi *.mkv *.webm *.3gp *.wav *.ape *.swf *.jpg *.png CONTRIBUTING.md.tmp ISSUE_TEMPLATE.md.tmp youtube-dl youtube-dl.exe
         find . -name "*.pyc" -delete
         find . -name "*.class" -delete
  
diff --git a/README.md b/README.md

index 98e37442070128aefdf0109beec8d69e0fbffaf7..2ee00f515ef571ee922c353aaeb624f59ca06693 100644 (file)
--- a/README.md
+++ b/README.md
@@ -29,7 +29,7 @@ # INSTALLATION
  
  You can also use pip:
  
-    sudo pip install --upgrade youtube-dl
+    sudo -H pip install --upgrade youtube-dl
      
  This command will update youtube-dl if you have already installed it. See the [pypi page](https://pypi.python.org/pypi/youtube_dl) for more information.
  
@@ -44,11 +44,7 @@ # INSTALLATION
  Alternatively, refer to the [developer instructions](#developer-instructions) for how to check out and work with the git repository. For further options, including PGP signatures, see the [youtube-dl Download Page](https://rg3.github.io/youtube-dl/download.html).
  
  # DESCRIPTION
-**youtube-dl** is a command-line program to download videos from
-YouTube.com and a few more sites. It requires the Python interpreter, version
-2.6, 2.7, or 3.2+, and it is not platform specific. It should work on
-your Unix box, on Windows or on Mac OS X. It is released to the public domain,
-which means you can modify it, redistribute it or use it however you like.
+**youtube-dl** is a command-line program to download videos from YouTube.com and a few more sites. It requires the Python interpreter, version 2.6, 2.7, or 3.2+, and it is not platform specific. It should work on your Unix box, on Windows or on Mac OS X. It is released to the public domain, which means you can modify it, redistribute it or use it however you like.
  
      youtube-dl [OPTIONS] URL [URL...]
  
@@ -84,13 +80,14 @@ # OPTIONS
                                       configuration in ~/.config/youtube-
                                       dl/config (%APPDATA%/youtube-dl/config.txt
                                       on Windows)
+    --config-location PATH           Location of the configuration file; either
+                                     the path to the config or its containing
+                                     directory.
      --flat-playlist                  Do not extract the videos of a playlist,
                                       only list them.
      --mark-watched                   Mark videos watched (YouTube only)
      --no-mark-watched                Do not mark videos watched (YouTube only)
      --no-color                       Do not emit color codes in output
-    --abort-on-unavailable-fragment  Abort downloading when some fragment is not
-                                     available
  
  ## Network Options:
      --proxy URL                      Use the specified HTTP/HTTPS/SOCKS proxy.
@@ -100,16 +97,13 @@ ## Network Options:
                                       string (--proxy "") for direct connection
      --socket-timeout SECONDS         Time to wait before giving up, in seconds
      --source-address IP              Client-side IP address to bind to
-                                     (experimental)
      -4, --force-ipv4                 Make all connections via IPv4
-                                     (experimental)
      -6, --force-ipv6                 Make all connections via IPv6
-                                     (experimental)
      --geo-verification-proxy URL     Use this proxy to verify the IP address for
                                       some geo-restricted sites. The default
                                       proxy specified by --proxy (or none, if the
                                       options is not present) is used for the
-                                     actual downloading. (experimental)
+                                     actual downloading.
  
  ## Video Selection:
      --playlist-start NUMBER          Playlist video to start at (default is 1)
@@ -140,23 +134,23 @@ ## Video Selection:
                                       COUNT views
      --max-views COUNT                Do not download any videos with more than
                                       COUNT views
-    --match-filter FILTER            Generic video filter (experimental).
-                                     Specify any key (see help for -o for a list
-                                     of available keys) to match if the key is
-                                     present, !key to check if the key is not
-                                     present,key > NUMBER (like "comment_count >
-                                     12", also works with >=, <, <=, !=, =) to
-                                     compare against a number, and & to require
-                                     multiple matches. Values which are not
-                                     known are excluded unless you put a
-                                     question mark (?) after the operator.For
-                                     example, to only match videos that have
-                                     been liked more than 100 times and disliked
-                                     less than 50 times (or the dislike
-                                     functionality is not available at the given
-                                     service), but who also have a description,
-                                     use --match-filter "like_count > 100 &
-                                     dislike_count <? 50 & description" .
+    --match-filter FILTER            Generic video filter. Specify any key (see
+                                     help for -o for a list of available keys)
+                                     to match if the key is present, !key to
+                                     check if the key is not present,key >
+                                     NUMBER (like "comment_count > 12", also
+                                     works with >=, <, <=, !=, =) to compare
+                                     against a number, and & to require multiple
+                                     matches. Values which are not known are
+                                     excluded unless you put a question mark (?)
+                                     after the operator.For example, to only
+                                     match videos that have been liked more than
+                                     100 times and disliked less than 50 times
+                                     (or the dislike functionality is not
+                                     available at the given service), but who
+                                     also have a description, use --match-filter
+                                     "like_count > 100 & dislike_count <? 50 &
+                                     description" .
      --no-playlist                    Download only the video, if the URL refers
                                       to a video and a playlist.
      --yes-playlist                   Download the playlist, if the URL refers to
@@ -179,6 +173,8 @@ ## Download Options:
                                       only)
      --skip-unavailable-fragments     Skip unavailable fragments (DASH and
                                       hlsnative only)
+    --abort-on-unavailable-fragment  Abort downloading when some fragment is not
+                                     available
      --buffer-size SIZE               Size of download buffer (e.g. 1024 or 16K)
                                       (default is 1024)
      --no-resize-buffer               Do not automatically adjust the buffer
@@ -187,7 +183,7 @@ ## Download Options:
                                       of SIZE.
      --playlist-reverse               Download playlist videos in reverse order
      --xattr-set-filesize             Set file xattribute ytdl.filesize with
-                                     expected filesize (experimental)
+                                     expected file size (experimental)
      --hls-prefer-native              Use the native HLS downloader instead of
                                       ffmpeg
      --hls-prefer-ffmpeg              Use ffmpeg instead of the native HLS
@@ -211,7 +207,9 @@ ## Filesystem Options:
      --autonumber-size NUMBER         Specify the number of digits in
                                       %(autonumber)s when it is present in output
                                       filename template or --auto-number option
-                                     is given
+                                     is given (default is 5)
+    --autonumber-start NUMBER        Specify the start value for %(autonumber)s
+                                     (default is 1)
      --restrict-filenames             Restrict filenames to only ASCII
                                       characters, and avoid "&" and spaces in
                                       filenames
@@ -354,7 +352,7 @@ ## Authentication Options:
      -u, --username USERNAME          Login with this account ID
      -p, --password PASSWORD          Account password. If this option is left
                                       out, youtube-dl will ask interactively.
-    -2, --twofactor TWOFACTOR        Two-factor auth code
+    -2, --twofactor TWOFACTOR        Two-factor authentication code
      -n, --netrc                      Use .netrc authentication data
      --video-password PASSWORD        Video password (vimeo, smotri, youku)
  
@@ -375,7 +373,7 @@ ## Post-processing Options:
                                       avprobe)
      --audio-format FORMAT            Specify audio format: "best", "aac",
                                       "vorbis", "mp3", "m4a", "opus", or "wav";
-                                     "best" by default
+                                     "best" by default; No effect without -x
      --audio-quality QUALITY          Specify ffmpeg/avconv audio quality, insert
                                       a value between 0 (better) and 9 (worse)
                                       for VBR or a specific bitrate like 128K
@@ -447,6 +445,8 @@ # Save all videos under Movies directory in your home directory
  
  You can use `--ignore-config` if you want to disable the configuration file for a particular youtube-dl run.
  
+You can also use `--config-location` if you want to use custom configuration file for a particular youtube-dl run.
+
  ### Authentication with `.netrc` file
  
  You may also want to configure automatic credentials storage for extractors that support authentication (by providing login and password with `--username` and `--password`) in order not to pass credentials as command line arguments on every youtube-dl execution and prevent tracking plain text passwords in the shell command history. You can achieve this using a [`.netrc` file](http://stackoverflow.com/tags/.netrc/info) on a per extractor basis. For that you will need to create a `.netrc` file in your `$HOME` and restrict permissions to read/write by only you:
@@ -638,7 +638,7 @@ # FORMAT SELECTION
   - `acodec`: Name of the audio codec in use
   - `vcodec`: Name of the video codec in use
   - `container`: Name of the container format
- - `protocol`: The protocol that will be used for the actual download, lower-case. `http`, `https`, `rtsp`, `rtmp`, `rtmpe`, `m3u8`, or `m3u8_native`
+ - `protocol`: The protocol that will be used for the actual download, lower-case (`http`, `https`, `rtsp`, `rtmp`, `rtmpe`, `mms`, `f4m`, `ism`, `m3u8`, or `m3u8_native`)
   - `format_id`: A short description of the format
  
  Note that none of the aforementioned meta fields are guaranteed to be present since this solely depends on the metadata obtained by particular extractor, i.e. the metadata offered by the video hoster.
@@ -664,7 +664,7 @@ # Download best mp4 format available or any other best if no mp4 available
  # Download best format available but not better that 480p
  $ youtube-dl -f 'bestvideo[height<=480]+bestaudio/best[height<=480]'
  
-# Download best video only format but no bigger that 50 MB
+# Download best video only format but no bigger than 50 MB
  $ youtube-dl -f 'best[filesize<50M]'
  
  # Download best format available via direct link over HTTP/HTTPS protocol
@@ -744,7 +744,7 @@ ### Can you please put the `-b` option back?
  
  ### I get HTTP error 402 when trying to download a video. What's this?
  
-Apparently YouTube requires you to pass a CAPTCHA test if you download too much. We're [considering to provide a way to let you solve the CAPTCHA](https://github.com/rg3/youtube-dl/issues/154), but at the moment, your best course of action is pointing a webbrowser to the youtube URL, solving the CAPTCHA, and restart youtube-dl.
+Apparently YouTube requires you to pass a CAPTCHA test if you download too much. We're [considering to provide a way to let you solve the CAPTCHA](https://github.com/rg3/youtube-dl/issues/154), but at the moment, your best course of action is pointing a web browser to the youtube URL, solving the CAPTCHA, and restart youtube-dl.
  
  ### Do I need any other programs?
  
@@ -756,7 +756,7 @@ ### I have downloaded a video but how can I play it?
  
  Once the video is fully downloaded, use any video player, such as [mpv](https://mpv.io/), [vlc](http://www.videolan.org/) or [mplayer](http://www.mplayerhq.hu/).
  
-### I extracted a video URL with `-g`, but it does not play on another machine / in my webbrowser.
+### I extracted a video URL with `-g`, but it does not play on another machine / in my web browser.
  
  It depends a lot on the service. In many cases, requests for the video (to download/play it) must come from the same IP address and with the same cookies and/or HTTP headers. Use the `--cookies` option to write the required cookies into a file, and advise your downloader to read cookies from that file. Some sites also require a common user agent to be used, use `--dump-user-agent` to see the one in use by youtube-dl. You can also get necessary cookies and HTTP headers from JSON output obtained with `--dump-json`.
  
@@ -840,7 +840,7 @@ ### How do I pass cookies to youtube-dl?
  
  In order to extract cookies from browser use any conforming browser extension for exporting cookies. For example, [cookies.txt](https://chrome.google.com/webstore/detail/cookiestxt/njabckikapfpffapmjgojcnbfjonfjfg) (for Chrome) or [Export Cookies](https://addons.mozilla.org/en-US/firefox/addon/export-cookies/) (for Firefox).
  
-Note that the cookies file must be in Mozilla/Netscape format and the first line of the cookies file must be either `# HTTP Cookie File` or `# Netscape HTTP Cookie File`. Make sure you have correct [newline format](https://en.wikipedia.org/wiki/Newline) in the cookies file and convert newlines if necessary to correspond with your OS, namely `CRLF` (`\r\n`) for Windows, `LF` (`\n`) for Linux and `CR` (`\r`) for Mac OS. `HTTP Error 400: Bad Request` when using `--cookies` is a good sign of invalid newline format.
+Note that the cookies file must be in Mozilla/Netscape format and the first line of the cookies file must be either `# HTTP Cookie File` or `# Netscape HTTP Cookie File`. Make sure you have correct [newline format](https://en.wikipedia.org/wiki/Newline) in the cookies file and convert newlines if necessary to correspond with your OS, namely `CRLF` (`\r\n`) for Windows and `LF` (`\n`) for Unix and Unix-like systems (Linux, Mac OS, etc.). `HTTP Error 400: Bad Request` when using `--cookies` is a good sign of invalid newline format.
  
  Passing cookies to youtube-dl is a good way to workaround login when a particular extractor does not implement it explicitly. Another use case is working around [CAPTCHA](https://en.wikipedia.org/wiki/CAPTCHA) some websites require you to solve in particular cases in order to get access (e.g. YouTube, CloudFlare).
  
@@ -930,9 +930,9 @@ # DEVELOPER INSTRUCTIONS
  
  ### Adding support for a new site
  
-If you want to add support for a new site, first of all **make sure** this site is **not dedicated to [copyright infringement](#can-you-add-support-for-this-anime-video-site-or-site-which-shows-current-movies-for-free)**. youtube-dl does **not support** such sites thus pull requests adding support for them **will be rejected**.
+If you want to add support for a new site, first of all **make sure** this site is **not dedicated to [copyright infringement](README.md#can-you-add-support-for-this-anime-video-site-or-site-which-shows-current-movies-for-free)**. youtube-dl does **not support** such sites thus pull requests adding support for them **will be rejected**.
  
-After you have ensured this site is distributing it's content legally, you can follow this quick list (assuming your service is called `yourextractor`):
+After you have ensured this site is distributing its content legally, you can follow this quick list (assuming your service is called `yourextractor`):
  
  1. [Fork this repository](https://github.com/rg3/youtube-dl/fork)
  2. Check out the source code with:
@@ -962,7 +962,7 @@ ### Adding support for a new site
                  'id': '42',
                  'ext': 'mp4',
                  'title': 'Video title goes here',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
                  # TODO more properties, either as:
                  # * A value
                  # * MD5 checksum; start the string with md5:
@@ -1037,7 +1037,7 @@ #### Example
  }
  ```
  
-Assume you want to extract `summary` and put it into the resulting info dict as `description`. Since `description` is an optional metafield you should be ready that this key may be missing from the `meta` dict, so that you should extract it like:
+Assume you want to extract `summary` and put it into the resulting info dict as `description`. Since `description` is an optional meta field you should be ready that this key may be missing from the `meta` dict, so that you should extract it like:
  
  ```python
  description = meta.get('summary')  # correct
@@ -1149,7 +1149,7 @@ # EMBEDDING YOUTUBE-DL
      ydl.download(['http://www.youtube.com/watch?v=BaW_jenozKc'])
  ```
  
-Most likely, you'll want to use various options. For a list of options available, have a look at [`youtube_dl/YoutubeDL.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/YoutubeDL.py#L128-L278). For a start, if you want to intercept youtube-dl's output, set a `logger` object.
+Most likely, you'll want to use various options. For a list of options available, have a look at [`youtube_dl/YoutubeDL.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/YoutubeDL.py#L129-L279). For a start, if you want to intercept youtube-dl's output, set a `logger` object.
  
  Here's a more complete example of a program that outputs only errors (and a short message after the download is finished), and downloads/converts the video to an mp3 file:
  
@@ -1252,7 +1252,7 @@ ###  Does the issue involve one problem, and one problem only?
  
  Some of our users seem to think there is a limit of issues they can or should open. There is no limit of issues they can or should open. While it may seem appealing to be able to dump all your issues into one ticket, that means that someone who solves one of your issues cannot mark the issue as closed. Typically, reporting a bunch of issues leads to the ticket lingering since nobody wants to attack that behemoth, until someone mercifully splits the issue into multiple ones.
  
-In particular, every site support request issue should only pertain to services at one site (generally under a common domain, but always using the same backend technology). Do not request support for vimeo user videos, Whitehouse podcasts, and Google Plus pages in the same issue. Also, make sure that you don't post bug reports alongside feature requests. As a rule of thumb, a feature request does not include outputs of youtube-dl that are not immediately related to the feature at hand. Do not post reports of a network error alongside the request for a new video service.
+In particular, every site support request issue should only pertain to services at one site (generally under a common domain, but always using the same backend technology). Do not request support for vimeo user videos, White house podcasts, and Google Plus pages in the same issue. Also, make sure that you don't post bug reports alongside feature requests. As a rule of thumb, a feature request does not include outputs of youtube-dl that are not immediately related to the feature at hand. Do not post reports of a network error alongside the request for a new video service.
  
  ###  Is anyone going to need the feature?
  
diff --git a/devscripts/bash-completion.py b/devscripts/bash-completion.py

index ce68f26f9ca39bd298f5d4149346af686257e042..3d1391334bd38a23c7024192c6c36522acaa5613 100755 (executable)
--- a/devscripts/bash-completion.py
+++ b/devscripts/bash-completion.py
@@ -25,5 +25,6 @@ def build_completion(opt_parser):
          filled_template = template.replace("{{flags}}", " ".join(opts_flag))
          f.write(filled_template)
  
+
  parser = youtube_dl.parseOpts()[0]
  build_completion(parser)
diff --git a/devscripts/buildserver.py b/devscripts/buildserver.py

index fc99c3213dddf985cfcf4fe74584cc09eeaf3175..1344b4d87b554b690fa8d5f0fab5462b7397aaea 100644 (file)
--- a/devscripts/buildserver.py
+++ b/devscripts/buildserver.py
@@ -424,8 +424,6 @@ def do_GET(self):
                      self.send_header('Content-Length', len(msg))
                      self.end_headers()
                      self.wfile.write(msg)
-                except HTTPError as e:
-                    self.send_response(e.code, str(e))
              else:
                  self.send_response(500, 'Unknown build method "%s"' % action)
          else:
diff --git a/devscripts/create-github-release.py b/devscripts/create-github-release.py

index 3b8021e74a8149b33753be5df590d2a9115a8305..30716ad8edc917da616a23753db30367458011d8 100644 (file)
--- a/devscripts/create-github-release.py
+++ b/devscripts/create-github-release.py
@@ -2,11 +2,13 @@
  from __future__ import unicode_literals
  
  import base64
+import io
  import json
  import mimetypes
  import netrc
  import optparse
  import os
+import re
  import sys
  
  sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
@@ -90,16 +92,23 @@ def create_asset(self, release_id, asset):
  
  
  def main():
-    parser = optparse.OptionParser(usage='%prog VERSION BUILDPATH')
+    parser = optparse.OptionParser(usage='%prog CHANGELOG VERSION BUILDPATH')
      options, args = parser.parse_args()
-    if len(args) != 2:
+    if len(args) != 3:
          parser.error('Expected a version and a build directory')
  
-    version, build_path = args
+    changelog_file, version, build_path = args
+
+    with io.open(changelog_file, encoding='utf-8') as inf:
+        changelog = inf.read()
+
+    mobj = re.search(r'(?s)version %s\n{2}(.+?)\n{3}' % version, changelog)
+    body = mobj.group(1) if mobj else ''
  
      releaser = GitHubReleaser()
  
-    new_release = releaser.create_release(version, name='youtube-dl %s' % version)
+    new_release = releaser.create_release(
+        version, name='youtube-dl %s' % version, body=body)
      release_id = new_release['id']
  
      for asset in os.listdir(build_path):
diff --git a/devscripts/fish-completion.py b/devscripts/fish-completion.py

index 41629d87d006fbaf4ba90cbb87bf60388fb7f7e5..51d19dd33d3bf5c05fc86f3c63e23c00871fda90 100755 (executable)
--- a/devscripts/fish-completion.py
+++ b/devscripts/fish-completion.py
@@ -44,5 +44,6 @@ def build_completion(opt_parser):
      with open(FISH_COMPLETION_FILE, 'w') as f:
          f.write(filled_template)
  
+
  parser = youtube_dl.parseOpts()[0]
  build_completion(parser)
diff --git a/devscripts/generate_aes_testdata.py b/devscripts/generate_aes_testdata.py

index 2e389fc8e742e26b0985f3492835ccb6790cef3e..e3df42cc2da6c99d9104c9bd2bac776af5a61c46 100644 (file)
--- a/devscripts/generate_aes_testdata.py
+++ b/devscripts/generate_aes_testdata.py
@@ -23,6 +23,7 @@ def openssl_encode(algo, key, iv):
      out, _ = prog.communicate(secret_msg)
      return out
  
+
  iv = key = [0x20, 0x15] + 14 * [0]
  
  r = openssl_encode('aes-128-cbc', key, iv)
diff --git a/devscripts/gh-pages/update-sites.py b/devscripts/gh-pages/update-sites.py

index 503c1372fd3589f45a207d043999a5286f6c5e1e..531c93c7089c1847a7e9018fcda5ca177f68547e 100755 (executable)
--- a/devscripts/gh-pages/update-sites.py
+++ b/devscripts/gh-pages/update-sites.py
@@ -32,5 +32,6 @@ def main():
      with open('supportedsites.html', 'w', encoding='utf-8') as sitesf:
          sitesf.write(template)
  
+
  if __name__ == '__main__':
      main()
diff --git a/devscripts/make_contributing.py b/devscripts/make_contributing.py

index 5e454a429e46eeb108612690ae2b523ee98f30d5..226d1a5d6644953982db6346a00a21ec45f9b089 100755 (executable)
--- a/devscripts/make_contributing.py
+++ b/devscripts/make_contributing.py
@@ -28,5 +28,6 @@ def main():
      with io.open(outfile, 'w', encoding='utf-8') as outf:
          outf.write(out)
  
+
  if __name__ == '__main__':
      main()
diff --git a/devscripts/make_lazy_extractors.py b/devscripts/make_lazy_extractors.py

index 9a79c2bc5a6d57f6de31be45b29807e36bd8e12f..19114d30d1aa59e394e0c35e7ec9446eb4969c56 100644 (file)
--- a/devscripts/make_lazy_extractors.py
+++ b/devscripts/make_lazy_extractors.py
@@ -59,6 +59,7 @@ def build_lazy_ie(ie, name):
          s += make_valid_template.format(valid_url=ie._make_valid_url())
      return s
  
+
  # find the correct sorting and add the required base classes so that sublcasses
  # can be correctly created
  classes = _ALL_CLASSES[:-1]
diff --git a/devscripts/make_supportedsites.py b/devscripts/make_supportedsites.py

index 8cb4a46380253643e6df2370058c433094cf159b..764795bc5b1e560b033c2e9a0c395cecb10b1242 100644 (file)
--- a/devscripts/make_supportedsites.py
+++ b/devscripts/make_supportedsites.py
@@ -41,5 +41,6 @@ def gen_ies_md(ies):
      with io.open(outfile, 'w', encoding='utf-8') as outf:
          outf.write(out)
  
+
  if __name__ == '__main__':
      main()
diff --git a/devscripts/prepare_manpage.py b/devscripts/prepare_manpage.py

index ce548739f57f5c52e6c73d46f2095e894c8f940d..f9fe63f1ffd5073b312f22e8f08fb7798fa3f7a4 100644 (file)
--- a/devscripts/prepare_manpage.py
+++ b/devscripts/prepare_manpage.py
@@ -74,5 +74,6 @@ def filter_options(readme):
  
      return ret
  
+
  if __name__ == '__main__':
      main()
diff --git a/devscripts/release.sh b/devscripts/release.sh

index 1af61aa0bef9c402e1a3cb04990e953762a9ad69..4db5def5d8534ef73664fc90d00433d90d363bbc 100755 (executable)
--- a/devscripts/release.sh
+++ b/devscripts/release.sh
@@ -110,7 +110,7 @@ RELEASE_FILES="youtube-dl youtube-dl.exe youtube-dl-$version.tar.gz"
  for f in $RELEASE_FILES; do gpg --passphrase-repeat 5 --detach-sig "build/$version/$f"; done
  
  ROOT=$(pwd)
-python devscripts/create-github-release.py $version "$ROOT/build/$version"
+python devscripts/create-github-release.py ChangeLog $version "$ROOT/build/$version"
  
  ssh ytdl@yt-dl.org "sh html/update_latest.sh $version"
  
diff --git a/devscripts/zsh-completion.py b/devscripts/zsh-completion.py

index 04728e8e2ce763ca886853061875c59e4f645921..60aaf76cc3297adc6e80984890e33e4267b95c2b 100755 (executable)
--- a/devscripts/zsh-completion.py
+++ b/devscripts/zsh-completion.py
@@ -44,5 +44,6 @@ def build_completion(opt_parser):
      with open(ZSH_COMPLETION_FILE, "w") as f:
          f.write(template)
  
+
  parser = youtube_dl.parseOpts()[0]
  build_completion(parser)
diff --git a/docs/supportedsites.md b/docs/supportedsites.md

index 77832504a885a06c1f3a2dd459a2594d27aea470..d900f5e12662a4a05f71645017996dc46fc22b62 100644 (file)
--- a/docs/supportedsites.md
+++ b/docs/supportedsites.md
@@ -33,7 +33,8 @@ # Supported sites
   - **AdobeTVVideo**
   - **AdultSwim**
   - **aenetworks**: A+E Networks: A&E, Lifetime, History.com, FYI Network
- - **AfreecaTV**: afreecatv.com
+ - **afreecatv**: afreecatv.com
+ - **afreecatv:global**: afreecatv.com
   - **AirMozilla**
   - **AlJazeera**
   - **Allocine**
@@ -74,6 +75,8 @@ # Supported sites
   - **awaan:live**
   - **awaan:season**
   - **awaan:video**
+ - **AZMedien**: AZ Medien videos
+ - **AZMedienPlaylist**: AZ Medien playlists
   - **Azubu**
   - **AzubuLive**
   - **BaiduVideo**: 百度视频
@@ -86,6 +89,7 @@ # Supported sites
   - **bbc.co.uk:article**: BBC articles
   - **bbc.co.uk:iplayer:playlist**
   - **bbc.co.uk:playlist**
+ - **Beam:live**
   - **Beatport**
   - **Beeg**
   - **BehindKink**
@@ -131,7 +135,8 @@ # Supported sites
   - **cbsnews**: CBS News
   - **cbsnews:livevideo**: CBS News Live Videos
   - **CBSSports**
- - **CCTV**
+ - **CCMA**
+ - **CCTV**: 央视网
   - **CDA**
   - **CeskaTelevize**
   - **channel9**: Channel 9
@@ -158,6 +163,7 @@ # Supported sites
   - **CollegeRama**
   - **ComCarCoff**
   - **ComedyCentral**
+ - **ComedyCentralFullEpisodes**
   - **ComedyCentralShortname**
   - **ComedyCentralTV**
   - **CondeNast**: Condé Nast media group: Allure, Architectural Digest, Ars Technica, Bon Appétit, Brides, Condé Nast, Condé Nast Traveler, Details, Epicurious, GQ, Glamour, Golf Digest, SELF, Teen Vogue, The New Yorker, Vanity Fair, Vogue, W Magazine, WIRED
@@ -196,6 +202,7 @@ # Supported sites
   - **Digiteka**
   - **Discovery**
   - **DiscoveryGo**
+ - **Disney**
   - **Dotsub**
   - **DouyuTV**: 斗鱼
   - **DPlay**
@@ -212,6 +219,7 @@ # Supported sites
   - **EaglePlatform**
   - **EbaumsWorld**
   - **EchoMsk**
+ - **egghead:course**: egghead.io course
   - **eHow**
   - **Einthusan**
   - **eitb.tv**
@@ -238,7 +246,6 @@ # Supported sites
   - **fc2**
   - **fc2:embed**
   - **Fczenit**
- - **features.aol.com**
   - **fernsehkritik.tv**
   - **Firstpost**
   - **FiveTV**
@@ -261,7 +268,6 @@ # Supported sites
   - **francetvinfo.fr**
   - **Freesound**
   - **freespeech.org**
- - **FreeVideo**
   - **Funimation**
   - **FunnyOrDie**
   - **Fusion**
@@ -303,6 +309,7 @@ # Supported sites
   - **history:topic**: History.com Topic
   - **hitbox**
   - **hitbox:live**
+ - **HitRecord**
   - **HornBunny**
   - **HotNewHipHop**
   - **HotStar**
@@ -320,6 +327,7 @@ # Supported sites
   - **Imgur**
   - **ImgurAlbum**
   - **Ina**
+ - **Inc**
   - **Indavideo**
   - **IndavideoEmbed**
   - **InfoQ**
@@ -329,6 +337,7 @@ # Supported sites
   - **IPrima**
   - **iqiyi**: 爱奇艺
   - **Ir90Tv**
+ - **ITV**
   - **ivi**: ivi.ru
   - **ivi:compilation**: ivi.ru compilations
   - **ivideon**: Ivideon TV
@@ -363,7 +372,8 @@ # Supported sites
   - **kuwo:singer**: 酷我音乐 - 歌手
   - **kuwo:song**: 酷我音乐
   - **la7.it**
- - **Laola1Tv**
+ - **laola1tv**
+ - **laola1tv:embed**
   - **LCI**
   - **Lcp**
   - **LcpPlay**
@@ -401,6 +411,8 @@ # Supported sites
   - **MatchTV**
   - **MDR**: MDR.DE and KiKA
   - **media.ccc.de**
+ - **Meipai**: 美拍
+ - **MelonVOD**
   - **META**
   - **metacafe**
   - **Metacritic**
@@ -434,6 +446,7 @@ # Supported sites
   - **mtg**: MTG services
   - **mtv**
   - **mtv.de**
+ - **mtv81**
   - **mtv:video**
   - **mtvservices:embedded**
   - **MuenchenTV**: münchen.tv
@@ -476,6 +489,7 @@ # Supported sites
   - **Newstube**
   - **NextMedia**: 蘋果日報
   - **NextMediaActionNews**: 蘋果日報 - 動新聞
+ - **NextTV**: 壹電視
   - **nfb**: National Film Board of Canada
   - **nfl.com**
   - **NhkVod**
@@ -512,6 +526,9 @@ # Supported sites
   - **NRKPlaylist**
   - **NRKSkole**: NRK Skole
   - **NRKTV**: NRK TV and NRK Radio
+ - **NRKTVDirekte**: NRK TV Direkte and NRK Radio Direkte
+ - **NRKTVEpisodes**
+ - **NRKTVSeries**
   - **ntv.ru**
   - **Nuvid**
   - **NYTimes**
@@ -522,6 +539,7 @@ # Supported sites
   - **Odnoklassniki**
   - **OktoberfestTV**
   - **on.aol.com**
+ - **OnDemandKorea**
   - **onet.tv**
   - **onet.tv:channel**
   - **OnionStudios**
@@ -545,6 +563,7 @@ # Supported sites
   - **PhilharmonieDeParis**: Philharmonie de Paris
   - **phoenix.de**
   - **Photobucket**
+ - **Piksel**
   - **Pinkbike**
   - **Pladform**
   - **play.fm**
@@ -561,6 +580,7 @@ # Supported sites
   - **PolskieRadio**
   - **PolskieRadioCategory**
   - **PornCom**
+ - **PornFlip**
   - **PornHd**
   - **PornHub**: PornHub and Thumbzilla
   - **PornHubPlaylist**
@@ -642,8 +662,6 @@ # Supported sites
   - **screen.yahoo:search**: Yahoo screen search
   - **Screencast**
   - **ScreencastOMatic**
- - **ScreenJunkies**
- - **ScreenwaveMedia**
   - **Seeker**
   - **SenateISVP**
   - **SendtoNews**
@@ -651,7 +669,7 @@ # Supported sites
   - **Sexu**
   - **Shahid**
   - **Shared**: shared.sx
- - **ShareSix**
+ - **ShowRoomLive**
   - **Sina**
   - **SixPlay**
   - **skynewsarabia:article**
@@ -715,7 +733,7 @@ # Supported sites
   - **teachertube:user:collection**: teachertube.com user and collection videos
   - **TeachingChannel**
   - **Teamcoco**
- - **TeamFour**
+ - **TeamFourStar**
   - **TechTalks**
   - **techtv.mit.edu**
   - **ted**
@@ -771,6 +789,9 @@ # Supported sites
   - **TV2Article**
   - **TV3**
   - **TV4**: tv4.se and tv4play.se
+ - **TVA**
+ - **TVANouvelles**
+ - **TVANouvellesArticle**
   - **TVC**
   - **TVCArticle**
   - **tvigle**: Интернет-телевидение Tvigle.ru
@@ -782,10 +803,13 @@ # Supported sites
   - **Tweakers**
   - **twitch:chapter**
   - **twitch:clips**
- - **twitch:past_broadcasts**
   - **twitch:profile**
   - **twitch:stream**
   - **twitch:video**
+ - **twitch:videos:all**
+ - **twitch:videos:highlights**
+ - **twitch:videos:past-broadcasts**
+ - **twitch:videos:uploads**
   - **twitch:vod**
   - **twitter**
   - **twitter:amplify**
@@ -793,6 +817,7 @@ # Supported sites
   - **udemy**
   - **udemy:course**
   - **UDNEmbed**: 聯合影音
+ - **UKTVPlay**
   - **Unistra**
   - **uol.com.br**
   - **uplynk**
@@ -821,6 +846,7 @@ # Supported sites
   - **ViceShow**
   - **Vidbit**
   - **Viddler**
+ - **Videa**
   - **video.google:search**: Google Video search
   - **video.mit.edu**
   - **VideoDetective**
@@ -830,7 +856,6 @@ # Supported sites
   - **videomore:season**
   - **videomore:video**
   - **VideoPremium**
- - **VideoTt**: video.tt - Your True Tube (Currently broken)
   - **videoweed**: VideoWeed
   - **Vidio**
   - **vidme**
@@ -857,11 +882,15 @@ # Supported sites
   - **Vimple**: Vimple - one-click video hosting
   - **Vine**
   - **vine:user**
+ - **Viu**
+ - **viu:ott**
+ - **viu:playlist**
   - **Vivo**: vivo.sx
   - **vk**: VK
   - **vk:uservideos**: VK - User's Videos
   - **vk:wallpost**
   - **vlive**
+ - **vlive:channel**
   - **Vodlocker**
   - **VODPlatform**
   - **VoiceRepublic**
@@ -871,6 +900,7 @@ # Supported sites
   - **VRT**
   - **vube**: Vube.com
   - **VuClip**
+ - **VVVVID**
   - **VyboryMos**
   - **Vzaar**
   - **Walla**
@@ -880,6 +910,8 @@ # Supported sites
   - **WatchIndianPorn**: Watch Indian Porn
   - **WDR**
   - **wdr:mobile**
+ - **Webcaster**
+ - **WebcasterFeed**
   - **WebOfStories**
   - **WebOfStoriesPlaylist**
   - **WeiqiTV**: WQTV
diff --git a/test/test_InfoExtractor.py b/test/test_InfoExtractor.py

index a98305c747635c1b1638f761d7bdf9bead353d19..437c7270ee6aeaa8eba588badfb3bf26d79ea37d 100644 (file)
--- a/test/test_InfoExtractor.py
+++ b/test/test_InfoExtractor.py
@@ -84,5 +84,6 @@ def test_download_json(self):
          self.assertRaises(ExtractorError, self.ie._download_json, uri, None)
          self.assertEqual(self.ie._download_json(uri, None, fatal=False), None)
  
+
  if __name__ == '__main__':
      unittest.main()
diff --git a/test/test_aes.py b/test/test_aes.py

index 315a3f5ae6a597662d05f56e97672b4ff93aff10..54078a66d61ad49a05600e9efca48472194f0fa5 100644 (file)
--- a/test/test_aes.py
+++ b/test/test_aes.py
@@ -51,5 +51,6 @@ def test_decrypt_text(self):
          decrypted = (aes_decrypt_text(encrypted, password, 32))
          self.assertEqual(decrypted, self.secret_msg)
  
+
  if __name__ == '__main__':
      unittest.main()
diff --git a/test/test_download.py b/test/test_download.py

index a3f1c0644f32b180a2b177e76dbea44854b0983e..4639529897967ebc49883e488f5624a038c70c44 100644 (file)
--- a/test/test_download.py
+++ b/test/test_download.py
@@ -60,6 +60,7 @@ def _file_md5(fn):
      with open(fn, 'rb') as f:
          return hashlib.md5(f.read()).hexdigest()
  
+
  defs = gettestcases()
  
  
@@ -217,6 +218,7 @@ def try_rm_tcs_files(tcs=None):
  
      return test_template
  
+
  # And add them to TestDownload
  for n, test_case in enumerate(defs):
      test_method = generator(test_case)
diff --git a/test/test_execution.py b/test/test_execution.py

index 620db080e9bd836c7239a93e86e0944b95f793e0..11661bb68148f4eb229b50c37f67dc744491c7df 100644 (file)
--- a/test/test_execution.py
+++ b/test/test_execution.py
@@ -39,5 +39,6 @@ def test_cmdline_umlauts(self):
          _, stderr = p.communicate()
          self.assertFalse(stderr)
  
+
  if __name__ == '__main__':
      unittest.main()
diff --git a/test/test_http.py b/test/test_http.py

index bb0a098e48f5cae93cade9b4bc99c4a4d1a545ed..7a7a3510ffb46e2791153dff5e4157bb21433056 100644 (file)
--- a/test/test_http.py
+++ b/test/test_http.py
@@ -169,5 +169,6 @@ def test_proxy_with_idn(self):
          # b'xn--fiq228c' is '中文'.encode('idna')
          self.assertEqual(response, 'normal: http://xn--fiq228c.tw/')
  
+
  if __name__ == '__main__':
      unittest.main()
diff --git a/test/test_iqiyi_sdk_interpreter.py b/test/test_iqiyi_sdk_interpreter.py

index 9d95cb60618ae4ee122b46884a2eae6233dffbec..789059dbea38026362caea2be08f9d36796a7b1d 100644 (file)
--- a/test/test_iqiyi_sdk_interpreter.py
+++ b/test/test_iqiyi_sdk_interpreter.py
@@ -43,5 +43,6 @@ def test_iqiyi_sdk_interpreter(self):
          ie._login()
          self.assertTrue('unable to log in:' in logger.messages[0])
  
+
  if __name__ == '__main__':
      unittest.main()
diff --git a/test/test_jsinterp.py b/test/test_jsinterp.py

index 63c350b8fa986fc63d70af43a6a0fdcaf5958eed..c24b8ca742acc308ca9c455378564bbac053765d 100644 (file)
--- a/test/test_jsinterp.py
+++ b/test/test_jsinterp.py
@@ -104,6 +104,14 @@ def test_precedence(self):
          }''')
          self.assertEqual(jsi.call_function('x'), [20, 20, 30, 40, 50])
  
+    def test_call(self):
+        jsi = JSInterpreter('''
+        function x() { return 2; }
+        function y(a) { return x() + a; }
+        function z() { return y(3); }
+        ''')
+        self.assertEqual(jsi.call_function('z'), 5)
+
  
  if __name__ == '__main__':
      unittest.main()
diff --git a/test/test_utils.py b/test/test_utils.py

index cb75ca53e5decc102a2978cc09e8fc395e2bdcdf..edc712f0741576c852be2b528f95dcf81f309bfc 100644 (file)
--- a/test/test_utils.py
+++ b/test/test_utils.py
@@ -70,6 +70,7 @@
      lowercase_escape,
      url_basename,
      base_url,
+    urljoin,
      urlencode_postdata,
      urshift,
      update_url_query,
@@ -294,6 +295,9 @@ def test_unified_dates(self):
          self.assertEqual(unified_strdate('27.02.2016 17:30'), '20160227')
          self.assertEqual(unified_strdate('UNKNOWN DATE FORMAT'), None)
          self.assertEqual(unified_strdate('Feb 7, 2016 at 6:35 pm'), '20160207')
+        self.assertEqual(unified_strdate('July 15th, 2013'), '20130715')
+        self.assertEqual(unified_strdate('September 1st, 2013'), '20130901')
+        self.assertEqual(unified_strdate('Sep 2nd, 2013'), '20130902')
  
      def test_unified_timestamps(self):
          self.assertEqual(unified_timestamp('December 21, 2010'), 1292889600)
@@ -445,6 +449,23 @@ def test_base_url(self):
          self.assertEqual(base_url('http://foo.de/bar/baz'), 'http://foo.de/bar/')
          self.assertEqual(base_url('http://foo.de/bar/baz?x=z/x/c'), 'http://foo.de/bar/')
  
+    def test_urljoin(self):
+        self.assertEqual(urljoin('http://foo.de/', '/a/b/c.txt'), 'http://foo.de/a/b/c.txt')
+        self.assertEqual(urljoin('//foo.de/', '/a/b/c.txt'), '//foo.de/a/b/c.txt')
+        self.assertEqual(urljoin('http://foo.de/', 'a/b/c.txt'), 'http://foo.de/a/b/c.txt')
+        self.assertEqual(urljoin('http://foo.de', '/a/b/c.txt'), 'http://foo.de/a/b/c.txt')
+        self.assertEqual(urljoin('http://foo.de', 'a/b/c.txt'), 'http://foo.de/a/b/c.txt')
+        self.assertEqual(urljoin('http://foo.de/', 'http://foo.de/a/b/c.txt'), 'http://foo.de/a/b/c.txt')
+        self.assertEqual(urljoin('http://foo.de/', '//foo.de/a/b/c.txt'), '//foo.de/a/b/c.txt')
+        self.assertEqual(urljoin(None, 'http://foo.de/a/b/c.txt'), 'http://foo.de/a/b/c.txt')
+        self.assertEqual(urljoin(None, '//foo.de/a/b/c.txt'), '//foo.de/a/b/c.txt')
+        self.assertEqual(urljoin('', 'http://foo.de/a/b/c.txt'), 'http://foo.de/a/b/c.txt')
+        self.assertEqual(urljoin(['foobar'], 'http://foo.de/a/b/c.txt'), 'http://foo.de/a/b/c.txt')
+        self.assertEqual(urljoin('http://foo.de/', None), None)
+        self.assertEqual(urljoin('http://foo.de/', ''), None)
+        self.assertEqual(urljoin('http://foo.de/', ['foobar']), None)
+        self.assertEqual(urljoin('http://foo.de/a/b/c.txt', '.././../d.txt'), 'http://foo.de/d.txt')
+
      def test_parse_age_limit(self):
          self.assertEqual(parse_age_limit(None), None)
          self.assertEqual(parse_age_limit(False), None)
@@ -489,6 +510,7 @@ def test_parse_duration(self):
          self.assertEqual(parse_duration('1 hour 3 minutes'), 3780)
          self.assertEqual(parse_duration('87 Min.'), 5220)
          self.assertEqual(parse_duration('PT1H0.040S'), 3600.04)
+        self.assertEqual(parse_duration('PT00H03M30SZ'), 210)
  
      def test_fix_xml_ampersands(self):
          self.assertEqual(
@@ -763,12 +785,27 @@ def test_js_to_json_edgecases(self):
          on = js_to_json('["abc", "def",]')
          self.assertEqual(json.loads(on), ['abc', 'def'])
  
+        on = js_to_json('[/*comment\n*/"abc"/*comment\n*/,/*comment\n*/"def",/*comment\n*/]')
+        self.assertEqual(json.loads(on), ['abc', 'def'])
+
+        on = js_to_json('[//comment\n"abc" //comment\n,//comment\n"def",//comment\n]')
+        self.assertEqual(json.loads(on), ['abc', 'def'])
+
          on = js_to_json('{"abc": "def",}')
          self.assertEqual(json.loads(on), {'abc': 'def'})
  
+        on = js_to_json('{/*comment\n*/"abc"/*comment\n*/:/*comment\n*/"def"/*comment\n*/,/*comment\n*/}')
+        self.assertEqual(json.loads(on), {'abc': 'def'})
+
          on = js_to_json('{ 0: /* " \n */ ",]" , }')
          self.assertEqual(json.loads(on), {'0': ',]'})
  
+        on = js_to_json('{ /*comment\n*/0/*comment\n*/: /* " \n */ ",]" , }')
+        self.assertEqual(json.loads(on), {'0': ',]'})
+
+        on = js_to_json('{ 0: // comment\n1 }')
+        self.assertEqual(json.loads(on), {'0': 1})
+
          on = js_to_json(r'["<p>x<\/p>"]')
          self.assertEqual(json.loads(on), ['<p>x</p>'])
  
@@ -778,15 +815,27 @@ def test_js_to_json_edgecases(self):
          on = js_to_json("['a\\\nb']")
          self.assertEqual(json.loads(on), ['ab'])
  
+        on = js_to_json("/*comment\n*/[/*comment\n*/'a\\\nb'/*comment\n*/]/*comment\n*/")
+        self.assertEqual(json.loads(on), ['ab'])
+
          on = js_to_json('{0xff:0xff}')
          self.assertEqual(json.loads(on), {'255': 255})
  
+        on = js_to_json('{/*comment\n*/0xff/*comment\n*/:/*comment\n*/0xff/*comment\n*/}')
+        self.assertEqual(json.loads(on), {'255': 255})
+
          on = js_to_json('{077:077}')
          self.assertEqual(json.loads(on), {'63': 63})
  
+        on = js_to_json('{/*comment\n*/077/*comment\n*/:/*comment\n*/077/*comment\n*/}')
+        self.assertEqual(json.loads(on), {'63': 63})
+
          on = js_to_json('{42:42}')
          self.assertEqual(json.loads(on), {'42': 42})
  
+        on = js_to_json('{/*comment\n*/42/*comment\n*/:/*comment\n*/42/*comment\n*/}')
+        self.assertEqual(json.loads(on), {'42': 42})
+
      def test_extract_attributes(self):
          self.assertEqual(extract_attributes('<e x="y">'), {'x': 'y'})
          self.assertEqual(extract_attributes("<e x='y'>"), {'x': 'y'})
@@ -1075,5 +1124,6 @@ def test_get_element_by_class(self):
          self.assertEqual(get_element_by_class('foo', html), 'nice')
          self.assertEqual(get_element_by_class('no-such-class', html), None)
  
+
  if __name__ == '__main__':
      unittest.main()
diff --git a/test/test_verbose_output.py b/test/test_verbose_output.py

index 96a66f7a09b0f2173cf1b01f20c81f42985ef1b6..c1465fe8c51d8bf3789606fbf6c61da0deabfa90 100644 (file)
--- a/test/test_verbose_output.py
+++ b/test/test_verbose_output.py
@@ -66,5 +66,6 @@ def test_private_info_shortarg_eq(self):
          self.assertTrue(b'-p' in serr)
          self.assertTrue(b'secret' not in serr)
  
+
  if __name__ == '__main__':
      unittest.main()
diff --git a/test/test_write_annotations.py b/test/test_write_annotations.py

index 8de08f2d6d3974bd2d28265c323e7ff76d1317a3..41abdfe3b99eaabf562ebabc222fc50fead77631 100644 (file)
--- a/test/test_write_annotations.py
+++ b/test/test_write_annotations.py
@@ -24,6 +24,7 @@ def __init__(self, *args, **kwargs):
          super(YoutubeDL, self).__init__(*args, **kwargs)
          self.to_stderr = self.to_screen
  
+
  params = get_params({
      'writeannotations': True,
      'skip_download': True,
@@ -74,5 +75,6 @@ def test_info_json(self):
      def tearDown(self):
          try_rm(ANNOTATIONS_FILE)
  
+
  if __name__ == '__main__':
      unittest.main()
diff --git a/test/test_youtube_lists.py b/test/test_youtube_lists.py

index af1c454217d0bec66a27a1bdc89c02195bb6274f..7a33dbf88e90f2d901b144759ffa90552787885c 100644 (file)
--- a/test/test_youtube_lists.py
+++ b/test/test_youtube_lists.py
@@ -66,5 +66,6 @@ def test_youtube_flat_playlist_titles(self):
          for entry in result['entries']:
              self.assertTrue(entry.get('title'))
  
+
  if __name__ == '__main__':
      unittest.main()
diff --git a/test/test_youtube_signature.py b/test/test_youtube_signature.py

index 060864434fe2ab81839dcde17475e6e9f61db0f2..f0c370eeedc8942abc0b8cd8c10e57b4361d00c2 100644 (file)
--- a/test/test_youtube_signature.py
+++ b/test/test_youtube_signature.py
@@ -114,6 +114,7 @@ def test_func(self):
      test_func.__name__ = str('test_signature_' + stype + '_' + test_id)
      setattr(TestSignature, test_func.__name__, test_func)
  
+
  for test_spec in _TESTS:
      make_tfunc(*test_spec)
  
diff --git a/youtube_dl/YoutubeDL.py b/youtube_dl/YoutubeDL.py

index 53f20ac2cb1bd16398e160db329004b49d6bf424..c71e94518bd981edf682483c85aa059730368b40 100755 (executable)
--- a/youtube_dl/YoutubeDL.py
+++ b/youtube_dl/YoutubeDL.py
@@ -584,7 +584,7 @@ def prepare_filename(self, info_dict):
              if autonumber_size is None:
                  autonumber_size = 5
              autonumber_templ = '%0' + str(autonumber_size) + 'd'
-            template_dict['autonumber'] = autonumber_templ % self._num_downloads
+            template_dict['autonumber'] = autonumber_templ % (self.params.get('autonumber_start', 1) - 1 + self._num_downloads)
              if template_dict.get('playlist_index') is not None:
                  template_dict['playlist_index'] = '%0*d' % (len(str(template_dict['n_entries'])), template_dict['playlist_index'])
              if template_dict.get('resolution') is None:
@@ -1339,7 +1339,7 @@ def process_video_result(self, info_dict, download=True):
                  format['format_id'] = compat_str(i)
              else:
                  # Sanitize format_id from characters used in format selector expression
-                format['format_id'] = re.sub('[\s,/+\[\]()]', '_', format['format_id'])
+                format['format_id'] = re.sub(r'[\s,/+\[\]()]', '_', format['format_id'])
              format_id = format['format_id']
              if format_id not in formats_dict:
                  formats_dict[format_id] = []
@@ -1363,7 +1363,7 @@ def process_video_result(self, info_dict, download=True):
                  format['ext'] = determine_ext(format['url']).lower()
              # Automatically determine protocol if missing (useful for format
              # selection purposes)
-            if 'protocol' not in format:
+            if format.get('protocol') is None:
                  format['protocol'] = determine_protocol(format)
              # Add HTTP headers, so that external programs can use them from the
              # json output
diff --git a/youtube_dl/__init__.py b/youtube_dl/__init__.py

index 643393558b2d1c19c71fcbd25f3c60f5465912aa..2b156342ad3e9b81ed43618dd65b30216ed16b35 100644 (file)
--- a/youtube_dl/__init__.py
+++ b/youtube_dl/__init__.py
@@ -95,8 +95,7 @@ def _real_main(argv=None):
                  write_string('[debug] Batch file urls: ' + repr(batch_urls) + '\n')
          except IOError:
              sys.exit('ERROR: batch file could not be read')
-    all_urls = batch_urls + args
-    all_urls = [url.strip() for url in all_urls]
+    all_urls = batch_urls + [url.strip() for url in args]  # batch_urls are already striped in read_batch_urls
      _enc = preferredencoding()
      all_urls = [url.decode(_enc, 'ignore') if isinstance(url, bytes) else url for url in all_urls]
  
@@ -134,6 +133,12 @@ def _real_main(argv=None):
          parser.error('TV Provider account username missing\n')
      if opts.outtmpl is not None and (opts.usetitle or opts.autonumber or opts.useid):
          parser.error('using output template conflicts with using title, video ID or auto number')
+    if opts.autonumber_size is not None:
+        if opts.autonumber_size <= 0:
+            parser.error('auto number size must be positive')
+    if opts.autonumber_start is not None:
+        if opts.autonumber_start < 0:
+            parser.error('auto number start must be positive or 0')
      if opts.usetitle and opts.useid:
          parser.error('using title conflicts with using video ID')
      if opts.username is not None and opts.password is None:
@@ -322,6 +327,7 @@ def parse_retries(retries):
          'listformats': opts.listformats,
          'outtmpl': outtmpl,
          'autonumber_size': opts.autonumber_size,
+        'autonumber_start': opts.autonumber_start,
          'restrictfilenames': opts.restrictfilenames,
          'ignoreerrors': opts.ignoreerrors,
          'force_generic_extractor': opts.force_generic_extractor,
@@ -406,7 +412,7 @@ def parse_retries(retries):
          'postprocessor_args': postprocessor_args,
          'cn_verification_proxy': opts.cn_verification_proxy,
          'geo_verification_proxy': opts.geo_verification_proxy,
-
+        'config_location': opts.config_location,
      }
  
      with YoutubeDL(ydl_opts) as ydl:
@@ -450,4 +456,5 @@ def main(argv=None):
      except KeyboardInterrupt:
          sys.exit('\nERROR: Interrupted by user')
  
+
  __all__ = ['main', 'YoutubeDL', 'gen_extractors', 'list_extractors']
diff --git a/youtube_dl/aes.py b/youtube_dl/aes.py

index a01c367de4f6cf5e6f9ce4d9b86de4991fa859dc..b8ff4548116403dc5166825250fedad65c20f665 100644 (file)
--- a/youtube_dl/aes.py
+++ b/youtube_dl/aes.py
@@ -174,6 +174,7 @@ def next_value(self):
  
      return plaintext
  
+
  RCON = (0x8d, 0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80, 0x1b, 0x36)
  SBOX = (0x63, 0x7C, 0x77, 0x7B, 0xF2, 0x6B, 0x6F, 0xC5, 0x30, 0x01, 0x67, 0x2B, 0xFE, 0xD7, 0xAB, 0x76,
          0xCA, 0x82, 0xC9, 0x7D, 0xFA, 0x59, 0x47, 0xF0, 0xAD, 0xD4, 0xA2, 0xAF, 0x9C, 0xA4, 0x72, 0xC0,
@@ -328,4 +329,5 @@ def inc(data):
              break
      return data
  
+
  __all__ = ['aes_encrypt', 'key_expansion', 'aes_ctr_decrypt', 'aes_cbc_decrypt', 'aes_decrypt_text']
diff --git a/youtube_dl/compat.py b/youtube_dl/compat.py

index b8aaf5a461c9e3ca2884c748ebb3225a2fd9fe29..7189020192601c289f47eafbda40feefd14cde6c 100644 (file)
--- a/youtube_dl/compat.py
+++ b/youtube_dl/compat.py
@@ -2344,7 +2344,7 @@
      from urllib.parse import unquote_plus as compat_urllib_parse_unquote_plus
  except ImportError:  # Python 2
      _asciire = (compat_urllib_parse._asciire if hasattr(compat_urllib_parse, '_asciire')
-                else re.compile('([\x00-\x7f]+)'))
+                else re.compile(r'([\x00-\x7f]+)'))
  
      # HACK: The following are the correct unquote_to_bytes, unquote and unquote_plus
      # implementations from cpython 3.4.3's stdlib. Python 2's version
@@ -2491,6 +2491,7 @@ class _TreeBuilder(etree.TreeBuilder):
      def doctype(self, name, pubid, system):
          pass
  
+
  if sys.version_info[0] >= 3:
      def compat_etree_fromstring(text):
          return etree.XML(text, parser=etree.XMLParser(target=_TreeBuilder()))
@@ -2528,6 +2529,24 @@ def compat_etree_fromstring(text):
                  el.text = el.text.decode('utf-8')
          return doc
  
+if hasattr(etree, 'register_namespace'):
+    compat_etree_register_namespace = etree.register_namespace
+else:
+    def compat_etree_register_namespace(prefix, uri):
+        """Register a namespace prefix.
+        The registry is global, and any existing mapping for either the
+        given prefix or the namespace URI will be removed.
+        *prefix* is the namespace prefix, *uri* is a namespace uri. Tags and
+        attributes in this namespace will be serialized with prefix if possible.
+        ValueError is raised if prefix is reserved or is invalid.
+        """
+        if re.match(r"ns\d+$", prefix):
+            raise ValueError("Prefix format reserved for internal use")
+        for k, v in list(etree._namespace_map.items()):
+            if k == uri or v == prefix:
+                del etree._namespace_map[k]
+        etree._namespace_map[uri] = prefix
+
  if sys.version_info < (2, 7):
      # Here comes the crazy part: In 2.6, if the xpath is a unicode,
      # .//node does not match if a node is a direct child of . !
@@ -2787,6 +2806,7 @@ def _compat_add_option(self, *args, **kwargs):
              return real_add_option(self, *bargs, **bkwargs)
          optparse.OptionGroup.add_option = _compat_add_option
  
+
  if hasattr(shutil, 'get_terminal_size'):  # Python >= 3.3
      compat_get_terminal_size = shutil.get_terminal_size
  else:
@@ -2863,6 +2883,7 @@ def compat_struct_unpack(spec, *args):
      'compat_cookiejar',
      'compat_cookies',
      'compat_etree_fromstring',
+    'compat_etree_register_namespace',
      'compat_expanduser',
      'compat_get_terminal_size',
      'compat_getenv',
diff --git a/youtube_dl/downloader/external.py b/youtube_dl/downloader/external.py

index 0aeae3b8f4f0f2fc153f7b3900f828618b224be0..138f353efcde9481ac2b04d01973522cbaedf756 100644 (file)
--- a/youtube_dl/downloader/external.py
+++ b/youtube_dl/downloader/external.py
@@ -17,6 +17,7 @@
      encodeArgument,
      handle_youtubedl_headers,
      check_executable,
+    is_outdated_version,
  )
  
  
@@ -264,7 +265,9 @@ def _call_downloader(self, tmpfilename, info_dict):
              if self.params.get('hls_use_mpegts', False) or tmpfilename == '-':
                  args += ['-f', 'mpegts']
              else:
-                args += ['-f', 'mp4', '-bsf:a', 'aac_adtstoasc']
+                args += ['-f', 'mp4']
+                if (ffpp.basename == 'ffmpeg' and is_outdated_version(ffpp._versions['ffmpeg'], '3.2')) and (not info_dict.get('acodec') or info_dict['acodec'].split('.')[0] in ('aac', 'mp4a')):
+                    args += ['-bsf:a', 'aac_adtstoasc']
          elif protocol == 'rtmp':
              args += ['-f', 'flv']
          else:
@@ -293,6 +296,7 @@ def _call_downloader(self, tmpfilename, info_dict):
  class AVconvFD(FFmpegFD):
      pass
  
+
  _BY_NAME = dict(
      (klass.get_basename(), klass)
      for name, klass in globals().items()
diff --git a/youtube_dl/downloader/f4m.py b/youtube_dl/downloader/f4m.py

index 80c21d40bc88382a64634b0eeb9daa3eaaccc303..688e086eb0536c55ef184ae68fa09a6ffb41462d 100644 (file)
--- a/youtube_dl/downloader/f4m.py
+++ b/youtube_dl/downloader/f4m.py
@@ -314,7 +314,8 @@ def real_download(self, filename, info_dict):
          man_url = info_dict['url']
          requested_bitrate = info_dict.get('tbr')
          self.to_screen('[%s] Downloading f4m manifest' % self.FD_NAME)
-        urlh = self.ydl.urlopen(man_url)
+
+        urlh = self.ydl.urlopen(self._prepare_url(info_dict, man_url))
          man_url = urlh.geturl()
          # Some manifests may be malformed, e.g. prosiebensat1 generated manifests
          # (see https://github.com/rg3/youtube-dl/issues/6215#issuecomment-121704244
@@ -387,7 +388,10 @@ def real_download(self, filename, info_dict):
              url_parsed = base_url_parsed._replace(path=base_url_parsed.path + name, query='&'.join(query))
              frag_filename = '%s-%s' % (ctx['tmpfilename'], name)
              try:
-                success = ctx['dl'].download(frag_filename, {'url': url_parsed.geturl()})
+                success = ctx['dl'].download(frag_filename, {
+                    'url': url_parsed.geturl(),
+                    'http_headers': info_dict.get('http_headers'),
+                })
                  if not success:
                      return False
                  (down, frag_sanitized) = sanitize_open(frag_filename, 'rb')
diff --git a/youtube_dl/downloader/fragment.py b/youtube_dl/downloader/fragment.py

index 84aacf7db6b839d6bf52f6254b58f1323822290b..60df627a65dfc589899f009fa5df9ce76a441ae5 100644 (file)
--- a/youtube_dl/downloader/fragment.py
+++ b/youtube_dl/downloader/fragment.py
@@ -9,6 +9,7 @@
      error_to_compat_str,
      encodeFilename,
      sanitize_open,
+    sanitized_Request,
  )
  
  
@@ -37,6 +38,10 @@ def report_retry_fragment(self, err, fragment_name, count, retries):
      def report_skip_fragment(self, fragment_name):
          self.to_screen('[download] Skipping fragment %s...' % fragment_name)
  
+    def _prepare_url(self, info_dict, url):
+        headers = info_dict.get('http_headers')
+        return sanitized_Request(url, None, headers) if headers else url
+
      def _prepare_and_start_frag_download(self, ctx):
          self._prepare_frag_download(ctx)
          self._start_frag_download(ctx)
diff --git a/youtube_dl/downloader/hls.py b/youtube_dl/downloader/hls.py

index 541b92ee122261f8230ede54e57c07b68dc40cac..4989abce12ee236e5c528778e5b95f67d92e165e 100644 (file)
--- a/youtube_dl/downloader/hls.py
+++ b/youtube_dl/downloader/hls.py
@@ -59,11 +59,15 @@ def can_download(manifest, info_dict):
      def real_download(self, filename, info_dict):
          man_url = info_dict['url']
          self.to_screen('[%s] Downloading m3u8 manifest' % self.FD_NAME)
-        manifest = self.ydl.urlopen(man_url).read()
+
+        manifest = self.ydl.urlopen(self._prepare_url(info_dict, man_url)).read()
  
          s = manifest.decode('utf-8', 'ignore')
  
          if not self.can_download(s, info_dict):
+            if info_dict.get('extra_param_to_segment_url'):
+                self.report_error('pycrypto not found. Please install it.')
+                return False
              self.report_warning(
                  'hlsnative has detected features it does not support, '
                  'extraction will be delegated to ffmpeg')
@@ -112,7 +116,10 @@ def real_download(self, filename, info_dict):
                      count = 0
                      while count <= fragment_retries:
                          try:
-                            success = ctx['dl'].download(frag_filename, {'url': frag_url})
+                            success = ctx['dl'].download(frag_filename, {
+                                'url': frag_url,
+                                'http_headers': info_dict.get('http_headers'),
+                            })
                              if not success:
                                  return False
                              down, frag_sanitized = sanitize_open(frag_filename, 'rb')
diff --git a/youtube_dl/extractor/abcnews.py b/youtube_dl/extractor/abcnews.py

index 6ae5d9a96ac6919ab1ea1ae906bf510018d5578b..4f56c4c11935ee85a9412a39de20138bb83cc33d 100644 (file)
--- a/youtube_dl/extractor/abcnews.py
+++ b/youtube_dl/extractor/abcnews.py
@@ -23,7 +23,7 @@ class AbcNewsVideoIE(AMPIE):
              'title': '\'This Week\' Exclusive: Iran\'s Foreign Minister Zarif',
              'description': 'George Stephanopoulos goes one-on-one with Iranian Foreign Minister Dr. Javad Zarif.',
              'duration': 180,
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          },
          'params': {
              # m3u8 download
@@ -59,7 +59,7 @@ class AbcNewsIE(InfoExtractor):
              'display_id': 'dramatic-video-rare-death-job-america',
              'title': 'Occupational Hazards',
              'description': 'Nightline investigates the dangers that lurk at various jobs.',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'upload_date': '20100428',
              'timestamp': 1272412800,
          },
diff --git a/youtube_dl/extractor/abcotvs.py b/youtube_dl/extractor/abcotvs.py

index 054bb05964910c3d521eb6615661c1843239290f..76e98132b9d18514e54ed37e11df61089a75678c 100644 (file)
--- a/youtube_dl/extractor/abcotvs.py
+++ b/youtube_dl/extractor/abcotvs.py
@@ -23,7 +23,7 @@ class ABCOTVSIE(InfoExtractor):
                  'ext': 'mp4',
                  'title': 'East Bay museum celebrates vintage synthesizers',
                  'description': 'md5:a4f10fb2f2a02565c1749d4adbab4b10',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
                  'timestamp': 1421123075,
                  'upload_date': '20150113',
                  'uploader': 'Jonathan Bloom',
diff --git a/youtube_dl/extractor/acast.py b/youtube_dl/extractor/acast.py

index 94ce88c834f5ce1575b36f839f02fdf43f96e046..6dace305141423ec35d25abe3c4283476a762b2f 100644 (file)
--- a/youtube_dl/extractor/acast.py
+++ b/youtube_dl/extractor/acast.py
@@ -8,6 +8,7 @@
  from ..compat import compat_str
  from ..utils import (
      int_or_none,
+    parse_iso8601,
      OnDemandPagedList,
  )
  
@@ -15,18 +16,33 @@
  class ACastIE(InfoExtractor):
      IE_NAME = 'acast'
      _VALID_URL = r'https?://(?:www\.)?acast\.com/(?P<channel>[^/]+)/(?P<id>[^/#?]+)'
-    _TEST = {
+    _TESTS = [{
+        # test with one bling
          'url': 'https://www.acast.com/condenasttraveler/-where-are-you-taipei-101-taiwan',
          'md5': 'ada3de5a1e3a2a381327d749854788bb',
          'info_dict': {
              'id': '57de3baa-4bb0-487e-9418-2692c1277a34',
              'ext': 'mp3',
              'title': '"Where Are You?": Taipei 101, Taiwan',
-            'timestamp': 1196172000000,
+            'timestamp': 1196172000,
+            'upload_date': '20071127',
              'description': 'md5:a0b4ef3634e63866b542e5b1199a1a0e',
              'duration': 211,
          }
-    }
+    }, {
+        # test with multiple blings
+        'url': 'https://www.acast.com/sparpodcast/2.raggarmordet-rosterurdetforflutna',
+        'md5': '55c0097badd7095f494c99a172f86501',
+        'info_dict': {
+            'id': '2a92b283-1a75-4ad8-8396-499c641de0d9',
+            'ext': 'mp3',
+            'title': '2. Raggarmordet - Röster ur det förflutna',
+            'timestamp': 1477346700,
+            'upload_date': '20161024',
+            'description': 'md5:4f81f6d8cf2e12ee21a321d8bca32db4',
+            'duration': 2797,
+        }
+    }]
  
      def _real_extract(self, url):
          channel, display_id = re.match(self._VALID_URL, url).groups()
@@ -35,11 +51,11 @@ def _real_extract(self, url):
          return {
              'id': compat_str(cast_data['id']),
              'display_id': display_id,
-            'url': cast_data['blings'][0]['audio'],
+            'url': [b['audio'] for b in cast_data['blings'] if b['type'] == 'BlingAudio'][0],
              'title': cast_data['name'],
              'description': cast_data.get('description'),
              'thumbnail': cast_data.get('image'),
-            'timestamp': int_or_none(cast_data.get('publishingDate')),
+            'timestamp': parse_iso8601(cast_data.get('publishingDate')),
              'duration': int_or_none(cast_data.get('duration')),
          }
  
diff --git a/youtube_dl/extractor/adobetv.py b/youtube_dl/extractor/adobetv.py

index 5ae16fa16809b557e74e133a4a7811d396b1c2c2..008c98e51ead3ffcad7bb350fcf928a945b91e35 100644 (file)
--- a/youtube_dl/extractor/adobetv.py
+++ b/youtube_dl/extractor/adobetv.py
@@ -30,7 +30,7 @@ class AdobeTVIE(AdobeTVBaseIE):
              'ext': 'mp4',
              'title': 'Quick Tip - How to Draw a Circle Around an Object in Photoshop',
              'description': 'md5:99ec318dc909d7ba2a1f2b038f7d2311',
-            'thumbnail': 're:https?://.*\.jpg$',
+            'thumbnail': r're:https?://.*\.jpg$',
              'upload_date': '20110914',
              'duration': 60,
              'view_count': int,
diff --git a/youtube_dl/extractor/aenetworks.py b/youtube_dl/extractor/aenetworks.py

index 6adb6d824c00ec733afaf1bbe1b243f7d623b647..c97317400ea1f660674d34ca22f4177365198d59 100644 (file)
--- a/youtube_dl/extractor/aenetworks.py
+++ b/youtube_dl/extractor/aenetworks.py
@@ -26,7 +26,7 @@ class AENetworksIE(AENetworksBaseIE):
      _VALID_URL = r'https?://(?:www\.)?(?P<domain>(?:history|aetv|mylifetime)\.com|fyi\.tv)/(?:shows/(?P<show_path>[^/]+(?:/[^/]+){0,2})|movies/(?P<movie_display_id>[^/]+)/full-movie)'
      _TESTS = [{
          'url': 'http://www.history.com/shows/mountain-men/season-1/episode-1',
-        'md5': '8ff93eb073449f151d6b90c0ae1ef0c7',
+        'md5': 'a97a65f7e823ae10e9244bc5433d5fe6',
          'info_dict': {
              'id': '22253814',
              'ext': 'mp4',
@@ -87,7 +87,7 @@ def _real_extract(self, url):
                      self._html_search_meta('aetn:SeriesTitle', webpage))
              elif url_parts_len == 2:
                  entries = []
-                for episode_item in re.findall(r'(?s)<div[^>]+class="[^"]*episode-item[^"]*"[^>]*>', webpage):
+                for episode_item in re.findall(r'(?s)<[^>]+class="[^"]*(?:episode|program)-item[^"]*"[^>]*>', webpage):
                      episode_attributes = extract_attributes(episode_item)
                      episode_url = compat_urlparse.urljoin(
                          url, episode_attributes['data-canonical'])
@@ -99,7 +99,7 @@ def _real_extract(self, url):
  
          query = {
              'mbr': 'true',
-            'assetTypes': 'medium_video_s3'
+            'assetTypes': 'high_video_s3'
          }
          video_id = self._html_search_meta('aetn:VideoID', webpage)
          media_url = self._search_regex(
@@ -155,7 +155,7 @@ class HistoryTopicIE(AENetworksBaseIE):
              'id': 'world-war-i-history',
              'title': 'World War I History',
          },
-        'playlist_mincount': 24,
+        'playlist_mincount': 23,
      }, {
          'url': 'http://www.history.com/topics/world-war-i-history/videos',
          'only_matching': True,
@@ -193,7 +193,8 @@ def _real_extract(self, url):
              return self.theplatform_url_result(
                  release_url, video_id, {
                      'mbr': 'true',
-                    'switch': 'hls'
+                    'switch': 'hls',
+                    'assetTypes': 'high_video_ak',
                  })
          else:
              webpage = self._download_webpage(url, topic_id)
@@ -203,6 +204,7 @@ def _real_extract(self, url):
                  entries.append(self.theplatform_url_result(
                      video_attributes['data-release-url'], video_attributes['data-id'], {
                          'mbr': 'true',
-                        'switch': 'hls'
+                        'switch': 'hls',
+                        'assetTypes': 'high_video_ak',
                      }))
              return self.playlist_result(entries, topic_id, get_element_by_attribute('class', 'show-title', webpage))
diff --git a/youtube_dl/extractor/afreecatv.py b/youtube_dl/extractor/afreecatv.py

index 75b36699363609876c755d4c120ec195aa81ec3a..4f6cdb8a2a66a3941da6ce4dafa4b656d6628495 100644 (file)
--- a/youtube_dl/extractor/afreecatv.py
+++ b/youtube_dl/extractor/afreecatv.py
@@ -18,6 +18,7 @@
  
  
  class AfreecaTVIE(InfoExtractor):
+    IE_NAME = 'afreecatv'
      IE_DESC = 'afreecatv.com'
      _VALID_URL = r'''(?x)
                      https?://
@@ -143,3 +144,94 @@ def _real_extract(self, url):
                  expected=True)
  
          return info
+
+
+class AfreecaTVGlobalIE(AfreecaTVIE):
+    IE_NAME = 'afreecatv:global'
+    _VALID_URL = r'https?://(?:www\.)?afreeca\.tv/(?P<channel_id>\d+)(?:/v/(?P<video_id>\d+))?'
+    _TESTS = [{
+        'url': 'http://afreeca.tv/36853014/v/58301',
+        'info_dict': {
+            'id': '58301',
+            'title': 'tryhard top100',
+            'uploader_id': '36853014',
+            'uploader': 'makgi Hearthstone Live!',
+        },
+        'playlist_count': 3,
+    }]
+
+    def _real_extract(self, url):
+        channel_id, video_id = re.match(self._VALID_URL, url).groups()
+        video_type = 'video' if video_id else 'live'
+        query = {
+            'pt': 'view',
+            'bid': channel_id,
+        }
+        if video_id:
+            query['vno'] = video_id
+        video_data = self._download_json(
+            'http://api.afreeca.tv/%s/view_%s.php' % (video_type, video_type),
+            video_id or channel_id, query=query)['channel']
+
+        if video_data.get('result') != 1:
+            raise ExtractorError('%s said: %s' % (self.IE_NAME, video_data['remsg']))
+
+        title = video_data['title']
+
+        info = {
+            'thumbnail': video_data.get('thumb'),
+            'view_count': int_or_none(video_data.get('vcnt')),
+            'age_limit': int_or_none(video_data.get('grade')),
+            'uploader_id': channel_id,
+            'uploader': video_data.get('cname'),
+        }
+
+        if video_id:
+            entries = []
+            for i, f in enumerate(video_data.get('flist', [])):
+                video_key = self.parse_video_key(f.get('key', ''))
+                f_url = f.get('file')
+                if not video_key or not f_url:
+                    continue
+                entries.append({
+                    'id': '%s_%s' % (video_id, video_key.get('part', i + 1)),
+                    'title': title,
+                    'upload_date': video_key.get('upload_date'),
+                    'duration': int_or_none(f.get('length')),
+                    'url': f_url,
+                    'protocol': 'm3u8_native',
+                    'ext': 'mp4',
+                })
+
+            info.update({
+                'id': video_id,
+                'title': title,
+                'duration': int_or_none(video_data.get('length')),
+            })
+            if len(entries) > 1:
+                info['_type'] = 'multi_video'
+                info['entries'] = entries
+            elif len(entries) == 1:
+                i = entries[0].copy()
+                i.update(info)
+                info = i
+        else:
+            formats = []
+            for s in video_data.get('strm', []):
+                s_url = s.get('purl')
+                if not s_url:
+                    continue
+                # TODO: extract rtmp formats
+                if s.get('stype') == 'HLS':
+                    formats.extend(self._extract_m3u8_formats(
+                        s_url, channel_id, 'mp4', fatal=False))
+            self._sort_formats(formats)
+
+            info.update({
+                'id': channel_id,
+                'title': self._live_title(title),
+                'is_live': True,
+                'formats': formats,
+            })
+
+        return info
diff --git a/youtube_dl/extractor/airmozilla.py b/youtube_dl/extractor/airmozilla.py

index f8e70f4e580746093d97e3d2d596d008ed3e6c15..0e069187994d0b9d25463d2d2f3cdb6c74ce5406 100644 (file)
--- a/youtube_dl/extractor/airmozilla.py
+++ b/youtube_dl/extractor/airmozilla.py
@@ -20,7 +20,7 @@ class AirMozillaIE(InfoExtractor):
              'id': '6x4q2w',
              'ext': 'mp4',
              'title': 'Privacy Lab - a meetup for privacy minded people in San Francisco',
-            'thumbnail': 're:https?://vid\.ly/(?P<id>[0-9a-z-]+)/poster',
+            'thumbnail': r're:https?://vid\.ly/(?P<id>[0-9a-z-]+)/poster',
              'description': 'Brings together privacy professionals and others interested in privacy at for-profits, non-profits, and NGOs in an effort to contribute to the state of the ecosystem...',
              'timestamp': 1422487800,
              'upload_date': '20150128',
diff --git a/youtube_dl/extractor/allocine.py b/youtube_dl/extractor/allocine.py

index 517b06def4d2ff690628eece4b1e85e647aea267..90f11d39f5393528e75ff641342cfda8c8706e0b 100644 (file)
--- a/youtube_dl/extractor/allocine.py
+++ b/youtube_dl/extractor/allocine.py
@@ -21,7 +21,7 @@ class AllocineIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Astérix - Le Domaine des Dieux Teaser VF',
              'description': 'md5:4a754271d9c6f16c72629a8a993ee884',
-            'thumbnail': 're:http://.*\.jpg',
+            'thumbnail': r're:http://.*\.jpg',
          },
      }, {
          'url': 'http://www.allocine.fr/video/player_gen_cmedia=19540403&cfilm=222257.html',
@@ -32,7 +32,7 @@ class AllocineIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Planes 2 Bande-annonce VF',
              'description': 'Regardez la bande annonce du film Planes 2 (Planes 2 Bande-annonce VF). Planes 2, un film de Roberts Gannaway',
-            'thumbnail': 're:http://.*\.jpg',
+            'thumbnail': r're:http://.*\.jpg',
          },
      }, {
          'url': 'http://www.allocine.fr/video/player_gen_cmedia=19544709&cfilm=181290.html',
@@ -43,7 +43,7 @@ class AllocineIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Dragons 2 - Bande annonce finale VF',
              'description': 'md5:6cdd2d7c2687d4c6aafe80a35e17267a',
-            'thumbnail': 're:http://.*\.jpg',
+            'thumbnail': r're:http://.*\.jpg',
          },
      }, {
          'url': 'http://www.allocine.fr/video/video-19550147/',
@@ -53,7 +53,7 @@ class AllocineIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Faux Raccord N°123 - Les gaffes de Cliffhanger',
              'description': 'md5:bc734b83ffa2d8a12188d9eb48bb6354',
-            'thumbnail': 're:http://.*\.jpg',
+            'thumbnail': r're:http://.*\.jpg',
          },
      }]
  
diff --git a/youtube_dl/extractor/alphaporno.py b/youtube_dl/extractor/alphaporno.py

index c34719d1fefb6afd45e8775c3683913799380714..3a6d99f6bfd050e7c204a08fe32fbf230a8ff694 100644 (file)
--- a/youtube_dl/extractor/alphaporno.py
+++ b/youtube_dl/extractor/alphaporno.py
@@ -19,7 +19,7 @@ class AlphaPornoIE(InfoExtractor):
              'display_id': 'sensual-striptease-porn-with-samantha-alexandra',
              'ext': 'mp4',
              'title': 'Sensual striptease porn with Samantha Alexandra',
-            'thumbnail': 're:https?://.*\.jpg$',
+            'thumbnail': r're:https?://.*\.jpg$',
              'timestamp': 1418694611,
              'upload_date': '20141216',
              'duration': 387,
diff --git a/youtube_dl/extractor/amcnetworks.py b/youtube_dl/extractor/amcnetworks.py

index d2b03b177c1fd46c88552d0355365d2fae7772c9..87c803e948fd2e04cde6b0b43251d3f804b952a0 100644 (file)
--- a/youtube_dl/extractor/amcnetworks.py
+++ b/youtube_dl/extractor/amcnetworks.py
@@ -10,7 +10,7 @@
  
  
  class AMCNetworksIE(ThePlatformIE):
-    _VALID_URL = r'https?://(?:www\.)?(?:amc|bbcamerica|ifc|wetv)\.com/(?:movies/|shows/[^/]+/(?:full-episodes/)?season-\d+/episode-\d+(?:-(?:[^/]+/)?|/))(?P<id>[^/?#]+)'
+    _VALID_URL = r'https?://(?:www\.)?(?:amc|bbcamerica|ifc|wetv)\.com/(?:movies/|shows/[^/]+/(?:full-episodes/)?[^/]+/episode-\d+(?:-(?:[^/]+/)?|/))(?P<id>[^/?#]+)'
      _TESTS = [{
          'url': 'http://www.ifc.com/shows/maron/season-04/episode-01/step-1',
          'md5': '',
@@ -41,6 +41,9 @@ class AMCNetworksIE(ThePlatformIE):
      }, {
          'url': 'http://www.ifc.com/movies/chaos',
          'only_matching': True,
+    }, {
+        'url': 'http://www.bbcamerica.com/shows/doctor-who/full-episodes/the-power-of-the-daleks/episode-01-episode-1-color-version',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
diff --git a/youtube_dl/extractor/aol.py b/youtube_dl/extractor/aol.py

index 2cdee33200232dc69c1755213fc2f8298c6c8fa3..b50f454ee0ca661aa3cf93ab05121bf8857eeca8 100644 (file)
--- a/youtube_dl/extractor/aol.py
+++ b/youtube_dl/extractor/aol.py
@@ -12,7 +12,7 @@
  
  class AolIE(InfoExtractor):
      IE_NAME = 'on.aol.com'
-    _VALID_URL = r'(?:aol-video:|https?://on\.aol\.com/(?:[^/]+/)*(?:[^/?#&]+-)?)(?P<id>[^/?#&]+)'
+    _VALID_URL = r'(?:aol-video:|https?://(?:(?:www|on)\.)?aol\.com/(?:[^/]+/)*(?:[^/?#&]+-)?)(?P<id>[^/?#&]+)'
  
      _TESTS = [{
          # video with 5min ID
@@ -33,7 +33,7 @@ class AolIE(InfoExtractor):
          }
      }, {
          # video with vidible ID
-        'url': 'http://on.aol.com/video/netflix-is-raising-rates-5707d6b8e4b090497b04f706?context=PC:homepage:PL1944:1460189336183',
+        'url': 'http://www.aol.com/video/view/netflix-is-raising-rates/5707d6b8e4b090497b04f706/',
          'info_dict': {
              'id': '5707d6b8e4b090497b04f706',
              'ext': 'mp4',
@@ -108,30 +108,3 @@ def _real_extract(self, url):
              'uploader': video_data.get('videoOwner'),
              'formats': formats,
          }
-
-
-class AolFeaturesIE(InfoExtractor):
-    IE_NAME = 'features.aol.com'
-    _VALID_URL = r'https?://features\.aol\.com/video/(?P<id>[^/?#]+)'
-
-    _TESTS = [{
-        'url': 'http://features.aol.com/video/behind-secret-second-careers-late-night-talk-show-hosts',
-        'md5': '7db483bb0c09c85e241f84a34238cc75',
-        'info_dict': {
-            'id': '519507715',
-            'ext': 'mp4',
-            'title': 'What To Watch - February 17, 2016',
-        },
-        'add_ie': ['FiveMin'],
-        'params': {
-            # encrypted m3u8 download
-            'skip_download': True,
-        },
-    }]
-
-    def _real_extract(self, url):
-        display_id = self._match_id(url)
-        webpage = self._download_webpage(url, display_id)
-        return self.url_result(self._search_regex(
-            r'<script type="text/javascript" src="(https?://[^/]*?5min\.com/Scripts/PlayerSeed\.js[^"]+)"',
-            webpage, '5min embed url'), 'FiveMin')
diff --git a/youtube_dl/extractor/ard.py b/youtube_dl/extractor/ard.py

index 35f3656f11d7579a1f67cd0ac6e9c06a37c44917..2d5599456688eba9756e28c2ffe9dbae48decb2c 100644 (file)
--- a/youtube_dl/extractor/ard.py
+++ b/youtube_dl/extractor/ard.py
@@ -253,7 +253,7 @@ class ARDIE(InfoExtractor):
              'duration': 2600,
              'title': 'Die Story im Ersten: Mission unter falscher Flagge',
              'upload_date': '20140804',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          },
          'skip': 'HTTP Error 404: Not Found',
      }
diff --git a/youtube_dl/extractor/arkena.py b/youtube_dl/extractor/arkena.py

index d45cae301df005f455aad3ec6aeda5ed87d5b50e..50ffb442dd051be347e2c79c2d4a11dacb9f574b 100644 (file)
--- a/youtube_dl/extractor/arkena.py
+++ b/youtube_dl/extractor/arkena.py
@@ -4,8 +4,10 @@
  import re
  
  from .common import InfoExtractor
+from ..compat import compat_urlparse
  from ..utils import (
      determine_ext,
+    ExtractorError,
      float_or_none,
      int_or_none,
      mimetype2ext,
@@ -15,7 +17,13 @@
  
  
  class ArkenaIE(InfoExtractor):
-    _VALID_URL = r'https?://play\.arkena\.com/(?:config|embed)/avp/v\d/player/media/(?P<id>[^/]+)/[^/]+/(?P<account_id>\d+)'
+    _VALID_URL = r'''(?x)
+                        https?://
+                            (?:
+                                video\.arkena\.com/play2/embed/player\?|
+                                play\.arkena\.com/(?:config|embed)/avp/v\d/player/media/(?P<id>[^/]+)/[^/]+/(?P<account_id>\d+)
+                            )
+                        '''
      _TESTS = [{
          'url': 'https://play.arkena.com/embed/avp/v2/player/media/b41dda37-d8e7-4d3f-b1b5-9a9db578bdfe/1/129411',
          'md5': 'b96f2f71b359a8ecd05ce4e1daa72365',
@@ -37,6 +45,9 @@ class ArkenaIE(InfoExtractor):
      }, {
          'url': 'http://play.arkena.com/embed/avp/v1/player/media/327336/darkmatter/131064/',
          'only_matching': True,
+    }, {
+        'url': 'http://video.arkena.com/play2/embed/player?accountId=472718&mediaId=35763b3b-00090078-bf604299&pageStyling=styled',
+        'only_matching': True,
      }]
  
      @staticmethod
@@ -53,6 +64,14 @@ def _real_extract(self, url):
          video_id = mobj.group('id')
          account_id = mobj.group('account_id')
  
+        # Handle http://video.arkena.com/play2/embed/player URL
+        if not video_id:
+            qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
+            video_id = qs.get('mediaId', [None])[0]
+            account_id = qs.get('accountId', [None])[0]
+            if not video_id or not account_id:
+                raise ExtractorError('Invalid URL', expected=True)
+
          playlist = self._download_json(
              'https://play.arkena.com/config/avp/v2/player/media/%s/0/%s/?callbackMethod=_'
              % (video_id, account_id),
diff --git a/youtube_dl/extractor/atresplayer.py b/youtube_dl/extractor/atresplayer.py

index d2f3889645f9b9324deb0eda00d4f6b67ab32dc1..e3c669830343bb4f698dc342adebbd764877fd4b 100644 (file)
--- a/youtube_dl/extractor/atresplayer.py
+++ b/youtube_dl/extractor/atresplayer.py
@@ -30,7 +30,7 @@ class AtresPlayerIE(InfoExtractor):
                  'title': 'Especial Solidario de Nochebuena',
                  'description': 'md5:e2d52ff12214fa937107d21064075bf1',
                  'duration': 5527.6,
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
              },
              'skip': 'This video is only available for registered users'
          },
@@ -43,7 +43,7 @@ class AtresPlayerIE(InfoExtractor):
                  'title': 'David Bustamante',
                  'description': 'md5:f33f1c0a05be57f6708d4dd83a3b81c6',
                  'duration': 1439.0,
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
              },
          },
          {
diff --git a/youtube_dl/extractor/atttechchannel.py b/youtube_dl/extractor/atttechchannel.py

index b01d35bb24bb45ee76741a672926372720b81ede..8f93fb353471dbd6516a540b0621d7c292720e05 100644 (file)
--- a/youtube_dl/extractor/atttechchannel.py
+++ b/youtube_dl/extractor/atttechchannel.py
@@ -14,7 +14,7 @@ class ATTTechChannelIE(InfoExtractor):
              'ext': 'flv',
              'title': 'AT&T Archives : The UNIX System: Making Computers Easier to Use',
              'description': 'A 1982 film about UNIX is the foundation for software in use around Bell Labs and AT&T.',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'upload_date': '20140127',
          },
          'params': {
diff --git a/youtube_dl/extractor/audioboom.py b/youtube_dl/extractor/audioboom.py

index d7d1c6306443b77dd7161b3c07480ad16c14ffa5..8fc5f65c67a94417498fd4480e87d50ba49c4766 100644 (file)
--- a/youtube_dl/extractor/audioboom.py
+++ b/youtube_dl/extractor/audioboom.py
@@ -17,7 +17,7 @@ class AudioBoomIE(InfoExtractor):
              'description': 'Guest:   Nate Davis - NFL free agency,   Guest:   Stan Gans',
              'duration': 2245.72,
              'uploader': 'Steve Czaban',
-            'uploader_url': 're:https?://(?:www\.)?audioboom\.com/channel/steveczabanyahoosportsradio',
+            'uploader_url': r're:https?://(?:www\.)?audioboom\.com/channel/steveczabanyahoosportsradio',
          }
      }, {
          'url': 'https://audioboom.com/posts/4279833-3-09-2016-czaban-hour-3?t=0',
diff --git a/youtube_dl/extractor/azmedien.py b/youtube_dl/extractor/azmedien.py

new file mode 100644 (file)

index 0000000..cbc3ed5
--- /dev/null
+++ b/youtube_dl/extractor/azmedien.py
@@ -0,0 +1,172 @@
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from .kaltura import KalturaIE
+from ..utils import (
+    get_element_by_id,
+    strip_or_none,
+    urljoin,
+)
+
+
+class AZMedienBaseIE(InfoExtractor):
+    def _kaltura_video(self, partner_id, entry_id):
+        return self.url_result(
+            'kaltura:%s:%s' % (partner_id, entry_id), ie=KalturaIE.ie_key(),
+            video_id=entry_id)
+
+
+class AZMedienIE(AZMedienBaseIE):
+    IE_DESC = 'AZ Medien videos'
+    _VALID_URL = r'''(?x)
+                    https?://
+                        (?:www\.)?
+                        (?:
+                            telezueri\.ch|
+                            telebaern\.tv|
+                            telem1\.ch
+                        )/
+                        [0-9]+-show-[^/\#]+
+                        (?:
+                            /[0-9]+-episode-[^/\#]+
+                            (?:
+                                /[0-9]+-segment-(?:[^/\#]+\#)?|
+                                \#
+                            )|
+                            \#
+                        )
+                        (?P<id>[^\#]+)
+                    '''
+
+    _TESTS = [{
+        # URL with 'segment'
+        'url': 'http://www.telezueri.ch/62-show-zuerinews/13772-episode-sonntag-18-dezember-2016/32419-segment-massenabweisungen-beim-hiltl-club-wegen-pelzboom',
+        'info_dict': {
+            'id': '1_2444peh4',
+            'ext': 'mov',
+            'title': 'Massenabweisungen beim Hiltl Club wegen Pelzboom',
+            'description': 'md5:9ea9dd1b159ad65b36ddcf7f0d7c76a8',
+            'uploader_id': 'TeleZ?ri',
+            'upload_date': '20161218',
+            'timestamp': 1482084490,
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }, {
+        # URL with 'segment' and fragment:
+        'url': 'http://www.telebaern.tv/118-show-news/14240-episode-dienstag-17-januar-2017/33666-segment-achtung-gefahr#zu-wenig-pflegerinnen-und-pfleger',
+        'only_matching': True
+    }, {
+        # URL with 'episode' and fragment:
+        'url': 'http://www.telem1.ch/47-show-sonntalk/13986-episode-soldaten-fuer-grenzschutz-energiestrategie-obama-bilanz#soldaten-fuer-grenzschutz-energiestrategie-obama-bilanz',
+        'only_matching': True
+    }, {
+        # URL with 'show' and fragment:
+        'url': 'http://www.telezueri.ch/66-show-sonntalk#burka-plakate-trump-putin-china-besuch',
+        'only_matching': True
+    }]
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, video_id)
+
+        partner_id = self._search_regex(
+            r'<script[^>]+src=["\'](?:https?:)?//(?:[^/]+\.)?kaltura\.com(?:/[^/]+)*/(?:p|partner_id)/([0-9]+)',
+            webpage, 'kaltura partner id')
+        entry_id = self._html_search_regex(
+            r'<a[^>]+data-id=(["\'])(?P<id>(?:(?!\1).)+)\1[^>]+data-slug=["\']%s'
+            % re.escape(video_id), webpage, 'kaltura entry id', group='id')
+
+        return self._kaltura_video(partner_id, entry_id)
+
+
+class AZMedienPlaylistIE(AZMedienBaseIE):
+    IE_DESC = 'AZ Medien playlists'
+    _VALID_URL = r'''(?x)
+                    https?://
+                        (?:www\.)?
+                        (?:
+                            telezueri\.ch|
+                            telebaern\.tv|
+                            telem1\.ch
+                        )/
+                        (?P<id>[0-9]+-
+                            (?:
+                                show|
+                                topic|
+                                themen
+                            )-[^/\#]+
+                            (?:
+                                /[0-9]+-episode-[^/\#]+
+                            )?
+                        )$
+                    '''
+
+    _TESTS = [{
+        # URL with 'episode'
+        'url': 'http://www.telebaern.tv/118-show-news/13735-episode-donnerstag-15-dezember-2016',
+        'info_dict': {
+            'id': '118-show-news/13735-episode-donnerstag-15-dezember-2016',
+            'title': 'News - Donnerstag, 15. Dezember 2016',
+        },
+        'playlist_count': 9,
+    }, {
+        # URL with 'themen'
+        'url': 'http://www.telem1.ch/258-themen-tele-m1-classics',
+        'info_dict': {
+            'id': '258-themen-tele-m1-classics',
+            'title': 'Tele M1 Classics',
+        },
+        'playlist_mincount': 15,
+    }, {
+        # URL with 'topic', contains nested playlists
+        'url': 'http://www.telezueri.ch/219-topic-aera-trump-hat-offiziell-begonnen',
+        'only_matching': True,
+    }, {
+        # URL with 'show' only
+        'url': 'http://www.telezueri.ch/86-show-talktaeglich',
+        'only_matching': True
+    }]
+
+    def _real_extract(self, url):
+        show_id = self._match_id(url)
+        webpage = self._download_webpage(url, show_id)
+
+        entries = []
+
+        partner_id = self._search_regex(
+            r'src=["\'](?:https?:)?//(?:[^/]+\.)kaltura\.com/(?:[^/]+/)*(?:p|partner_id)/(\d+)',
+            webpage, 'kaltura partner id', default=None)
+
+        if partner_id:
+            entries = [
+                self._kaltura_video(partner_id, m.group('id'))
+                for m in re.finditer(
+                    r'data-id=(["\'])(?P<id>(?:(?!\1).)+)\1', webpage)]
+
+        if not entries:
+            entries = [
+                self.url_result(m.group('url'), ie=AZMedienIE.ie_key())
+                for m in re.finditer(
+                    r'<a[^>]+data-real=(["\'])(?P<url>http.+?)\1', webpage)]
+
+        if not entries:
+            entries = [
+                # May contain nested playlists (e.g. [1]) thus no explicit
+                # ie_key
+                # 1. http://www.telezueri.ch/219-topic-aera-trump-hat-offiziell-begonnen)
+                self.url_result(urljoin(url, m.group('url')))
+                for m in re.finditer(
+                    r'<a[^>]+name=[^>]+href=(["\'])(?P<url>/.+?)\1', webpage)]
+
+        title = self._search_regex(
+            r'episodeShareTitle\s*=\s*(["\'])(?P<title>(?:(?!\1).)+)\1',
+            webpage, 'title',
+            default=strip_or_none(get_element_by_id(
+                'video-title', webpage)), group='title')
+
+        return self.playlist_result(entries, show_id, title)
diff --git a/youtube_dl/extractor/azubu.py b/youtube_dl/extractor/azubu.py

index 72e1bd59d28fcd4bceaa6c1453fe80d65e9ccc96..3ba2f00d39dc7836a3bfeddb8ca1da3e929e6e88 100644 (file)
--- a/youtube_dl/extractor/azubu.py
+++ b/youtube_dl/extractor/azubu.py
@@ -11,7 +11,7 @@
  
  
  class AzubuIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?azubu\.tv/[^/]+#!/play/(?P<id>\d+)'
+    _VALID_URL = r'https?://(?:www\.)?azubu\.(?:tv|uol.com.br)/[^/]+#!/play/(?P<id>\d+)'
      _TESTS = [
          {
              'url': 'http://www.azubu.tv/GSL#!/play/15575/2014-hot6-cup-last-big-match-ro8-day-1',
@@ -21,7 +21,7 @@ class AzubuIE(InfoExtractor):
                  'ext': 'mp4',
                  'title': '2014 HOT6 CUP LAST BIG MATCH Ro8 Day 1',
                  'description': 'md5:d06bdea27b8cc4388a90ad35b5c66c01',
-                'thumbnail': 're:^https?://.*\.jpe?g',
+                'thumbnail': r're:^https?://.*\.jpe?g',
                  'timestamp': 1417523507.334,
                  'upload_date': '20141202',
                  'duration': 9988.7,
@@ -38,7 +38,7 @@ class AzubuIE(InfoExtractor):
                  'ext': 'mp4',
                  'title': 'Fnatic at Worlds 2014: Toyz - "I love Rekkles, he has amazing mechanics"',
                  'description': 'md5:4a649737b5f6c8b5c5be543e88dc62af',
-                'thumbnail': 're:^https?://.*\.jpe?g',
+                'thumbnail': r're:^https?://.*\.jpe?g',
                  'timestamp': 1410530893.320,
                  'upload_date': '20140912',
                  'duration': 172.385,
@@ -103,12 +103,15 @@ def _real_extract(self, url):
  
  
  class AzubuLiveIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?azubu\.tv/(?P<id>[^/]+)$'
+    _VALID_URL = r'https?://(?:www\.)?azubu\.(?:tv|uol.com.br)/(?P<id>[^/]+)$'
  
-    _TEST = {
+    _TESTS = [{
          'url': 'http://www.azubu.tv/MarsTVMDLen',
          'only_matching': True,
-    }
+    }, {
+        'url': 'http://azubu.uol.com.br/adolfz',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
          user = self._match_id(url)
diff --git a/youtube_dl/extractor/bandcamp.py b/youtube_dl/extractor/bandcamp.py

index 249c3d9569c440b057af9f6706109545ace32862..88c590e98388d5f6058dd71ffb97f4f0254f0c5b 100644 (file)
--- a/youtube_dl/extractor/bandcamp.py
+++ b/youtube_dl/extractor/bandcamp.py
@@ -1,7 +1,9 @@
  from __future__ import unicode_literals
  
  import json
+import random
  import re
+import time
  
  from .common import InfoExtractor
  from ..compat import (
@@ -12,6 +14,9 @@
      ExtractorError,
      float_or_none,
      int_or_none,
+    parse_filesize,
+    unescapeHTML,
+    update_url_query,
  )
  
  
@@ -81,35 +86,68 @@ def _real_extract(self, url):
              r'(?ms)var TralbumData = .*?[{,]\s*id: (?P<id>\d+),?$',
              webpage, 'video id')
  
-        download_webpage = self._download_webpage(download_link, video_id, 'Downloading free downloads page')
-        # We get the dictionary of the track from some javascript code
-        all_info = self._parse_json(self._search_regex(
-            r'(?sm)items: (.*?),$', download_webpage, 'items'), video_id)
-        info = all_info[0]
-        # We pick mp3-320 for now, until format selection can be easily implemented.
-        mp3_info = info['downloads']['mp3-320']
-        # If we try to use this url it says the link has expired
-        initial_url = mp3_info['url']
-        m_url = re.match(
-            r'(?P<server>http://(.*?)\.bandcamp\.com)/download/track\?enc=mp3-320&fsig=(?P<fsig>.*?)&id=(?P<id>.*?)&ts=(?P<ts>.*)$',
-            initial_url)
-        # We build the url we will use to get the final track url
-        # This url is build in Bandcamp in the script download_bunde_*.js
-        request_url = '%s/statdownload/track?enc=mp3-320&fsig=%s&id=%s&ts=%s&.rand=665028774616&.vrs=1' % (m_url.group('server'), m_url.group('fsig'), video_id, m_url.group('ts'))
-        final_url_webpage = self._download_webpage(request_url, video_id, 'Requesting download url')
-        # If we could correctly generate the .rand field the url would be
-        # in the "download_url" key
-        final_url = self._proto_relative_url(self._search_regex(
-            r'"retry_url":"(.+?)"', final_url_webpage, 'final video URL'), 'http:')
+        download_webpage = self._download_webpage(
+            download_link, video_id, 'Downloading free downloads page')
+
+        blob = self._parse_json(
+            self._search_regex(
+                r'data-blob=(["\'])(?P<blob>{.+?})\1', download_webpage,
+                'blob', group='blob'),
+            video_id, transform_source=unescapeHTML)
+
+        info = blob['digital_items'][0]
+
+        downloads = info['downloads']
+        track = info['title']
+
+        artist = info.get('artist')
+        title = '%s - %s' % (artist, track) if artist else track
+
+        download_formats = {}
+        for f in blob['download_formats']:
+            name, ext = f.get('name'), f.get('file_extension')
+            if all(isinstance(x, compat_str) for x in (name, ext)):
+                download_formats[name] = ext.strip('.')
+
+        formats = []
+        for format_id, f in downloads.items():
+            format_url = f.get('url')
+            if not format_url:
+                continue
+            # Stat URL generation algorithm is reverse engineered from
+            # download_*_bundle_*.js
+            stat_url = update_url_query(
+                format_url.replace('/download/', '/statdownload/'), {
+                    '.rand': int(time.time() * 1000 * random.random()),
+                })
+            format_id = f.get('encoding_name') or format_id
+            stat = self._download_json(
+                stat_url, video_id, 'Downloading %s JSON' % format_id,
+                transform_source=lambda s: s[s.index('{'):s.rindex('}') + 1],
+                fatal=False)
+            if not stat:
+                continue
+            retry_url = stat.get('retry_url')
+            if not isinstance(retry_url, compat_str):
+                continue
+            formats.append({
+                'url': self._proto_relative_url(retry_url, 'http:'),
+                'ext': download_formats.get(format_id),
+                'format_id': format_id,
+                'format_note': f.get('description'),
+                'filesize': parse_filesize(f.get('size_mb')),
+                'vcodec': 'none',
+            })
+        self._sort_formats(formats)
  
          return {
              'id': video_id,
-            'title': info['title'],
-            'ext': 'mp3',
-            'vcodec': 'none',
-            'url': final_url,
+            'title': title,
              'thumbnail': info.get('thumb_url'),
              'uploader': info.get('artist'),
+            'artist': artist,
+            'track': track,
+            'formats': formats,
          }
  
  
diff --git a/youtube_dl/extractor/beampro.py b/youtube_dl/extractor/beampro.py

new file mode 100644 (file)

index 0000000..f3a9e32
--- /dev/null
+++ b/youtube_dl/extractor/beampro.py
@@ -0,0 +1,73 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import (
+    ExtractorError,
+    clean_html,
+    compat_str,
+    int_or_none,
+    parse_iso8601,
+    try_get,
+)
+
+
+class BeamProLiveIE(InfoExtractor):
+    IE_NAME = 'Beam:live'
+    _VALID_URL = r'https?://(?:\w+\.)?beam\.pro/(?P<id>[^/?#&]+)'
+    _RATINGS = {'family': 0, 'teen': 13, '18+': 18}
+    _TEST = {
+        'url': 'http://www.beam.pro/niterhayven',
+        'info_dict': {
+            'id': '261562',
+            'ext': 'mp4',
+            'title': 'Introducing The Witcher 3 //  The Grind Starts Now!',
+            'description': 'md5:0b161ac080f15fe05d18a07adb44a74d',
+            'thumbnail': r're:https://.*\.jpg$',
+            'timestamp': 1483477281,
+            'upload_date': '20170103',
+            'uploader': 'niterhayven',
+            'uploader_id': '373396',
+            'age_limit': 18,
+            'is_live': True,
+            'view_count': int,
+        },
+        'skip': 'niterhayven is offline',
+        'params': {
+            'skip_download': True,
+        },
+    }
+
+    def _real_extract(self, url):
+        channel_name = self._match_id(url)
+
+        chan = self._download_json(
+            'https://beam.pro/api/v1/channels/%s' % channel_name, channel_name)
+
+        if chan.get('online') is False:
+            raise ExtractorError(
+                '{0} is offline'.format(channel_name), expected=True)
+
+        channel_id = chan['id']
+
+        formats = self._extract_m3u8_formats(
+            'https://beam.pro/api/v1/channels/%s/manifest.m3u8' % channel_id,
+            channel_name, ext='mp4', m3u8_id='hls', fatal=False)
+        self._sort_formats(formats)
+
+        user_id = chan.get('userId') or try_get(chan, lambda x: x['user']['id'])
+
+        return {
+            'id': compat_str(chan.get('id') or channel_name),
+            'title': self._live_title(chan.get('name') or channel_name),
+            'description': clean_html(chan.get('description')),
+            'thumbnail': try_get(chan, lambda x: x['thumbnail']['url'], compat_str),
+            'timestamp': parse_iso8601(chan.get('updatedAt')),
+            'uploader': chan.get('token') or try_get(
+                chan, lambda x: x['user']['username'], compat_str),
+            'uploader_id': compat_str(user_id) if user_id else None,
+            'age_limit': self._RATINGS.get(chan.get('audience')),
+            'is_live': True,
+            'view_count': int_or_none(chan.get('viewersTotal')),
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/bet.py b/youtube_dl/extractor/bet.py

index 1f8ef030380c5fb548d14cc8e944c8dad1fca900..d7ceaa85e45c7da4ffbf3909b1e00f8ffd9ac75c 100644 (file)
--- a/youtube_dl/extractor/bet.py
+++ b/youtube_dl/extractor/bet.py
@@ -17,7 +17,7 @@ class BetIE(MTVServicesInfoExtractor):
                  'description': 'President Obama urges persistence in confronting racism and bias.',
                  'duration': 1534,
                  'upload_date': '20141208',
-                'thumbnail': 're:(?i)^https?://.*\.jpg$',
+                'thumbnail': r're:(?i)^https?://.*\.jpg$',
                  'subtitles': {
                      'en': 'mincount:2',
                  }
@@ -37,7 +37,7 @@ class BetIE(MTVServicesInfoExtractor):
                  'description': 'A BET News special.',
                  'duration': 1696,
                  'upload_date': '20141125',
-                'thumbnail': 're:(?i)^https?://.*\.jpg$',
+                'thumbnail': r're:(?i)^https?://.*\.jpg$',
                  'subtitles': {
                      'en': 'mincount:2',
                  }
diff --git a/youtube_dl/extractor/bild.py b/youtube_dl/extractor/bild.py

index 1a0184861d20d7674badc042bfc44fbda6c9718b..b8dfbd42b429beb8f530076b1efdee571ec4d855 100644 (file)
--- a/youtube_dl/extractor/bild.py
+++ b/youtube_dl/extractor/bild.py
@@ -19,7 +19,7 @@ class BildIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Das können die  neuen iPads',
              'description': 'md5:a4058c4fa2a804ab59c00d7244bbf62f',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 196,
          }
      }
diff --git a/youtube_dl/extractor/bilibili.py b/youtube_dl/extractor/bilibili.py

index 2d174e6f9a81da7412cd58ac316c7b5924dcde78..80dd8382e4e8758274e3a7ba2418479ee3d2fbbc 100644 (file)
--- a/youtube_dl/extractor/bilibili.py
+++ b/youtube_dl/extractor/bilibili.py
@@ -5,19 +5,27 @@
  import re
  
  from .common import InfoExtractor
-from ..compat import compat_parse_qs
+from ..compat import (
+    compat_parse_qs,
+    compat_urlparse,
+)
  from ..utils import (
+    ExtractorError,
      int_or_none,
      float_or_none,
+    parse_iso8601,
+    smuggle_url,
+    strip_jsonp,
      unified_timestamp,
+    unsmuggle_url,
      urlencode_postdata,
  )
  
  
  class BiliBiliIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.|bangumi\.|)bilibili\.(?:tv|com)/(?:video/av|anime/v/)(?P<id>\d+)'
+    _VALID_URL = r'https?://(?:www\.|bangumi\.|)bilibili\.(?:tv|com)/(?:video/av|anime/(?P<anime_id>\d+)/play#)(?P<id>\d+)'
  
-    _TEST = {
+    _TESTS = [{
          'url': 'http://www.bilibili.tv/video/av1074402/',
          'md5': '9fa226fe2b8a9a4d5a69b4c6a183417e',
          'info_dict': {
@@ -28,29 +36,65 @@ class BiliBiliIE(InfoExtractor):
              'duration': 308.315,
              'timestamp': 1398012660,
              'upload_date': '20140420',
-            'thumbnail': 're:^https?://.+\.jpg',
+            'thumbnail': r're:^https?://.+\.jpg',
              'uploader': '菊子桑',
              'uploader_id': '156160',
          },
-    }
+    }, {
+        # Tested in BiliBiliBangumiIE
+        'url': 'http://bangumi.bilibili.com/anime/1869/play#40062',
+        'only_matching': True,
+    }, {
+        'url': 'http://bangumi.bilibili.com/anime/5802/play#100643',
+        'md5': '3f721ad1e75030cc06faf73587cfec57',
+        'info_dict': {
+            'id': '100643',
+            'ext': 'mp4',
+            'title': 'CHAOS;CHILD',
+            'description': '如果你是神明，并且能够让妄想成为现实。那你会进行怎么样的妄想？是淫靡的世界？独裁社会？毁灭性的制裁？还是……2015年，涩谷。从6年前发生的大灾害“涩谷地震”之后复兴了的这个街区里新设立的私立高中...',
+        },
+        'skip': 'Geo-restricted to China',
+    }]
+
+    _APP_KEY = '84956560bc028eb7'
+    _BILIBILI_KEY = '94aba54af9065f71de72f5508f1cd42e'
  
-    _APP_KEY = '6f90a59ac58a4123'
-    _BILIBILI_KEY = '0bfd84cc3940035173f35e6777508326'
+    def _report_error(self, result):
+        if 'message' in result:
+            raise ExtractorError('%s said: %s' % (self.IE_NAME, result['message']), expected=True)
+        elif 'code' in result:
+            raise ExtractorError('%s returns error %d' % (self.IE_NAME, result['code']), expected=True)
+        else:
+            raise ExtractorError('Can\'t extract Bangumi episode ID')
  
      def _real_extract(self, url):
-        video_id = self._match_id(url)
+        url, smuggled_data = unsmuggle_url(url, {})
+
+        mobj = re.match(self._VALID_URL, url)
+        video_id = mobj.group('id')
+        anime_id = mobj.group('anime_id')
          webpage = self._download_webpage(url, video_id)
  
-        if 'anime/v' not in url:
+        if 'anime/' not in url:
              cid = compat_parse_qs(self._search_regex(
                  [r'EmbedPlayer\([^)]+,\s*"([^"]+)"\)',
                   r'<iframe[^>]+src="https://secure\.bilibili\.com/secure,([^"]+)"'],
                  webpage, 'player parameters'))['cid'][0]
          else:
+            if 'no_bangumi_tip' not in smuggled_data:
+                self.to_screen('Downloading episode %s. To download all videos in anime %s, re-run youtube-dl with %s' % (
+                    video_id, anime_id, compat_urlparse.urljoin(url, '//bangumi.bilibili.com/anime/%s' % anime_id)))
+            headers = {
+                'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
+            }
+            headers.update(self.geo_verification_headers())
+
              js = self._download_json(
                  'http://bangumi.bilibili.com/web_api/get_source', video_id,
                  data=urlencode_postdata({'episode_id': video_id}),
-                headers={'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8'})
+                headers=headers)
+            if 'result' not in js:
+                self._report_error(js)
              cid = js['result']['cid']
  
          payload = 'appkey=%s&cid=%s&otype=json&quality=2&type=mp4' % (self._APP_KEY, cid)
@@ -58,7 +102,11 @@ def _real_extract(self, url):
  
          video_info = self._download_json(
              'http://interface.bilibili.com/playurl?%s&sign=%s' % (payload, sign),
-            video_id, note='Downloading video info page')
+            video_id, note='Downloading video info page',
+            headers=self.geo_verification_headers())
+
+        if 'durl' not in video_info:
+            self._report_error(video_info)
  
          entries = []
  
@@ -85,7 +133,7 @@ def _real_extract(self, url):
          title = self._html_search_regex('<h1[^>]+title="([^"]+)">', webpage, 'title')
          description = self._html_search_meta('description', webpage)
          timestamp = unified_timestamp(self._html_search_regex(
-            r'<time[^>]+datetime="([^"]+)"', webpage, 'upload time', fatal=False))
+            r'<time[^>]+datetime="([^"]+)"', webpage, 'upload time', default=None))
          thumbnail = self._html_search_meta(['og:image', 'thumbnailUrl'], webpage)
  
          # TODO 'view_count' requires deobfuscating Javascript
@@ -99,7 +147,7 @@ def _real_extract(self, url):
          }
  
          uploader_mobj = re.search(
-            r'<a[^>]+href="https?://space\.bilibili\.com/(?P<id>\d+)"[^>]+title="(?P<name>[^"]+)"',
+            r'<a[^>]+href="(?:https?:)?//space\.bilibili\.com/(?P<id>\d+)"[^>]+title="(?P<name>[^"]+)"',
              webpage)
          if uploader_mobj:
              info.update({
@@ -123,3 +171,70 @@ def _real_extract(self, url):
                  'description': description,
                  'entries': entries,
              }
+
+
+class BiliBiliBangumiIE(InfoExtractor):
+    _VALID_URL = r'https?://bangumi\.bilibili\.com/anime/(?P<id>\d+)'
+
+    IE_NAME = 'bangumi.bilibili.com'
+    IE_DESC = 'BiliBili番剧'
+
+    _TESTS = [{
+        'url': 'http://bangumi.bilibili.com/anime/1869',
+        'info_dict': {
+            'id': '1869',
+            'title': '混沌武士',
+            'description': 'md5:6a9622b911565794c11f25f81d6a97d2',
+        },
+        'playlist_count': 26,
+    }, {
+        'url': 'http://bangumi.bilibili.com/anime/1869',
+        'info_dict': {
+            'id': '1869',
+            'title': '混沌武士',
+            'description': 'md5:6a9622b911565794c11f25f81d6a97d2',
+        },
+        'playlist': [{
+            'md5': '91da8621454dd58316851c27c68b0c13',
+            'info_dict': {
+                'id': '40062',
+                'ext': 'mp4',
+                'title': '混沌武士',
+                'description': '故事发生在日本的江户时代。风是一个小酒馆的打工女。一日，酒馆里来了一群恶霸，虽然他们的举动令风十分不满，但是毕竟风只是一届女流，无法对他们采取什么行动，只能在心里嘟哝。这时，酒家里又进来了个“不良份子...',
+                'timestamp': 1414538739,
+                'upload_date': '20141028',
+                'episode': '疾风怒涛 Tempestuous Temperaments',
+                'episode_number': 1,
+            },
+        }],
+        'params': {
+            'playlist_items': '1',
+        },
+    }]
+
+    @classmethod
+    def suitable(cls, url):
+        return False if BiliBiliIE.suitable(url) else super(BiliBiliBangumiIE, cls).suitable(url)
+
+    def _real_extract(self, url):
+        bangumi_id = self._match_id(url)
+
+        # Sometimes this API returns a JSONP response
+        season_info = self._download_json(
+            'http://bangumi.bilibili.com/jsonp/seasoninfo/%s.ver' % bangumi_id,
+            bangumi_id, transform_source=strip_jsonp)['result']
+
+        entries = [{
+            '_type': 'url_transparent',
+            'url': smuggle_url(episode['webplay_url'], {'no_bangumi_tip': 1}),
+            'ie_key': BiliBiliIE.ie_key(),
+            'timestamp': parse_iso8601(episode.get('update_time'), delimiter=' '),
+            'episode': episode.get('index_title'),
+            'episode_number': int_or_none(episode.get('index')),
+        } for episode in season_info['episodes']]
+
+        entries = sorted(entries, key=lambda entry: entry.get('episode_number'))
+
+        return self.playlist_result(
+            entries, bangumi_id,
+            season_info.get('bangumi_title'), season_info.get('evaluate'))
diff --git a/youtube_dl/extractor/biobiochiletv.py b/youtube_dl/extractor/biobiochiletv.py

index 7608c0a085b3c656277b03f61d19c1f60ea8d4f1..b92031c8ab6bac3c5d5135743dda5883de080855 100644 (file)
--- a/youtube_dl/extractor/biobiochiletv.py
+++ b/youtube_dl/extractor/biobiochiletv.py
@@ -19,7 +19,7 @@ class BioBioChileTVIE(InfoExtractor):
              'id': 'sobre-camaras-y-camarillas-parlamentarias',
              'ext': 'mp4',
              'title': 'Sobre Cámaras y camarillas parlamentarias',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'uploader': 'Fernando Atria',
          },
          'skip': 'URL expired and redirected to http://www.biobiochile.cl/portada/bbtv/index.html',
@@ -31,7 +31,7 @@ class BioBioChileTVIE(InfoExtractor):
              'id': 'natalia-valdebenito-repasa-a-diputado-hasbun-paso-a-la-categoria-de-hablar-brutalidades',
              'ext': 'mp4',
              'title': 'Natalia Valdebenito repasa a diputado Hasbún: Pasó a la categoría de hablar brutalidades',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'uploader': 'Piangella Obrador',
          },
          'params': {
diff --git a/youtube_dl/extractor/bloomberg.py b/youtube_dl/extractor/bloomberg.py

index 2a8cd64b99d2551da9777aaa259d356d8cad51ed..c5e11e8eb81151ca8dd07be04c1fe2005a26bfa9 100644 (file)
--- a/youtube_dl/extractor/bloomberg.py
+++ b/youtube_dl/extractor/bloomberg.py
@@ -45,7 +45,8 @@ def _real_extract(self, url):
          name = self._match_id(url)
          webpage = self._download_webpage(url, name)
          video_id = self._search_regex(
-            r'["\']bmmrId["\']\s*:\s*(["\'])(?P<url>.+?)\1',
+            (r'["\']bmmrId["\']\s*:\s*(["\'])(?P<url>(?:(?!\1).)+)\1',
+             r'videoId\s*:\s*(["\'])(?P<url>(?:(?!\1).)+)\1'),
              webpage, 'id', group='url', default=None)
          if not video_id:
              bplayer_data = self._parse_json(self._search_regex(
diff --git a/youtube_dl/extractor/breakcom.py b/youtube_dl/extractor/breakcom.py

index 725859b4d2d554df91ff4793a2b3d245f02c8996..5a87c2661910303d638351a0f5155dd20db35793 100644 (file)
--- a/youtube_dl/extractor/breakcom.py
+++ b/youtube_dl/extractor/breakcom.py
@@ -1,9 +1,9 @@
  from __future__ import unicode_literals
  
  import re
-import json
  
  from .common import InfoExtractor
+from ..compat import compat_str
  from ..utils import (
      int_or_none,
      parse_age_limit,
@@ -11,7 +11,7 @@
  
  
  class BreakIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?break\.com/video/(?:[^/]+/)*.+-(?P<id>\d+)'
+    _VALID_URL = r'https?://(?:www\.)?(?P<site>break|screenjunkies)\.com/video/(?P<display_id>[^/]+?)(?:-(?P<id>\d+))?(?:[/?#&]|$)'
      _TESTS = [{
          'url': 'http://www.break.com/video/when-girls-act-like-guys-2468056',
          'info_dict': {
@@ -20,45 +20,124 @@ class BreakIE(InfoExtractor):
              'title': 'When Girls Act Like D-Bags',
              'age_limit': 13,
          }
+    }, {
+        'url': 'http://www.screenjunkies.com/video/best-quentin-tarantino-movie-2841915',
+        'md5': '5c2b686bec3d43de42bde9ec047536b0',
+        'info_dict': {
+            'id': '2841915',
+            'display_id': 'best-quentin-tarantino-movie',
+            'ext': 'mp4',
+            'title': 'Best Quentin Tarantino Movie',
+            'thumbnail': r're:^https?://.*\.jpg',
+            'duration': 3671,
+            'age_limit': 13,
+            'tags': list,
+        },
+    }, {
+        'url': 'http://www.screenjunkies.com/video/honest-trailers-the-dark-knight',
+        'info_dict': {
+            'id': '2348808',
+            'display_id': 'honest-trailers-the-dark-knight',
+            'ext': 'mp4',
+            'title': 'Honest Trailers - The Dark Knight',
+            'thumbnail': r're:^https?://.*\.(?:jpg|png)',
+            'age_limit': 10,
+            'tags': list,
+        },
+    }, {
+        # requires subscription but worked around
+        'url': 'http://www.screenjunkies.com/video/knocking-dead-ep-1-the-show-so-far-3003285',
+        'info_dict': {
+            'id': '3003285',
+            'display_id': 'knocking-dead-ep-1-the-show-so-far',
+            'ext': 'mp4',
+            'title': 'State of The Dead Recap: Knocking Dead Pilot',
+            'thumbnail': r're:^https?://.*\.jpg',
+            'duration': 3307,
+            'age_limit': 13,
+            'tags': list,
+        },
      }, {
          'url': 'http://www.break.com/video/ugc/baby-flex-2773063',
          'only_matching': True,
      }]
  
+    _DEFAULT_BITRATES = (48, 150, 320, 496, 864, 2240, 3264)
+
      def _real_extract(self, url):
-        video_id = self._match_id(url)
+        site, display_id, video_id = re.match(self._VALID_URL, url).groups()
+
+        if not video_id:
+            webpage = self._download_webpage(url, display_id)
+            video_id = self._search_regex(
+                (r'src=["\']/embed/(\d+)', r'data-video-content-id=["\'](\d+)'),
+                webpage, 'video id')
+
          webpage = self._download_webpage(
-            'http://www.break.com/embed/%s' % video_id, video_id)
-        info = json.loads(self._search_regex(
-            r'var embedVars = ({.*})\s*?</script>',
-            webpage, 'info json', flags=re.DOTALL))
+            'http://www.%s.com/embed/%s' % (site, video_id),
+            display_id, 'Downloading video embed page')
+        embed_vars = self._parse_json(
+            self._search_regex(
+                r'(?s)embedVars\s*=\s*({.+?})\s*</script>', webpage, 'embed vars'),
+            display_id)
  
-        youtube_id = info.get('youtubeId')
+        youtube_id = embed_vars.get('youtubeId')
          if youtube_id:
              return self.url_result(youtube_id, 'Youtube')
  
-        formats = [{
-            'url': media['uri'] + '?' + info['AuthToken'],
-            'tbr': media['bitRate'],
-            'width': media['width'],
-            'height': media['height'],
-        } for media in info['media'] if media.get('mediaPurpose') == 'play']
+        title = embed_vars['contentName']
  
-        if not formats:
+        formats = []
+        bitrates = []
+        for f in embed_vars.get('media', []):
+            if not f.get('uri') or f.get('mediaPurpose') != 'play':
+                continue
+            bitrate = int_or_none(f.get('bitRate'))
+            if bitrate:
+                bitrates.append(bitrate)
              formats.append({
-                'url': info['videoUri']
+                'url': f['uri'],
+                'format_id': 'http-%d' % bitrate if bitrate else 'http',
+                'width': int_or_none(f.get('width')),
+                'height': int_or_none(f.get('height')),
+                'tbr': bitrate,
+                'format': 'mp4',
              })
  
-        self._sort_formats(formats)
+        if not bitrates:
+            # When subscriptionLevel > 0, i.e. plus subscription is required
+            # media list will be empty. However, hds and hls uris are still
+            # available. We can grab them assuming bitrates to be default.
+            bitrates = self._DEFAULT_BITRATES
+
+        auth_token = embed_vars.get('AuthToken')
  
-        duration = int_or_none(info.get('videoLengthInSeconds'))
-        age_limit = parse_age_limit(info.get('audienceRating'))
+        def construct_manifest_url(base_url, ext):
+            pieces = [base_url]
+            pieces.extend([compat_str(b) for b in bitrates])
+            pieces.append('_kbps.mp4.%s?%s' % (ext, auth_token))
+            return ','.join(pieces)
+
+        if bitrates and auth_token:
+            hds_url = embed_vars.get('hdsUri')
+            if hds_url:
+                formats.extend(self._extract_f4m_formats(
+                    construct_manifest_url(hds_url, 'f4m'),
+                    display_id, f4m_id='hds', fatal=False))
+            hls_url = embed_vars.get('hlsUri')
+            if hls_url:
+                formats.extend(self._extract_m3u8_formats(
+                    construct_manifest_url(hls_url, 'm3u8'),
+                    display_id, 'mp4', entry_protocol='m3u8_native', m3u8_id='hls', fatal=False))
+        self._sort_formats(formats)
  
          return {
              'id': video_id,
-            'title': info['contentName'],
-            'thumbnail': info['thumbUri'],
-            'duration': duration,
-            'age_limit': age_limit,
+            'display_id': display_id,
+            'title': title,
+            'thumbnail': embed_vars.get('thumbUri'),
+            'duration': int_or_none(embed_vars.get('videoLengthInSeconds')) or None,
+            'age_limit': parse_age_limit(embed_vars.get('audienceRating')),
+            'tags': embed_vars.get('tags', '').split(','),
              'formats': formats,
          }
diff --git a/youtube_dl/extractor/brightcove.py b/youtube_dl/extractor/brightcove.py

index 945cf19e8bce0f1f9576d26abc455c9795a250d3..5c6e99da134efe150962dd979cbf36e391527d3a 100644 (file)
--- a/youtube_dl/extractor/brightcove.py
+++ b/youtube_dl/extractor/brightcove.py
@@ -179,7 +179,7 @@ def find_param(name):
  
          params = {}
  
-        playerID = find_param('playerID')
+        playerID = find_param('playerID') or find_param('playerId')
          if playerID is None:
              raise ExtractorError('Cannot find player ID')
          params['playerID'] = playerID
@@ -204,7 +204,7 @@ def _build_brighcove_url_from_js(cls, object_js):
          #   // build Brightcove <object /> XML
          # }
          m = re.search(
-            r'''(?x)customBC.\createVideo\(
+            r'''(?x)customBC\.createVideo\(
                  .*?                                                  # skipping width and height
                  ["\'](?P<playerID>\d+)["\']\s*,\s*                   # playerID
                  ["\'](?P<playerKey>AQ[^"\']{48})[^"\']*["\']\s*,\s*  # playerKey begins with AQ and is 50 characters
@@ -232,13 +232,16 @@ def _extract_brightcove_urls(cls, webpage):
          """Return a list of all Brightcove URLs from the webpage """
  
          url_m = re.search(
-            r'<meta\s+property=[\'"]og:video[\'"]\s+content=[\'"](https?://(?:secure|c)\.brightcove.com/[^\'"]+)[\'"]',
-            webpage)
+            r'''(?x)
+                <meta\s+
+                    (?:property|itemprop)=([\'"])(?:og:video|embedURL)\1[^>]+
+                    content=([\'"])(?P<url>https?://(?:secure|c)\.brightcove.com/(?:(?!\2).)+)\2
+            ''', webpage)
          if url_m:
-            url = unescapeHTML(url_m.group(1))
+            url = unescapeHTML(url_m.group('url'))
              # Some sites don't add it, we can't download with this url, for example:
              # http://www.ktvu.com/videos/news/raw-video-caltrain-releases-video-of-man-almost/vCTZdY/
-            if 'playerKey' in url or 'videoId' in url:
+            if 'playerKey' in url or 'videoId' in url or 'idVideo' in url:
                  return [url]
  
          matches = re.findall(
@@ -259,7 +262,7 @@ def _real_extract(self, url):
          url, smuggled_data = unsmuggle_url(url, {})
  
          # Change the 'videoId' and others field to '@videoPlayer'
-        url = re.sub(r'(?<=[?&])(videoI(d|D)|bctid)', '%40videoPlayer', url)
+        url = re.sub(r'(?<=[?&])(videoI(d|D)|idVideo|bctid)', '%40videoPlayer', url)
          # Change bckey (used by bcove.me urls) to playerKey
          url = re.sub(r'(?<=[?&])bckey', 'playerKey', url)
          mobj = re.match(self._VALID_URL, url)
@@ -548,7 +551,7 @@ def _real_extract(self, url):
              container = source.get('container')
              ext = mimetype2ext(source.get('type'))
              src = source.get('src')
-            if ext == 'ism':
+            if ext == 'ism' or container == 'WVM':
                  continue
              elif ext == 'm3u8' or container == 'M2TS':
                  if not src:
diff --git a/youtube_dl/extractor/byutv.py b/youtube_dl/extractor/byutv.py

index 4be175d7039dd845f7c961af552bc1153b73598e..8ef089653d80e75bcf9eb89b81a9c82f08e7971d 100644 (file)
--- a/youtube_dl/extractor/byutv.py
+++ b/youtube_dl/extractor/byutv.py
@@ -16,7 +16,7 @@ class BYUtvIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Season 5 Episode 5',
              'description': 'md5:e07269172baff037f8e8bf9956bc9747',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 1486.486,
          },
          'params': {
diff --git a/youtube_dl/extractor/camdemy.py b/youtube_dl/extractor/camdemy.py

index d4e6fbdce029b8267450b9d50d3b41556a47664d..8f0c6c545c35312813bb461b3218f868a660fdd2 100644 (file)
--- a/youtube_dl/extractor/camdemy.py
+++ b/youtube_dl/extractor/camdemy.py
@@ -26,7 +26,7 @@ class CamdemyIE(InfoExtractor):
              'id': '5181',
              'ext': 'mp4',
              'title': 'Ch1-1 Introduction, Signals (02-23-2012)',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'creator': 'ss11spring',
              'duration': 1591,
              'upload_date': '20130114',
@@ -41,7 +41,7 @@ class CamdemyIE(InfoExtractor):
              'id': '13885',
              'ext': 'mp4',
              'title': 'EverCam + Camdemy QuickStart',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'description': 'md5:2a9f989c2b153a2342acee579c6e7db6',
              'creator': 'evercam',
              'duration': 318,
diff --git a/youtube_dl/extractor/canalplus.py b/youtube_dl/extractor/canalplus.py

index 1c3c41d26619ec2fa347c4a75093b2a1cf7003a2..b3f76a7b1de2414db91206f091011daabafa24a1 100644 (file)
--- a/youtube_dl/extractor/canalplus.py
+++ b/youtube_dl/extractor/canalplus.py
@@ -105,8 +105,9 @@ def _real_extract(self, url):
          webpage = self._download_webpage(url, display_id)
          video_id = self._search_regex(
              [r'<canal:player[^>]+?videoId=(["\'])(?P<id>\d+)',
-             r'id=["\']canal_video_player(?P<id>\d+)'],
-            webpage, 'video id', group='id')
+             r'id=["\']canal_video_player(?P<id>\d+)',
+             r'data-video=["\'](?P<id>\d+)'],
+            webpage, 'video id', default=mobj.group('vid'), group='id')
  
          info_url = self._VIDEO_INFO_TEMPLATE % (site_id, video_id)
          video_data = self._download_json(info_url, video_id, 'Downloading video JSON')
diff --git a/youtube_dl/extractor/canvas.py b/youtube_dl/extractor/canvas.py

index d183d5d527fb8ab4163b16fcaffd0aeedbf0dd0c..544c6657c12e53afee6d0e1a2916a9a47d606415 100644 (file)
--- a/youtube_dl/extractor/canvas.py
+++ b/youtube_dl/extractor/canvas.py
@@ -17,7 +17,7 @@ class CanvasIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'De afspraak veilt voor de Warmste Week',
              'description': 'md5:24cb860c320dc2be7358e0e5aa317ba6',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 49.02,
          }
      }, {
@@ -29,7 +29,7 @@ class CanvasIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Pieter 0167',
              'description': 'md5:943cd30f48a5d29ba02c3a104dc4ec4e',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 2553.08,
              'subtitles': {
                  'nl': [{
@@ -48,7 +48,7 @@ class CanvasIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Herbekijk Sorry voor alles',
              'description': 'md5:8bb2805df8164e5eb95d6a7a29dc0dd3',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 3788.06,
          },
          'params': {
@@ -89,6 +89,9 @@ def _real_extract(self, url):
              elif format_type == 'HDS':
                  formats.extend(self._extract_f4m_formats(
                      format_url, display_id, f4m_id=format_type, fatal=False))
+            elif format_type == 'MPEG_DASH':
+                formats.extend(self._extract_mpd_formats(
+                    format_url, display_id, mpd_id=format_type, fatal=False))
              else:
                  formats.append({
                      'format_id': format_type,
diff --git a/youtube_dl/extractor/carambatv.py b/youtube_dl/extractor/carambatv.py

index 66c0f900a402664653a846e9b39fc44c1da2853e..9ba909a918755b5a02f8d8fe684a6f73a438ada9 100644 (file)
--- a/youtube_dl/extractor/carambatv.py
+++ b/youtube_dl/extractor/carambatv.py
@@ -21,7 +21,7 @@ class CarambaTVIE(InfoExtractor):
              'id': '191910501',
              'ext': 'mp4',
              'title': '[BadComedian] - Разборка в Маниле (Абсолютный обзор)',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'duration': 2678.31,
          },
      }, {
@@ -69,7 +69,7 @@ class CarambaTVPageIE(InfoExtractor):
              'id': '475222',
              'ext': 'flv',
              'title': '[BadComedian] - Разборка в Маниле (Абсолютный обзор)',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              # duration reported by videomore is incorrect
              'duration': int,
          },
diff --git a/youtube_dl/extractor/cbc.py b/youtube_dl/extractor/cbc.py

index d71fddf58a068461cd2d377b31e4c3981d6c2b3d..cf678e7f843225f00a69546c59ba27a2b9c93c3d 100644 (file)
--- a/youtube_dl/extractor/cbc.py
+++ b/youtube_dl/extractor/cbc.py
@@ -90,36 +90,49 @@ class CBCIE(InfoExtractor):
              },
          }],
          'skip': 'Geo-restricted to Canada',
+    }, {
+        # multiple CBC.APP.Caffeine.initInstance(...)
+        'url': 'http://www.cbc.ca/news/canada/calgary/dog-indoor-exercise-winter-1.3928238',
+        'info_dict': {
+            'title': 'Keep Rover active during the deep freeze with doggie pushups and other fun indoor tasks',
+            'id': 'dog-indoor-exercise-winter-1.3928238',
+        },
+        'playlist_mincount': 6,
      }]
  
      @classmethod
      def suitable(cls, url):
          return False if CBCPlayerIE.suitable(url) else super(CBCIE, cls).suitable(url)
  
+    def _extract_player_init(self, player_init, display_id):
+        player_info = self._parse_json(player_init, display_id, js_to_json)
+        media_id = player_info.get('mediaId')
+        if not media_id:
+            clip_id = player_info['clipId']
+            feed = self._download_json(
+                'http://tpfeed.cbc.ca/f/ExhSPC/vms_5akSXx4Ng_Zn?byCustomValue={:mpsReleases}{%s}' % clip_id,
+                clip_id, fatal=False)
+            if feed:
+                media_id = try_get(feed, lambda x: x['entries'][0]['guid'], compat_str)
+            if not media_id:
+                media_id = self._download_json(
+                    'http://feed.theplatform.com/f/h9dtGB/punlNGjMlc1F?fields=id&byContent=byReleases%3DbyId%253D' + clip_id,
+                    clip_id)['entries'][0]['id'].split('/')[-1]
+        return self.url_result('cbcplayer:%s' % media_id, 'CBCPlayer', media_id)
+
      def _real_extract(self, url):
          display_id = self._match_id(url)
          webpage = self._download_webpage(url, display_id)
-        player_init = self._search_regex(
-            r'CBC\.APP\.Caffeine\.initInstance\(({.+?})\);', webpage, 'player init',
-            default=None)
-        if player_init:
-            player_info = self._parse_json(player_init, display_id, js_to_json)
-            media_id = player_info.get('mediaId')
-            if not media_id:
-                clip_id = player_info['clipId']
-                feed = self._download_json(
-                    'http://tpfeed.cbc.ca/f/ExhSPC/vms_5akSXx4Ng_Zn?byCustomValue={:mpsReleases}{%s}' % clip_id,
-                    clip_id, fatal=False)
-                if feed:
-                    media_id = try_get(feed, lambda x: x['entries'][0]['guid'], compat_str)
-                if not media_id:
-                    media_id = self._download_json(
-                        'http://feed.theplatform.com/f/h9dtGB/punlNGjMlc1F?fields=id&byContent=byReleases%3DbyId%253D' + clip_id,
-                        clip_id)['entries'][0]['id'].split('/')[-1]
-            return self.url_result('cbcplayer:%s' % media_id, 'CBCPlayer', media_id)
-        else:
-            entries = [self.url_result('cbcplayer:%s' % media_id, 'CBCPlayer', media_id) for media_id in re.findall(r'<iframe[^>]+src="[^"]+?mediaId=(\d+)"', webpage)]
-            return self.playlist_result(entries)
+        entries = [
+            self._extract_player_init(player_init, display_id)
+            for player_init in re.findall(r'CBC\.APP\.Caffeine\.initInstance\(({.+?})\);', webpage)]
+        entries.extend([
+            self.url_result('cbcplayer:%s' % media_id, 'CBCPlayer', media_id)
+            for media_id in re.findall(r'<iframe[^>]+src="[^"]+?mediaId=(\d+)"', webpage)])
+        return self.playlist_result(
+            entries, display_id,
+            self._og_search_title(webpage, fatal=False),
+            self._og_search_description(webpage))
  
  
  class CBCPlayerIE(InfoExtractor):
@@ -283,11 +296,12 @@ def _real_extract(self, url):
          formats = self._extract_m3u8_formats(re.sub(r'/([^/]+)/[^/?]+\.m3u8', r'/\1/\1.m3u8', m3u8_url), video_id, 'mp4', fatal=False)
          if len(formats) < 2:
              formats = self._extract_m3u8_formats(m3u8_url, video_id, 'mp4')
-        # Despite metadata in m3u8 all video+audio formats are
-        # actually video-only (no audio)
          for f in formats:
-            if f.get('acodec') != 'none' and f.get('vcodec') != 'none':
-                f['acodec'] = 'none'
+            format_id = f.get('format_id')
+            if format_id.startswith('AAC'):
+                f['acodec'] = 'aac'
+            elif format_id.startswith('AC3'):
+                f['acodec'] = 'ac-3'
          self._sort_formats(formats)
  
          info = {
diff --git a/youtube_dl/extractor/cbslocal.py b/youtube_dl/extractor/cbslocal.py

index 289709c97b61b2fd5ab29b82e426d17bb5b4d701..8d5f11dd11de8bb85a9f6a2ddc86710a65c56a94 100644 (file)
--- a/youtube_dl/extractor/cbslocal.py
+++ b/youtube_dl/extractor/cbslocal.py
@@ -4,11 +4,14 @@
  from .anvato import AnvatoIE
  from .sendtonews import SendtoNewsIE
  from ..compat import compat_urlparse
-from ..utils import unified_timestamp
+from ..utils import (
+    parse_iso8601,
+    unified_timestamp,
+)
  
  
  class CBSLocalIE(AnvatoIE):
-    _VALID_URL = r'https?://[a-z]+\.cbslocal\.com/\d+/\d+/\d+/(?P<id>[0-9a-z-]+)'
+    _VALID_URL = r'https?://[a-z]+\.cbslocal\.com/(?:\d+/\d+/\d+|video)/(?P<id>[0-9a-z-]+)'
  
      _TESTS = [{
          # Anvato backend
@@ -49,6 +52,31 @@ class CBSLocalIE(AnvatoIE):
              # m3u8 download
              'skip_download': True,
          },
+    }, {
+        'url': 'http://newyork.cbslocal.com/video/3580809-a-very-blue-anniversary/',
+        'info_dict': {
+            'id': '3580809',
+            'ext': 'mp4',
+            'title': 'A Very Blue Anniversary',
+            'description': 'CBS2’s Cindy Hsu has more.',
+            'thumbnail': 're:^https?://.*',
+            'timestamp': 1479962220,
+            'upload_date': '20161124',
+            'uploader': 'CBS',
+            'subtitles': {
+                'en': 'mincount:5',
+            },
+            'categories': [
+                'Stations\\Spoken Word\\WCBSTV',
+                'Syndication\\AOL',
+                'Syndication\\MSN',
+                'Syndication\\NDN',
+                'Syndication\\Yahoo',
+                'Content\\News',
+                'Content\\News\\Local News',
+            ],
+            'tags': ['CBS 2 News Weekends', 'Cindy Hsu', 'Blue Man Group'],
+        },
      }]
  
      def _real_extract(self, url):
@@ -64,8 +92,11 @@ def _real_extract(self, url):
          info_dict = self._extract_anvato_videos(webpage, display_id)
  
          time_str = self._html_search_regex(
-            r'class="entry-date">([^<]+)<', webpage, 'released date', fatal=False)
-        timestamp = unified_timestamp(time_str)
+            r'class="entry-date">([^<]+)<', webpage, 'released date', default=None)
+        if time_str:
+            timestamp = unified_timestamp(time_str)
+        else:
+            timestamp = parse_iso8601(self._html_search_meta('uploadDate', webpage))
  
          info_dict.update({
              'display_id': display_id,
diff --git a/youtube_dl/extractor/cbsnews.py b/youtube_dl/extractor/cbsnews.py

index 91b0f5fa94c7ba919e01fd097cbdfc71fe6992b4..17bb9af4fe2a8a0066611a2bbc2d090ad7cf5e30 100644 (file)
--- a/youtube_dl/extractor/cbsnews.py
+++ b/youtube_dl/extractor/cbsnews.py
@@ -39,7 +39,7 @@ class CBSNewsIE(CBSIE):
                  'upload_date': '20140404',
                  'timestamp': 1396650660,
                  'uploader': 'CBSI-NEW',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
                  'duration': 205,
                  'subtitles': {
                      'en': [{
diff --git a/youtube_dl/extractor/ccc.py b/youtube_dl/extractor/ccc.py

index 8f7f09e22dad6eda3ca08edfbf9edc118146e893..73470214412b542adad72f1227e66fd341742e82 100644 (file)
--- a/youtube_dl/extractor/ccc.py
+++ b/youtube_dl/extractor/ccc.py
@@ -19,7 +19,7 @@ class CCCIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Introduction to Processor Design',
              'description': 'md5:df55f6d073d4ceae55aae6f2fd98a0ac',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'upload_date': '20131228',
              'timestamp': 1388188800,
              'duration': 3710,
@@ -32,7 +32,7 @@ class CCCIE(InfoExtractor):
      def _real_extract(self, url):
          display_id = self._match_id(url)
          webpage = self._download_webpage(url, display_id)
-        event_id = self._search_regex("data-id='(\d+)'", webpage, 'event id')
+        event_id = self._search_regex(r"data-id='(\d+)'", webpage, 'event id')
          event_data = self._download_json('https://media.ccc.de/public/events/%s' % event_id, event_id)
  
          formats = []
diff --git a/youtube_dl/extractor/ccma.py b/youtube_dl/extractor/ccma.py

new file mode 100644 (file)

index 0000000..39938c9
--- /dev/null
+++ b/youtube_dl/extractor/ccma.py
@@ -0,0 +1,99 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..utils import (
+    int_or_none,
+    parse_duration,
+    parse_iso8601,
+    clean_html,
+)
+
+
+class CCMAIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?ccma\.cat/(?:[^/]+/)*?(?P<type>video|audio)/(?P<id>\d+)'
+    _TESTS = [{
+        'url': 'http://www.ccma.cat/tv3/alacarta/lespot-de-la-marato-de-tv3/lespot-de-la-marato-de-tv3/video/5630208/',
+        'md5': '7296ca43977c8ea4469e719c609b0871',
+        'info_dict': {
+            'id': '5630208',
+            'ext': 'mp4',
+            'title': 'L\'espot de La Marató de TV3',
+            'description': 'md5:f12987f320e2f6e988e9908e4fe97765',
+            'timestamp': 1470918540,
+            'upload_date': '20160811',
+        }
+    }, {
+        'url': 'http://www.ccma.cat/catradio/alacarta/programa/el-consell-de-savis-analitza-el-derbi/audio/943685/',
+        'md5': 'fa3e38f269329a278271276330261425',
+        'info_dict': {
+            'id': '943685',
+            'ext': 'mp3',
+            'title': 'El Consell de Savis analitza el derbi',
+            'description': 'md5:e2a3648145f3241cb9c6b4b624033e53',
+            'upload_date': '20171205',
+            'timestamp': 1512507300,
+        }
+    }]
+
+    def _real_extract(self, url):
+        media_type, media_id = re.match(self._VALID_URL, url).groups()
+        media_data = {}
+        formats = []
+        profiles = ['pc'] if media_type == 'audio' else ['mobil', 'pc']
+        for i, profile in enumerate(profiles):
+            md = self._download_json('http://dinamics.ccma.cat/pvideo/media.jsp', media_id, query={
+                'media': media_type,
+                'idint': media_id,
+                'profile': profile,
+            }, fatal=False)
+            if md:
+                media_data = md
+                media_url = media_data.get('media', {}).get('url')
+                if media_url:
+                    formats.append({
+                        'format_id': profile,
+                        'url': media_url,
+                        'quality': i,
+                    })
+        self._sort_formats(formats)
+
+        informacio = media_data['informacio']
+        title = informacio['titol']
+        durada = informacio.get('durada', {})
+        duration = int_or_none(durada.get('milisegons'), 1000) or parse_duration(durada.get('text'))
+        timestamp = parse_iso8601(informacio.get('data_emissio', {}).get('utc'))
+
+        subtitles = {}
+        subtitols = media_data.get('subtitols', {})
+        if subtitols:
+            sub_url = subtitols.get('url')
+            if sub_url:
+                subtitles.setdefault(
+                    subtitols.get('iso') or subtitols.get('text') or 'ca', []).append({
+                        'url': sub_url,
+                    })
+
+        thumbnails = []
+        imatges = media_data.get('imatges', {})
+        if imatges:
+            thumbnail_url = imatges.get('url')
+            if thumbnail_url:
+                thumbnails = [{
+                    'url': thumbnail_url,
+                    'width': int_or_none(imatges.get('amplada')),
+                    'height': int_or_none(imatges.get('alcada')),
+                }]
+
+        return {
+            'id': media_id,
+            'title': title,
+            'description': clean_html(informacio.get('descripcio')),
+            'duration': duration,
+            'timestamp': timestamp,
+            'thumnails': thumbnails,
+            'subtitles': subtitles,
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/cctv.py b/youtube_dl/extractor/cctv.py

index 72a72cb73502ad242b150ea897512ba5426207c2..c76f361c684e96e992f5fea9f9ffbcd5a114a6ad 100644 (file)
--- a/youtube_dl/extractor/cctv.py
+++ b/youtube_dl/extractor/cctv.py
@@ -4,50 +4,188 @@
  import re
  
  from .common import InfoExtractor
-from ..utils import float_or_none
+from ..compat import compat_str
+from ..utils import (
+    float_or_none,
+    try_get,
+    unified_timestamp,
+)
  
  
  class CCTVIE(InfoExtractor):
-    _VALID_URL = r'''(?x)https?://(?:.+?\.)?
-        (?:
-            cctv\.(?:com|cn)|
-            cntv\.cn
-        )/
-        (?:
-            video/[^/]+/(?P<id>[0-9a-f]{32})|
-            \d{4}/\d{2}/\d{2}/(?P<display_id>VID[0-9A-Za-z]+)
-        )'''
+    IE_DESC = '央视网'
+    _VALID_URL = r'https?://(?:(?:[^/]+)\.(?:cntv|cctv)\.(?:com|cn)|(?:www\.)?ncpa-classic\.com)/(?:[^/]+/)*?(?P<id>[^/?#&]+?)(?:/index)?(?:\.s?html|[?#&]|$)'
      _TESTS = [{
-        'url': 'http://english.cntv.cn/2016/09/03/VIDEhnkB5y9AgHyIEVphCEz1160903.shtml',
-        'md5': '819c7b49fc3927d529fb4cd555621823',
+        # fo.addVariable("videoCenterId","id")
+        'url': 'http://sports.cntv.cn/2016/02/12/ARTIaBRxv4rTT1yWf1frW2wi160212.shtml',
+        'md5': 'd61ec00a493e09da810bf406a078f691',
          'info_dict': {
-            'id': '454368eb19ad44a1925bf1eb96140a61',
+            'id': '5ecdbeab623f4973b40ff25f18b174e8',
              'ext': 'mp4',
-            'title': 'Portrait of Real Current Life 09/03/2016 Modern Inventors Part 1',
-        }
+            'title': '[NBA]二少联手砍下46分 雷霆主场击败鹈鹕（快讯）',
+            'description': 'md5:7e14a5328dc5eb3d1cd6afbbe0574e95',
+            'duration': 98,
+            'uploader': 'songjunjie',
+            'timestamp': 1455279956,
+            'upload_date': '20160212',
+        },
+    }, {
+        # var guid = "id"
+        'url': 'http://tv.cctv.com/2016/02/05/VIDEUS7apq3lKrHG9Dncm03B160205.shtml',
+        'info_dict': {
+            'id': 'efc5d49e5b3b4ab2b34f3a502b73d3ae',
+            'ext': 'mp4',
+            'title': '[赛车]“车王”舒马赫恢复情况成谜（快讯）',
+            'description': '2月4日，蒙特泽莫罗透露了关于“车王”舒马赫恢复情况，但情况是否属实遭到了质疑。',
+            'duration': 37,
+            'uploader': 'shujun',
+            'timestamp': 1454677291,
+            'upload_date': '20160205',
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }, {
+        # changePlayer('id')
+        'url': 'http://english.cntv.cn/special/four_comprehensives/index.shtml',
+        'info_dict': {
+            'id': '4bb9bb4db7a6471ba85fdeda5af0381e',
+            'ext': 'mp4',
+            'title': 'NHnews008 ANNUAL POLITICAL SEASON',
+            'description': 'Four Comprehensives',
+            'duration': 60,
+            'uploader': 'zhangyunlei',
+            'timestamp': 1425385521,
+            'upload_date': '20150303',
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }, {
+        # loadvideo('id')
+        'url': 'http://cctv.cntv.cn/lm/tvseries_russian/yilugesanghua/index.shtml',
+        'info_dict': {
+            'id': 'b15f009ff45c43968b9af583fc2e04b2',
+            'ext': 'mp4',
+            'title': 'Путь，усыпанный космеями Серия 1',
+            'description': 'Путь, усыпанный космеями',
+            'duration': 2645,
+            'uploader': 'renxue',
+            'timestamp': 1477479241,
+            'upload_date': '20161026',
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }, {
+        # var initMyAray = 'id'
+        'url': 'http://www.ncpa-classic.com/2013/05/22/VIDE1369219508996867.shtml',
+        'info_dict': {
+            'id': 'a194cfa7f18c426b823d876668325946',
+            'ext': 'mp4',
+            'title': '小泽征尔音乐塾 音乐梦想无国界',
+            'duration': 2173,
+            'timestamp': 1369248264,
+            'upload_date': '20130522',
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }, {
+        # var ids = ["id"]
+        'url': 'http://www.ncpa-classic.com/clt/more/416/index.shtml',
+        'info_dict': {
+            'id': 'a8606119a4884588a79d81c02abecc16',
+            'ext': 'mp3',
+            'title': '来自维也纳的新年贺礼',
+            'description': 'md5:f13764ae8dd484e84dd4b39d5bcba2a7',
+            'duration': 1578,
+            'uploader': 'djy',
+            'timestamp': 1482942419,
+            'upload_date': '20161228',
+        },
+        'params': {
+            'skip_download': True,
+        },
+        'expected_warnings': ['Failed to download m3u8 information'],
+    }, {
+        'url': 'http://ent.cntv.cn/2016/01/18/ARTIjprSSJH8DryTVr5Bx8Wb160118.shtml',
+        'only_matching': True,
+    }, {
+        'url': 'http://tv.cntv.cn/video/C39296/e0210d949f113ddfb38d31f00a4e5c44',
+        'only_matching': True,
+    }, {
+        'url': 'http://english.cntv.cn/2016/09/03/VIDEhnkB5y9AgHyIEVphCEz1160903.shtml',
+        'only_matching': True,
      }, {
          'url': 'http://tv.cctv.com/2016/09/07/VIDE5C1FnlX5bUywlrjhxXOV160907.shtml',
          'only_matching': True,
      }, {
          'url': 'http://tv.cntv.cn/video/C39296/95cfac44cabd3ddc4a9438780a4e5c44',
-        'only_matching': True
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
-        video_id, display_id = re.match(self._VALID_URL, url).groups()
-        if not video_id:
-            webpage = self._download_webpage(url, display_id)
-            video_id = self._search_regex(
-                r'(?:fo\.addVariable\("videoCenterId",\s*|guid\s*=\s*)"([0-9a-f]{32})',
-                webpage, 'video_id')
-        api_data = self._download_json(
-            'http://vdn.apps.cntv.cn/api/getHttpVideoInfo.do?pid=' + video_id, video_id)
-        m3u8_url = re.sub(r'maxbr=\d+&?', '', api_data['hls_url'])
+        video_id = self._match_id(url)
+        webpage = self._download_webpage(url, video_id)
+
+        video_id = self._search_regex(
+            [r'var\s+guid\s*=\s*["\']([\da-fA-F]+)',
+             r'videoCenterId["\']\s*,\s*["\']([\da-fA-F]+)',
+             r'changePlayer\s*\(\s*["\']([\da-fA-F]+)',
+             r'load[Vv]ideo\s*\(\s*["\']([\da-fA-F]+)',
+             r'var\s+initMyAray\s*=\s*["\']([\da-fA-F]+)',
+             r'var\s+ids\s*=\s*\[["\']([\da-fA-F]+)'],
+            webpage, 'video id')
+
+        data = self._download_json(
+            'http://vdn.apps.cntv.cn/api/getHttpVideoInfo.do', video_id,
+            query={
+                'pid': video_id,
+                'url': url,
+                'idl': 32,
+                'idlr': 32,
+                'modifyed': 'false',
+            })
+
+        title = data['title']
+
+        formats = []
+
+        video = data.get('video')
+        if isinstance(video, dict):
+            for quality, chapters_key in enumerate(('lowChapters', 'chapters')):
+                video_url = try_get(
+                    video, lambda x: x[chapters_key][0]['url'], compat_str)
+                if video_url:
+                    formats.append({
+                        'url': video_url,
+                        'format_id': 'http',
+                        'quality': quality,
+                        'preference': -1,
+                    })
+
+        hls_url = try_get(data, lambda x: x['hls_url'], compat_str)
+        if hls_url:
+            hls_url = re.sub(r'maxbr=\d+&?', '', hls_url)
+            formats.extend(self._extract_m3u8_formats(
+                hls_url, video_id, 'mp4', entry_protocol='m3u8_native',
+                m3u8_id='hls', fatal=False))
+
+        self._sort_formats(formats)
+
+        uploader = data.get('editer_name')
+        description = self._html_search_meta(
+            'description', webpage, default=None)
+        timestamp = unified_timestamp(data.get('f_pgmtime'))
+        duration = float_or_none(try_get(video, lambda x: x['totalLength']))
  
          return {
              'id': video_id,
-            'title': api_data['title'],
-            'formats': self._extract_m3u8_formats(
-                m3u8_url, video_id, 'mp4', 'm3u8_native', fatal=False),
-            'duration': float_or_none(api_data.get('video', {}).get('totalLength')),
+            'title': title,
+            'description': description,
+            'uploader': uploader,
+            'timestamp': timestamp,
+            'duration': duration,
+            'formats': formats,
          }
diff --git a/youtube_dl/extractor/cda.py b/youtube_dl/extractor/cda.py

index e00bdaf66a6d9eb6ac051cc169cabbf02844770b..ae7af2f0e3c432dd5c9f75401d787f0b2b27d083 100755 (executable)
--- a/youtube_dl/extractor/cda.py
+++ b/youtube_dl/extractor/cda.py
@@ -24,7 +24,7 @@ class CDAIE(InfoExtractor):
              'height': 720,
              'title': 'Oto dlaczego przed zakrętem należy zwolnić.',
              'description': 'md5:269ccd135d550da90d1662651fcb9772',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'average_rating': float,
              'duration': 39
          }
@@ -36,7 +36,7 @@ class CDAIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Lądowanie na lotnisku na Maderze',
              'description': 'md5:60d76b71186dcce4e0ba6d4bbdb13e1a',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'uploader': 'crash404',
              'view_count': int,
              'average_rating': float,
diff --git a/youtube_dl/extractor/ceskatelevize.py b/youtube_dl/extractor/ceskatelevize.py

index 4ec79d19dd9db6402752ee65d462631985009cbf..4f88c31ad2af53fe07df449e384137689f65c17d 100644 (file)
--- a/youtube_dl/extractor/ceskatelevize.py
+++ b/youtube_dl/extractor/ceskatelevize.py
@@ -25,7 +25,7 @@ class CeskaTelevizeIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Hyde Park Civilizace',
              'description': 'md5:fe93f6eda372d150759d11644ebbfb4a',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'duration': 3350,
          },
          'params': {
@@ -39,7 +39,7 @@ class CeskaTelevizeIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Hyde Park Civilizace: Bonus 01 - En',
              'description': 'English Subtittles',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'duration': 81.3,
          },
          'params': {
@@ -52,7 +52,7 @@ class CeskaTelevizeIE(InfoExtractor):
          'info_dict': {
              'id': 402,
              'ext': 'mp4',
-            'title': 're:^ČT Sport \d{4}-\d{2}-\d{2} \d{2}:\d{2}$',
+            'title': r're:^ČT Sport \d{4}-\d{2}-\d{2} \d{2}:\d{2}$',
              'is_live': True,
          },
          'params': {
@@ -80,7 +80,7 @@ class CeskaTelevizeIE(InfoExtractor):
                  'id': '61924494877068022',
                  'ext': 'mp4',
                  'title': 'Queer: Bogotart (Queer)',
-                'thumbnail': 're:^https?://.*\.jpg',
+                'thumbnail': r're:^https?://.*\.jpg',
                  'duration': 1558.3,
              },
          }],
diff --git a/youtube_dl/extractor/channel9.py b/youtube_dl/extractor/channel9.py

index 34d4e61569b110b49998768f13bb81cdda75bd75..865dbcaba5016eb957c41bac43966e2a75044f0b 100644 (file)
--- a/youtube_dl/extractor/channel9.py
+++ b/youtube_dl/extractor/channel9.py
@@ -31,7 +31,7 @@ class Channel9IE(InfoExtractor):
              'title': 'Developer Kick-Off Session: Stuff We Love',
              'description': 'md5:c08d72240b7c87fcecafe2692f80e35f',
              'duration': 4576,
-            'thumbnail': 're:http://.*\.jpg',
+            'thumbnail': r're:http://.*\.jpg',
              'session_code': 'KOS002',
              'session_day': 'Day 1',
              'session_room': 'Arena 1A',
@@ -47,7 +47,7 @@ class Channel9IE(InfoExtractor):
              'title': 'Self-service BI with Power BI - nuclear testing',
              'description': 'md5:d1e6ecaafa7fb52a2cacdf9599829f5b',
              'duration': 1540,
-            'thumbnail': 're:http://.*\.jpg',
+            'thumbnail': r're:http://.*\.jpg',
              'authors': ['Mike Wilmot'],
          },
      }, {
@@ -59,7 +59,7 @@ class Channel9IE(InfoExtractor):
              'title': 'Ranges for the Standard Library',
              'description': 'md5:2e6b4917677af3728c5f6d63784c4c5d',
              'duration': 5646,
-            'thumbnail': 're:http://.*\.jpg',
+            'thumbnail': r're:http://.*\.jpg',
          },
          'params': {
              'skip_download': True,
diff --git a/youtube_dl/extractor/charlierose.py b/youtube_dl/extractor/charlierose.py

index 4bf2cf7b0ce7648ec2786952220b452c20201a52..2d517f23194b023a7fcd0fd917a63ac08a6fa80c 100644 (file)
--- a/youtube_dl/extractor/charlierose.py
+++ b/youtube_dl/extractor/charlierose.py
@@ -13,7 +13,7 @@ class CharlieRoseIE(InfoExtractor):
              'id': '27996',
              'ext': 'mp4',
              'title': 'Remembering Zaha Hadid',
-            'thumbnail': 're:^https?://.*\.jpg\?\d+',
+            'thumbnail': r're:^https?://.*\.jpg\?\d+',
              'description': 'We revisit past conversations with Zaha Hadid, in memory of the world renowned Iraqi architect.',
              'subtitles': {
                  'en': [{
diff --git a/youtube_dl/extractor/chaturbate.py b/youtube_dl/extractor/chaturbate.py

index 29a8820d5835b1b3cf7aca3840705a2fb2f2e1e3..8fbc91c1fbae17f8c46adfe1cb947ff64f5d11b4 100644 (file)
--- a/youtube_dl/extractor/chaturbate.py
+++ b/youtube_dl/extractor/chaturbate.py
@@ -1,5 +1,7 @@
  from __future__ import unicode_literals
  
+import re
+
  from .common import InfoExtractor
  from ..utils import ExtractorError
  
@@ -31,30 +33,35 @@ def _real_extract(self, url):
  
          webpage = self._download_webpage(url, video_id)
  
-        m3u8_url = self._search_regex(
-            r'src=(["\'])(?P<url>http.+?\.m3u8.*?)\1', webpage,
-            'playlist', default=None, group='url')
+        m3u8_formats = [(m.group('id').lower(), m.group('url')) for m in re.finditer(
+            r'hlsSource(?P<id>.+?)\s*=\s*(?P<q>["\'])(?P<url>http.+?)(?P=q)', webpage)]
  
-        if not m3u8_url:
+        if not m3u8_formats:
              error = self._search_regex(
                  [r'<span[^>]+class=(["\'])desc_span\1[^>]*>(?P<error>[^<]+)</span>',
                   r'<div[^>]+id=(["\'])defchat\1[^>]*>\s*<p><strong>(?P<error>[^<]+)<'],
                  webpage, 'error', group='error', default=None)
              if not error:
-                if any(p not in webpage for p in (
+                if any(p in webpage for p in (
                          self._ROOM_OFFLINE, 'offline_tipping', 'tip_offline')):
                      error = self._ROOM_OFFLINE
              if error:
                  raise ExtractorError(error, expected=True)
              raise ExtractorError('Unable to find stream URL')
  
-        formats = self._extract_m3u8_formats(m3u8_url, video_id, ext='mp4')
+        formats = []
+        for m3u8_id, m3u8_url in m3u8_formats:
+            formats.extend(self._extract_m3u8_formats(
+                m3u8_url, video_id, ext='mp4',
+                # ffmpeg skips segments for fast m3u8
+                preference=-10 if m3u8_id == 'fast' else None,
+                m3u8_id=m3u8_id, fatal=False, live=True))
          self._sort_formats(formats)
  
          return {
              'id': video_id,
              'title': self._live_title(video_id),
-            'thumbnail': 'https://cdn-s.highwebmedia.com/uHK3McUtGCG3SMFcd4ZJsRv8/roomimage/%s.jpg' % video_id,
+            'thumbnail': 'https://roomimg.stream.highwebmedia.com/ri/%s.jpg' % video_id,
              'age_limit': self._rta_search(webpage),
              'is_live': True,
              'formats': formats,
diff --git a/youtube_dl/extractor/chirbit.py b/youtube_dl/extractor/chirbit.py

index f35df143a604695c0b1fe7b0e33d7384192d1d98..4815b34be7832144075793217de77ba44b7c9471 100644 (file)
--- a/youtube_dl/extractor/chirbit.py
+++ b/youtube_dl/extractor/chirbit.py
@@ -19,6 +19,7 @@ class ChirbitIE(InfoExtractor):
              'title': 'md5:f542ea253f5255240be4da375c6a5d7e',
              'description': 'md5:f24a4e22a71763e32da5fed59e47c770',
              'duration': 306,
+            'uploader': 'Gerryaudio',
          },
          'params': {
              'skip_download': True,
@@ -54,6 +55,9 @@ def _real_extract(self, url):
          duration = parse_duration(self._search_regex(
              r'class=["\']c-length["\'][^>]*>([^<]+)',
              webpage, 'duration', fatal=False))
+        uploader = self._search_regex(
+            r'id=["\']chirbit-username["\'][^>]*>([^<]+)',
+            webpage, 'uploader', fatal=False)
  
          return {
              'id': audio_id,
@@ -61,6 +65,7 @@ def _real_extract(self, url):
              'title': title,
              'description': description,
              'duration': duration,
+            'uploader': uploader,
          }
  
  
diff --git a/youtube_dl/extractor/cliphunter.py b/youtube_dl/extractor/cliphunter.py

index 252c2e846969c96d733911f2f471054286ec0777..ab651d1c8632fe08c29d44334cb4ba4a6ea2fddf 100644 (file)
--- a/youtube_dl/extractor/cliphunter.py
+++ b/youtube_dl/extractor/cliphunter.py
@@ -30,7 +30,7 @@ class CliphunterIE(InfoExtractor):
              'id': '1012420',
              'ext': 'flv',
              'title': 'Fun Jynx Maze solo',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'age_limit': 18,
          },
          'skip': 'Video gone',
@@ -41,7 +41,7 @@ class CliphunterIE(InfoExtractor):
              'id': '2019449',
              'ext': 'mp4',
              'title': 'ShesNew - My booty girlfriend, Victoria Paradice\'s pussy filled with jizz',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'age_limit': 18,
          },
      }]
diff --git a/youtube_dl/extractor/clipsyndicate.py b/youtube_dl/extractor/clipsyndicate.py

index 0b6ad895fd7841e70b7dc0dd136052ff0459dd3c..6cdb42f5a4ae56ab9fd1202716c62cadc8bc456d 100644 (file)
--- a/youtube_dl/extractor/clipsyndicate.py
+++ b/youtube_dl/extractor/clipsyndicate.py
@@ -18,7 +18,7 @@ class ClipsyndicateIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Brick Briscoe',
              'duration': 612,
-            'thumbnail': 're:^https?://.+\.jpg',
+            'thumbnail': r're:^https?://.+\.jpg',
          },
      }, {
          'url': 'http://chic.clipsyndicate.com/video/play/5844117/shark_attack',
diff --git a/youtube_dl/extractor/clubic.py b/youtube_dl/extractor/clubic.py

index f7ee3a8f8ebe4715b2d2a5f4634bc50836cc33f7..98f9cb596955621d458461c42a6e7adb7638b9db 100644 (file)
--- a/youtube_dl/extractor/clubic.py
+++ b/youtube_dl/extractor/clubic.py
@@ -19,7 +19,7 @@ class ClubicIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Clubic Week 2.0 : le FBI se lance dans la photo d\u0092identité',
              'description': 're:Gueule de bois chez Nokia. Le constructeur a indiqué cette.*',
-            'thumbnail': 're:^http://img\.clubic\.com/.*\.jpg$',
+            'thumbnail': r're:^http://img\.clubic\.com/.*\.jpg$',
          }
      }, {
          'url': 'http://www.clubic.com/video/video-clubic-week-2-0-apple-iphone-6s-et-plus-mais-surtout-le-pencil-469792.html',
diff --git a/youtube_dl/extractor/cmt.py b/youtube_dl/extractor/cmt.py

index 7d3e9b0c9ce89fff9b8094f2d86beaa5fb35e7e0..e701fbeab8231cb90ec54a1c439e6bae42ec703d 100644 (file)
--- a/youtube_dl/extractor/cmt.py
+++ b/youtube_dl/extractor/cmt.py
@@ -1,13 +1,11 @@
  from __future__ import unicode_literals
  
  from .mtv import MTVIE
-from ..utils import ExtractorError
  
  
  class CMTIE(MTVIE):
      IE_NAME = 'cmt.com'
-    _VALID_URL = r'https?://(?:www\.)?cmt\.com/(?:videos|shows)/(?:[^/]+/)*(?P<videoid>\d+)'
-    _FEED_URL = 'http://www.cmt.com/sitewide/apps/player/embed/rss/'
+    _VALID_URL = r'https?://(?:www\.)?cmt\.com/(?:videos|shows|(?:full-)?episodes|video-clips)/(?P<id>[^/]+)'
  
      _TESTS = [{
          'url': 'http://www.cmt.com/videos/garth-brooks/989124/the-call-featuring-trisha-yearwood.jhtml#artist=30061',
@@ -33,17 +31,24 @@ class CMTIE(MTVIE):
      }, {
          'url': 'http://www.cmt.com/shows/party-down-south/party-down-south-ep-407-gone-girl/1738172/playlist/#id=1738172',
          'only_matching': True,
+    }, {
+        'url': 'http://www.cmt.com/full-episodes/537qb3/nashville-the-wayfaring-stranger-season-5-ep-501',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.cmt.com/video-clips/t9e4ci/nashville-juliette-in-2-minutes',
+        'only_matching': True,
      }]
  
-    @classmethod
-    def _transform_rtmp_url(cls, rtmp_video_url):
-        if 'error_not_available.swf' in rtmp_video_url:
-            raise ExtractorError(
-                '%s said: video is not available' % cls.IE_NAME, expected=True)
-
-        return super(CMTIE, cls)._transform_rtmp_url(rtmp_video_url)
-
      def _extract_mgid(self, webpage):
-        return self._search_regex(
+        mgid = self._search_regex(
              r'MTVN\.VIDEO\.contentUri\s*=\s*([\'"])(?P<mgid>.+?)\1',
-            webpage, 'mgid', group='mgid')
+            webpage, 'mgid', group='mgid', default=None)
+        if not mgid:
+            mgid = self._extract_triforce_mgid(webpage)
+        return mgid
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        webpage = self._download_webpage(url, video_id)
+        mgid = self._extract_mgid(webpage)
+        return self.url_result('http://media.mtvnservices.com/embed/%s' % mgid)
diff --git a/youtube_dl/extractor/collegerama.py b/youtube_dl/extractor/collegerama.py

index f9e84193d95a8ebd2e49331a34c91b04ad95c649..18c7347668a55ccfdce4aed30a8052829cbf24a9 100644 (file)
--- a/youtube_dl/extractor/collegerama.py
+++ b/youtube_dl/extractor/collegerama.py
@@ -21,7 +21,7 @@ class CollegeRamaIE(InfoExtractor):
                  'ext': 'mp4',
                  'title': 'Een nieuwe wereld: waarden, bewustzijn en techniek van de mensheid 2.0.',
                  'description': '',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
                  'duration': 7713.088,
                  'timestamp': 1413309600,
                  'upload_date': '20141014',
diff --git a/youtube_dl/extractor/comedycentral.py b/youtube_dl/extractor/comedycentral.py

index 88346dde7754a124e2b1d88d5ab8291dca4ca632..4cac294153f166b676f1bdcdb2d47ee9e5fdf693 100644 (file)
--- a/youtube_dl/extractor/comedycentral.py
+++ b/youtube_dl/extractor/comedycentral.py
@@ -6,7 +6,7 @@
  
  class ComedyCentralIE(MTVServicesInfoExtractor):
      _VALID_URL = r'''(?x)https?://(?:www\.)?cc\.com/
-        (video-clips|episodes|cc-studios|video-collections|full-episodes|shows)
+        (video-clips|episodes|cc-studios|video-collections|shows(?=/[^/]+/(?!full-episodes)))
          /(?P<title>.*)'''
      _FEED_URL = 'http://comedycentral.com/feeds/mrss/'
  
@@ -27,6 +27,32 @@ class ComedyCentralIE(MTVServicesInfoExtractor):
      }]
  
  
+class ComedyCentralFullEpisodesIE(MTVServicesInfoExtractor):
+    _VALID_URL = r'''(?x)https?://(?:www\.)?cc\.com/
+        (?:full-episodes|shows(?=/[^/]+/full-episodes))
+        /(?P<id>[^?]+)'''
+    _FEED_URL = 'http://comedycentral.com/feeds/mrss/'
+
+    _TESTS = [{
+        'url': 'http://www.cc.com/full-episodes/pv391a/the-daily-show-with-trevor-noah-november-28--2016---ryan-speedo-green-season-22-ep-22028',
+        'info_dict': {
+            'description': 'Donald Trump is accused of exploiting his president-elect status for personal gain, Cuban leader Fidel Castro dies, and Ryan Speedo Green discusses "Sing for Your Life."',
+            'title': 'November 28, 2016 - Ryan Speedo Green',
+        },
+        'playlist_count': 4,
+    }, {
+        'url': 'http://www.cc.com/shows/the-daily-show-with-trevor-noah/full-episodes',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        playlist_id = self._match_id(url)
+        webpage = self._download_webpage(url, playlist_id)
+        mgid = self._extract_triforce_mgid(webpage, data_zone='t2_lc_promo1')
+        videos_info = self._get_videos_info(mgid)
+        return videos_info
+
+
  class ToshIE(MTVServicesInfoExtractor):
      IE_DESC = 'Tosh.0'
      _VALID_URL = r'^https?://tosh\.cc\.com/video-(?:clips|collections)/[^/]+/(?P<videotitle>[^/?#]+)'
@@ -45,7 +71,7 @@ class ToshIE(MTVServicesInfoExtractor):
                  'ext': 'mp4',
                  'title': 'Tosh.0|June 9, 2077|2|211|Twitter Users Share Summer Plans',
                  'description': 'Tosh asked fans to share their summer plans.',
-                'thumbnail': 're:^https?://.*\.jpg',
+                'thumbnail': r're:^https?://.*\.jpg',
                  # It's really reported to be published on year 2077
                  'upload_date': '20770610',
                  'timestamp': 3390510600,
@@ -59,12 +85,6 @@ class ToshIE(MTVServicesInfoExtractor):
          'only_matching': True,
      }]
  
-    @classmethod
-    def _transform_rtmp_url(cls, rtmp_video_url):
-        new_urls = super(ToshIE, cls)._transform_rtmp_url(rtmp_video_url)
-        new_urls['rtmp'] = rtmp_video_url.replace('viacomccstrm', 'viacommtvstrm')
-        return new_urls
-
  
  class ComedyCentralTVIE(MTVServicesInfoExtractor):
      _VALID_URL = r'https?://(?:www\.)?comedycentral\.tv/(?:staffeln|shows)/(?P<id>[^/?#&]+)'
diff --git a/youtube_dl/extractor/common.py b/youtube_dl/extractor/common.py

index 05c51fac9b0b4162fb126cb79a79d871b591ead8..2c8ec1417c21cb9397e34efe8dec7e0e5bca9e62 100644 (file)
--- a/youtube_dl/extractor/common.py
+++ b/youtube_dl/extractor/common.py
@@ -59,6 +59,7 @@
      parse_m3u8_attributes,
      extract_attributes,
      parse_codecs,
+    urljoin,
  )
  
  
@@ -120,9 +121,19 @@ class InfoExtractor(object):
                                   download, lower-case.
                                   "http", "https", "rtsp", "rtmp", "rtmpe",
                                   "m3u8", "m3u8_native" or "http_dash_segments".
-                    * fragments  A list of fragments of the fragmented media,
-                                 with the following entries:
-                                 * "url" (mandatory) - fragment's URL
+                    * fragment_base_url
+                                 Base URL for fragments. Each fragment's path
+                                 value (if present) will be relative to
+                                 this URL.
+                    * fragments  A list of fragments of a fragmented media.
+                                 Each fragment entry must contain either an url
+                                 or a path. If an url is present it should be
+                                 considered by a client. Otherwise both path and
+                                 fragment_base_url must be present. Here is
+                                 the list of all potential fields:
+                                 * "url" - fragment's URL
+                                 * "path" - fragment's path relative to
+                                            fragment_base_url
                                   * "duration" (optional, int or float)
                                   * "filesize" (optional, int)
                      * preference Order number of this format. If this field is
@@ -188,9 +199,10 @@ class InfoExtractor(object):
      uploader_url:   Full URL to a personal webpage of the video uploader.
      location:       Physical location where the video was filmed.
      subtitles:      The available subtitles as a dictionary in the format
-                    {language: subformats}. "subformats" is a list sorted from
-                    lower to higher preference, each element is a dictionary
-                    with the "ext" entry and one of:
+                    {tag: subformats}. "tag" is usually a language code, and
+                    "subformats" is a list sorted from lower to higher
+                    preference, each element is a dictionary with the "ext"
+                    entry and one of:
                          * "data": The subtitles file contents
                          * "url": A URL pointing to the subtitles file
                      "ext" will be calculated from URL if missing
@@ -1013,13 +1025,13 @@ def _remove_duplicate_formats(formats):
                  unique_formats.append(f)
          formats[:] = unique_formats
  
-    def _is_valid_url(self, url, video_id, item='video'):
+    def _is_valid_url(self, url, video_id, item='video', headers={}):
          url = self._proto_relative_url(url, scheme='http:')
          # For now assume non HTTP(S) URLs always valid
          if not (url.startswith('http://') or url.startswith('https://')):
              return True
          try:
-            self._request_webpage(url, video_id, 'Checking %s URL' % item)
+            self._request_webpage(url, video_id, 'Checking %s URL' % item, headers=headers)
              return True
          except ExtractorError as e:
              if isinstance(e.cause, compat_urllib_error.URLError):
@@ -1224,6 +1236,7 @@ def _extract_m3u8_formats(self, m3u8_url, video_id, ext=None,
                  'protocol': entry_protocol,
                  'preference': preference,
              }]
+        audio_in_video_stream = {}
          last_info = {}
          last_media = {}
          for line in m3u8_doc.splitlines():
@@ -1233,25 +1246,32 @@ def _extract_m3u8_formats(self, m3u8_url, video_id, ext=None,
                  media = parse_m3u8_attributes(line)
                  media_type = media.get('TYPE')
                  if media_type in ('VIDEO', 'AUDIO'):
+                    group_id = media.get('GROUP-ID')
                      media_url = media.get('URI')
                      if media_url:
                          format_id = []
-                        for v in (media.get('GROUP-ID'), media.get('NAME')):
+                        for v in (group_id, media.get('NAME')):
                              if v:
                                  format_id.append(v)
-                        formats.append({
+                        f = {
                              'format_id': '-'.join(format_id),
                              'url': format_url(media_url),
                              'language': media.get('LANGUAGE'),
-                            'vcodec': 'none' if media_type == 'AUDIO' else None,
                              'ext': ext,
                              'protocol': entry_protocol,
                              'preference': preference,
-                        })
+                        }
+                        if media_type == 'AUDIO':
+                            f['vcodec'] = 'none'
+                            if group_id and not audio_in_video_stream.get(group_id):
+                                audio_in_video_stream[group_id] = False
+                        formats.append(f)
                      else:
                          # When there is no URI in EXT-X-MEDIA let this tag's
                          # data be used by regular URI lines below
                          last_media = media
+                        if media_type == 'AUDIO' and group_id:
+                            audio_in_video_stream[group_id] = True
              elif line.startswith('#') or not line.strip():
                  continue
              else:
@@ -1295,6 +1315,9 @@ def _extract_m3u8_formats(self, m3u8_url, video_id, ext=None,
                          'abr': abr,
                      })
                  f.update(parse_codecs(last_info.get('CODECS')))
+                if audio_in_video_stream.get(last_info.get('AUDIO')) is False:
+                    # TODO: update acodec for for audio only formats with the same GROUP-ID
+                    f['acodec'] = 'none'
                  formats.append(f)
                  last_info = {}
                  last_media = {}
@@ -1614,21 +1637,16 @@ def extract_Initialization(source):
                  segment_template = element.find(_add_ns('SegmentTemplate'))
                  if segment_template is not None:
                      extract_common(segment_template)
-                    media_template = segment_template.get('media')
-                    if media_template:
-                        ms_info['media_template'] = media_template
+                    media = segment_template.get('media')
+                    if media:
+                        ms_info['media'] = media
                      initialization = segment_template.get('initialization')
                      if initialization:
-                        ms_info['initialization_url'] = initialization
+                        ms_info['initialization'] = initialization
                      else:
                          extract_Initialization(segment_template)
              return ms_info
  
-        def combine_url(base_url, target_url):
-            if re.match(r'^https?://', target_url):
-                return target_url
-            return '%s%s%s' % (base_url, '' if base_url.endswith('/') else '/', target_url)
-
          mpd_duration = parse_duration(mpd_doc.get('mediaPresentationDuration'))
          formats = []
          for period in mpd_doc.findall(_add_ns('Period')):
@@ -1668,6 +1686,7 @@ def combine_url(base_url, target_url):
                          lang = representation_attrib.get('lang')
                          url_el = representation.find(_add_ns('BaseURL'))
                          filesize = int_or_none(url_el.attrib.get('{http://youtube.com/yt/2012/10/10}contentLength') if url_el is not None else None)
+                        bandwidth = int_or_none(representation_attrib.get('bandwidth'))
                          f = {
                              'format_id': '%s-%s' % (mpd_id, representation_id) if mpd_id else representation_id,
                              'url': base_url,
@@ -1675,23 +1694,41 @@ def combine_url(base_url, target_url):
                              'ext': mimetype2ext(mime_type),
                              'width': int_or_none(representation_attrib.get('width')),
                              'height': int_or_none(representation_attrib.get('height')),
-                            'tbr': int_or_none(representation_attrib.get('bandwidth'), 1000),
+                            'tbr': int_or_none(bandwidth, 1000),
                              'asr': int_or_none(representation_attrib.get('audioSamplingRate')),
                              'fps': int_or_none(representation_attrib.get('frameRate')),
-                            'vcodec': 'none' if content_type == 'audio' else representation_attrib.get('codecs'),
-                            'acodec': 'none' if content_type == 'video' else representation_attrib.get('codecs'),
                              'language': lang if lang not in ('mul', 'und', 'zxx', 'mis') else None,
                              'format_note': 'DASH %s' % content_type,
                              'filesize': filesize,
                          }
+                        f.update(parse_codecs(representation_attrib.get('codecs')))
                          representation_ms_info = extract_multisegment_info(representation, adaption_set_ms_info)
-                        if 'segment_urls' not in representation_ms_info and 'media_template' in representation_ms_info:
  
-                            media_template = representation_ms_info['media_template']
-                            media_template = media_template.replace('$RepresentationID$', representation_id)
-                            media_template = re.sub(r'\$(Number|Bandwidth|Time)\$', r'%(\1)d', media_template)
-                            media_template = re.sub(r'\$(Number|Bandwidth|Time)%([^$]+)\$', r'%(\1)\2', media_template)
-                            media_template.replace('$$', '$')
+                        def prepare_template(template_name, identifiers):
+                            t = representation_ms_info[template_name]
+                            t = t.replace('$RepresentationID$', representation_id)
+                            t = re.sub(r'\$(%s)\$' % '|'.join(identifiers), r'%(\1)d', t)
+                            t = re.sub(r'\$(%s)%%([^$]+)\$' % '|'.join(identifiers), r'%(\1)\2', t)
+                            t.replace('$$', '$')
+                            return t
+
+                        # @initialization is a regular template like @media one
+                        # so it should be handled just the same way (see
+                        # https://github.com/rg3/youtube-dl/issues/11605)
+                        if 'initialization' in representation_ms_info:
+                            initialization_template = prepare_template(
+                                'initialization',
+                                # As per [1, 5.3.9.4.2, Table 15, page 54] $Number$ and
+                                # $Time$ shall not be included for @initialization thus
+                                # only $Bandwidth$ remains
+                                ('Bandwidth', ))
+                            representation_ms_info['initialization_url'] = initialization_template % {
+                                'Bandwidth': bandwidth,
+                            }
+
+                        if 'segment_urls' not in representation_ms_info and 'media' in representation_ms_info:
+
+                            media_template = prepare_template('media', ('Number', 'Bandwidth', 'Time'))
  
                              # As per [1, 5.3.9.4.4, Table 16, page 55] $Number$ and $Time$
                              # can't be used at the same time
@@ -1703,7 +1740,7 @@ def combine_url(base_url, target_url):
                                  representation_ms_info['fragments'] = [{
                                      'url': media_template % {
                                          'Number': segment_number,
-                                        'Bandwidth': int_or_none(representation_attrib.get('bandwidth')),
+                                        'Bandwidth': bandwidth,
                                      },
                                      'duration': segment_duration,
                                  } for segment_number in range(
@@ -1721,7 +1758,7 @@ def combine_url(base_url, target_url):
                                  def add_segment_url():
                                      segment_url = media_template % {
                                          'Time': segment_time,
-                                        'Bandwidth': int_or_none(representation_attrib.get('bandwidth')),
+                                        'Bandwidth': bandwidth,
                                          'Number': segment_number,
                                      }
                                      representation_ms_info['fragments'].append({
@@ -1744,14 +1781,16 @@ def add_segment_url():
                              # Example: https://www.youtube.com/watch?v=iXZV5uAYMJI
                              # or any YouTube dashsegments video
                              fragments = []
-                            s_num = 0
-                            for segment_url in representation_ms_info['segment_urls']:
-                                s = representation_ms_info['s'][s_num]
+                            segment_index = 0
+                            timescale = representation_ms_info['timescale']
+                            for s in representation_ms_info['s']:
+                                duration = float_or_none(s['d'], timescale)
                                  for r in range(s.get('r', 0) + 1):
                                      fragments.append({
-                                        'url': segment_url,
-                                        'duration': float_or_none(s['d'], representation_ms_info['timescale']),
+                                        'url': representation_ms_info['segment_urls'][segment_index],
+                                        'duration': duration,
                                      })
+                                    segment_index += 1
                              representation_ms_info['fragments'] = fragments
                          # NB: MPD manifest may contain direct URLs to unfragmented media.
                          # No fragments key is present in this case.
@@ -1761,13 +1800,13 @@ def add_segment_url():
                                  'protocol': 'http_dash_segments',
                              })
                              if 'initialization_url' in representation_ms_info:
-                                initialization_url = representation_ms_info['initialization_url'].replace('$RepresentationID$', representation_id)
+                                initialization_url = representation_ms_info['initialization_url']
                                  if not f.get('url'):
                                      f['url'] = initialization_url
                                  f['fragments'].append({'url': initialization_url})
                              f['fragments'].extend(representation_ms_info['fragments'])
                              for fragment in f['fragments']:
-                                fragment['url'] = combine_url(base_url, fragment['url'])
+                                fragment['url'] = urljoin(base_url, fragment['url'])
                          try:
                              existing_format = next(
                                  fo for fo in formats
@@ -1881,7 +1920,7 @@ def _parse_ism_formats(self, ism_doc, ism_url, ism_id=None):
                  })
          return formats
  
-    def _parse_html5_media_entries(self, base_url, webpage, video_id, m3u8_id=None, m3u8_entry_protocol='m3u8'):
+    def _parse_html5_media_entries(self, base_url, webpage, video_id, m3u8_id=None, m3u8_entry_protocol='m3u8', mpd_id=None):
          def absolute_url(video_url):
              return compat_urlparse.urljoin(base_url, video_url)
  
@@ -1898,11 +1937,16 @@ def parse_content_type(content_type):
  
          def _media_formats(src, cur_media_type):
              full_url = absolute_url(src)
-            if determine_ext(full_url) == 'm3u8':
+            ext = determine_ext(full_url)
+            if ext == 'm3u8':
                  is_plain_url = False
                  formats = self._extract_m3u8_formats(
                      full_url, video_id, ext='mp4',
                      entry_protocol=m3u8_entry_protocol, m3u8_id=m3u8_id)
+            elif ext == 'mpd':
+                is_plain_url = False
+                formats = self._extract_mpd_formats(
+                    full_url, video_id, mpd_id=mpd_id)
              else:
                  is_plain_url = True
                  formats = [{
@@ -1955,10 +1999,13 @@ def _media_formats(src, cur_media_type):
                  entries.append(media_info)
          return entries
  
-    def _extract_akamai_formats(self, manifest_url, video_id):
+    def _extract_akamai_formats(self, manifest_url, video_id, hosts={}):
          formats = []
          hdcore_sign = 'hdcore=3.7.0'
-        f4m_url = re.sub(r'(https?://.+?)/i/', r'\1/z/', manifest_url).replace('/master.m3u8', '/manifest.f4m')
+        f4m_url = re.sub(r'(https?://[^/+])/i/', r'\1/z/', manifest_url).replace('/master.m3u8', '/manifest.f4m')
+        hds_host = hosts.get('hds')
+        if hds_host:
+            f4m_url = re.sub(r'(https?://)[^/]+', r'\1' + hds_host, f4m_url)
          if 'hdcore=' not in f4m_url:
              f4m_url += ('&' if '?' in f4m_url else '?') + hdcore_sign
          f4m_formats = self._extract_f4m_formats(
@@ -1966,7 +2013,10 @@ def _extract_akamai_formats(self, manifest_url, video_id):
          for entry in f4m_formats:
              entry.update({'extra_param_to_segment_url': hdcore_sign})
          formats.extend(f4m_formats)
-        m3u8_url = re.sub(r'(https?://.+?)/z/', r'\1/i/', manifest_url).replace('/manifest.f4m', '/master.m3u8')
+        m3u8_url = re.sub(r'(https?://[^/]+)/z/', r'\1/i/', manifest_url).replace('/manifest.f4m', '/master.m3u8')
+        hls_host = hosts.get('hls')
+        if hls_host:
+            m3u8_url = re.sub(r'(https?://)[^/]+', r'\1' + hls_host, m3u8_url)
          formats.extend(self._extract_m3u8_formats(
              m3u8_url, video_id, 'mp4', 'm3u8_native',
              m3u8_id='hls', fatal=False))
diff --git a/youtube_dl/extractor/coub.py b/youtube_dl/extractor/coub.py

index a901b8d2223fb7606538d8dcd98e19905ff3889c..5fa1f006b82675d299d1cef30fbe2108496256d5 100644 (file)
--- a/youtube_dl/extractor/coub.py
+++ b/youtube_dl/extractor/coub.py
@@ -20,7 +20,7 @@ class CoubIE(InfoExtractor):
              'id': '5u5n1',
              'ext': 'mp4',
              'title': 'The Matrix Moonwalk',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 4.6,
              'timestamp': 1428527772,
              'upload_date': '20150408',
diff --git a/youtube_dl/extractor/crackle.py b/youtube_dl/extractor/crackle.py

index cc68f1c0082674eaf850c2a0c1e3d6ae0f670d74..377fb45e9d2bcd70c1a6aa6d835331636708c215 100644 (file)
--- a/youtube_dl/extractor/crackle.py
+++ b/youtube_dl/extractor/crackle.py
@@ -6,7 +6,7 @@
  
  
  class CrackleIE(InfoExtractor):
-    _VALID_URL = r'(?:crackle:|https?://(?:www\.)?crackle\.com/(?:playlist/\d+/|(?:[^/]+/)+))(?P<id>\d+)'
+    _VALID_URL = r'(?:crackle:|https?://(?:(?:www|m)\.)?crackle\.com/(?:playlist/\d+/|(?:[^/]+/)+))(?P<id>\d+)'
      _TEST = {
          'url': 'http://www.crackle.com/comedians-in-cars-getting-coffee/2498934',
          'info_dict': {
@@ -14,7 +14,7 @@ class CrackleIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Everybody Respects A Bloody Nose',
              'description': 'Jerry is kaffeeklatsching in L.A. with funnyman J.B. Smoove (Saturday Night Live, Real Husbands of Hollywood). They’re headed for brew at 10 Speed Coffee in a 1964 Studebaker Avanti.',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'duration': 906,
              'series': 'Comedians In Cars Getting Coffee',
              'season_number': 8,
@@ -31,8 +31,32 @@ class CrackleIE(InfoExtractor):
          }
      }
  
+    _THUMBNAIL_RES = [
+        (120, 90),
+        (208, 156),
+        (220, 124),
+        (220, 220),
+        (240, 180),
+        (250, 141),
+        (315, 236),
+        (320, 180),
+        (360, 203),
+        (400, 300),
+        (421, 316),
+        (460, 330),
+        (460, 460),
+        (462, 260),
+        (480, 270),
+        (587, 330),
+        (640, 480),
+        (700, 330),
+        (700, 394),
+        (854, 480),
+        (1024, 1024),
+        (1920, 1080),
+    ]
+
      # extracted from http://legacyweb-us.crackle.com/flash/ReferrerRedirect.ashx
-    _THUMBNAIL_TEMPLATE = 'http://images-us-am.crackle.com/%stnl_1920x1080.jpg?ts=20140107233116?c=635333335057637614'
      _MEDIA_FILE_SLOTS = {
          'c544.flv': {
              'width': 544,
@@ -61,17 +85,25 @@ def _real_extract(self, url):
  
          item = self._download_xml(
              'http://legacyweb-us.crackle.com/app/revamp/vidwallcache.aspx?flags=-1&fm=%s' % video_id,
-            video_id).find('i')
+            video_id, headers=self.geo_verification_headers()).find('i')
          title = item.attrib['t']
  
          subtitles = {}
          formats = self._extract_m3u8_formats(
              'http://content.uplynk.com/ext/%s/%s.m3u8' % (config_doc.attrib['strUplynkOwnerId'], video_id),
              video_id, 'mp4', m3u8_id='hls', fatal=None)
-        thumbnail = None
+        thumbnails = []
          path = item.attrib.get('p')
          if path:
-            thumbnail = self._THUMBNAIL_TEMPLATE % path
+            for width, height in self._THUMBNAIL_RES:
+                res = '%dx%d' % (width, height)
+                thumbnails.append({
+                    'id': res,
+                    'url': 'http://images-us-am.crackle.com/%stnl_%s.jpg' % (path, res),
+                    'width': width,
+                    'height': height,
+                    'resolution': res,
+                })
              http_base_url = 'http://ahttp.crackle.com/' + path
              for mfs_path, mfs_info in self._MEDIA_FILE_SLOTS.items():
                  formats.append({
@@ -86,10 +118,11 @@ def _real_extract(self, url):
                  if locale and v:
                      if locale not in subtitles:
                          subtitles[locale] = []
-                    subtitles[locale] = [{
-                        'url': '%s/%s%s_%s.xml' % (config_doc.attrib['strSubtitleServer'], path, locale, v),
-                        'ext': 'ttml',
-                    }]
+                    for url_ext, ext in (('vtt', 'vtt'), ('xml', 'tt')):
+                        subtitles.setdefault(locale, []).append({
+                            'url': '%s/%s%s_%s.%s' % (config_doc.attrib['strSubtitleServer'], path, locale, v, url_ext),
+                            'ext': ext,
+                        })
          self._sort_formats(formats, ('width', 'height', 'tbr', 'format_id'))
  
          return {
@@ -100,7 +133,7 @@ def _real_extract(self, url):
              'series': item.attrib.get('sn'),
              'season_number': int_or_none(item.attrib.get('se')),
              'episode_number': int_or_none(item.attrib.get('ep')),
-            'thumbnail': thumbnail,
+            'thumbnails': thumbnails,
              'subtitles': subtitles,
              'formats': formats,
          }
diff --git a/youtube_dl/extractor/criterion.py b/youtube_dl/extractor/criterion.py

index cf6a5d6cbe906443b1db592616cd89926860bbdd..f7815b905d13910e0a931f2609fa015c9ac3f00a 100644 (file)
--- a/youtube_dl/extractor/criterion.py
+++ b/youtube_dl/extractor/criterion.py
@@ -14,7 +14,7 @@ class CriterionIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Le Samouraï',
              'description': 'md5:a2b4b116326558149bef81f76dcbb93f',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          }
      }
  
diff --git a/youtube_dl/extractor/crooksandliars.py b/youtube_dl/extractor/crooksandliars.py

index 443eb7691c7b9c2402942db21e521d17550167f9..7fb782db7ce930a47fc9a75730409aec805c18ac 100644 (file)
--- a/youtube_dl/extractor/crooksandliars.py
+++ b/youtube_dl/extractor/crooksandliars.py
@@ -16,7 +16,7 @@ class CrooksAndLiarsIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Fox & Friends Says Protecting Atheists From Discrimination Is Anti-Christian!',
              'description': 'md5:e1a46ad1650e3a5ec7196d432799127f',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'timestamp': 1428207000,
              'upload_date': '20150405',
              'uploader': 'Heather',
diff --git a/youtube_dl/extractor/crunchyroll.py b/youtube_dl/extractor/crunchyroll.py

index cc141f68ec52f4d3b7f795a099b5b1ccf310fdbb..109d1c5a864f283a01b2b2baaed784384776a5c1 100644 (file)
--- a/youtube_dl/extractor/crunchyroll.py
+++ b/youtube_dl/extractor/crunchyroll.py
@@ -142,7 +142,7 @@ class CrunchyrollIE(CrunchyrollBaseIE):
              'ext': 'flv',
              'title': 'Culture Japan Episode 1 – Rebuilding Japan after the 3.11',
              'description': 'md5:2fbc01f90b87e8e9137296f37b461c12',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'uploader': 'Danny Choo Network',
              'upload_date': '20120213',
          },
@@ -158,7 +158,7 @@ class CrunchyrollIE(CrunchyrollBaseIE):
              'ext': 'mp4',
              'title': 'Re:ZERO -Starting Life in Another World- Episode 5 – The Morning of Our Promise Is Still Distant',
              'description': 'md5:97664de1ab24bbf77a9c01918cb7dca9',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'uploader': 'TV TOKYO',
              'upload_date': '20160508',
          },
@@ -166,6 +166,25 @@ class CrunchyrollIE(CrunchyrollBaseIE):
              # m3u8 download
              'skip_download': True,
          },
+    }, {
+        'url': 'http://www.crunchyroll.com/konosuba-gods-blessing-on-this-wonderful-world/episode-1-give-me-deliverance-from-this-judicial-injustice-727589',
+        'info_dict': {
+            'id': '727589',
+            'ext': 'mp4',
+            'title': "KONOSUBA -God's blessing on this wonderful world! 2 Episode 1 – Give Me Deliverance from this Judicial Injustice!",
+            'description': 'md5:cbcf05e528124b0f3a0a419fc805ea7d',
+            'thumbnail': r're:^https?://.*\.jpg$',
+            'uploader': 'Kadokawa Pictures Inc.',
+            'upload_date': '20170118',
+            'series': "KONOSUBA -God's blessing on this wonderful world!",
+            'season_number': 2,
+            'episode': 'Give Me Deliverance from this Judicial Injustice!',
+            'episode_number': 1,
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        },
      }, {
          'url': 'http://www.crunchyroll.fr/girl-friend-beta/episode-11-goodbye-la-mode-661697',
          'only_matching': True,
@@ -236,8 +255,7 @@ def ass_bool(strvalue):
          output += 'WrapStyle: %s\n' % sub_root.attrib['wrap_style']
          output += 'PlayResX: %s\n' % sub_root.attrib['play_res_x']
          output += 'PlayResY: %s\n' % sub_root.attrib['play_res_y']
-        output += """ScaledBorderAndShadow: yes
-
+        output += """
  [V4+ Styles]
  Format: Name, Fontname, Fontsize, PrimaryColour, SecondaryColour, OutlineColour, BackColour, Bold, Italic, Underline, StrikeOut, ScaleX, ScaleY, Spacing, Angle, BorderStyle, Outline, Shadow, Alignment, MarginL, MarginR, MarginV, Encoding
  """
@@ -439,6 +457,18 @@ def _real_extract(self, url):
  
          subtitles = self.extract_subtitles(video_id, webpage)
  
+        # webpage provide more accurate data than series_title from XML
+        series = self._html_search_regex(
+            r'id=["\']showmedia_about_episode_num[^>]+>\s*<a[^>]+>([^<]+)',
+            webpage, 'series', default=xpath_text(metadata, 'series_title'))
+
+        episode = xpath_text(metadata, 'episode_title')
+        episode_number = int_or_none(xpath_text(metadata, 'episode_number'))
+
+        season_number = int_or_none(self._search_regex(
+            r'(?s)<h4[^>]+id=["\']showmedia_about_episode_num[^>]+>.+?</h4>\s*<h4>\s*Season (\d+)',
+            webpage, 'season number', default=None))
+
          return {
              'id': video_id,
              'title': video_title,
@@ -446,9 +476,10 @@ def _real_extract(self, url):
              'thumbnail': xpath_text(metadata, 'episode_image_url'),
              'uploader': video_uploader,
              'upload_date': video_upload_date,
-            'series': xpath_text(metadata, 'series_title'),
-            'episode': xpath_text(metadata, 'episode_title'),
-            'episode_number': int_or_none(xpath_text(metadata, 'episode_number')),
+            'series': series,
+            'season_number': season_number,
+            'episode': episode,
+            'episode_number': episode_number,
              'subtitles': subtitles,
              'formats': formats,
          }
diff --git a/youtube_dl/extractor/cspan.py b/youtube_dl/extractor/cspan.py

index 7e5d4f2276385a363eade175dba78519cea515fe..d4576160b4489e599e4ca7dabc1e18c9d685610f 100644 (file)
--- a/youtube_dl/extractor/cspan.py
+++ b/youtube_dl/extractor/cspan.py
@@ -12,6 +12,7 @@
      ExtractorError,
  )
  from .senateisvp import SenateISVPIE
+from .ustream import UstreamIE
  
  
  class CSpanIE(InfoExtractor):
@@ -22,14 +23,13 @@ class CSpanIE(InfoExtractor):
          'md5': '94b29a4f131ff03d23471dd6f60b6a1d',
          'info_dict': {
              'id': '315139',
-            'ext': 'mp4',
              'title': 'Attorney General Eric Holder on Voting Rights Act Decision',
-            'description': 'Attorney General Eric Holder speaks to reporters following the Supreme Court decision in [Shelby County v. Holder], in which the court ruled that the preclearance provisions of the Voting Rights Act could not be enforced.',
          },
+        'playlist_mincount': 2,
          'skip': 'Regularly fails on travis, for unknown reasons',
      }, {
          'url': 'http://www.c-span.org/video/?c4486943/cspan-international-health-care-models',
-        'md5': '8e5fbfabe6ad0f89f3012a7943c1287b',
+        # md5 is unstable
          'info_dict': {
              'id': 'c4486943',
              'ext': 'mp4',
@@ -38,14 +38,11 @@ class CSpanIE(InfoExtractor):
          }
      }, {
          'url': 'http://www.c-span.org/video/?318608-1/gm-ignition-switch-recall',
-        'md5': '2ae5051559169baadba13fc35345ae74',
          'info_dict': {
              'id': '342759',
-            'ext': 'mp4',
              'title': 'General Motors Ignition Switch Recall',
-            'duration': 14848,
-            'description': 'md5:118081aedd24bf1d3b68b3803344e7f3'
          },
+        'playlist_mincount': 6,
      }, {
          # Video from senate.gov
          'url': 'http://www.c-span.org/video/?104517-1/immigration-reforms-needed-protect-skilled-american-workers',
@@ -57,12 +54,30 @@ class CSpanIE(InfoExtractor):
          'params': {
              'skip_download': True,  # m3u8 downloads
          }
+    }, {
+        # Ustream embedded video
+        'url': 'https://www.c-span.org/video/?114917-1/armed-services',
+        'info_dict': {
+            'id': '58428542',
+            'ext': 'flv',
+            'title': 'USHR07 Armed Services Committee',
+            'description': 'hsas00-2118-20150204-1000et-07\n\n\nUSHR07 Armed Services Committee',
+            'timestamp': 1423060374,
+            'upload_date': '20150204',
+            'uploader': 'HouseCommittee',
+            'uploader_id': '12987475',
+        },
      }]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
          video_type = None
          webpage = self._download_webpage(url, video_id)
+
+        ustream_url = UstreamIE._extract_url(webpage)
+        if ustream_url:
+            return self.url_result(ustream_url, UstreamIE.ie_key())
+
          # We first look for clipid, because clipprog always appears before
          patterns = [r'id=\'clip(%s)\'\s*value=\'([0-9]+)\'' % t for t in ('id', 'prog')]
          results = list(filter(None, (re.search(p, webpage) for p in patterns)))
diff --git a/youtube_dl/extractor/ctsnews.py b/youtube_dl/extractor/ctsnews.py

index 83ca90c3b68a66c8c612bd29cda89ae6d91f1478..d565335cf6c31a047b8882415afb4ea259578a06 100644 (file)
--- a/youtube_dl/extractor/ctsnews.py
+++ b/youtube_dl/extractor/ctsnews.py
@@ -28,7 +28,7 @@ class CtsNewsIE(InfoExtractor):
              'ext': 'mp4',
              'title': '韓國31歲童顏男 貌如十多歲小孩',
              'description': '越有年紀的人，越希望看起來年輕一點，而南韓卻有一位31歲的男子，看起來像是11、12歲的小孩，身...',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'timestamp': 1378205880,
              'upload_date': '20130903',
          }
@@ -41,7 +41,7 @@ class CtsNewsIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'iPhone6熱銷 蘋果財報亮眼',
              'description': 'md5:f395d4f485487bb0f992ed2c4b07aa7d',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'upload_date': '20150128',
              'uploader_id': 'TBSCTS',
              'uploader': '中華電視公司',
diff --git a/youtube_dl/extractor/ctvnews.py b/youtube_dl/extractor/ctvnews.py

index 1023b61300b4d381a0f5019e2a3a04cbc77adc8a..55a127b7696e5d5dbb845709451c1b05b8df7211 100644 (file)
--- a/youtube_dl/extractor/ctvnews.py
+++ b/youtube_dl/extractor/ctvnews.py
@@ -8,7 +8,7 @@
  
  
  class CTVNewsIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?ctvnews\.ca/(?:video\?(?:clip|playlist|bin)Id=|.*?)(?P<id>[0-9.]+)'
+    _VALID_URL = r'https?://(?:.+?\.)?ctvnews\.ca/(?:video\?(?:clip|playlist|bin)Id=|.*?)(?P<id>[0-9.]+)'
      _TESTS = [{
          'url': 'http://www.ctvnews.ca/video?clipId=901995',
          'md5': '10deb320dc0ccb8d01d34d12fc2ea672',
@@ -40,6 +40,9 @@ class CTVNewsIE(InfoExtractor):
      }, {
          'url': 'http://www.ctvnews.ca/canadiens-send-p-k-subban-to-nashville-in-blockbuster-trade-1.2967231',
          'only_matching': True,
+    }, {
+        'url': 'http://vancouverisland.ctvnews.ca/video?clipId=761241',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
diff --git a/youtube_dl/extractor/cultureunplugged.py b/youtube_dl/extractor/cultureunplugged.py

index 9f26fa5878777d3302383646ad581056f429841a..bcdf27323edc795e75b91488c3989dfd5552d455 100644 (file)
--- a/youtube_dl/extractor/cultureunplugged.py
+++ b/youtube_dl/extractor/cultureunplugged.py
@@ -21,7 +21,7 @@ class CultureUnpluggedIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'The Next, Best West',
              'description': 'md5:0423cd00833dea1519cf014e9d0903b1',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'creator': 'Coldstream Creative',
              'duration': 2203,
              'view_count': int,
diff --git a/youtube_dl/extractor/dailymotion.py b/youtube_dl/extractor/dailymotion.py

index 4a3314ea7d4fc2df95543cda554d32a8caf586ac..31bf5faf6605553cdcd79f670285a554e711364f 100644 (file)
--- a/youtube_dl/extractor/dailymotion.py
+++ b/youtube_dl/extractor/dailymotion.py
@@ -58,7 +58,7 @@ class DailymotionIE(DailymotionBaseInfoExtractor):
                  'ext': 'mp4',
                  'title': 'Steam Machine Models, Pricing Listed on Steam Store - IGN News',
                  'description': 'Several come bundled with the Steam Controller.',
-                'thumbnail': 're:^https?:.*\.(?:jpg|png)$',
+                'thumbnail': r're:^https?:.*\.(?:jpg|png)$',
                  'duration': 74,
                  'timestamp': 1425657362,
                  'upload_date': '20150306',
diff --git a/youtube_dl/extractor/daum.py b/youtube_dl/extractor/daum.py

index 732b4362a96488e67f4b1858f83429a85e877555..76f0218923536b29550c9384ce8348baf05289d5 100644 (file)
--- a/youtube_dl/extractor/daum.py
+++ b/youtube_dl/extractor/daum.py
@@ -32,7 +32,7 @@ class DaumIE(InfoExtractor):
              'title': '마크 헌트 vs 안토니오 실바',
              'description': 'Mark Hunt vs Antonio Silva',
              'upload_date': '20131217',
-            'thumbnail': 're:^https?://.*\.(?:jpg|png)',
+            'thumbnail': r're:^https?://.*\.(?:jpg|png)',
              'duration': 2117,
              'view_count': int,
              'comment_count': int,
@@ -45,7 +45,7 @@ class DaumIE(InfoExtractor):
              'title': '1297회, \'아빠 아들로 태어나길 잘 했어\' 민수, 감동의 눈물[아빠 어디가] 20150118',
              'description': 'md5:79794514261164ff27e36a21ad229fc5',
              'upload_date': '20150604',
-            'thumbnail': 're:^https?://.*\.(?:jpg|png)',
+            'thumbnail': r're:^https?://.*\.(?:jpg|png)',
              'duration': 154,
              'view_count': int,
              'comment_count': int,
@@ -61,7 +61,7 @@ class DaumIE(InfoExtractor):
              'title': '01-Korean War ( Trouble on the horizon )',
              'description': '\nKorean War 01\nTrouble on the horizon\n전쟁의 먹구름',
              'upload_date': '20080223',
-            'thumbnail': 're:^https?://.*\.(?:jpg|png)',
+            'thumbnail': r're:^https?://.*\.(?:jpg|png)',
              'duration': 249,
              'view_count': int,
              'comment_count': int,
@@ -139,7 +139,7 @@ class DaumClipIE(InfoExtractor):
              'title': 'DOTA 2GETHER 시즌2 6회 - 2부',
              'description': 'DOTA 2GETHER 시즌2 6회 - 2부',
              'upload_date': '20130831',
-            'thumbnail': 're:^https?://.*\.(?:jpg|png)',
+            'thumbnail': r're:^https?://.*\.(?:jpg|png)',
              'duration': 3868,
              'view_count': int,
          },
diff --git a/youtube_dl/extractor/dbtv.py b/youtube_dl/extractor/dbtv.py

index 6d880d43d6507077018f9489749947d83a36f64b..f232f0dc536f612530e6ca7cfa0fde97e20b9467 100644 (file)
--- a/youtube_dl/extractor/dbtv.py
+++ b/youtube_dl/extractor/dbtv.py
@@ -17,7 +17,7 @@ class DBTVIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Skulle teste ut fornøyelsespark, men kollegaen var bare opptatt av bikinikroppen',
              'description': 'md5:1504a54606c4dde3e4e61fc97aa857e0',
-            'thumbnail': 're:https?://.*\.jpg',
+            'thumbnail': r're:https?://.*\.jpg',
              'timestamp': 1404039863,
              'upload_date': '20140629',
              'duration': 69.544,
diff --git a/youtube_dl/extractor/dctp.py b/youtube_dl/extractor/dctp.py

index 14ba88715887caeb9144e68384417b2e7b518b07..00fbbff2fa35d2212d521a43e0e1b41b281d477b 100644 (file)
--- a/youtube_dl/extractor/dctp.py
+++ b/youtube_dl/extractor/dctp.py
@@ -17,7 +17,7 @@ class DctpTvIE(InfoExtractor):
              'title': 'Videoinstallation für eine Kaufhausfassade',
              'description': 'Kurzfilm',
              'upload_date': '20110407',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          },
      }
  
diff --git a/youtube_dl/extractor/deezer.py b/youtube_dl/extractor/deezer.py

index 7a07f3267db874649e5bcc5228a1c7881ebe19d3..ec87b94dbcc74ae60e05d1c6f43a6e4429cbb721 100644 (file)
--- a/youtube_dl/extractor/deezer.py
+++ b/youtube_dl/extractor/deezer.py
@@ -19,7 +19,7 @@ class DeezerPlaylistIE(InfoExtractor):
              'id': '176747451',
              'title': 'Best!',
              'uploader': 'Anonymous',
-            'thumbnail': 're:^https?://cdn-images.deezer.com/images/cover/.*\.jpg$',
+            'thumbnail': r're:^https?://cdn-images.deezer.com/images/cover/.*\.jpg$',
          },
          'playlist_count': 30,
          'skip': 'Only available in .de',
diff --git a/youtube_dl/extractor/dhm.py b/youtube_dl/extractor/dhm.py

index 44e0c5d4d7094cf965555431e39387a78bdb6f83..aee72a6ed1e2daac661887b2ed225e898635c71b 100644 (file)
--- a/youtube_dl/extractor/dhm.py
+++ b/youtube_dl/extractor/dhm.py
@@ -17,7 +17,7 @@ class DHMIE(InfoExtractor):
              'title': 'MARSHALL PLAN AT WORK IN WESTERN GERMANY, THE',
              'description': 'md5:1fabd480c153f97b07add61c44407c82',
              'duration': 660,
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          },
      }, {
          'url': 'http://www.dhm.de/filmarchiv/02-mapping-the-wall/peter-g/rolle-1/',
@@ -26,7 +26,7 @@ class DHMIE(InfoExtractor):
              'id': 'rolle-1',
              'ext': 'flv',
              'title': 'ROLLE 1',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          },
      }]
  
diff --git a/youtube_dl/extractor/digiteka.py b/youtube_dl/extractor/digiteka.py

index 7bb79ffda0bbeda00ea103e59ab19ab746196de3..3dfde0d8c772746821afea8b16d0f5d9d8dc1cfb 100644 (file)
--- a/youtube_dl/extractor/digiteka.py
+++ b/youtube_dl/extractor/digiteka.py
@@ -36,7 +36,7 @@ class DigitekaIE(InfoExtractor):
              'id': 's8uk0r',
              'ext': 'mp4',
              'title': 'Loi sur la fin de vie: le texte prévoit un renforcement des directives anticipées',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'duration': 74,
              'upload_date': '20150317',
              'timestamp': 1426604939,
@@ -50,7 +50,7 @@ class DigitekaIE(InfoExtractor):
              'id': 'xvpfp8',
              'ext': 'mp4',
              'title': 'Two - C\'est La Vie (clip)',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'duration': 233,
              'upload_date': '20150224',
              'timestamp': 1424760500,
diff --git a/youtube_dl/extractor/discoverygo.py b/youtube_dl/extractor/discoverygo.py

index c4e83b2c3790670ec7d6c1b7c9cca4e47b4d7779..2042493a8c7836ecae4efd23005101cf805116a7 100644 (file)
--- a/youtube_dl/extractor/discoverygo.py
+++ b/youtube_dl/extractor/discoverygo.py
@@ -6,7 +6,6 @@
      extract_attributes,
      int_or_none,
      parse_age_limit,
-    unescapeHTML,
      ExtractorError,
  )
  
@@ -49,7 +48,7 @@ def _real_extract(self, url):
                  webpage, 'video container'))
  
          video = self._parse_json(
-            unescapeHTML(container.get('data-video') or container.get('data-json')),
+            container.get('data-video') or container.get('data-json'),
              display_id)
  
          title = video['name']
diff --git a/youtube_dl/extractor/disney.py b/youtube_dl/extractor/disney.py

new file mode 100644 (file)

index 0000000..396873c
--- /dev/null
+++ b/youtube_dl/extractor/disney.py
@@ -0,0 +1,115 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..utils import (
+    int_or_none,
+    unified_strdate,
+    compat_str,
+    determine_ext,
+)
+
+
+class DisneyIE(InfoExtractor):
+    _VALID_URL = r'''(?x)
+        https?://(?P<domain>(?:[^/]+\.)?(?:disney\.[a-z]{2,3}(?:\.[a-z]{2})?|disney(?:(?:me|latino)\.com|turkiye\.com\.tr)|starwars\.com))/(?:embed/|(?:[^/]+/)+[\w-]+-)(?P<id>[a-z0-9]{24})'''
+    _TESTS = [{
+        'url': 'http://video.disney.com/watch/moana-trailer-545ed1857afee5a0ec239977',
+        'info_dict': {
+            'id': '545ed1857afee5a0ec239977',
+            'ext': 'mp4',
+            'title': 'Moana - Trailer',
+            'description': 'A fun adventure for the entire Family!  Bring home Moana on Digital HD Feb 21 & Blu-ray March 7',
+            'upload_date': '20170112',
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        }
+    }, {
+        'url': 'http://videos.disneylatino.com/ver/spider-man-de-regreso-a-casa-primer-adelanto-543a33a1850bdcfcca13bae2',
+        'only_matching': True,
+    }, {
+        'url': 'http://video.en.disneyme.com/watch/future-worm/robo-carp-2001-544b66002aa7353cdd3f5114',
+        'only_matching': True,
+    }, {
+        'url': 'http://video.disneyturkiye.com.tr/izle/7c-7-cuceler/kimin-sesi-zaten-5456f3d015f6b36c8afdd0e2',
+        'only_matching': True,
+    }, {
+        'url': 'http://disneyjunior.disney.com/embed/546a4798ddba3d1612e4005d',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.starwars.com/embed/54690d1e6c42e5f09a0fb097',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        domain, video_id = re.match(self._VALID_URL, url).groups()
+        webpage = self._download_webpage(
+            'http://%s/embed/%s' % (domain, video_id), video_id)
+        video_data = self._parse_json(self._search_regex(
+            r'Disney\.EmbedVideo=({.+});', webpage, 'embed data'), video_id)['video']
+
+        for external in video_data.get('externals', []):
+            if external.get('source') == 'vevo':
+                return self.url_result('vevo:' + external['data_id'], 'Vevo')
+
+        title = video_data['title']
+
+        formats = []
+        for flavor in video_data.get('flavors', []):
+            flavor_format = flavor.get('format')
+            flavor_url = flavor.get('url')
+            if not flavor_url or not re.match(r'https?://', flavor_url):
+                continue
+            tbr = int_or_none(flavor.get('bitrate'))
+            if tbr == 99999:
+                formats.extend(self._extract_m3u8_formats(
+                    flavor_url, video_id, 'mp4', m3u8_id=flavor_format, fatal=False))
+                continue
+            format_id = []
+            if flavor_format:
+                format_id.append(flavor_format)
+            if tbr:
+                format_id.append(compat_str(tbr))
+            ext = determine_ext(flavor_url)
+            if flavor_format == 'applehttp' or ext == 'm3u8':
+                ext = 'mp4'
+            width = int_or_none(flavor.get('width'))
+            height = int_or_none(flavor.get('height'))
+            formats.append({
+                'format_id': '-'.join(format_id),
+                'url': flavor_url,
+                'width': width,
+                'height': height,
+                'tbr': tbr,
+                'ext': ext,
+                'vcodec': 'none' if (width == 0 and height == 0) else None,
+            })
+        self._sort_formats(formats)
+
+        subtitles = {}
+        for caption in video_data.get('captions', []):
+            caption_url = caption.get('url')
+            caption_format = caption.get('format')
+            if not caption_url or caption_format.startswith('unknown'):
+                continue
+            subtitles.setdefault(caption.get('language', 'en'), []).append({
+                'url': caption_url,
+                'ext': {
+                    'webvtt': 'vtt',
+                }.get(caption_format, caption_format),
+            })
+
+        return {
+            'id': video_id,
+            'title': title,
+            'description': video_data.get('description') or video_data.get('short_desc'),
+            'thumbnail': video_data.get('thumb') or video_data.get('thumb_secure'),
+            'duration': int_or_none(video_data.get('duration_sec')),
+            'upload_date': unified_strdate(video_data.get('publish_date')),
+            'formats': formats,
+            'subtitles': subtitles,
+        }
diff --git a/youtube_dl/extractor/douyutv.py b/youtube_dl/extractor/douyutv.py

index e366e17e68139288543243667d637544488a6a23..91159441369121773a5b3a5b02b5ecc9e9ee01fd 100644 (file)
--- a/youtube_dl/extractor/douyutv.py
+++ b/youtube_dl/extractor/douyutv.py
@@ -18,7 +18,7 @@
  
  class DouyuTVIE(InfoExtractor):
      IE_DESC = '斗鱼'
-    _VALID_URL = r'https?://(?:www\.)?douyu(?:tv)?\.com/(?P<id>[A-Za-z0-9]+)'
+    _VALID_URL = r'https?://(?:www\.)?douyu(?:tv)?\.com/(?:[^/]+/)*(?P<id>[A-Za-z0-9]+)'
      _TESTS = [{
          'url': 'http://www.douyutv.com/iseven',
          'info_dict': {
@@ -26,8 +26,8 @@ class DouyuTVIE(InfoExtractor):
              'display_id': 'iseven',
              'ext': 'flv',
              'title': 're:^清晨醒脑！T-ara根本停不下来！ [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
-            'description': 're:.*m7show@163\.com.*',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'description': r're:.*m7show@163\.com.*',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'uploader': '7师傅',
              'is_live': True,
          },
@@ -42,7 +42,7 @@ class DouyuTVIE(InfoExtractor):
              'ext': 'flv',
              'title': 're:^小漠从零单排记！——CSOL2躲猫猫 [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
              'description': 'md5:746a2f7a253966a06755a912f0acc0d2',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'uploader': 'douyu小漠',
              'is_live': True,
          },
@@ -57,8 +57,8 @@ class DouyuTVIE(InfoExtractor):
              'display_id': '17732',
              'ext': 'flv',
              'title': 're:^清晨醒脑！T-ara根本停不下来！ [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
-            'description': 're:.*m7show@163\.com.*',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'description': r're:.*m7show@163\.com.*',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'uploader': '7师傅',
              'is_live': True,
          },
@@ -68,6 +68,10 @@ class DouyuTVIE(InfoExtractor):
      }, {
          'url': 'http://www.douyu.com/xiaocang',
          'only_matching': True,
+    }, {
+        # \"room_id\"
+        'url': 'http://www.douyu.com/t/lpl',
+        'only_matching': True,
      }]
  
      # Decompile core.swf in webpage by ffdec "Search SWFs in memory". core.swf
@@ -82,7 +86,7 @@ def _real_extract(self, url):
          else:
              page = self._download_webpage(url, video_id)
              room_id = self._html_search_regex(
-                r'"room_id"\s*:\s*(\d+),', page, 'room id')
+                r'"room_id\\?"\s*:\s*(\d+),', page, 'room id')
  
          room = self._download_json(
              'http://m.douyu.com/html5/live?roomId=%s' % room_id, video_id,
diff --git a/youtube_dl/extractor/dplay.py b/youtube_dl/extractor/dplay.py

index 5790553f38ca29107bad44317fedb271dce0883a..32028bc3b79b61d249ad4bccaebadf745f9f942a 100644 (file)
--- a/youtube_dl/extractor/dplay.py
+++ b/youtube_dl/extractor/dplay.py
@@ -8,6 +8,7 @@
  from .common import InfoExtractor
  from ..compat import compat_urlparse
  from ..utils import (
+    USER_AGENTS,
      int_or_none,
      update_url_query,
  )
@@ -102,10 +103,16 @@ def extract_formats(protocol, manifest_url):
                      manifest_url, video_id, ext='mp4',
                      entry_protocol='m3u8_native', m3u8_id=protocol, fatal=False)
                  # Sometimes final URLs inside m3u8 are unsigned, let's fix this
-                # ourselves
+                # ourselves. Also fragments' URLs are only served signed for
+                # Safari user agent.
                  query = compat_urlparse.parse_qs(compat_urlparse.urlparse(manifest_url).query)
                  for m3u8_format in m3u8_formats:
-                    m3u8_format['url'] = update_url_query(m3u8_format['url'], query)
+                    m3u8_format.update({
+                        'url': update_url_query(m3u8_format['url'], query),
+                        'http_headers': {
+                            'User-Agent': USER_AGENTS['Safari'],
+                        },
+                    })
                  formats.extend(m3u8_formats)
              elif protocol == 'hds':
                  formats.extend(self._extract_f4m_formats(
diff --git a/youtube_dl/extractor/dramafever.py b/youtube_dl/extractor/dramafever.py

index c115956121a242920ec8016e8c9f3558c34060c6..bcd9fe2a039550d36af3f1a63cb3cf8cc583cb2a 100644 (file)
--- a/youtube_dl/extractor/dramafever.py
+++ b/youtube_dl/extractor/dramafever.py
@@ -66,7 +66,7 @@ def _login(self):
  
  class DramaFeverIE(DramaFeverBaseIE):
      IE_NAME = 'dramafever'
-    _VALID_URL = r'https?://(?:www\.)?dramafever\.com/drama/(?P<id>[0-9]+/[0-9]+)(?:/|$)'
+    _VALID_URL = r'https?://(?:www\.)?dramafever\.com/(?:[^/]+/)?drama/(?P<id>[0-9]+/[0-9]+)(?:/|$)'
      _TESTS = [{
          'url': 'http://www.dramafever.com/drama/4512/1/Cooking_with_Shin/',
          'info_dict': {
@@ -76,7 +76,7 @@ class DramaFeverIE(DramaFeverBaseIE):
              'description': 'md5:a8eec7942e1664a6896fcd5e1287bfd0',
              'episode': 'Episode 1',
              'episode_number': 1,
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'timestamp': 1404336058,
              'upload_date': '20140702',
              'duration': 343,
@@ -94,7 +94,7 @@ class DramaFeverIE(DramaFeverBaseIE):
              'description': 'md5:3ff2ee8fedaef86e076791c909cf2e91',
              'episode': 'Mnet Asian Music Awards 2015 - Part 3',
              'episode_number': 4,
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'timestamp': 1450213200,
              'upload_date': '20151215',
              'duration': 5602,
@@ -103,6 +103,9 @@ class DramaFeverIE(DramaFeverBaseIE):
              # m3u8 download
              'skip_download': True,
          },
+    }, {
+        'url': 'https://www.dramafever.com/zh-cn/drama/4972/15/Doctor_Romantic/',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
@@ -148,7 +151,7 @@ def _real_extract(self, url):
  
  class DramaFeverSeriesIE(DramaFeverBaseIE):
      IE_NAME = 'dramafever:series'
-    _VALID_URL = r'https?://(?:www\.)?dramafever\.com/drama/(?P<id>[0-9]+)(?:/(?:(?!\d+(?:/|$)).+)?)?$'
+    _VALID_URL = r'https?://(?:www\.)?dramafever\.com/(?:[^/]+/)?drama/(?P<id>[0-9]+)(?:/(?:(?!\d+(?:/|$)).+)?)?$'
      _TESTS = [{
          'url': 'http://www.dramafever.com/drama/4512/Cooking_with_Shin/',
          'info_dict': {
diff --git a/youtube_dl/extractor/drbonanza.py b/youtube_dl/extractor/drbonanza.py

index 01271f8f06ff91b22680314d644485fe94434391..79ec212c890471bd72a0e88eaf7bec0da70af124 100644 (file)
--- a/youtube_dl/extractor/drbonanza.py
+++ b/youtube_dl/extractor/drbonanza.py
@@ -20,7 +20,7 @@ class DRBonanzaIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Talkshowet - Leonard Cohen',
              'description': 'md5:8f34194fb30cd8c8a30ad8b27b70c0ca',
-            'thumbnail': 're:^https?://.*\.(?:gif|jpg)$',
+            'thumbnail': r're:^https?://.*\.(?:gif|jpg)$',
              'timestamp': 1295537932,
              'upload_date': '20110120',
              'duration': 3664,
@@ -36,7 +36,7 @@ class DRBonanzaIE(InfoExtractor):
              'ext': 'mp3',
              'title': 'EM fodbold 1992 Danmark - Tyskland finale Transmission',
              'description': 'md5:501e5a195749480552e214fbbed16c4e',
-            'thumbnail': 're:^https?://.*\.(?:gif|jpg)$',
+            'thumbnail': r're:^https?://.*\.(?:gif|jpg)$',
              'timestamp': 1223274900,
              'upload_date': '20081006',
              'duration': 7369,
diff --git a/youtube_dl/extractor/dreisat.py b/youtube_dl/extractor/dreisat.py

index 908c9e514c41ea72bac0e6f6ede41def4ba0b20b..f138025d5564b27bef7d09c2d74d7aefffd8cfdc 100644 (file)
--- a/youtube_dl/extractor/dreisat.py
+++ b/youtube_dl/extractor/dreisat.py
@@ -2,10 +2,19 @@
  
  import re
  
-from .zdf import ZDFIE
+from .common import InfoExtractor
+from ..utils import (
+    int_or_none,
+    unified_strdate,
+    xpath_text,
+    determine_ext,
+    qualities,
+    float_or_none,
+    ExtractorError,
+)
  
  
-class DreiSatIE(ZDFIE):
+class DreiSatIE(InfoExtractor):
      IE_NAME = '3sat'
      _VALID_URL = r'(?:https?://)?(?:www\.)?3sat\.de/mediathek/(?:index\.php|mediathek\.php)?\?(?:(?:mode|display)=[^&]+&)*obj=(?P<id>[0-9]+)$'
      _TESTS = [
@@ -31,6 +40,163 @@ class DreiSatIE(ZDFIE):
          },
      ]
  
+    def _parse_smil_formats(self, smil, smil_url, video_id, namespace=None, f4m_params=None, transform_rtmp_url=None):
+        param_groups = {}
+        for param_group in smil.findall(self._xpath_ns('./head/paramGroup', namespace)):
+            group_id = param_group.attrib.get(self._xpath_ns('id', 'http://www.w3.org/XML/1998/namespace'))
+            params = {}
+            for param in param_group:
+                params[param.get('name')] = param.get('value')
+            param_groups[group_id] = params
+
+        formats = []
+        for video in smil.findall(self._xpath_ns('.//video', namespace)):
+            src = video.get('src')
+            if not src:
+                continue
+            bitrate = float_or_none(video.get('system-bitrate') or video.get('systemBitrate'), 1000)
+            group_id = video.get('paramGroup')
+            param_group = param_groups[group_id]
+            for proto in param_group['protocols'].split(','):
+                formats.append({
+                    'url': '%s://%s' % (proto, param_group['host']),
+                    'app': param_group['app'],
+                    'play_path': src,
+                    'ext': 'flv',
+                    'format_id': '%s-%d' % (proto, bitrate),
+                    'tbr': bitrate,
+                })
+        self._sort_formats(formats)
+        return formats
+
+    def extract_from_xml_url(self, video_id, xml_url):
+        doc = self._download_xml(
+            xml_url, video_id,
+            note='Downloading video info',
+            errnote='Failed to download video info')
+
+        status_code = doc.find('./status/statuscode')
+        if status_code is not None and status_code.text != 'ok':
+            code = status_code.text
+            if code == 'notVisibleAnymore':
+                message = 'Video %s is not available' % video_id
+            else:
+                message = '%s returned error: %s' % (self.IE_NAME, code)
+            raise ExtractorError(message, expected=True)
+
+        title = doc.find('.//information/title').text
+        description = xpath_text(doc, './/information/detail', 'description')
+        duration = int_or_none(xpath_text(doc, './/details/lengthSec', 'duration'))
+        uploader = xpath_text(doc, './/details/originChannelTitle', 'uploader')
+        uploader_id = xpath_text(doc, './/details/originChannelId', 'uploader id')
+        upload_date = unified_strdate(xpath_text(doc, './/details/airtime', 'upload date'))
+
+        def xml_to_thumbnails(fnode):
+            thumbnails = []
+            for node in fnode:
+                thumbnail_url = node.text
+                if not thumbnail_url:
+                    continue
+                thumbnail = {
+                    'url': thumbnail_url,
+                }
+                if 'key' in node.attrib:
+                    m = re.match('^([0-9]+)x([0-9]+)$', node.attrib['key'])
+                    if m:
+                        thumbnail['width'] = int(m.group(1))
+                        thumbnail['height'] = int(m.group(2))
+                thumbnails.append(thumbnail)
+            return thumbnails
+
+        thumbnails = xml_to_thumbnails(doc.findall('.//teaserimages/teaserimage'))
+
+        format_nodes = doc.findall('.//formitaeten/formitaet')
+        quality = qualities(['veryhigh', 'high', 'med', 'low'])
+
+        def get_quality(elem):
+            return quality(xpath_text(elem, 'quality'))
+        format_nodes.sort(key=get_quality)
+        format_ids = []
+        formats = []
+        for fnode in format_nodes:
+            video_url = fnode.find('url').text
+            is_available = 'http://www.metafilegenerator' not in video_url
+            if not is_available:
+                continue
+            format_id = fnode.attrib['basetype']
+            quality = xpath_text(fnode, './quality', 'quality')
+            format_m = re.match(r'''(?x)
+                (?P<vcodec>[^_]+)_(?P<acodec>[^_]+)_(?P<container>[^_]+)_
+                (?P<proto>[^_]+)_(?P<index>[^_]+)_(?P<indexproto>[^_]+)
+            ''', format_id)
+
+            ext = determine_ext(video_url, None) or format_m.group('container')
+            if ext not in ('smil', 'f4m', 'm3u8'):
+                format_id = format_id + '-' + quality
+            if format_id in format_ids:
+                continue
+
+            if ext == 'meta':
+                continue
+            elif ext == 'smil':
+                formats.extend(self._extract_smil_formats(
+                    video_url, video_id, fatal=False))
+            elif ext == 'm3u8':
+                # the certificates are misconfigured (see
+                # https://github.com/rg3/youtube-dl/issues/8665)
+                if video_url.startswith('https://'):
+                    continue
+                formats.extend(self._extract_m3u8_formats(
+                    video_url, video_id, 'mp4', m3u8_id=format_id, fatal=False))
+            elif ext == 'f4m':
+                formats.extend(self._extract_f4m_formats(
+                    video_url, video_id, f4m_id=format_id, fatal=False))
+            else:
+                proto = format_m.group('proto').lower()
+
+                abr = int_or_none(xpath_text(fnode, './audioBitrate', 'abr'), 1000)
+                vbr = int_or_none(xpath_text(fnode, './videoBitrate', 'vbr'), 1000)
+
+                width = int_or_none(xpath_text(fnode, './width', 'width'))
+                height = int_or_none(xpath_text(fnode, './height', 'height'))
+
+                filesize = int_or_none(xpath_text(fnode, './filesize', 'filesize'))
+
+                format_note = ''
+                if not format_note:
+                    format_note = None
+
+                formats.append({
+                    'format_id': format_id,
+                    'url': video_url,
+                    'ext': ext,
+                    'acodec': format_m.group('acodec'),
+                    'vcodec': format_m.group('vcodec'),
+                    'abr': abr,
+                    'vbr': vbr,
+                    'width': width,
+                    'height': height,
+                    'filesize': filesize,
+                    'format_note': format_note,
+                    'protocol': proto,
+                    '_available': is_available,
+                })
+            format_ids.append(format_id)
+
+        self._sort_formats(formats)
+
+        return {
+            'id': video_id,
+            'title': title,
+            'description': description,
+            'duration': duration,
+            'thumbnails': thumbnails,
+            'uploader': uploader,
+            'uploader_id': uploader_id,
+            'upload_date': upload_date,
+            'formats': formats,
+        }
+
      def _real_extract(self, url):
          mobj = re.match(self._VALID_URL, url)
          video_id = mobj.group('id')
diff --git a/youtube_dl/extractor/drtuber.py b/youtube_dl/extractor/drtuber.py

index 22da8e48105e5e8ee81a9cc948c67f6ec7d72eb8..1eca82b3b46ae47e511b0f2f3f8bd6bb505cdc23 100644 (file)
--- a/youtube_dl/extractor/drtuber.py
+++ b/youtube_dl/extractor/drtuber.py
@@ -22,7 +22,7 @@ class DrTuberIE(InfoExtractor):
              'like_count': int,
              'comment_count': int,
              'categories': ['Babe', 'Blonde', 'Erotic', 'Outdoor', 'Softcore', 'Solo'],
-            'thumbnail': 're:https?://.*\.jpg$',
+            'thumbnail': r're:https?://.*\.jpg$',
              'age_limit': 18,
          }
      }, {
diff --git a/youtube_dl/extractor/dumpert.py b/youtube_dl/extractor/dumpert.py

index e5aadcd25ccccb6f9838d0bd1417edc2fbe3bd0f..c9fc9b5a9df65cd8681ce8e0933473ea9658202d 100644 (file)
--- a/youtube_dl/extractor/dumpert.py
+++ b/youtube_dl/extractor/dumpert.py
@@ -21,7 +21,7 @@ class DumpertIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Ik heb nieuws voor je',
              'description': 'Niet schrikken hoor',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          }
      }, {
          'url': 'http://www.dumpert.nl/embed/6675421/dc440fe7/',
diff --git a/youtube_dl/extractor/eagleplatform.py b/youtube_dl/extractor/eagleplatform.py

index c2f593eca201a42f7023cc64d4237b5052fbc722..76d39adac5faa9912f42d271def60a46128be3f7 100644 (file)
--- a/youtube_dl/extractor/eagleplatform.py
+++ b/youtube_dl/extractor/eagleplatform.py
@@ -31,7 +31,7 @@ class EaglePlatformIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Навальный вышел на свободу',
              'description': 'md5:d97861ac9ae77377f3f20eaf9d04b4f5',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 87,
              'view_count': int,
              'age_limit': 0,
@@ -45,7 +45,7 @@ class EaglePlatformIE(InfoExtractor):
              'id': '12820',
              'ext': 'mp4',
              'title': "'O Sole Mio",
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 216,
              'view_count': int,
          },
diff --git a/youtube_dl/extractor/egghead.py b/youtube_dl/extractor/egghead.py

new file mode 100644 (file)

index 0000000..db92146
--- /dev/null
+++ b/youtube_dl/extractor/egghead.py
@@ -0,0 +1,39 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+
+
+class EggheadCourseIE(InfoExtractor):
+    IE_DESC = 'egghead.io course'
+    IE_NAME = 'egghead:course'
+    _VALID_URL = r'https://egghead\.io/courses/(?P<id>[a-zA-Z_0-9-]+)'
+    _TEST = {
+        'url': 'https://egghead.io/courses/professor-frisby-introduces-composable-functional-javascript',
+        'playlist_count': 29,
+        'info_dict': {
+            'id': 'professor-frisby-introduces-composable-functional-javascript',
+            'title': 'Professor Frisby Introduces Composable Functional JavaScript',
+            'description': 're:(?s)^This course teaches the ubiquitous.*You\'ll start composing functionality before you know it.$',
+        },
+    }
+
+    def _real_extract(self, url):
+        playlist_id = self._match_id(url)
+        webpage = self._download_webpage(url, playlist_id)
+
+        title = self._html_search_regex(r'<h1 class="title">([^<]+)</h1>', webpage, 'title')
+        ul = self._search_regex(r'(?s)<ul class="series-lessons-list">(.*?)</ul>', webpage, 'session list')
+
+        found = re.findall(r'(?s)<a class="[^"]*"\s*href="([^"]+)">\s*<li class="item', ul)
+        entries = [self.url_result(m) for m in found]
+
+        return {
+            '_type': 'playlist',
+            'id': playlist_id,
+            'title': title,
+            'description': self._og_search_description(webpage),
+            'entries': entries,
+        }
diff --git a/youtube_dl/extractor/einthusan.py b/youtube_dl/extractor/einthusan.py

index 443865ad27ba96eea8f78c56d14b72a54bc86389..6ca07a13d736b3909269aa1314d6e868150f8aa0 100644 (file)
--- a/youtube_dl/extractor/einthusan.py
+++ b/youtube_dl/extractor/einthusan.py
@@ -19,7 +19,7 @@ class EinthusanIE(InfoExtractor):
                  'id': '2447',
                  'ext': 'mp4',
                  'title': 'Ek Villain',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
                  'description': 'md5:9d29fc91a7abadd4591fb862fa560d93',
              }
          },
@@ -30,7 +30,7 @@ class EinthusanIE(InfoExtractor):
                  'id': '1671',
                  'ext': 'mp4',
                  'title': 'Soodhu Kavvuum',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
                  'description': 'md5:b40f2bf7320b4f9414f3780817b2af8c',
              }
          },
diff --git a/youtube_dl/extractor/elpais.py b/youtube_dl/extractor/elpais.py

index 8c725a4e631860584781b116e72b02dd05813fc2..99e00cf3c68ea93fc00d5301e1e6be5567a72bff 100644 (file)
--- a/youtube_dl/extractor/elpais.py
+++ b/youtube_dl/extractor/elpais.py
@@ -2,7 +2,7 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
-from ..utils import unified_strdate
+from ..utils import strip_jsonp, unified_strdate
  
  
  class ElPaisIE(InfoExtractor):
@@ -29,6 +29,16 @@ class ElPaisIE(InfoExtractor):
              'description': 'Que sí, que las cápsulas son cómodas. Pero si le pides algo más a la vida, quizá deberías aprender a usar bien la cafetera italiana. No tienes más que ver este vídeo y seguir sus siete normas básicas.',
              'upload_date': '20160303',
          }
+    }, {
+        'url': 'http://elpais.com/elpais/2017/01/26/ciencia/1485456786_417876.html',
+        'md5': '9c79923a118a067e1a45789e1e0b0f9c',
+        'info_dict': {
+            'id': '1485456786_417876',
+            'ext': 'mp4',
+            'title': 'Hallado un barco de la antigua Roma que naufragó en Baleares hace 1.800 años',
+            'description': 'La nave portaba cientos de ánforas y se hundió cerca de la isla de Cabrera por razones desconocidas',
+            'upload_date': '20170127',
+        },
      }]
  
      def _real_extract(self, url):
@@ -37,8 +47,15 @@ def _real_extract(self, url):
  
          prefix = self._html_search_regex(
              r'var\s+url_cache\s*=\s*"([^"]+)";', webpage, 'URL prefix')
-        video_suffix = self._search_regex(
-            r"(?:URLMediaFile|urlVideo_\d+)\s*=\s*url_cache\s*\+\s*'([^']+)'", webpage, 'video URL')
+        id_multimedia = self._search_regex(
+            r"id_multimedia\s*=\s*'([^']+)'", webpage, 'ID multimedia', default=None)
+        if id_multimedia:
+            url_info = self._download_json(
+                'http://elpais.com/vdpep/1/?pepid=' + id_multimedia, video_id, transform_source=strip_jsonp)
+            video_suffix = url_info['mp4']
+        else:
+            video_suffix = self._search_regex(
+                r"(?:URLMediaFile|urlVideo_\d+)\s*=\s*url_cache\s*\+\s*'([^']+)'", webpage, 'video URL')
          video_url = prefix + video_suffix
          thumbnail_suffix = self._search_regex(
              r"(?:URLMediaStill|urlFotogramaFijo_\d+)\s*=\s*url_cache\s*\+\s*'([^']+)'",
diff --git a/youtube_dl/extractor/eroprofile.py b/youtube_dl/extractor/eroprofile.py

index 297f8a6f5fa4371415554bfe6c44d0745c262491..c08643a17cb99a92dd508201ad5c1ca69fd863ad 100644 (file)
--- a/youtube_dl/extractor/eroprofile.py
+++ b/youtube_dl/extractor/eroprofile.py
@@ -22,7 +22,7 @@ class EroProfileIE(InfoExtractor):
              'display_id': 'sexy-babe-softcore',
              'ext': 'm4v',
              'title': 'sexy babe softcore',
-            'thumbnail': 're:https?://.*\.jpg',
+            'thumbnail': r're:https?://.*\.jpg',
              'age_limit': 18,
          }
      }, {
@@ -32,7 +32,7 @@ class EroProfileIE(InfoExtractor):
              'id': '1133519',
              'ext': 'm4v',
              'title': 'Try It On Pee_cut_2.wmv - 4shared.com - file sharing - download movie file',
-            'thumbnail': 're:https?://.*\.jpg',
+            'thumbnail': r're:https?://.*\.jpg',
              'age_limit': 18,
          },
          'skip': 'Requires login',
diff --git a/youtube_dl/extractor/escapist.py b/youtube_dl/extractor/escapist.py

index a3d7bbbcb3f45a4c098397d0622fc59324412fcc..4d8a3c13467b8478b6c2a4a91bae8679a778e062 100644 (file)
--- a/youtube_dl/extractor/escapist.py
+++ b/youtube_dl/extractor/escapist.py
@@ -45,7 +45,7 @@ class EscapistIE(InfoExtractor):
              'ext': 'mp4',
              'description': "Baldur's Gate: Original, Modded or Enhanced Edition? I'll break down what you can expect from the new Baldur's Gate: Enhanced Edition.",
              'title': "Breaking Down Baldur's Gate",
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 264,
              'uploader': 'The Escapist',
          }
@@ -57,7 +57,7 @@ class EscapistIE(InfoExtractor):
              'ext': 'mp4',
              'description': 'This week, Zero Punctuation reviews Evolve.',
              'title': 'Evolve - One vs Multiplayer',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 304,
              'uploader': 'The Escapist',
          }
diff --git a/youtube_dl/extractor/esri.py b/youtube_dl/extractor/esri.py

index d4205d7fbde331e3bb9fc94275da143575ebd454..e9dcaeb1dd165f86f8eac0a78a9147fa62ada1ab 100644 (file)
--- a/youtube_dl/extractor/esri.py
+++ b/youtube_dl/extractor/esri.py
@@ -22,7 +22,7 @@ class EsriVideoIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'ArcGIS Online - Developing Applications',
              'description': 'Jeremy Bartley demonstrates how to develop applications with ArcGIS Online.',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 185,
              'upload_date': '20120419',
          }
diff --git a/youtube_dl/extractor/europa.py b/youtube_dl/extractor/europa.py

index adc43919e72aa48fa052db641e5412c6dae9b999..1efc0b2ec04bc874fee5744803e4549dc9058cd1 100644 (file)
--- a/youtube_dl/extractor/europa.py
+++ b/youtube_dl/extractor/europa.py
@@ -23,7 +23,7 @@ class EuropaIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'TRADE - Wikileaks on TTIP',
              'description': 'NEW  LIVE EC Midday press briefing of 11/08/2015',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'upload_date': '20150811',
              'duration': 34,
              'view_count': int,
diff --git a/youtube_dl/extractor/expotv.py b/youtube_dl/extractor/expotv.py

index ef11962f35035617a589e91cde5db43659099f66..95a8977821d3c292470e42f0f9170674ed9a6aa2 100644 (file)
--- a/youtube_dl/extractor/expotv.py
+++ b/youtube_dl/extractor/expotv.py
@@ -17,7 +17,7 @@ class ExpoTVIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'NYX Butter Lipstick Little Susie',
              'description': 'Goes on like butter, but looks better!',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'uploader': 'Stephanie S.',
              'upload_date': '20150520',
              'view_count': int,
diff --git a/youtube_dl/extractor/extractors.py b/youtube_dl/extractor/extractors.py

index e4ee43ee3251284b0fc69cee5f4330634e85fb13..aa235bec1aeef1090c30e7a7b4f4af3cdfece6c2 100644 (file)
--- a/youtube_dl/extractor/extractors.py
+++ b/youtube_dl/extractor/extractors.py
@@ -30,7 +30,10 @@
      AENetworksIE,
      HistoryTopicIE,
  )
-from .afreecatv import AfreecaTVIE
+from .afreecatv import (
+    AfreecaTVIE,
+    AfreecaTVGlobalIE,
+)
  from .airmozilla import AirMozillaIE
  from .aljazeera import AlJazeeraIE
  from .alphaporno import AlphaPornoIE
@@ -38,10 +41,7 @@
  from .animeondemand import AnimeOnDemandIE
  from .anitube import AnitubeIE
  from .anysex import AnySexIE
-from .aol import (
-    AolIE,
-    AolFeaturesIE,
-)
+from .aol import AolIE
  from .allocine import AllocineIE
  from .aparat import AparatIE
  from .appleconnect import AppleConnectIE
@@ -80,6 +80,10 @@
      AWAANLiveIE,
      AWAANSeasonIE,
  )
+from .azmedien import (
+    AZMedienIE,
+    AZMedienPlaylistIE,
+)
  from .azubu import AzubuIE, AzubuLiveIE
  from .baidu import BaiduVideoIE
  from .bambuser import BambuserIE, BambuserChannelIE
@@ -91,6 +95,7 @@
      BBCCoUkPlaylistIE,
      BBCIE,
  )
+from .beampro import BeamProLiveIE
  from .beeg import BeegIE
  from .behindkink import BehindKinkIE
  from .bellmedia import BellMediaIE
@@ -98,7 +103,10 @@
  from .bet import BetIE
  from .bigflix import BigflixIE
  from .bild import BildIE
-from .bilibili import BiliBiliIE
+from .bilibili import (
+    BiliBiliIE,
+    BiliBiliBangumiIE,
+)
  from .biobiochiletv import BioBioChileTVIE
  from .biqle import BIQLEIE
  from .bleacherreport import (
@@ -150,6 +158,7 @@
  )
  from .cbssports import CBSSportsIE
  from .ccc import CCCIE
+from .ccma import CCMAIE
  from .cctv import CCTVIE
  from .cda import CDAIE
  from .ceskatelevize import CeskaTelevizeIE
@@ -180,6 +189,7 @@
  from .coub import CoubIE
  from .collegerama import CollegeRamaIE
  from .comedycentral import (
+    ComedyCentralFullEpisodesIE,
      ComedyCentralIE,
      ComedyCentralShortnameIE,
      ComedyCentralTVIE,
@@ -244,6 +254,7 @@
  from .defense import DefenseGouvFrIE
  from .discovery import DiscoveryIE
  from .discoverygo import DiscoveryGoIE
+from .disney import DisneyIE
  from .dispeak import DigitallySpeakingIE
  from .dropbox import DropboxIE
  from .dw import (
@@ -253,6 +264,7 @@
  from .eagleplatform import EaglePlatformIE
  from .ebaumsworld import EbaumsWorldIE
  from .echomsk import EchoMskIE
+from .egghead import EggheadCourseIE
  from .ehow import EHowIE
  from .eighttracks import EightTracksIE
  from .einthusan import EinthusanIE
@@ -322,7 +334,6 @@
  )
  from .freesound import FreesoundIE
  from .freespeech import FreespeechIE
-from .freevideo import FreeVideoIE
  from .funimation import FunimationIE
  from .funnyordie import FunnyOrDieIE
  from .fusion import FusionIE
@@ -372,6 +383,7 @@
  )
  from .historicfilms import HistoricFilmsIE
  from .hitbox import HitboxIE, HitboxLiveIE
+from .hitrecord import HitRecordIE
  from .hornbunny import HornBunnyIE
  from .hotnewhiphop import HotNewHipHopIE
  from .hotstar import HotStarIE
@@ -399,6 +411,7 @@
      ImgurAlbumIE,
  )
  from .ina import InaIE
+from .inc import IncIE
  from .indavideo import (
      IndavideoIE,
      IndavideoEmbedIE,
@@ -409,6 +422,7 @@
  from .iprima import IPrimaIE
  from .iqiyi import IqiyiIE
  from .ir90tv import Ir90TvIE
+from .itv import ITVIE
  from .ivi import (
      IviIE,
      IviCompilationIE
@@ -449,7 +463,10 @@
      KuwoMvIE,
  )
  from .la7 import LA7IE
-from .laola1tv import Laola1TvIE
+from .laola1tv import (
+    Laola1TvEmbedIE,
+    Laola1TvIE,
+)
  from .lci import LCIIE
  from .lcp import (
      LcpPlayIE,
@@ -501,6 +518,8 @@
  )
  from .matchtv import MatchTVIE
  from .mdr import MDRIE
+from .meipai import MeipaiIE
+from .melonvod import MelonVODIE
  from .meta import METAIE
  from .metacafe import MetacafeIE
  from .metacritic import MetacriticIE
@@ -542,6 +561,7 @@
      MTVVideoIE,
      MTVServicesEmbeddedIE,
      MTVDEIE,
+    MTV81IE,
  )
  from .muenchentv import MuenchenTVIE
  from .musicplayon import MusicPlayOnIE
@@ -591,6 +611,7 @@
      NextMediaIE,
      NextMediaActionNewsIE,
      AppleDailyIE,
+    NextTVIE,
  )
  from .nfb import NFBIE
  from .nfl import NFLIE
@@ -652,6 +673,9 @@
      NRKPlaylistIE,
      NRKSkoleIE,
      NRKTVIE,
+    NRKTVDirekteIE,
+    NRKTVEpisodesIE,
+    NRKTVSeriesIE,
  )
  from .ntvde import NTVDeIE
  from .ntvru import NTVRuIE
@@ -664,6 +688,7 @@
  from .odatv import OdaTVIE
  from .odnoklassniki import OdnoklassnikiIE
  from .oktoberfesttv import OktoberfestTVIE
+from .ondemandkorea import OnDemandKoreaIE
  from .onet import (
      OnetIE,
      OnetChannelIE,
@@ -694,6 +719,7 @@
  from .philharmoniedeparis import PhilharmonieDeParisIE
  from .phoenix import PhoenixIE
  from .photobucket import PhotobucketIE
+from .piksel import PikselIE
  from .pinkbike import PinkbikeIE
  from .pladform import PladformIE
  from .playfm import PlayFMIE
@@ -713,6 +739,7 @@
  )
  from .porn91 import Porn91IE
  from .porncom import PornComIE
+from .pornflip import PornFlipIE
  from .pornhd import PornHdIE
  from .pornhub import (
      PornHubIE,
@@ -807,8 +834,6 @@
  from .scivee import SciVeeIE
  from .screencast import ScreencastIE
  from .screencastomatic import ScreencastOMaticIE
-from .screenjunkies import ScreenJunkiesIE
-from .screenwavemedia import ScreenwaveMediaIE, TeamFourIE
  from .seeker import SeekerIE
  from .senateisvp import SenateISVPIE
  from .sendtonews import SendtoNewsIE
@@ -819,7 +844,7 @@
      SharedIE,
      VivoIE,
  )
-from .sharesix import ShareSixIE
+from .showroomlive import ShowRoomLiveIE
  from .sina import SinaIE
  from .sixplay import SixPlayIE
  from .skynewsarabia import (
@@ -901,6 +926,7 @@
  )
  from .teachingchannel import TeachingChannelIE
  from .teamcoco import TeamcocoIE
+from .teamfourstar import TeamFourStarIE
  from .techtalks import TechTalksIE
  from .ted import TEDIE
  from .tele13 import Tele13IE
@@ -969,6 +995,11 @@
  )
  from .tv3 import TV3IE
  from .tv4 import TV4IE
+from .tva import TVAIE
+from .tvanouvelles import (
+    TVANouvellesIE,
+    TVANouvellesArticleIE,
+)
  from .tvc import (
      TVCIE,
      TVCArticleIE,
@@ -997,7 +1028,10 @@
      TwitchChapterIE,
      TwitchVodIE,
      TwitchProfileIE,
+    TwitchAllVideosIE,
+    TwitchUploadsIE,
      TwitchPastBroadcastsIE,
+    TwitchHighlightsIE,
      TwitchStreamIE,
      TwitchClipsIE,
  )
@@ -1011,6 +1045,7 @@
      UdemyCourseIE
  )
  from .udn import UDNEmbedIE
+from .uktvplay import UKTVPlayIE
  from .digiteka import DigitekaIE
  from .unistra import UnistraIE
  from .uol import UOLIE
@@ -1050,6 +1085,7 @@
  from .viceland import VicelandIE
  from .vidbit import VidbitIE
  from .viddler import ViddlerIE
+from .videa import VideaIE
  from .videodetective import VideoDetectiveIE
  from .videofyme import VideofyMeIE
  from .videomega import VideoMegaIE
@@ -1059,7 +1095,6 @@
      VideomoreSeasonIE,
  )
  from .videopremium import VideoPremiumIE
-from .videott import VideoTtIE
  from .vidio import VidioIE
  from .vidme import (
      VidmeIE,
@@ -1094,12 +1129,20 @@
      VikiIE,
      VikiChannelIE,
  )
+from .viu import (
+    ViuIE,
+    ViuPlaylistIE,
+    ViuOTTIE,
+)
  from .vk import (
      VKIE,
      VKUserVideosIE,
      VKWallPostIE,
  )
-from .vlive import VLiveIE
+from .vlive import (
+    VLiveIE,
+    VLiveChannelIE
+)
  from .vodlocker import VodlockerIE
  from .vodplatform import VODPlatformIE
  from .voicerepublic import VoiceRepublicIE
@@ -1108,6 +1151,7 @@
  from .vrt import VRTIE
  from .vube import VubeIE
  from .vuclip import VuClipIE
+from .vvvvid import VVVVIDIE
  from .vyborymos import VyboryMosIE
  from .vzaar import VzaarIE
  from .walla import WallaIE
@@ -1121,6 +1165,10 @@
      WDRIE,
      WDRMobileIE,
  )
+from .webcaster import (
+    WebcasterIE,
+    WebcasterFeedIE,
+)
  from .webofstories import (
      WebOfStoriesIE,
      WebOfStoriesPlaylistIE,
diff --git a/youtube_dl/extractor/facebook.py b/youtube_dl/extractor/facebook.py

index b4d38e5c258b830e192bcfa2639f2074d9217434..b325c82004b8aedc612cf3656c54816dcaf48e94 100644 (file)
--- a/youtube_dl/extractor/facebook.py
+++ b/youtube_dl/extractor/facebook.py
@@ -12,14 +12,16 @@
      compat_urllib_parse_unquote_plus,
  )
  from ..utils import (
+    clean_html,
      error_to_compat_str,
      ExtractorError,
+    get_element_by_id,
      int_or_none,
+    js_to_json,
      limit_length,
      sanitized_Request,
+    try_get,
      urlencode_postdata,
-    get_element_by_id,
-    clean_html,
  )
  
  
@@ -27,7 +29,7 @@ class FacebookIE(InfoExtractor):
      _VALID_URL = r'''(?x)
                  (?:
                      https?://
-                        (?:[\w-]+\.)?facebook\.com/
+                        (?:[\w-]+\.)?(?:facebook\.com|facebookcorewwwi\.onion)/
                          (?:[^#]*?\#!/)?
                          (?:
                              (?:
@@ -71,7 +73,7 @@ class FacebookIE(InfoExtractor):
          'info_dict': {
              'id': '274175099429670',
              'ext': 'mp4',
-            'title': 'Facebook video #274175099429670',
+            'title': 'Asif Nawab Butt posted a video to his Timeline.',
              'uploader': 'Asif Nawab Butt',
              'upload_date': '20140506',
              'timestamp': 1399398998,
@@ -150,6 +152,9 @@ class FacebookIE(InfoExtractor):
      }, {
          'url': 'https://zh-hk.facebook.com/peoplespower/videos/1135894589806027/',
          'only_matching': True,
+    }, {
+        'url': 'https://www.facebookcorewwwi.onion/video.php?v=274175099429670',
+        'only_matching': True,
      }]
  
      @staticmethod
@@ -240,12 +245,30 @@ def _extract_from_url(self, url, video_id, fatal_if_no_video=True):
  
          video_data = None
  
+        def extract_video_data(instances):
+            for item in instances:
+                if item[1][0] == 'VideoConfig':
+                    video_item = item[2][0]
+                    if video_item.get('video_id') == video_id:
+                        return video_item['videoData']
+
          server_js_data = self._parse_json(self._search_regex(
-            r'handleServerJS\(({.+})(?:\);|,")', webpage, 'server js data', default='{}'), video_id)
-        for item in server_js_data.get('instances', []):
-            if item[1][0] == 'VideoConfig':
-                video_data = item[2][0]['videoData']
-                break
+            r'handleServerJS\(({.+})(?:\);|,")', webpage,
+            'server js data', default='{}'), video_id, fatal=False)
+
+        if server_js_data:
+            video_data = extract_video_data(server_js_data.get('instances', []))
+
+        if not video_data:
+            server_js_data = self._parse_json(
+                self._search_regex(
+                    r'bigPipe\.onPageletArrive\(({.+?})\)\s*;\s*}\s*\)\s*,\s*["\']onPageletArrive\s+stream_pagelet',
+                    webpage, 'js data', default='{}'),
+                video_id, transform_source=js_to_json, fatal=False)
+            if server_js_data:
+                video_data = extract_video_data(try_get(
+                    server_js_data, lambda x: x['jsmods']['instances'],
+                    list) or [])
  
          if not video_data:
              if not fatal_if_no_video:
@@ -255,6 +278,8 @@ def _extract_from_url(self, url, video_id, fatal_if_no_video=True):
                  raise ExtractorError(
                      'The video is not available, Facebook said: "%s"' % m_msg.group(1),
                      expected=True)
+            elif '>You must log in to continue' in webpage:
+                self.raise_login_required()
              else:
                  raise ExtractorError('Cannot parse data')
  
@@ -293,10 +318,16 @@ def _extract_from_url(self, url, video_id, fatal_if_no_video=True):
              video_title = self._html_search_regex(
                  r'(?s)<span class="fbPhotosPhotoCaption".*?id="fbPhotoPageCaption"><span class="hasCaption">(.*?)</span>',
                  webpage, 'alternative title', default=None)
-            video_title = limit_length(video_title, 80)
          if not video_title:
+            video_title = self._html_search_meta(
+                'description', webpage, 'title')
+        if video_title:
+            video_title = limit_length(video_title, 80)
+        else:
              video_title = 'Facebook video #%s' % video_id
-        uploader = clean_html(get_element_by_id('fbPhotoPageAuthorName', webpage))
+        uploader = clean_html(get_element_by_id(
+            'fbPhotoPageAuthorName', webpage)) or self._search_regex(
+            r'ownerName\s*:\s*"([^"]+)"', webpage, 'uploader', fatal=False)
          timestamp = int_or_none(self._search_regex(
              r'<abbr[^>]+data-utime=["\'](\d+)', webpage,
              'timestamp', default=None))
diff --git a/youtube_dl/extractor/fc2.py b/youtube_dl/extractor/fc2.py

index c032d4d0282cc7907b08ec42de9ac842dd4a34c2..448647d727159d97b2f940e76136888af1abc64a 100644 (file)
--- a/youtube_dl/extractor/fc2.py
+++ b/youtube_dl/extractor/fc2.py
@@ -133,7 +133,7 @@ class FC2EmbedIE(InfoExtractor):
              'id': '201403223kCqB3Ez',
              'ext': 'flv',
              'title': 'プリズン･ブレイク S1-01 マイケル 【吹替】',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          },
      }
  
diff --git a/youtube_dl/extractor/firsttv.py b/youtube_dl/extractor/firsttv.py

index 6b662cc3cd78e4acf661af473f2374b5ec2af05c..081c7184233d3e79d0a2d684bd693631b7600eb2 100644 (file)
--- a/youtube_dl/extractor/firsttv.py
+++ b/youtube_dl/extractor/firsttv.py
@@ -2,7 +2,10 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
-from ..compat import compat_urlparse
+from ..compat import (
+    compat_str,
+    compat_urlparse,
+)
  from ..utils import (
      int_or_none,
      qualities,
@@ -22,9 +25,8 @@ class FirstTVIE(InfoExtractor):
          'info_dict': {
              'id': '40049',
              'ext': 'mp4',
-            'title': 'Гость Людмила Сенчина. Наедине со всеми. Выпуск от 12.02.2015',
-            'description': 'md5:36a39c1d19618fec57d12efe212a8370',
-            'thumbnail': 're:^https?://.*\.(?:jpg|JPG)$',
+            'title': 'Гость Людмила Сенчина. Наедине со всеми. Выпуск от 12.02.2015',
+            'thumbnail': r're:^https?://.*\.(?:jpg|JPG)$',
              'upload_date': '20150212',
              'duration': 2694,
          },
@@ -34,9 +36,8 @@ class FirstTVIE(InfoExtractor):
          'info_dict': {
              'id': '364746',
              'ext': 'mp4',
-            'title': 'Весенняя аллергия. Доброе утро. Фрагмент выпуска от 07.04.2016',
-            'description': 'md5:a242eea0031fd180a4497d52640a9572',
-            'thumbnail': 're:^https?://.*\.(?:jpg|JPG)$',
+            'title': 'Весенняя аллергия. Доброе утро. Фрагмент выпуска от 07.04.2016',
+            'thumbnail': r're:^https?://.*\.(?:jpg|JPG)$',
              'upload_date': '20160407',
              'duration': 179,
              'formats': 'mincount:3',
@@ -44,6 +45,17 @@ class FirstTVIE(InfoExtractor):
          'params': {
              'skip_download': True,
          },
+    }, {
+        'url': 'http://www.1tv.ru/news/issue/2016-12-01/14:00',
+        'info_dict': {
+            'id': '14:00',
+            'title': 'Выпуск новостей в 14:00   1 декабря 2016 года. Новости. Первый канал',
+            'description': 'md5:2e921b948f8c1ff93901da78ebdb1dfd',
+        },
+        'playlist_count': 13,
+    }, {
+        'url': 'http://www.1tv.ru/shows/tochvtoch-supersezon/vystupleniya/evgeniy-dyatlov-vladimir-vysockiy-koni-priveredlivye-toch-v-toch-supersezon-fragment-vypuska-ot-06-11-2016',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
@@ -51,43 +63,91 @@ def _real_extract(self, url):
  
          webpage = self._download_webpage(url, display_id)
          playlist_url = compat_urlparse.urljoin(url, self._search_regex(
-            r'data-playlist-url="([^"]+)', webpage, 'playlist url'))
+            r'data-playlist-url=(["\'])(?P<url>(?:(?!\1).)+)\1',
+            webpage, 'playlist url', group='url'))
+
+        parsed_url = compat_urlparse.urlparse(playlist_url)
+        qs = compat_urlparse.parse_qs(parsed_url.query)
+        item_ids = qs.get('videos_ids[]') or qs.get('news_ids[]')
+
+        items = self._download_json(playlist_url, display_id)
+
+        if item_ids:
+            items = [
+                item for item in items
+                if item.get('uid') and compat_str(item['uid']) in item_ids]
+        else:
+            items = [items[0]]
+
+        entries = []
+        QUALITIES = ('ld', 'sd', 'hd', )
+
+        for item in items:
+            title = item['title']
+            quality = qualities(QUALITIES)
+            formats = []
+            path = None
+            for f in item.get('mbr', []):
+                src = f.get('src')
+                if not src or not isinstance(src, compat_str):
+                    continue
+                tbr = int_or_none(self._search_regex(
+                    r'_(\d{3,})\.mp4', src, 'tbr', default=None))
+                if not path:
+                    path = self._search_regex(
+                        r'//[^/]+/(.+?)_\d+\.mp4', src,
+                        'm3u8 path', default=None)
+                formats.append({
+                    'url': src,
+                    'format_id': f.get('name'),
+                    'tbr': tbr,
+                    'source_preference': quality(f.get('name')),
+                })
+            # m3u8 URL format is reverse engineered from [1] (search for
+            # master.m3u8). dashEdges (that is currently balancer-vod.1tv.ru)
+            # is taken from [2].
+            # 1. http://static.1tv.ru/player/eump1tv-current/eump-1tv.all.min.js?rnd=9097422834:formatted
+            # 2. http://static.1tv.ru/player/eump1tv-config/config-main.js?rnd=9097422834
+            if not path and len(formats) == 1:
+                path = self._search_regex(
+                    r'//[^/]+/(.+?$)', formats[0]['url'],
+                    'm3u8 path', default=None)
+            if path:
+                if len(formats) == 1:
+                    m3u8_path = ','
+                else:
+                    tbrs = [compat_str(t) for t in sorted(f['tbr'] for f in formats)]
+                    m3u8_path = '_,%s,%s' % (','.join(tbrs), '.mp4')
+                formats.extend(self._extract_m3u8_formats(
+                    'http://balancer-vod.1tv.ru/%s%s.urlset/master.m3u8'
+                    % (path, m3u8_path),
+                    display_id, 'mp4',
+                    entry_protocol='m3u8_native', m3u8_id='hls', fatal=False))
+            self._sort_formats(formats)
+
+            thumbnail = item.get('poster') or self._og_search_thumbnail(webpage)
+            duration = int_or_none(item.get('duration') or self._html_search_meta(
+                'video:duration', webpage, 'video duration', fatal=False))
+            upload_date = unified_strdate(self._html_search_meta(
+                'ya:ovs:upload_date', webpage, 'upload date', default=None))
  
-        item = self._download_json(playlist_url, display_id)[0]
-        video_id = item['id']
-        quality = qualities(('ld', 'sd', 'hd', ))
-        formats = []
-        for f in item.get('mbr', []):
-            src = f.get('src')
-            if not src:
-                continue
-            fname = f.get('name')
-            formats.append({
-                'url': src,
-                'format_id': fname,
-                'quality': quality(fname),
+            entries.append({
+                'id': compat_str(item.get('id') or item['uid']),
+                'thumbnail': thumbnail,
+                'title': title,
+                'upload_date': upload_date,
+                'duration': int_or_none(duration),
+                'formats': formats
              })
-        self._sort_formats(formats)
  
          title = self._html_search_regex(
              (r'<div class="tv_translation">\s*<h1><a href="[^"]+">([^<]*)</a>',
               r"'title'\s*:\s*'([^']+)'"),
-            webpage, 'title', default=None) or item['title']
+            webpage, 'title', default=None) or self._og_search_title(
+            webpage, default=None)
          description = self._html_search_regex(
              r'<div class="descr">\s*<div>&nbsp;</div>\s*<p>([^<]*)</p></div>',
              webpage, 'description', default=None) or self._html_search_meta(
-            'description', webpage, 'description')
-        duration = int_or_none(self._html_search_meta(
-            'video:duration', webpage, 'video duration', fatal=False))
-        upload_date = unified_strdate(self._html_search_meta(
-            'ya:ovs:upload_date', webpage, 'upload date', fatal=False))
+            'description', webpage, 'description', default=None)
  
-        return {
-            'id': video_id,
-            'thumbnail': item.get('poster') or self._og_search_thumbnail(webpage),
-            'title': title,
-            'description': description,
-            'upload_date': upload_date,
-            'duration': int_or_none(duration),
-            'formats': formats
-        }
+        return self.playlist_result(entries, display_id, title, description)
diff --git a/youtube_dl/extractor/fivetv.py b/youtube_dl/extractor/fivetv.py

index 13fbc4da2c6fbc7c535c49a66e2a64f9dc042511..15736c9fe91e6d5a860641bcbd3be49636b83b47 100644 (file)
--- a/youtube_dl/extractor/fivetv.py
+++ b/youtube_dl/extractor/fivetv.py
@@ -25,7 +25,7 @@ class FiveTVIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Россияне выбрали имя для общенациональной платежной системы',
              'description': 'md5:a8aa13e2b7ad36789e9f77a74b6de660',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 180,
          },
      }, {
@@ -35,7 +35,7 @@ class FiveTVIE(InfoExtractor):
              'ext': 'mp4',
              'title': '3D принтер',
              'description': 'md5:d76c736d29ef7ec5c0cf7d7c65ffcb41',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 180,
          },
      }, {
@@ -44,7 +44,7 @@ class FiveTVIE(InfoExtractor):
              'id': 'glavnoe',
              'ext': 'mp4',
              'title': 'Итоги недели с 8 по 14 июня 2015 года',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          },
      }, {
          'url': 'http://www.5-tv.ru/glavnoe/broadcasts/508645/',
diff --git a/youtube_dl/extractor/fktv.py b/youtube_dl/extractor/fktv.py

index a3a2915998dc1cc2fca8f5ccdf6cec6cac0d528b..2958452f470bca7f7322fa9dcdccca66f525cee0 100644 (file)
--- a/youtube_dl/extractor/fktv.py
+++ b/youtube_dl/extractor/fktv.py
@@ -19,7 +19,7 @@ class FKTVIE(InfoExtractor):
              'id': '1',
              'ext': 'mp4',
              'title': 'Folge 1 vom 10. April 2007',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          },
      }
  
diff --git a/youtube_dl/extractor/flipagram.py b/youtube_dl/extractor/flipagram.py

index 1902a23938eb0452eb83908ef6f12f8576bf0617..b7be40f1b90f4f7ace02ba2b2687fe1e2e61ce30 100644 (file)
--- a/youtube_dl/extractor/flipagram.py
+++ b/youtube_dl/extractor/flipagram.py
@@ -81,7 +81,7 @@ def _real_extract(self, url):
              'filesize': int_or_none(cover.get('size')),
          } for cover in flipagram.get('covers', []) if cover.get('url')]
  
-        # Note that this only retrieves comments that are initally loaded.
+        # Note that this only retrieves comments that are initially loaded.
          # For videos with large amounts of comments, most won't be retrieved.
          comments = []
          for comment in video_data.get('comments', {}).get(video_id, {}).get('items', []):
diff --git a/youtube_dl/extractor/foxgay.py b/youtube_dl/extractor/foxgay.py

index 39174fcecca44b54ce42a174f59f3d14fbec2592..e887ae48869426617fdbf797182cef93f97ac2ef 100644 (file)
--- a/youtube_dl/extractor/foxgay.py
+++ b/youtube_dl/extractor/foxgay.py
@@ -20,7 +20,7 @@ class FoxgayIE(InfoExtractor):
              'title': 'Fuck Turkish-style',
              'description': 'md5:6ae2d9486921891efe89231ace13ffdf',
              'age_limit': 18,
-            'thumbnail': 're:https?://.*\.jpg$',
+            'thumbnail': r're:https?://.*\.jpg$',
          },
      }
  
diff --git a/youtube_dl/extractor/foxnews.py b/youtube_dl/extractor/foxnews.py

index 229bcb175789ee78b12ae71dbcca811de69d9b65..dc0662f74ce5a84d59aa94333ee14d56a592cda2 100644 (file)
--- a/youtube_dl/extractor/foxnews.py
+++ b/youtube_dl/extractor/foxnews.py
@@ -22,7 +22,7 @@ class FoxNewsIE(AMPIE):
                  'duration': 265,
                  'timestamp': 1304411491,
                  'upload_date': '20110503',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
              },
          },
          {
@@ -36,7 +36,7 @@ class FoxNewsIE(AMPIE):
                  'duration': 292,
                  'timestamp': 1417662047,
                  'upload_date': '20141204',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
              },
              'params': {
                  # m3u8 download
@@ -111,7 +111,7 @@ class FoxNewsInsiderIE(InfoExtractor):
              'description': 'Is campus censorship getting out of control?',
              'timestamp': 1472168725,
              'upload_date': '20160825',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          },
          'params': {
              # m3u8 download
diff --git a/youtube_dl/extractor/franceculture.py b/youtube_dl/extractor/franceculture.py

index 56048ffc21e8de8810b7e6b10122cc621927fbba..b98da692cb23ccc1a6de7a8657f0d8331640280f 100644 (file)
--- a/youtube_dl/extractor/franceculture.py
+++ b/youtube_dl/extractor/franceculture.py
@@ -17,7 +17,7 @@ class FranceCultureIE(InfoExtractor):
              'display_id': 'rendez-vous-au-pays-des-geeks',
              'ext': 'mp3',
              'title': 'Rendez-vous au pays des geeks',
-            'thumbnail': 're:^https?://.*\\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'upload_date': '20140301',
              'vcodec': 'none',
          }
diff --git a/youtube_dl/extractor/francetv.py b/youtube_dl/extractor/francetv.py

index e7068d1aed9573199211a29a91486bd72e9aecd0..48d43ae58e80bd3b054068e59f4e43464e31ec0f 100644 (file)
--- a/youtube_dl/extractor/francetv.py
+++ b/youtube_dl/extractor/francetv.py
@@ -168,7 +168,7 @@ class FranceTvInfoIE(FranceTVBaseInfoExtractor):
              'id': 'NI_173343',
              'ext': 'mp4',
              'title': 'Les entreprises familiales : le secret de la réussite',
-            'thumbnail': 're:^https?://.*\.jpe?g$',
+            'thumbnail': r're:^https?://.*\.jpe?g$',
              'timestamp': 1433273139,
              'upload_date': '20150602',
          },
@@ -184,7 +184,7 @@ class FranceTvInfoIE(FranceTVBaseInfoExtractor):
              'ext': 'mp4',
              'title': 'Olivier Monthus, réalisateur de "Bretagne, le choix de l’Armor"',
              'description': 'md5:a3264114c9d29aeca11ced113c37b16c',
-            'thumbnail': 're:^https?://.*\.jpe?g$',
+            'thumbnail': r're:^https?://.*\.jpe?g$',
              'timestamp': 1458300695,
              'upload_date': '20160318',
          },
diff --git a/youtube_dl/extractor/freesound.py b/youtube_dl/extractor/freesound.py

index 5ff62af2a33d1743709bdb076dc0c80be0e3156b..138b6bc58cf9aa282c8afc3b6498ba84884197fc 100644 (file)
--- a/youtube_dl/extractor/freesound.py
+++ b/youtube_dl/extractor/freesound.py
@@ -3,10 +3,16 @@
  import re
  
  from .common import InfoExtractor
+from ..utils import (
+    float_or_none,
+    get_element_by_class,
+    get_element_by_id,
+    unified_strdate,
+)
  
  
  class FreesoundIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?freesound\.org/people/([^/]+)/sounds/(?P<id>[^/]+)'
+    _VALID_URL = r'https?://(?:www\.)?freesound\.org/people/[^/]+/sounds/(?P<id>[^/]+)'
      _TEST = {
          'url': 'http://www.freesound.org/people/miklovan/sounds/194503/',
          'md5': '12280ceb42c81f19a515c745eae07650',
@@ -14,26 +20,60 @@ class FreesoundIE(InfoExtractor):
              'id': '194503',
              'ext': 'mp3',
              'title': 'gulls in the city.wav',
-            'uploader': 'miklovan',
              'description': 'the sounds of seagulls in the city',
+            'duration': 130.233,
+            'uploader': 'miklovan',
+            'upload_date': '20130715',
+            'tags': list,
          }
      }
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        music_id = mobj.group('id')
-        webpage = self._download_webpage(url, music_id)
-        title = self._html_search_regex(
-            r'<div id="single_sample_header">.*?<a href="#">(.+?)</a>',
-            webpage, 'music title', flags=re.DOTALL)
+        audio_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, audio_id)
+
+        audio_url = self._og_search_property('audio', webpage, 'song url')
+        title = self._og_search_property('audio:title', webpage, 'song title')
+
          description = self._html_search_regex(
-            r'<div id="sound_description">(.*?)</div>', webpage, 'description',
-            fatal=False, flags=re.DOTALL)
+            r'(?s)id=["\']sound_description["\'][^>]*>(.+?)</div>',
+            webpage, 'description', fatal=False)
+
+        duration = float_or_none(
+            get_element_by_class('duration', webpage), scale=1000)
+
+        upload_date = unified_strdate(get_element_by_id('sound_date', webpage))
+        uploader = self._og_search_property(
+            'audio:artist', webpage, 'uploader', fatal=False)
+
+        channels = self._html_search_regex(
+            r'Channels</dt><dd>(.+?)</dd>', webpage,
+            'channels info', fatal=False)
+
+        tags_str = get_element_by_class('tags', webpage)
+        tags = re.findall(r'<a[^>]+>([^<]+)', tags_str) if tags_str else None
+
+        audio_urls = [audio_url]
+
+        LQ_FORMAT = '-lq.mp3'
+        if LQ_FORMAT in audio_url:
+            audio_urls.append(audio_url.replace(LQ_FORMAT, '-hq.mp3'))
+
+        formats = [{
+            'url': format_url,
+            'format_note': channels,
+            'quality': quality,
+        } for quality, format_url in enumerate(audio_urls)]
+        self._sort_formats(formats)
  
          return {
-            'id': music_id,
+            'id': audio_id,
              'title': title,
-            'url': self._og_search_property('audio', webpage, 'music url'),
-            'uploader': self._og_search_property('audio:artist', webpage, 'music uploader'),
              'description': description,
+            'duration': duration,
+            'uploader': uploader,
+            'upload_date': upload_date,
+            'tags': tags,
+            'formats': formats,
          }
diff --git a/youtube_dl/extractor/freevideo.py b/youtube_dl/extractor/freevideo.py

deleted file mode 100644 (file)

index cd8423a..0000000
--- a/youtube_dl/extractor/freevideo.py
+++ /dev/null
@@ -1,38 +0,0 @@
-from __future__ import unicode_literals
-
-from .common import InfoExtractor
-from ..utils import ExtractorError
-
-
-class FreeVideoIE(InfoExtractor):
-    _VALID_URL = r'^https?://www.freevideo.cz/vase-videa/(?P<id>[^.]+)\.html(?:$|[?#])'
-
-    _TEST = {
-        'url': 'http://www.freevideo.cz/vase-videa/vysukany-zadecek-22033.html',
-        'info_dict': {
-            'id': 'vysukany-zadecek-22033',
-            'ext': 'mp4',
-            'title': 'vysukany-zadecek-22033',
-            'age_limit': 18,
-        },
-        'skip': 'Blocked outside .cz',
-    }
-
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-        webpage, handle = self._download_webpage_handle(url, video_id)
-        if '//www.czechav.com/' in handle.geturl():
-            raise ExtractorError(
-                'Access to freevideo is blocked from your location',
-                expected=True)
-
-        video_url = self._search_regex(
-            r'\s+url: "(http://[a-z0-9-]+.cdn.freevideo.cz/stream/.*?/video.mp4)"',
-            webpage, 'video URL')
-
-        return {
-            'id': video_id,
-            'url': video_url,
-            'title': video_id,
-            'age_limit': 18,
-        }
diff --git a/youtube_dl/extractor/funimation.py b/youtube_dl/extractor/funimation.py

index 0ad0d9b6a9fe789228487e861139fa2166d88767..eba00cd5acc0c8d931173a5f85e2e1fa03c2f78f 100644 (file)
--- a/youtube_dl/extractor/funimation.py
+++ b/youtube_dl/extractor/funimation.py
@@ -29,7 +29,7 @@ class FunimationIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Air - 1 - Breeze',
              'description': 'md5:1769f43cd5fc130ace8fd87232207892',
-            'thumbnail': 're:https?://.*\.jpg',
+            'thumbnail': r're:https?://.*\.jpg',
          },
          'skip': 'Access without user interaction is forbidden by CloudFlare, and video removed',
      }, {
@@ -40,7 +40,7 @@ class FunimationIE(InfoExtractor):
              'ext': 'mp4',
              'title': '.hack//SIGN - 1 - Role Play',
              'description': 'md5:b602bdc15eef4c9bbb201bb6e6a4a2dd',
-            'thumbnail': 're:https?://.*\.jpg',
+            'thumbnail': r're:https?://.*\.jpg',
          },
          'skip': 'Access without user interaction is forbidden by CloudFlare',
      }, {
@@ -51,7 +51,7 @@ class FunimationIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Attack on Titan: Junior High - Broadcast Dub Preview',
              'description': 'md5:f8ec49c0aff702a7832cd81b8a44f803',
-            'thumbnail': 're:https?://.*\.(?:jpg|png)',
+            'thumbnail': r're:https?://.*\.(?:jpg|png)',
          },
          'skip': 'Access without user interaction is forbidden by CloudFlare',
      }]
diff --git a/youtube_dl/extractor/funnyordie.py b/youtube_dl/extractor/funnyordie.py

index 8c5ffc9e84cec305e9fc813a6366b360b7e36230..81c0ce9a360d3f28476905849565ff341c26b883 100644 (file)
--- a/youtube_dl/extractor/funnyordie.py
+++ b/youtube_dl/extractor/funnyordie.py
@@ -17,7 +17,7 @@ class FunnyOrDieIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Heart-Shaped Box: Literal Video Version',
              'description': 'md5:ea09a01bc9a1c46d9ab696c01747c338',
-            'thumbnail': 're:^http:.*\.jpg$',
+            'thumbnail': r're:^http:.*\.jpg$',
          },
      }, {
          'url': 'http://www.funnyordie.com/embed/e402820827',
@@ -26,7 +26,10 @@ class FunnyOrDieIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Please Use This Song (Jon Lajoie)',
              'description': 'Please use this to sell something.  www.jonlajoie.com',
-            'thumbnail': 're:^http:.*\.jpg$',
+            'thumbnail': r're:^http:.*\.jpg$',
+        },
+        'params': {
+            'skip_download': True,
          },
      }, {
          'url': 'http://www.funnyordie.com/articles/ebf5e34fc8/10-hours-of-walking-in-nyc-as-a-man',
@@ -51,19 +54,45 @@ def _real_extract(self, url):
  
          formats = []
  
-        formats.extend(self._extract_m3u8_formats(
-            m3u8_url, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
+        m3u8_formats = self._extract_m3u8_formats(
+            m3u8_url, video_id, 'mp4', 'm3u8_native',
+            m3u8_id='hls', fatal=False)
+        source_formats = list(filter(
+            lambda f: f.get('vcodec') != 'none' and f.get('resolution') != 'multiple',
+            m3u8_formats))
  
-        bitrates = [int(bitrate) for bitrate in re.findall(r'[,/]v(\d+)[,/]', m3u8_url)]
+        bitrates = [int(bitrate) for bitrate in re.findall(r'[,/]v(\d+)(?=[,/])', m3u8_url)]
          bitrates.sort()
  
-        for bitrate in bitrates:
-            for link in links:
-                formats.append({
-                    'url': self._proto_relative_url('%s%d.%s' % (link[0], bitrate, link[1])),
-                    'format_id': '%s-%d' % (link[1], bitrate),
-                    'vbr': bitrate,
-                })
+        if source_formats:
+            self._sort_formats(source_formats)
+
+        for bitrate, f in zip(bitrates, source_formats or [{}] * len(bitrates)):
+            for path, ext in links:
+                ff = f.copy()
+                if ff:
+                    if ext != 'mp4':
+                        ff = dict(
+                            [(k, v) for k, v in ff.items()
+                             if k in ('height', 'width', 'format_id')])
+                    ff.update({
+                        'format_id': ff['format_id'].replace('hls', ext),
+                        'ext': ext,
+                        'protocol': 'http',
+                    })
+                else:
+                    ff.update({
+                        'format_id': '%s-%d' % (ext, bitrate),
+                        'vbr': bitrate,
+                    })
+                ff['url'] = self._proto_relative_url(
+                    '%s%d.%s' % (path, bitrate, ext))
+                formats.append(ff)
+        self._check_formats(formats, video_id)
+
+        formats.extend(m3u8_formats)
+        self._sort_formats(
+            formats, field_preference=('height', 'width', 'tbr', 'format_id'))
  
          subtitles = {}
          for src, src_lang in re.findall(r'<track kind="captions" src="([^"]+)" srclang="([^"]+)"', webpage):
diff --git a/youtube_dl/extractor/fusion.py b/youtube_dl/extractor/fusion.py

index b4ab4cbb7e0e86ea9a5ff43a02dfa0ec61e3d25e..ede729b5262c286c347b544fe7493bea020b5afd 100644 (file)
--- a/youtube_dl/extractor/fusion.py
+++ b/youtube_dl/extractor/fusion.py
@@ -29,7 +29,7 @@ def _real_extract(self, url):
          webpage = self._download_webpage(url, display_id)
  
          ooyala_code = self._search_regex(
-            r'data-video-id=(["\'])(?P<code>.+?)\1',
+            r'data-ooyala-id=(["\'])(?P<code>(?:(?!\1).)+)\1',
              webpage, 'ooyala code', group='code')
  
          return OoyalaIE._build_url_result(ooyala_code)
diff --git a/youtube_dl/extractor/gamersyde.py b/youtube_dl/extractor/gamersyde.py

index d545e01bb8db7a9694efa9d691f817cc9e394357..a218a6944d149d86a549db18a84d6f2ee31b796e 100644 (file)
--- a/youtube_dl/extractor/gamersyde.py
+++ b/youtube_dl/extractor/gamersyde.py
@@ -20,7 +20,7 @@ class GamersydeIE(InfoExtractor):
              'ext': 'mp4',
              'duration': 372,
              'title': 'Bloodborne - Birth of a hero',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          }
      }
  
diff --git a/youtube_dl/extractor/gamespot.py b/youtube_dl/extractor/gamespot.py

index 4e859e09aa16b7608ee103e851f0d0928bbfdb30..682c49e797aab0d63e5ecd7a7ab75e2a65f71e34 100644 (file)
--- a/youtube_dl/extractor/gamespot.py
+++ b/youtube_dl/extractor/gamespot.py
@@ -63,7 +63,7 @@ def _real_extract(self, url):
              streams, ('progressive_hd', 'progressive_high', 'progressive_low'))
          if progressive_url and manifest_url:
              qualities_basename = self._search_regex(
-                '/([^/]+)\.csmil/',
+                r'/([^/]+)\.csmil/',
                  manifest_url, 'qualities basename', default=None)
              if qualities_basename:
                  QUALITIES_RE = r'((,\d+)+,?)'
diff --git a/youtube_dl/extractor/gamestar.py b/youtube_dl/extractor/gamestar.py

index 55a34604af2cd2bca83ebc2c7957f1f4eb7401f1..e607d6ab8215db56afd3810613f24bb2debf63f5 100644 (file)
--- a/youtube_dl/extractor/gamestar.py
+++ b/youtube_dl/extractor/gamestar.py
@@ -18,7 +18,7 @@ class GameStarIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Hobbit 3: Die Schlacht der Fünf Heere - Teaser-Trailer zum dritten Teil',
              'description': 'Der Teaser-Trailer zu Hobbit 3: Die Schlacht der Fünf Heere zeigt einige Szenen aus dem dritten Teil der Saga und kündigt den...',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'timestamp': 1406542020,
              'upload_date': '20140728',
              'duration': 17
diff --git a/youtube_dl/extractor/gazeta.py b/youtube_dl/extractor/gazeta.py

index 18ef5c252a9adc0ac2a1e6ae6806d2ea9b5b2546..57c67a4510f428c2c9f466532d8ddc1c19e2c809 100644 (file)
--- a/youtube_dl/extractor/gazeta.py
+++ b/youtube_dl/extractor/gazeta.py
@@ -16,7 +16,7 @@ class GazetaIE(InfoExtractor):
              'ext': 'mp4',
              'title': '«70–80 процентов гражданских в Донецке на грани голода»',
              'description': 'md5:38617526050bd17b234728e7f9620a71',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
          },
          'skip': 'video not found',
      }, {
diff --git a/youtube_dl/extractor/generic.py b/youtube_dl/extractor/generic.py

index bde65fa270fb399140e85ac63395060bd7007d2e..a23486620b8d8d9a9c81714b56777ebe3587c581 100644 (file)
--- a/youtube_dl/extractor/generic.py
+++ b/youtube_dl/extractor/generic.py
@@ -56,10 +56,10 @@
  )
  from .onionstudios import OnionStudiosIE
  from .viewlift import ViewLiftEmbedIE
-from .screenwavemedia import ScreenwaveMediaIE
  from .mtv import MTVServicesEmbeddedIE
  from .pladform import PladformIE
  from .videomore import VideomoreIE
+from .webcaster import WebcasterFeedIE
  from .googledrive import GoogleDriveIE
  from .jwplatform import JWPlatformIE
  from .digiteka import DigitekaIE
@@ -73,8 +73,14 @@
  from .eagleplatform import EaglePlatformIE
  from .facebook import FacebookIE
  from .soundcloud import SoundcloudIE
+from .tunein import TuneInBaseIE
  from .vbox7 import Vbox7IE
  from .dbtv import DBTVIE
+from .piksel import PikselIE
+from .videa import VideaIE
+from .twentymin import TwentyMinutenIE
+from .ustream import UstreamIE
+from .openload import OpenloadIE
  
  
  class GenericIE(InfoExtractor):
@@ -236,7 +242,7 @@ class GenericIE(InfoExtractor):
                  'ext': 'mp4',
                  'title': 'Tikibad ontruimd wegens brand',
                  'description': 'md5:05ca046ff47b931f9b04855015e163a4',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
                  'duration': 33,
              },
              'params': {
@@ -297,7 +303,7 @@ class GenericIE(InfoExtractor):
                  'ext': 'mp4',
                  'upload_date': '20130224',
                  'uploader_id': 'TheVerge',
-                'description': 're:^Chris Ziegler takes a look at the\.*',
+                'description': r're:^Chris Ziegler takes a look at the\.*',
                  'uploader': 'The Verge',
                  'title': 'First Firefox OS phones side-by-side',
              },
@@ -343,10 +349,10 @@ class GenericIE(InfoExtractor):
              },
              'skip': 'There is a limit of 200 free downloads / month for the test song',
          },
-        # embedded brightcove video
-        # it also tests brightcove videos that need to set the 'Referer' in the
-        # http requests
          {
+            # embedded brightcove video
+            # it also tests brightcove videos that need to set the 'Referer'
+            # in the http requests
              'add_ie': ['BrightcoveLegacy'],
              'url': 'http://www.bfmtv.com/video/bfmbusiness/cours-bourse/cours-bourse-l-analyse-technique-154522/',
              'info_dict': {
@@ -360,6 +366,24 @@ class GenericIE(InfoExtractor):
                  'skip_download': True,
              },
          },
+        {
+            # embedded with itemprop embedURL and video id spelled as `idVideo`
+            'add_id': ['BrightcoveLegacy'],
+            'url': 'http://bfmbusiness.bfmtv.com/mediaplayer/chroniques/olivier-delamarche/',
+            'info_dict': {
+                'id': '5255628253001',
+                'ext': 'mp4',
+                'title': 'md5:37c519b1128915607601e75a87995fc0',
+                'description': 'md5:37f7f888b434bb8f8cc8dbd4f7a4cf26',
+                'uploader': 'BFM BUSINESS',
+                'uploader_id': '876450612001',
+                'timestamp': 1482255315,
+                'upload_date': '20161220',
+            },
+            'params': {
+                'skip_download': True,
+            },
+        },
          {
              # https://github.com/rg3/youtube-dl/issues/2253
              'url': 'http://bcove.me/i6nfkrc3',
@@ -401,6 +425,26 @@ class GenericIE(InfoExtractor):
                  'skip_download': True,  # m3u8 download
              },
          },
+        {
+            # Brightcove with alternative playerID key
+            'url': 'http://www.nature.com/nmeth/journal/v9/n7/fig_tab/nmeth.2062_SV1.html',
+            'info_dict': {
+                'id': 'nmeth.2062_SV1',
+                'title': 'Simultaneous multiview imaging of the Drosophila syncytial blastoderm : Quantitative high-speed imaging of entire developing embryos with simultaneous multiview light-sheet microscopy : Nature Methods : Nature Research',
+            },
+            'playlist': [{
+                'info_dict': {
+                    'id': '2228375078001',
+                    'ext': 'mp4',
+                    'title': 'nmeth.2062-sv1',
+                    'description': 'nmeth.2062-sv1',
+                    'timestamp': 1363357591,
+                    'upload_date': '20130315',
+                    'uploader': 'Nature Publishing Group',
+                    'uploader_id': '1964492299001',
+                },
+            }],
+        },
          # ooyala video
          {
              'url': 'http://www.rollingstone.com/music/videos/norwegian-dj-cashmere-cat-goes-spartan-on-with-me-premiere-20131219',
@@ -518,7 +562,7 @@ class GenericIE(InfoExtractor):
                  'id': 'f4dafcad-ff21-423d-89b5-146cfd89fa1e',
                  'ext': 'mp4',
                  'title': 'Ужастики, русский трейлер (2015)',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
                  'duration': 153,
              }
          },
@@ -546,17 +590,6 @@ class GenericIE(InfoExtractor):
                  'description': 'md5:8145d19d320ff3e52f28401f4c4283b9',
              }
          },
-        # Embedded Ustream video
-        {
-            'url': 'http://www.american.edu/spa/pti/nsa-privacy-janus-2014.cfm',
-            'md5': '27b99cdb639c9b12a79bca876a073417',
-            'info_dict': {
-                'id': '45734260',
-                'ext': 'flv',
-                'uploader': 'AU SPA:  The NSA and Privacy',
-                'title': 'NSA and Privacy Forum Debate featuring General Hayden and Barton Gellman'
-            }
-        },
          # nowvideo embed hidden behind percent encoding
          {
              'url': 'http://www.waoanime.tv/the-super-dimension-fortress-macross-episode-1/',
@@ -738,7 +771,7 @@ class GenericIE(InfoExtractor):
                  'duration': 48,
                  'timestamp': 1401537900,
                  'upload_date': '20140531',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
              },
          },
          # Wistia embed
@@ -808,6 +841,21 @@ class GenericIE(InfoExtractor):
              },
              'playlist_mincount': 7,
          },
+        # TuneIn station embed
+        {
+            'url': 'http://radiocnrv.com/promouvoir-radio-cnrv/',
+            'info_dict': {
+                'id': '204146',
+                'ext': 'mp3',
+                'title': 'CNRV',
+                'location': 'Paris, France',
+                'is_live': True,
+            },
+            'params': {
+                # Live stream
+                'skip_download': True,
+            },
+        },
          # Livestream embed
          {
              'url': 'http://www.esa.int/Our_Activities/Space_Science/Rosetta/Philae_comet_touch-down_webcast',
@@ -972,6 +1020,20 @@ class GenericIE(InfoExtractor):
                  'skip_download': True,
              }
          },
+        {
+            # Kaltura embedded, some fileExt broken (#11480)
+            'url': 'http://www.cornell.edu/video/nima-arkani-hamed-standard-models-of-particle-physics',
+            'info_dict': {
+                'id': '1_sgtvehim',
+                'ext': 'mp4',
+                'title': 'Our "Standard Models" of particle physics and cosmology',
+                'description': 'md5:67ea74807b8c4fea92a6f38d6d323861',
+                'timestamp': 1321158993,
+                'upload_date': '20111113',
+                'uploader_id': 'kps1',
+            },
+            'add_ie': ['Kaltura'],
+        },
          # Eagle.Platform embed (generic URL)
          {
              'url': 'http://lenta.ru/news/2015/03/06/navalny/',
@@ -981,7 +1043,7 @@ class GenericIE(InfoExtractor):
                  'ext': 'mp4',
                  'title': 'Навальный вышел на свободу',
                  'description': 'md5:d97861ac9ae77377f3f20eaf9d04b4f5',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
                  'duration': 87,
                  'view_count': int,
                  'age_limit': 0,
@@ -995,7 +1057,7 @@ class GenericIE(InfoExtractor):
                  'id': '12820',
                  'ext': 'mp4',
                  'title': "'O Sole Mio",
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
                  'duration': 216,
                  'view_count': int,
              },
@@ -1008,7 +1070,7 @@ class GenericIE(InfoExtractor):
                  'ext': 'mp4',
                  'title': 'Тайны перевала Дятлова • 1 серия 2 часть',
                  'description': 'Документальный сериал-расследование одной из самых жутких тайн ХХ века',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
                  'duration': 694,
                  'age_limit': 0,
              },
@@ -1020,7 +1082,7 @@ class GenericIE(InfoExtractor):
                  'id': '3519514',
                  'ext': 'mp4',
                  'title': 'Joe Dirt 2 Beautiful Loser Teaser Trailer',
-                'thumbnail': 're:^https?://.*\.png$',
+                'thumbnail': r're:^https?://.*\.png$',
                  'duration': 45.115,
              },
          },
@@ -1103,7 +1165,7 @@ class GenericIE(InfoExtractor):
                  'id': '300346',
                  'ext': 'mp4',
                  'title': '中一中男師變性 全校師生力挺',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
              },
              'params': {
                  # m3u8 download
@@ -1149,7 +1211,7 @@ class GenericIE(InfoExtractor):
                  'ext': 'mp4',
                  'title': 'Sauvons les abeilles ! - Le débat',
                  'description': 'md5:d9082128b1c5277987825d684939ca26',
-                'thumbnail': 're:^https?://.*\.jpe?g$',
+                'thumbnail': r're:^https?://.*\.jpe?g$',
                  'timestamp': 1434970506,
                  'upload_date': '20150622',
                  'uploader': 'Public Sénat',
@@ -1163,7 +1225,7 @@ class GenericIE(InfoExtractor):
                  'id': '2855',
                  'ext': 'mp4',
                  'title': 'Don’t Understand Bitcoin? This Man Will Mumble An Explanation At You',
-                'thumbnail': 're:^https?://.*\.jpe?g$',
+                'thumbnail': r're:^https?://.*\.jpe?g$',
                  'uploader': 'ClickHole',
                  'uploader_id': 'clickhole',
              }
@@ -1189,16 +1251,6 @@ class GenericIE(InfoExtractor):
                  'duration': 248.667,
              },
          },
-        # ScreenwaveMedia embed
-        {
-            'url': 'http://www.thecinemasnob.com/the-cinema-snob/a-nightmare-on-elm-street-2-freddys-revenge1',
-            'md5': '24ace5baba0d35d55c6810b51f34e9e0',
-            'info_dict': {
-                'id': 'cinemasnob-55d26273809dd',
-                'ext': 'mp4',
-                'title': 'cinemasnob',
-            },
-        },
          # BrightcoveInPageEmbed embed
          {
              'url': 'http://www.geekandsundry.com/tabletop-bonus-wils-final-thoughts-on-dread/',
@@ -1399,6 +1451,29 @@ class GenericIE(InfoExtractor):
              },
              'playlist_mincount': 3,
          },
+        {
+            # Videa embeds
+            'url': 'http://forum.dvdtalk.com/movie-talk/623756-deleted-magic-star-wars-ot-deleted-alt-scenes-docu-style.html',
+            'info_dict': {
+                'id': '623756-deleted-magic-star-wars-ot-deleted-alt-scenes-docu-style',
+                'title': 'Deleted Magic - Star Wars: OT Deleted / Alt. Scenes Docu. Style - DVD Talk Forum',
+            },
+            'playlist_mincount': 2,
+        },
+        {
+            # 20 minuten embed
+            'url': 'http://www.20min.ch/schweiz/news/story/So-kommen-Sie-bei-Eis-und-Schnee-sicher-an-27032552',
+            'info_dict': {
+                'id': '523629',
+                'ext': 'mp4',
+                'title': 'So kommen Sie bei Eis und Schnee sicher an',
+                'description': 'md5:117c212f64b25e3d95747e5276863f7d',
+            },
+            'params': {
+                'skip_download': True,
+            },
+            'add_ie': [TwentyMinutenIE.ie_key()],
+        }
          # {
          #     # TODO: find another test
          #     # http://schema.org/VideoObject
@@ -1890,7 +1965,14 @@ def _playlist_from_matches(matches, getter=None, ie=None):
                  re.search(r'SBN\.VideoLinkset\.ooyala\([\'"](?P<ec>.{32})[\'"]\)', webpage) or
                  re.search(r'data-ooyala-video-id\s*=\s*[\'"](?P<ec>.{32})[\'"]', webpage))
          if mobj is not None:
-            return OoyalaIE._build_url_result(smuggle_url(mobj.group('ec'), {'domain': url}))
+            embed_token = self._search_regex(
+                r'embedToken[\'"]?\s*:\s*[\'"]([^\'"]+)',
+                webpage, 'ooyala embed token', default=None)
+            return OoyalaIE._build_url_result(smuggle_url(
+                mobj.group('ec'), {
+                    'domain': url,
+                    'embed_token': embed_token,
+                }))
  
          # Look for multiple Ooyala embeds on SBN network websites
          mobj = re.search(r'SBN\.VideoLinkset\.entryGroup\((\[.*?\])', webpage)
@@ -2021,10 +2103,9 @@ def _playlist_from_matches(matches, getter=None, ie=None):
              return self.url_result(mobj.group('url'), 'TED')
  
          # Look for embedded Ustream videos
-        mobj = re.search(
-            r'<iframe[^>]+?src=(["\'])(?P<url>http://www\.ustream\.tv/embed/.+?)\1', webpage)
-        if mobj is not None:
-            return self.url_result(mobj.group('url'), 'Ustream')
+        ustream_url = UstreamIE._extract_url(webpage)
+        if ustream_url:
+            return self.url_result(ustream_url, UstreamIE.ie_key())
  
          # Look for embedded arte.tv player
          mobj = re.search(
@@ -2055,6 +2136,11 @@ def _playlist_from_matches(matches, getter=None, ie=None):
          if soundcloud_urls:
              return _playlist_from_matches(soundcloud_urls, getter=unescapeHTML, ie=SoundcloudIE.ie_key())
  
+        # Look for tunein player
+        tunein_urls = TuneInBaseIE._extract_urls(webpage)
+        if tunein_urls:
+            return _playlist_from_matches(tunein_urls)
+
          # Look for embedded mtvservices player
          mtvservices_url = MTVServicesEmbeddedIE._extract_url(webpage)
          if mtvservices_url:
@@ -2140,6 +2226,11 @@ def _playlist_from_matches(matches, getter=None, ie=None):
          if videomore_url:
              return self.url_result(videomore_url)
  
+        # Look for Webcaster embeds
+        webcaster_url = WebcasterFeedIE._extract_url(self, webpage)
+        if webcaster_url:
+            return self.url_result(webcaster_url, ie=WebcasterFeedIE.ie_key())
+
          # Look for Playwire embeds
          mobj = re.search(
              r'<script[^>]+data-config=(["\'])(?P<url>(?:https?:)?//config\.playwire\.com/.+?)\1', webpage)
@@ -2206,11 +2297,6 @@ def _playlist_from_matches(matches, getter=None, ie=None):
          if jwplatform_url:
              return self.url_result(jwplatform_url, 'JWPlatform')
  
-        # Look for ScreenwaveMedia embeds
-        mobj = re.search(ScreenwaveMediaIE.EMBED_PATTERN, webpage)
-        if mobj is not None:
-            return self.url_result(unescapeHTML(mobj.group('url')), 'ScreenwaveMedia')
-
          # Look for Digiteka embeds
          digiteka_url = DigitekaIE._extract_url(webpage)
          if digiteka_url:
@@ -2221,6 +2307,11 @@ def _playlist_from_matches(matches, getter=None, ie=None):
          if arkena_url:
              return self.url_result(arkena_url, ArkenaIE.ie_key())
  
+        # Look for Piksel embeds
+        piksel_url = PikselIE._extract_url(webpage)
+        if piksel_url:
+            return self.url_result(piksel_url, PikselIE.ie_key())
+
          # Look for Limelight embeds
          mobj = re.search(r'LimelightPlayer\.doLoad(Media|Channel|ChannelList)\(["\'](?P<id>[a-z0-9]{32})', webpage)
          if mobj:
@@ -2232,6 +2323,16 @@ def _playlist_from_matches(matches, getter=None, ie=None):
              return self.url_result('limelight:%s:%s' % (
                  lm[mobj.group(1)], mobj.group(2)), 'Limelight%s' % mobj.group(1), mobj.group(2))
  
+        mobj = re.search(
+            r'''(?sx)
+                <object[^>]+class=(["\'])LimelightEmbeddedPlayerFlash\1[^>]*>.*?
+                    <param[^>]+
+                        name=(["\'])flashVars\2[^>]+
+                        value=(["\'])(?:(?!\3).)*mediaId=(?P<id>[a-z0-9]{32})
+            ''', webpage)
+        if mobj:
+            return self.url_result('limelight:media:%s' % mobj.group('id'))
+
          # Look for AdobeTVVideo embeds
          mobj = re.search(
              r'<iframe[^>]+src=[\'"]((?:https?:)?//video\.tv\.adobe\.com/v/\d+[^"]+)[\'"]',
@@ -2320,6 +2421,23 @@ def _playlist_from_matches(matches, getter=None, ie=None):
          if dbtv_urls:
              return _playlist_from_matches(dbtv_urls, ie=DBTVIE.ie_key())
  
+        # Look for Videa embeds
+        videa_urls = VideaIE._extract_urls(webpage)
+        if videa_urls:
+            return _playlist_from_matches(videa_urls, ie=VideaIE.ie_key())
+
+        # Look for 20 minuten embeds
+        twentymin_urls = TwentyMinutenIE._extract_urls(webpage)
+        if twentymin_urls:
+            return _playlist_from_matches(
+                twentymin_urls, ie=TwentyMinutenIE.ie_key())
+
+        # Look for Openload embeds
+        openload_urls = OpenloadIE._extract_urls(webpage)
+        if openload_urls:
+            return _playlist_from_matches(
+                openload_urls, ie=OpenloadIE.ie_key())
+
          # Looking for http://schema.org/VideoObject
          json_ld = self._search_json_ld(
              webpage, video_id, default={}, expected_type='VideoObject')
diff --git a/youtube_dl/extractor/giantbomb.py b/youtube_dl/extractor/giantbomb.py

index 87cd19147d707c50606c43eecb54aef828ba778b..29b684d35875031c0b5a256e0f12cf0695b90353 100644 (file)
--- a/youtube_dl/extractor/giantbomb.py
+++ b/youtube_dl/extractor/giantbomb.py
@@ -23,7 +23,7 @@ class GiantBombIE(InfoExtractor):
              'title': 'Quick Look: Destiny: The Dark Below',
              'description': 'md5:0aa3aaf2772a41b91d44c63f30dfad24',
              'duration': 2399,
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          }
      }
  
diff --git a/youtube_dl/extractor/giga.py b/youtube_dl/extractor/giga.py

index 28eb733e2bac89818a54952f77b342fec6ebe4ff..5a9992a278580478655ebe7856d8183b2a56e58d 100644 (file)
--- a/youtube_dl/extractor/giga.py
+++ b/youtube_dl/extractor/giga.py
@@ -24,7 +24,7 @@ class GigaIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Anime Awesome: Chihiros Reise ins Zauberland – Das Beste kommt zum Schluss',
              'description': 'md5:afdf5862241aded4718a30dff6a57baf',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 578,
              'timestamp': 1414749706,
              'upload_date': '20141031',
diff --git a/youtube_dl/extractor/glide.py b/youtube_dl/extractor/glide.py

index f0d951396fdba4f74027e81af629f7c27c253f9a..d94dfbf09307b44ddfd6b1576ebca67eb6b6f349 100644 (file)
--- a/youtube_dl/extractor/glide.py
+++ b/youtube_dl/extractor/glide.py
@@ -14,7 +14,7 @@ class GlideIE(InfoExtractor):
              'id': 'UZF8zlmuQbe4mr+7dCiQ0w==',
              'ext': 'mp4',
              'title': "Damon's Glide message",
-            'thumbnail': 're:^https?://.*?\.cloudfront\.net/.*\.jpg$',
+            'thumbnail': r're:^https?://.*?\.cloudfront\.net/.*\.jpg$',
          }
      }
  
diff --git a/youtube_dl/extractor/go.py b/youtube_dl/extractor/go.py

index c7776b1868e617cf78e37dd4c4b3bc742e89e8b7..a34779b169ddf852d3378389f07189c1b051d38c 100644 (file)
--- a/youtube_dl/extractor/go.py
+++ b/youtube_dl/extractor/go.py
@@ -43,7 +43,10 @@ def _real_extract(self, url):
          sub_domain, video_id, display_id = re.match(self._VALID_URL, url).groups()
          if not video_id:
              webpage = self._download_webpage(url, display_id)
-            video_id = self._search_regex(r'data-video-id=["\']VDKA(\w+)', webpage, 'video id')
+            video_id = self._search_regex(
+                # There may be inner quotes, e.g. data-video-id="'VDKA3609139'"
+                # from http://freeform.go.com/shows/shadowhunters/episodes/season-2/1-this-guilty-blood
+                r'data-video-id=["\']*VDKA(\w+)', webpage, 'video id')
          brand = self._BRANDS[sub_domain]
          video_data = self._download_json(
              'http://api.contents.watchabc.go.com/vp2/ws/contents/3000/videos/%s/001/-1/-1/-1/%s/-1/-1.json' % (brand, video_id),
diff --git a/youtube_dl/extractor/godtube.py b/youtube_dl/extractor/godtube.py

index 363dc66086e350af241959f2b547004ebd07d6db..92efd16b3e6234d9d64392a14ba47ac3f315a942 100644 (file)
--- a/youtube_dl/extractor/godtube.py
+++ b/youtube_dl/extractor/godtube.py
@@ -23,7 +23,7 @@ class GodTubeIE(InfoExtractor):
                  'timestamp': 1205712000,
                  'uploader': 'beverlybmusic',
                  'upload_date': '20080317',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
              },
          },
      ]
diff --git a/youtube_dl/extractor/goshgay.py b/youtube_dl/extractor/goshgay.py

index 74e1720ee325da8fb4c011eddec342fe2de62d9b..377981d3e41ca76c29daeecbb5045928dff87a43 100644 (file)
--- a/youtube_dl/extractor/goshgay.py
+++ b/youtube_dl/extractor/goshgay.py
@@ -19,7 +19,7 @@ class GoshgayIE(InfoExtractor):
              'id': '299069',
              'ext': 'flv',
              'title': 'DIESEL SFW XXX Video',
-            'thumbnail': 're:^http://.*\.jpg$',
+            'thumbnail': r're:^http://.*\.jpg$',
              'duration': 80,
              'age_limit': 18,
          }
diff --git a/youtube_dl/extractor/hbo.py b/youtube_dl/extractor/hbo.py

index cbf774377b7261c326bd71f5db2d5de8216be5f4..8116ad9bd42f840bc5875070d5f40e8d904b7abb 100644 (file)
--- a/youtube_dl/extractor/hbo.py
+++ b/youtube_dl/extractor/hbo.py
@@ -120,7 +120,7 @@ class HBOIE(HBOBaseIE):
              'id': '1437839',
              'ext': 'mp4',
              'title': 'Ep. 64 Clip: Encryption',
-            'thumbnail': 're:https?://.*\.jpg$',
+            'thumbnail': r're:https?://.*\.jpg$',
              'duration': 1072,
          }
      }
@@ -141,7 +141,7 @@ class HBOEpisodeIE(HBOBaseIE):
              'display_id': 'ep-52-inside-the-episode',
              'ext': 'mp4',
              'title': 'Ep. 52: Inside the Episode',
-            'thumbnail': 're:https?://.*\.jpg$',
+            'thumbnail': r're:https?://.*\.jpg$',
              'duration': 240,
          },
      }, {
diff --git a/youtube_dl/extractor/hearthisat.py b/youtube_dl/extractor/hearthisat.py

index 2564538820e7d534adc24fd8c967ee44490e0dc3..18c2520120463ebf17253f0696275f2ea2736d66 100644 (file)
--- a/youtube_dl/extractor/hearthisat.py
+++ b/youtube_dl/extractor/hearthisat.py
@@ -25,7 +25,7 @@ class HearThisAtIE(InfoExtractor):
              'id': '150939',
              'ext': 'wav',
              'title': 'Moofi - Dr. Kreep',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'timestamp': 1421564134,
              'description': 'Listen to Dr. Kreep by Moofi on hearthis.at - Modular, Eurorack, Mutable Intruments Braids, Valhalla-DSP',
              'upload_date': '20150118',
@@ -46,7 +46,7 @@ class HearThisAtIE(InfoExtractor):
              'description': 'Listen to DJ Jim Hopkins -  Totally Bitchin\' 80\'s Dance Mix! by TwitchSF on hearthis.at - Dance',
              'upload_date': '20160328',
              'timestamp': 1459186146,
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'comment_count': int,
              'view_count': int,
              'like_count': int,
diff --git a/youtube_dl/extractor/heise.py b/youtube_dl/extractor/heise.py

index 278d9f527fd41c8e1e2c180a9ae455a23fbef1fc..1629cdb8d5a7ca584321474cb160f9907884dd69 100644 (file)
--- a/youtube_dl/extractor/heise.py
+++ b/youtube_dl/extractor/heise.py
@@ -29,7 +29,7 @@ class HeiseIE(InfoExtractor):
              'timestamp': 1411812600,
              'upload_date': '20140927',
              'description': 'In uplink-Episode 3.3 geht es darum, wie man sich von Cloud-Anbietern emanzipieren kann, worauf man beim Kauf einer Tastatur achten sollte und was Smartphones über uns verraten.',
-            'thumbnail': 're:^https?://.*\.jpe?g$',
+            'thumbnail': r're:^https?://.*\.jpe?g$',
          }
      }
  
diff --git a/youtube_dl/extractor/hellporno.py b/youtube_dl/extractor/hellporno.py

index 7a1c75b655439a953e46f1692cd672c6c27374ef..0ee8ea712c72e618a4d7544f26c376e94fcaf70d 100644 (file)
--- a/youtube_dl/extractor/hellporno.py
+++ b/youtube_dl/extractor/hellporno.py
@@ -6,12 +6,13 @@
  from ..utils import (
      js_to_json,
      remove_end,
+    determine_ext,
  )
  
  
  class HellPornoIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?hellporno\.com/videos/(?P<id>[^/]+)'
-    _TEST = {
+    _VALID_URL = r'https?://(?:www\.)?hellporno\.(?:com/videos|net/v)/(?P<id>[^/]+)'
+    _TESTS = [{
          'url': 'http://hellporno.com/videos/dixie-is-posing-with-naked-ass-very-erotic/',
          'md5': '1fee339c610d2049699ef2aa699439f1',
          'info_dict': {
@@ -19,10 +20,13 @@ class HellPornoIE(InfoExtractor):
              'display_id': 'dixie-is-posing-with-naked-ass-very-erotic',
              'ext': 'mp4',
              'title': 'Dixie is posing with naked ass very erotic',
-            'thumbnail': 're:https?://.*\.jpg$',
+            'thumbnail': r're:https?://.*\.jpg$',
              'age_limit': 18,
          }
-    }
+    }, {
+        'url': 'http://hellporno.net/v/186271/',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
          display_id = self._match_id(url)
@@ -38,7 +42,7 @@ def _real_extract(self, url):
  
          video_id = flashvars.get('video_id')
          thumbnail = flashvars.get('preview_url')
-        ext = flashvars.get('postfix', '.mp4')[1:]
+        ext = determine_ext(flashvars.get('postfix'), 'mp4')
  
          formats = []
          for video_url_key in ['video_url', 'video_alt_url']:
diff --git a/youtube_dl/extractor/historicfilms.py b/youtube_dl/extractor/historicfilms.py

index 6a36933ac2c98ada87b21af4089aa158d42a3112..56343e98fb6fe33b7d714289c60db47156f48ef2 100644 (file)
--- a/youtube_dl/extractor/historicfilms.py
+++ b/youtube_dl/extractor/historicfilms.py
@@ -14,7 +14,7 @@ class HistoricFilmsIE(InfoExtractor):
              'ext': 'mov',
              'title': 'Historic Films: GP-7',
              'description': 'md5:1a86a0f3ac54024e419aba97210d959a',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 2096,
          },
      }
diff --git a/youtube_dl/extractor/hitbox.py b/youtube_dl/extractor/hitbox.py

index ff797438dec12303aab55af0e29aac8bd35229c5..e21ebb8fb4057ac6b6d226b3c0f99501b3340cdf 100644 (file)
--- a/youtube_dl/extractor/hitbox.py
+++ b/youtube_dl/extractor/hitbox.py
@@ -25,7 +25,7 @@ class HitboxIE(InfoExtractor):
              'alt_title': 'hitboxlive - Aug 9th #6',
              'description': '',
              'ext': 'mp4',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 215.1666,
              'resolution': 'HD 720p',
              'uploader': 'hitboxlive',
@@ -163,7 +163,7 @@ def _real_extract(self, url):
              if cdn.get('rtmpSubscribe') is True:
                  continue
              base_url = cdn.get('netConnectionUrl')
-            host = re.search('.+\.([^\.]+\.[^\./]+)/.+', base_url).group(1)
+            host = re.search(r'.+\.([^\.]+\.[^\./]+)/.+', base_url).group(1)
              if base_url not in servers:
                  servers.append(base_url)
                  for stream in cdn.get('bitrates'):
diff --git a/youtube_dl/extractor/hitrecord.py b/youtube_dl/extractor/hitrecord.py

new file mode 100644 (file)

index 0000000..01a6946
--- /dev/null
+++ b/youtube_dl/extractor/hitrecord.py
@@ -0,0 +1,68 @@
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..compat import compat_str
+from ..utils import (
+    clean_html,
+    float_or_none,
+    int_or_none,
+    try_get,
+)
+
+
+class HitRecordIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?hitrecord\.org/records/(?P<id>\d+)'
+    _TEST = {
+        'url': 'https://hitrecord.org/records/2954362',
+        'md5': 'fe1cdc2023bce0bbb95c39c57426aa71',
+        'info_dict': {
+            'id': '2954362',
+            'ext': 'mp4',
+            'title': 'A Very Different World (HITRECORD x ACLU)',
+            'description': 'md5:e62defaffab5075a5277736bead95a3d',
+            'duration': 139.327,
+            'timestamp': 1471557582,
+            'upload_date': '20160818',
+            'uploader': 'Zuzi.C12',
+            'uploader_id': '362811',
+            'view_count': int,
+            'like_count': int,
+            'comment_count': int,
+            'tags': list,
+        }
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        video = self._download_json(
+            'https://hitrecord.org/api/web/records/%s' % video_id, video_id)
+
+        title = video['title']
+        video_url = video['source_url']['mp4_url']
+
+        tags = None
+        tags_list = try_get(video, lambda x: x['tags'], list)
+        if tags_list:
+            tags = [
+                t['text']
+                for t in tags_list
+                if isinstance(t, dict) and t.get('text') and
+                isinstance(t['text'], compat_str)]
+
+        return {
+            'id': video_id,
+            'url': video_url,
+            'title': title,
+            'description': clean_html(video.get('body')),
+            'duration': float_or_none(video.get('duration'), 1000),
+            'timestamp': int_or_none(video.get('created_at_i')),
+            'uploader': try_get(
+                video, lambda x: x['user']['username'], compat_str),
+            'uploader_id': try_get(
+                video, lambda x: compat_str(x['user']['id'])),
+            'view_count': int_or_none(video.get('total_views_count')),
+            'like_count': int_or_none(video.get('hearts_count')),
+            'comment_count': int_or_none(video.get('comments_count')),
+            'tags': tags,
+        }
diff --git a/youtube_dl/extractor/hornbunny.py b/youtube_dl/extractor/hornbunny.py

index 0615f06af4139acbd3164f5aaac1ab2ede4cdc27..c458a959d9767c47eeaaf7a05f5c853637b6ab34 100644 (file)
--- a/youtube_dl/extractor/hornbunny.py
+++ b/youtube_dl/extractor/hornbunny.py
@@ -20,7 +20,7 @@ class HornBunnyIE(InfoExtractor):
              'duration': 550,
              'age_limit': 18,
              'view_count': int,
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          }
      }
  
diff --git a/youtube_dl/extractor/howstuffworks.py b/youtube_dl/extractor/howstuffworks.py

index 65ba2a48b069bd67d2b3382f2d87bc1160145612..2be68abad0af91f1b508bc2cfa6e984ac39dbfd0 100644 (file)
--- a/youtube_dl/extractor/howstuffworks.py
+++ b/youtube_dl/extractor/howstuffworks.py
@@ -21,7 +21,7 @@ class HowStuffWorksIE(InfoExtractor):
                  'title': 'Cool Jobs - Iditarod Musher',
                  'description': 'Cold sleds, freezing temps and warm dog breath... an Iditarod musher\'s dream. Kasey-Dee Gardner jumps on a sled to find out what the big deal is.',
                  'display_id': 'cool-jobs-iditarod-musher',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
                  'duration': 161,
              },
              'skip': 'Video broken',
@@ -34,7 +34,7 @@ class HowStuffWorksIE(InfoExtractor):
                  'title': 'Survival Zone: Food and Water In the Savanna',
                  'description': 'Learn how to find both food and water while trekking in the African savannah. In this video from the Discovery Channel.',
                  'display_id': 'survival-zone-food-and-water-in-the-savanna',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
              },
          },
          {
@@ -45,7 +45,7 @@ class HowStuffWorksIE(InfoExtractor):
                  'title': 'Sword Swallowing #1 by Dan Meyer',
                  'description': 'Video footage (1 of 3) used by permission of the owner Dan Meyer through Sword Swallowers Association International <www.swordswallow.org>',
                  'display_id': 'sword-swallowing-1-by-dan-meyer',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
              },
          },
          {
diff --git a/youtube_dl/extractor/huajiao.py b/youtube_dl/extractor/huajiao.py

index cec0df09a1e78dcff6d2ed4118200e96a55b0050..4ca275dda18e45e18fd628a0c8a5104fd6cfb560 100644 (file)
--- a/youtube_dl/extractor/huajiao.py
+++ b/youtube_dl/extractor/huajiao.py
@@ -20,7 +20,7 @@ class HuajiaoIE(InfoExtractor):
              'title': '#新人求关注#',
              'description': 're:.*',
              'duration': 2424.0,
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'timestamp': 1475866459,
              'upload_date': '20161007',
              'uploader': 'Penny_余姿昀',
diff --git a/youtube_dl/extractor/huffpost.py b/youtube_dl/extractor/huffpost.py

index 059073749e67605464b6159b9391f71eb5a6052d..97e36f0568f45c0da495cdb54851405a12e51fc7 100644 (file)
--- a/youtube_dl/extractor/huffpost.py
+++ b/youtube_dl/extractor/huffpost.py
@@ -52,7 +52,7 @@ def _real_extract(self, url):
  
          thumbnails = []
          for url in filter(None, data['images'].values()):
-            m = re.match('.*-([0-9]+x[0-9]+)\.', url)
+            m = re.match(r'.*-([0-9]+x[0-9]+)\.', url)
              if not m:
                  continue
              thumbnails.append({
diff --git a/youtube_dl/extractor/imdb.py b/youtube_dl/extractor/imdb.py

index f0fc8d49a4ad50c128d124534fc37141cb510ba6..f95c00c7330f3db4b5354161804460cbc2bb53d0 100644 (file)
--- a/youtube_dl/extractor/imdb.py
+++ b/youtube_dl/extractor/imdb.py
@@ -13,7 +13,7 @@
  class ImdbIE(InfoExtractor):
      IE_NAME = 'imdb'
      IE_DESC = 'Internet Movie Database trailers'
-    _VALID_URL = r'https?://(?:www|m)\.imdb\.com/(?:video/[^/]+/|title/tt\d+.*?#lb-)vi(?P<id>\d+)'
+    _VALID_URL = r'https?://(?:www|m)\.imdb\.com/(?:video/[^/]+/|title/tt\d+.*?#lb-|videoplayer/)vi(?P<id>\d+)'
  
      _TESTS = [{
          'url': 'http://www.imdb.com/video/imdb/vi2524815897',
@@ -32,6 +32,9 @@ class ImdbIE(InfoExtractor):
      }, {
          'url': 'http://www.imdb.com/title/tt1667889/#lb-vi2524815897',
          'only_matching': True,
+    }, {
+        'url': 'http://www.imdb.com/videoplayer/vi1562949145',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
diff --git a/youtube_dl/extractor/inc.py b/youtube_dl/extractor/inc.py

new file mode 100644 (file)

index 0000000..241ec83
--- /dev/null
+++ b/youtube_dl/extractor/inc.py
@@ -0,0 +1,41 @@
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from .kaltura import KalturaIE
+
+
+class IncIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?inc\.com/(?:[^/]+/)+(?P<id>[^.]+).html'
+    _TESTS = [{
+        'url': 'http://www.inc.com/tip-sheet/bill-gates-says-these-5-books-will-make-you-smarter.html',
+        'md5': '7416739c9c16438c09fa35619d6ba5cb',
+        'info_dict': {
+            'id': '1_wqig47aq',
+            'ext': 'mov',
+            'title': 'Bill Gates Says These 5 Books Will Make You Smarter',
+            'description': 'md5:bea7ff6cce100886fc1995acb743237e',
+            'timestamp': 1474414430,
+            'upload_date': '20160920',
+            'uploader_id': 'video@inc.com',
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }, {
+        'url': 'http://www.inc.com/video/david-whitford/founders-forum-tripadvisor-steve-kaufer-most-enjoyable-moment-for-entrepreneur.html',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        display_id = self._match_id(url)
+        webpage = self._download_webpage(url, display_id)
+
+        partner_id = self._search_regex(
+            r'var\s+_?bizo_data_partner_id\s*=\s*["\'](\d+)', webpage, 'partner id')
+
+        kaltura_id = self._parse_json(self._search_regex(
+            r'pageInfo\.videos\s*=\s*\[(.+)\];', webpage, 'kaltura id'),
+            display_id)['vid_kaltura_id']
+
+        return self.url_result(
+            'kaltura:%s:%s' % (partner_id, kaltura_id), KalturaIE.ie_key())
diff --git a/youtube_dl/extractor/indavideo.py b/youtube_dl/extractor/indavideo.py

index c6f080484a99f43614f104ead8023e8e57609cda..11cf3c60964fe55c21282ecccf48a7d80ae4bac5 100644 (file)
--- a/youtube_dl/extractor/indavideo.py
+++ b/youtube_dl/extractor/indavideo.py
@@ -19,7 +19,7 @@ class IndavideoEmbedIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Cicatánc',
              'description': '',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'uploader': 'cukiajanlo',
              'uploader_id': '83729',
              'timestamp': 1439193826,
@@ -102,7 +102,7 @@ class IndavideoIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Vicces cica',
              'description': 'Játszik a tablettel. :D',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'uploader': 'Jet_Pack',
              'uploader_id': '491217',
              'timestamp': 1390821212,
diff --git a/youtube_dl/extractor/infoq.py b/youtube_dl/extractor/infoq.py

index cca0b8a9323c0d2412c65610a3acb3ef2943ba6f..9fb71e8effe107c6e182c3537cd634b7ac21e9bb 100644 (file)
--- a/youtube_dl/extractor/infoq.py
+++ b/youtube_dl/extractor/infoq.py
@@ -4,7 +4,10 @@
  
  import base64
  
-from ..compat import compat_urllib_parse_unquote
+from ..compat import (
+    compat_urllib_parse_unquote,
+    compat_urlparse,
+)
  from ..utils import determine_ext
  from .bokecc import BokeCCBaseIE
  
@@ -33,9 +36,21 @@ class InfoQIE(BokeCCBaseIE):
              'ext': 'flv',
              'description': 'md5:308d981fb28fa42f49f9568322c683ff',
          },
+    }, {
+        'url': 'https://www.infoq.com/presentations/Simple-Made-Easy',
+        'md5': '0e34642d4d9ef44bf86f66f6399672db',
+        'info_dict': {
+            'id': 'Simple-Made-Easy',
+            'title': 'Simple Made Easy',
+            'ext': 'mp3',
+            'description': 'md5:3e0e213a8bbd074796ef89ea35ada25b',
+        },
+        'params': {
+            'format': 'bestaudio',
+        },
      }]
  
-    def _extract_rtmp_videos(self, webpage):
+    def _extract_rtmp_video(self, webpage):
          # The server URL is hardcoded
          video_url = 'rtmpe://video.infoq.com/cfx/st/'
  
@@ -47,28 +62,53 @@ def _extract_rtmp_videos(self, webpage):
          playpath = 'mp4:' + real_id
  
          return [{
-            'format_id': 'rtmp',
+            'format_id': 'rtmp_video',
              'url': video_url,
              'ext': determine_ext(playpath),
              'play_path': playpath,
          }]
  
-    def _extract_http_videos(self, webpage):
-        http_video_url = self._search_regex(r'P\.s\s*=\s*\'([^\']+)\'', webpage, 'video URL')
-
+    def _extract_cookies(self, webpage):
          policy = self._search_regex(r'InfoQConstants.scp\s*=\s*\'([^\']+)\'', webpage, 'policy')
          signature = self._search_regex(r'InfoQConstants.scs\s*=\s*\'([^\']+)\'', webpage, 'signature')
          key_pair_id = self._search_regex(r'InfoQConstants.sck\s*=\s*\'([^\']+)\'', webpage, 'key-pair-id')
+        return 'CloudFront-Policy=%s; CloudFront-Signature=%s; CloudFront-Key-Pair-Id=%s' % (
+            policy, signature, key_pair_id)
  
+    def _extract_http_video(self, webpage):
+        http_video_url = self._search_regex(r'P\.s\s*=\s*\'([^\']+)\'', webpage, 'video URL')
          return [{
-            'format_id': 'http',
+            'format_id': 'http_video',
              'url': http_video_url,
              'http_headers': {
-                'Cookie': 'CloudFront-Policy=%s; CloudFront-Signature=%s; CloudFront-Key-Pair-Id=%s' % (
-                    policy, signature, key_pair_id),
+                'Cookie': self._extract_cookies(webpage)
              },
          }]
  
+    def _extract_http_audio(self, webpage, video_id):
+        fields = self._hidden_inputs(webpage)
+        http_audio_url = fields['filename']
+        if http_audio_url is None:
+            return []
+
+        cookies_header = {'Cookie': self._extract_cookies(webpage)}
+
+        # base URL is found in the Location header in the response returned by
+        # GET https://www.infoq.com/mp3download.action?filename=... when logged in.
+        http_audio_url = compat_urlparse.urljoin('http://res.infoq.com/downloads/mp3downloads/', http_audio_url)
+
+        # audio file seem to be missing some times even if there is a download link
+        # so probe URL to make sure
+        if not self._is_valid_url(http_audio_url, video_id, headers=cookies_header):
+            return []
+
+        return [{
+            'format_id': 'http_audio',
+            'url': http_audio_url,
+            'vcodec': 'none',
+            'http_headers': cookies_header,
+        }]
+
      def _real_extract(self, url):
          video_id = self._match_id(url)
          webpage = self._download_webpage(url, video_id)
@@ -80,7 +120,10 @@ def _real_extract(self, url):
              # for China videos, HTTP video URL exists but always fails with 403
              formats = self._extract_bokecc_formats(webpage, video_id)
          else:
-            formats = self._extract_rtmp_videos(webpage) + self._extract_http_videos(webpage)
+            formats = (
+                self._extract_rtmp_video(webpage) +
+                self._extract_http_video(webpage) +
+                self._extract_http_audio(webpage, video_id))
  
          self._sort_formats(formats)
  
diff --git a/youtube_dl/extractor/instagram.py b/youtube_dl/extractor/instagram.py

index 196407b063a9393b94c759be6c8080de9a494277..98f408c18650cf8393869432a861a3486575b533 100644 (file)
--- a/youtube_dl/extractor/instagram.py
+++ b/youtube_dl/extractor/instagram.py
@@ -22,7 +22,7 @@ class InstagramIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Video by naomipq',
              'description': 'md5:1f17f0ab29bd6fe2bfad705f58de3cb8',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'timestamp': 1371748545,
              'upload_date': '20130620',
              'uploader_id': 'naomipq',
@@ -38,7 +38,7 @@ class InstagramIE(InfoExtractor):
              'id': 'BA-pQFBG8HZ',
              'ext': 'mp4',
              'title': 'Video by britneyspears',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'timestamp': 1453760977,
              'upload_date': '20160125',
              'uploader_id': 'britneyspears',
@@ -169,7 +169,7 @@ class InstagramUserIE(InfoExtractor):
                  'id': '614605558512799803_462752227',
                  'ext': 'mp4',
                  'title': '#Porsche Intelligent Performance.',
-                'thumbnail': 're:^https?://.*\.jpg',
+                'thumbnail': r're:^https?://.*\.jpg',
                  'uploader': 'Porsche',
                  'uploader_id': 'porsche',
                  'timestamp': 1387486713,
diff --git a/youtube_dl/extractor/iprima.py b/youtube_dl/extractor/iprima.py

index da2cdc656ac90f15a575eceabf33309b084c8f28..0fe5768834cef9faed9226ebc8418661306f2b54 100644 (file)
--- a/youtube_dl/extractor/iprima.py
+++ b/youtube_dl/extractor/iprima.py
@@ -65,7 +65,7 @@ def extract_formats(format_url, format_key=None, lang=None):
  
          options = self._parse_json(
              self._search_regex(
-                r'(?s)var\s+playerOptions\s*=\s*({.+?});',
+                r'(?s)(?:TDIPlayerOptions|playerOptions)\s*=\s*({.+?});\s*\]\]',
                  playerpage, 'player options', default='{}'),
              video_id, transform_source=js_to_json, fatal=False)
          if options:
diff --git a/youtube_dl/extractor/ir90tv.py b/youtube_dl/extractor/ir90tv.py

index 214bcd5b59c1a95a7a34ebc2acd87b2dc6f76454..d5a3f6fa5dbbf0d962da53df48948af4fa1d7521 100644 (file)
--- a/youtube_dl/extractor/ir90tv.py
+++ b/youtube_dl/extractor/ir90tv.py
@@ -14,7 +14,7 @@ class Ir90TvIE(InfoExtractor):
              'id': '95719',
              'ext': 'mp4',
              'title': 'شایعات نقل و انتقالات مهم فوتبال اروپا 94/02/18',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          }
      }, {
          'url': 'http://www.90tv.ir/video/95719/%D8%B4%D8%A7%DB%8C%D8%B9%D8%A7%D8%AA-%D9%86%D9%82%D9%84-%D9%88-%D8%A7%D9%86%D8%AA%D9%82%D8%A7%D9%84%D8%A7%D8%AA-%D9%85%D9%87%D9%85-%D9%81%D9%88%D8%AA%D8%A8%D8%A7%D9%84-%D8%A7%D8%B1%D9%88%D9%BE%D8%A7-940218',
diff --git a/youtube_dl/extractor/itv.py b/youtube_dl/extractor/itv.py

new file mode 100644 (file)

index 0000000..b0d8604
--- /dev/null
+++ b/youtube_dl/extractor/itv.py
@@ -0,0 +1,196 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import uuid
+import xml.etree.ElementTree as etree
+import json
+
+from .common import InfoExtractor
+from ..compat import (
+    compat_str,
+    compat_etree_register_namespace,
+)
+from ..utils import (
+    extract_attributes,
+    xpath_with_ns,
+    xpath_element,
+    xpath_text,
+    int_or_none,
+    parse_duration,
+    ExtractorError,
+    determine_ext,
+)
+
+
+class ITVIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?itv\.com/hub/[^/]+/(?P<id>[0-9a-zA-Z]+)'
+    _TEST = {
+        'url': 'http://www.itv.com/hub/mr-bean-animated-series/2a2936a0053',
+        'info_dict': {
+            'id': '2a2936a0053',
+            'ext': 'flv',
+            'title': 'Home Movie',
+        },
+        'params': {
+            # rtmp download
+            'skip_download': True,
+        },
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        webpage = self._download_webpage(url, video_id)
+        params = extract_attributes(self._search_regex(
+            r'(?s)(<[^>]+id="video"[^>]*>)', webpage, 'params'))
+
+        ns_map = {
+            'soapenv': 'http://schemas.xmlsoap.org/soap/envelope/',
+            'tem': 'http://tempuri.org/',
+            'itv': 'http://schemas.datacontract.org/2004/07/Itv.BB.Mercury.Common.Types',
+            'com': 'http://schemas.itv.com/2009/05/Common',
+        }
+        for ns, full_ns in ns_map.items():
+            compat_etree_register_namespace(ns, full_ns)
+
+        def _add_ns(name):
+            return xpath_with_ns(name, ns_map)
+
+        def _add_sub_element(element, name):
+            return etree.SubElement(element, _add_ns(name))
+
+        req_env = etree.Element(_add_ns('soapenv:Envelope'))
+        _add_sub_element(req_env, 'soapenv:Header')
+        body = _add_sub_element(req_env, 'soapenv:Body')
+        get_playlist = _add_sub_element(body, ('tem:GetPlaylist'))
+        request = _add_sub_element(get_playlist, 'tem:request')
+        _add_sub_element(request, 'itv:ProductionId').text = params['data-video-id']
+        _add_sub_element(request, 'itv:RequestGuid').text = compat_str(uuid.uuid4()).upper()
+        vodcrid = _add_sub_element(request, 'itv:Vodcrid')
+        _add_sub_element(vodcrid, 'com:Id')
+        _add_sub_element(request, 'itv:Partition')
+        user_info = _add_sub_element(get_playlist, 'tem:userInfo')
+        _add_sub_element(user_info, 'itv:Broadcaster').text = 'Itv'
+        _add_sub_element(user_info, 'itv:DM')
+        _add_sub_element(user_info, 'itv:RevenueScienceValue')
+        _add_sub_element(user_info, 'itv:SessionId')
+        _add_sub_element(user_info, 'itv:SsoToken')
+        _add_sub_element(user_info, 'itv:UserToken')
+        site_info = _add_sub_element(get_playlist, 'tem:siteInfo')
+        _add_sub_element(site_info, 'itv:AdvertisingRestriction').text = 'None'
+        _add_sub_element(site_info, 'itv:AdvertisingSite').text = 'ITV'
+        _add_sub_element(site_info, 'itv:AdvertisingType').text = 'Any'
+        _add_sub_element(site_info, 'itv:Area').text = 'ITVPLAYER.VIDEO'
+        _add_sub_element(site_info, 'itv:Category')
+        _add_sub_element(site_info, 'itv:Platform').text = 'DotCom'
+        _add_sub_element(site_info, 'itv:Site').text = 'ItvCom'
+        device_info = _add_sub_element(get_playlist, 'tem:deviceInfo')
+        _add_sub_element(device_info, 'itv:ScreenSize').text = 'Big'
+        player_info = _add_sub_element(get_playlist, 'tem:playerInfo')
+        _add_sub_element(player_info, 'itv:Version').text = '2'
+
+        headers = self.geo_verification_headers()
+        headers.update({
+            'Content-Type': 'text/xml; charset=utf-8',
+            'SOAPAction': 'http://tempuri.org/PlaylistService/GetPlaylist',
+        })
+        resp_env = self._download_xml(
+            params['data-playlist-url'], video_id,
+            headers=headers, data=etree.tostring(req_env))
+        playlist = xpath_element(resp_env, './/Playlist')
+        if playlist is None:
+            fault_string = xpath_text(resp_env, './/faultstring')
+            raise ExtractorError('%s said: %s' % (self.IE_NAME, fault_string))
+        title = xpath_text(playlist, 'EpisodeTitle', fatal=True)
+        video_element = xpath_element(playlist, 'VideoEntries/Video', fatal=True)
+        media_files = xpath_element(video_element, 'MediaFiles', fatal=True)
+        rtmp_url = media_files.attrib['base']
+
+        formats = []
+        for media_file in media_files.findall('MediaFile'):
+            play_path = xpath_text(media_file, 'URL')
+            if not play_path:
+                continue
+            tbr = int_or_none(media_file.get('bitrate'), 1000)
+            formats.append({
+                'format_id': 'rtmp' + ('-%d' % tbr if tbr else ''),
+                'url': rtmp_url,
+                'play_path': play_path,
+                'tbr': tbr,
+                'ext': 'flv',
+            })
+
+        ios_playlist_url = params.get('data-video-playlist')
+        hmac = params.get('data-video-hmac')
+        if ios_playlist_url and hmac:
+            headers = self.geo_verification_headers()
+            headers.update({
+                'Accept': 'application/vnd.itv.vod.playlist.v2+json',
+                'Content-Type': 'application/json',
+                'hmac': hmac.upper(),
+            })
+            ios_playlist = self._download_json(
+                ios_playlist_url, video_id, data=json.dumps({
+                    'user': {
+                        'itvUserId': '',
+                        'entitlements': [],
+                        'token': ''
+                    },
+                    'device': {
+                        'manufacturer': 'Apple',
+                        'model': 'iPad',
+                        'os': {
+                            'name': 'iPhone OS',
+                            'version': '9.3',
+                            'type': 'ios'
+                        }
+                    },
+                    'client': {
+                        'version': '4.1',
+                        'id': 'browser'
+                    },
+                    'variantAvailability': {
+                        'featureset': {
+                            'min': ['hls', 'aes'],
+                            'max': ['hls', 'aes']
+                        },
+                        'platformTag': 'mobile'
+                    }
+                }).encode(), headers=headers, fatal=False)
+            if ios_playlist:
+                video_data = ios_playlist.get('Playlist', {}).get('Video', {})
+                ios_base_url = video_data.get('Base')
+                for media_file in video_data.get('MediaFiles', []):
+                    href = media_file.get('Href')
+                    if not href:
+                        continue
+                    if ios_base_url:
+                        href = ios_base_url + href
+                    ext = determine_ext(href)
+                    if ext == 'm3u8':
+                        formats.extend(self._extract_m3u8_formats(href, video_id, 'mp4', m3u8_id='hls', fatal=False))
+                    else:
+                        formats.append({
+                            'url': href,
+                        })
+        self._sort_formats(formats)
+
+        subtitles = {}
+        for caption_url in video_element.findall('ClosedCaptioningURIs/URL'):
+            if not caption_url.text:
+                continue
+            ext = determine_ext(caption_url.text, 'ttml')
+            subtitles.setdefault('en', []).append({
+                'url': caption_url.text,
+                'ext': 'ttml' if ext == 'xml' else ext,
+            })
+
+        return {
+            'id': video_id,
+            'title': title,
+            'formats': formats,
+            'subtitles': subtitles,
+            'episode_title': title,
+            'episode_number': int_or_none(xpath_text(playlist, 'EpisodeNumber')),
+            'series': xpath_text(playlist, 'ProgrammeTitle'),
+            'duartion': parse_duration(xpath_text(playlist, 'Duration')),
+        }
diff --git a/youtube_dl/extractor/ivi.py b/youtube_dl/extractor/ivi.py

index 7c8cb21c2c5619b4809f5daf8605958a808eccb9..3d3c15024457e30d2002a3ee19e6eeab8a29ee4d 100644 (file)
--- a/youtube_dl/extractor/ivi.py
+++ b/youtube_dl/extractor/ivi.py
@@ -28,7 +28,7 @@ class IviIE(InfoExtractor):
                  'title': 'Иван Васильевич меняет профессию',
                  'description': 'md5:b924063ea1677c8fe343d8a72ac2195f',
                  'duration': 5498,
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
              },
              'skip': 'Only works from Russia',
          },
@@ -46,7 +46,7 @@ class IviIE(InfoExtractor):
                  'episode': 'Дело Гольдберга (1 часть)',
                  'episode_number': 1,
                  'duration': 2655,
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
              },
              'skip': 'Only works from Russia',
          },
@@ -60,7 +60,7 @@ class IviIE(InfoExtractor):
                  'title': 'Кукла',
                  'description': 'md5:ffca9372399976a2d260a407cc74cce6',
                  'duration': 5599,
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
              },
              'skip': 'Only works from Russia',
          }
diff --git a/youtube_dl/extractor/izlesene.py b/youtube_dl/extractor/izlesene.py

index aa0728abc0155fa6abbe8e2a88de18dd89d85138..b1d72177d5acef2c48a82f7df18081005199b47e 100644 (file)
--- a/youtube_dl/extractor/izlesene.py
+++ b/youtube_dl/extractor/izlesene.py
@@ -29,7 +29,7 @@ class IzleseneIE(InfoExtractor):
                  'ext': 'mp4',
                  'title': 'Sevinçten Çıldırtan Doğum Günü Hediyesi',
                  'description': 'md5:253753e2655dde93f59f74b572454f6d',
-                'thumbnail': 're:^https?://.*\.jpg',
+                'thumbnail': r're:^https?://.*\.jpg',
                  'uploader_id': 'pelikzzle',
                  'timestamp': int,
                  'upload_date': '20140702',
@@ -44,7 +44,7 @@ class IzleseneIE(InfoExtractor):
                  'id': '17997',
                  'ext': 'mp4',
                  'title': 'Tarkan Dortmund 2006 Konseri',
-                'thumbnail': 're:^https://.*\.jpg',
+                'thumbnail': r're:^https://.*\.jpg',
                  'uploader_id': 'parlayankiz',
                  'timestamp': int,
                  'upload_date': '20061112',
diff --git a/youtube_dl/extractor/jamendo.py b/youtube_dl/extractor/jamendo.py

index ee9acac09a4c14f02fc4857d9e22005a1dcd7951..595d7a5b75a25d7e5ac41b29c9052e22cc531e66 100644 (file)
--- a/youtube_dl/extractor/jamendo.py
+++ b/youtube_dl/extractor/jamendo.py
@@ -5,9 +5,27 @@
  
  from ..compat import compat_urlparse
  from .common import InfoExtractor
-
-
-class JamendoIE(InfoExtractor):
+from ..utils import parse_duration
+
+
+class JamendoBaseIE(InfoExtractor):
+    def _extract_meta(self, webpage, fatal=True):
+        title = self._og_search_title(
+            webpage, default=None) or self._search_regex(
+            r'<title>([^<]+)', webpage,
+            'title', default=None)
+        if title:
+            title = self._search_regex(
+                r'(.+?)\s*\|\s*Jamendo Music', title, 'title', default=None)
+        if not title:
+            title = self._html_search_meta(
+                'name', webpage, 'title', fatal=fatal)
+        mobj = re.search(r'(.+) - (.+)', title or '')
+        artist, second = mobj.groups() if mobj else [None] * 2
+        return title, artist, second
+
+
+class JamendoIE(JamendoBaseIE):
      _VALID_URL = r'https?://(?:www\.)?jamendo\.com/track/(?P<id>[0-9]+)/(?P<display_id>[^/?#&]+)'
      _TEST = {
          'url': 'https://www.jamendo.com/track/196219/stories-from-emona-i',
@@ -16,8 +34,11 @@ class JamendoIE(InfoExtractor):
              'id': '196219',
              'display_id': 'stories-from-emona-i',
              'ext': 'flac',
-            'title': 'Stories from Emona I',
-            'thumbnail': 're:^https?://.*\.jpg'
+            'title': 'Maya Filipič - Stories from Emona I',
+            'artist': 'Maya Filipič',
+            'track': 'Stories from Emona I',
+            'duration': 210,
+            'thumbnail': r're:^https?://.*\.jpg'
          }
      }
  
@@ -28,7 +49,7 @@ def _real_extract(self, url):
  
          webpage = self._download_webpage(url, display_id)
  
-        title = self._html_search_meta('name', webpage, 'title')
+        title, artist, track = self._extract_meta(webpage)
  
          formats = [{
              'url': 'https://%s.jamendo.com/?trackid=%s&format=%s&from=app-97dab294'
@@ -46,37 +67,47 @@ def _real_extract(self, url):
  
          thumbnail = self._html_search_meta(
              'image', webpage, 'thumbnail', fatal=False)
+        duration = parse_duration(self._search_regex(
+            r'<span[^>]+itemprop=["\']duration["\'][^>]+content=["\'](.+?)["\']',
+            webpage, 'duration', fatal=False))
  
          return {
              'id': track_id,
              'display_id': display_id,
              'thumbnail': thumbnail,
              'title': title,
+            'duration': duration,
+            'artist': artist,
+            'track': track,
              'formats': formats
          }
  
  
-class JamendoAlbumIE(InfoExtractor):
+class JamendoAlbumIE(JamendoBaseIE):
      _VALID_URL = r'https?://(?:www\.)?jamendo\.com/album/(?P<id>[0-9]+)/(?P<display_id>[\w-]+)'
      _TEST = {
          'url': 'https://www.jamendo.com/album/121486/duck-on-cover',
          'info_dict': {
              'id': '121486',
-            'title': 'Duck On Cover'
+            'title': 'Shearer - Duck On Cover'
          },
          'playlist': [{
              'md5': 'e1a2fcb42bda30dfac990212924149a8',
              'info_dict': {
                  'id': '1032333',
                  'ext': 'flac',
-                'title': 'Warmachine'
+                'title': 'Shearer - Warmachine',
+                'artist': 'Shearer',
+                'track': 'Warmachine',
              }
          }, {
              'md5': '1f358d7b2f98edfe90fd55dac0799d50',
              'info_dict': {
                  'id': '1032330',
                  'ext': 'flac',
-                'title': 'Without Your Ghost'
+                'title': 'Shearer - Without Your Ghost',
+                'artist': 'Shearer',
+                'track': 'Without Your Ghost',
              }
          }],
          'params': {
@@ -90,18 +121,18 @@ def _real_extract(self, url):
  
          webpage = self._download_webpage(url, mobj.group('display_id'))
  
-        title = self._html_search_meta('name', webpage, 'title')
-
-        entries = [
-            self.url_result(
-                compat_urlparse.urljoin(url, m.group('path')),
-                ie=JamendoIE.ie_key(),
-                video_id=self._search_regex(
-                    r'/track/(\d+)', m.group('path'),
-                    'track id', default=None))
-            for m in re.finditer(
-                r'<a[^>]+href=(["\'])(?P<path>(?:(?!\1).)+)\1[^>]+class=["\'][^>]*js-trackrow-albumpage-link',
-                webpage)
-        ]
+        title, artist, album = self._extract_meta(webpage, fatal=False)
+
+        entries = [{
+            '_type': 'url_transparent',
+            'url': compat_urlparse.urljoin(url, m.group('path')),
+            'ie_key': JamendoIE.ie_key(),
+            'id': self._search_regex(
+                r'/track/(\d+)', m.group('path'), 'track id', default=None),
+            'artist': artist,
+            'album': album,
+        } for m in re.finditer(
+            r'<a[^>]+href=(["\'])(?P<path>(?:(?!\1).)+)\1[^>]+class=["\'][^>]*js-trackrow-albumpage-link',
+            webpage)]
  
          return self.playlist_result(entries, album_id, title)
diff --git a/youtube_dl/extractor/jove.py b/youtube_dl/extractor/jove.py

index cf73cd7533177d028cee83a2a013914b93f64b15..f9a034b78e41a8ec4b998f956c25c47715d6b1c7 100644 (file)
--- a/youtube_dl/extractor/jove.py
+++ b/youtube_dl/extractor/jove.py
@@ -21,7 +21,7 @@ class JoveIE(InfoExtractor):
                  'ext': 'mp4',
                  'title': 'Electrode Positioning and Montage in Transcranial Direct Current Stimulation',
                  'description': 'md5:015dd4509649c0908bc27f049e0262c6',
-                'thumbnail': 're:^https?://.*\.png$',
+                'thumbnail': r're:^https?://.*\.png$',
                  'upload_date': '20110523',
              }
          },
@@ -33,7 +33,7 @@ class JoveIE(InfoExtractor):
                  'ext': 'mp4',
                  'title': 'Culturing Caenorhabditis elegans in Axenic Liquid Media and Creation of Transgenic Worms by Microparticle Bombardment',
                  'description': 'md5:35ff029261900583970c4023b70f1dc9',
-                'thumbnail': 're:^https?://.*\.png$',
+                'thumbnail': r're:^https?://.*\.png$',
                  'upload_date': '20140802',
              }
          },
diff --git a/youtube_dl/extractor/jwplatform.py b/youtube_dl/extractor/jwplatform.py

index 5d56e0a28bd55b93153a92446834ba440ad59572..aff7ab49a9500c8bdabe78fac393eb30ef827db5 100644 (file)
--- a/youtube_dl/extractor/jwplatform.py
+++ b/youtube_dl/extractor/jwplatform.py
@@ -11,6 +11,7 @@
      int_or_none,
      js_to_json,
      mimetype2ext,
+    urljoin,
  )
  
  
@@ -110,10 +111,14 @@ def _parse_jwplayer_data(self, jwplayer_data, video_id=None, require_title=True,
              tracks = video_data.get('tracks')
              if tracks and isinstance(tracks, list):
                  for track in tracks:
-                    if track.get('file') and track.get('kind') == 'captions':
-                        subtitles.setdefault(track.get('label') or 'en', []).append({
-                            'url': self._proto_relative_url(track['file'])
-                        })
+                    if track.get('kind') != 'captions':
+                        continue
+                    track_url = urljoin(base_url, track.get('file'))
+                    if not track_url:
+                        continue
+                    subtitles.setdefault(track.get('label') or 'en', []).append({
+                        'url': self._proto_relative_url(track_url)
+                    })
  
              entries.append({
                  'id': this_video_id,
@@ -121,7 +126,7 @@ def _parse_jwplayer_data(self, jwplayer_data, video_id=None, require_title=True,
                  'description': video_data.get('description'),
                  'thumbnail': self._proto_relative_url(video_data.get('image')),
                  'timestamp': int_or_none(video_data.get('pubdate')),
-                'duration': float_or_none(jwplayer_data.get('duration')),
+                'duration': float_or_none(jwplayer_data.get('duration') or video_data.get('duration')),
                  'subtitles': subtitles,
                  'formats': formats,
              })
diff --git a/youtube_dl/extractor/kaltura.py b/youtube_dl/extractor/kaltura.py

index 91bc3a0a7c0af4690cf1a16713de1e76bccaa67a..5ef382f9f730091c079ab5083e0ab87f4677c407 100644 (file)
--- a/youtube_dl/extractor/kaltura.py
+++ b/youtube_dl/extractor/kaltura.py
@@ -107,7 +107,7 @@ def _extract_url(webpage):
                          (?P<q1>['\"])wid(?P=q1)\s*:\s*
                          (?P<q2>['\"])_?(?P<partner_id>(?:(?!(?P=q2)).)+)(?P=q2),.*?
                          (?P<q3>['\"])entry_?[Ii]d(?P=q3)\s*:\s*
-                        (?P<q4>['\"])(?P<id>(?:(?!(?P=q4)).)+)(?P=q4),
+                        (?P<q4>['\"])(?P<id>(?:(?!(?P=q4)).)+)(?P=q4)(?:,|\s*\})
                  """, webpage) or
              re.search(
                  r'''(?xs)
@@ -266,6 +266,12 @@ def sign_url(unsigned_url):
              # skip for now.
              if f.get('fileExt') == 'chun':
                  continue
+            if not f.get('fileExt'):
+                # QT indicates QuickTime; some videos have broken fileExt
+                if f.get('containerFormat') == 'qt':
+                    f['fileExt'] = 'mov'
+                else:
+                    f['fileExt'] = 'mp4'
              video_url = sign_url(
                  '%s/flavorId/%s' % (data_url, f['id']))
              # audio-only has no videoCodecId (e.g. kaltura:1926081:0_c03e1b5g
@@ -316,6 +322,6 @@ def sign_url(unsigned_url):
              'thumbnail': info.get('thumbnailUrl'),
              'duration': info.get('duration'),
              'timestamp': info.get('createdAt'),
-            'uploader_id': info.get('userId'),
+            'uploader_id': info.get('userId') if info.get('userId') != 'None' else None,
              'view_count': info.get('plays'),
          }
diff --git a/youtube_dl/extractor/karrierevideos.py b/youtube_dl/extractor/karrierevideos.py

index c05263e6165159320376939c252af7dea7aeadb2..4e9eb67bf24690571176299de5ff900c7496fec8 100644 (file)
--- a/youtube_dl/extractor/karrierevideos.py
+++ b/youtube_dl/extractor/karrierevideos.py
@@ -20,7 +20,7 @@ class KarriereVideosIE(InfoExtractor):
              'ext': 'flv',
              'title': 'AltenpflegerIn',
              'description': 'md5:dbadd1259fde2159a9b28667cb664ae2',
-            'thumbnail': 're:^http://.*\.png',
+            'thumbnail': r're:^http://.*\.png',
          },
          'params': {
              # rtmp download
@@ -34,7 +34,7 @@ class KarriereVideosIE(InfoExtractor):
              'ext': 'flv',
              'title': 'Väterkarenz und neue Chancen für Mütter - "Baby - was nun?"',
              'description': 'md5:97092c6ad1fd7d38e9d6a5fdeb2bcc33',
-            'thumbnail': 're:^http://.*\.png',
+            'thumbnail': r're:^http://.*\.png',
          },
          'params': {
              # rtmp download
diff --git a/youtube_dl/extractor/keezmovies.py b/youtube_dl/extractor/keezmovies.py

index 588a4d0ec4eda6e38817b26f192536c40a172f3e..e83115e2a6c7b7a63be5237340ca0845272f8c03 100644 (file)
--- a/youtube_dl/extractor/keezmovies.py
+++ b/youtube_dl/extractor/keezmovies.py
@@ -27,7 +27,7 @@ class KeezMoviesIE(InfoExtractor):
              'display_id': 'petite-asian-lady-mai-playing-in-bathtub',
              'ext': 'mp4',
              'title': 'Petite Asian Lady Mai Playing In Bathtub',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'view_count': int,
              'age_limit': 18,
          }
diff --git a/youtube_dl/extractor/ketnet.py b/youtube_dl/extractor/ketnet.py

index eb0a160089b395736a1370171ca7460e32f4e7e2..fb9c2dbd47789ae6f0457a4b2724c53875d14753 100644 (file)
--- a/youtube_dl/extractor/ketnet.py
+++ b/youtube_dl/extractor/ketnet.py
@@ -13,7 +13,7 @@ class KetnetIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Gluur mee op de filmset en op Pennenzakkenrock',
              'description': 'Gluur mee met Ghost Rockers op de filmset',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          }
      }, {
          'url': 'https://www.ketnet.be/kijken/karrewiet/uitzending-8-september-2016',
diff --git a/youtube_dl/extractor/konserthusetplay.py b/youtube_dl/extractor/konserthusetplay.py

index 55291c66ff066733a8610abe5acc65b1e0daf7f3..c11cbcf4757238642639cb6fac454ce98bb4a5c5 100644 (file)
--- a/youtube_dl/extractor/konserthusetplay.py
+++ b/youtube_dl/extractor/konserthusetplay.py
@@ -2,29 +2,31 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
+from ..compat import compat_str
  from ..utils import (
+    determine_ext,
      float_or_none,
      int_or_none,
  )
  
  
  class KonserthusetPlayIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?konserthusetplay\.se/\?.*\bm=(?P<id>[^&]+)'
-    _TEST = {
+    _VALID_URL = r'https?://(?:www\.)?(?:konserthusetplay|rspoplay)\.se/\?.*\bm=(?P<id>[^&]+)'
+    _TESTS = [{
          'url': 'http://www.konserthusetplay.se/?m=CKDDnlCY-dhWAAqiMERd-A',
+        'md5': 'e3fd47bf44e864bd23c08e487abe1967',
          'info_dict': {
              'id': 'CKDDnlCY-dhWAAqiMERd-A',
-            'ext': 'flv',
+            'ext': 'mp4',
              'title': 'Orkesterns instrument: Valthornen',
              'description': 'md5:f10e1f0030202020396a4d712d2fa827',
              'thumbnail': 're:^https?://.*$',
-            'duration': 398.8,
+            'duration': 398.76,
          },
-        'params': {
-            # rtmp download
-            'skip_download': True,
-        },
-    }
+    }, {
+        'url': 'http://rspoplay.se/?m=elWuEH34SMKvaO4wO_cHBw',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
@@ -42,12 +44,18 @@ def _real_extract(self, url):
          player_config = media['playerconfig']
          playlist = player_config['playlist']
  
-        source = next(f for f in playlist if f.get('bitrates'))
+        source = next(f for f in playlist if f.get('bitrates') or f.get('provider'))
  
          FORMAT_ID_REGEX = r'_([^_]+)_h264m\.mp4'
  
          formats = []
  
+        m3u8_url = source.get('url')
+        if m3u8_url and determine_ext(m3u8_url) == 'm3u8':
+            formats.extend(self._extract_m3u8_formats(
+                m3u8_url, video_id, 'mp4', entry_protocol='m3u8_native',
+                m3u8_id='hls', fatal=False))
+
          fallback_url = source.get('fallbackUrl')
          fallback_format_id = None
          if fallback_url:
@@ -97,6 +105,13 @@ def _real_extract(self, url):
          thumbnail = media.get('image')
          duration = float_or_none(media.get('duration'), 1000)
  
+        subtitles = {}
+        captions = source.get('captionsAvailableLanguages')
+        if isinstance(captions, dict):
+            for lang, subtitle_url in captions.items():
+                if lang != 'none' and isinstance(subtitle_url, compat_str):
+                    subtitles.setdefault(lang, []).append({'url': subtitle_url})
+
          return {
              'id': video_id,
              'title': title,
@@ -104,4 +119,5 @@ def _real_extract(self, url):
              'thumbnail': thumbnail,
              'duration': duration,
              'formats': formats,
+            'subtitles': subtitles,
          }
diff --git a/youtube_dl/extractor/krasview.py b/youtube_dl/extractor/krasview.py

index cf8876fa1f2321e7b020e2e773452f82df1bd2f1..d27d052ff0c11937a910aa689f6583ea5a3c8148 100644 (file)
--- a/youtube_dl/extractor/krasview.py
+++ b/youtube_dl/extractor/krasview.py
@@ -23,7 +23,7 @@ class KrasViewIE(InfoExtractor):
              'title': 'Снег, лёд, заносы',
              'description': 'Снято в городе Нягань, в Ханты-Мансийском автономном округе.',
              'duration': 27,
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
          },
          'params': {
              'skip_download': 'Not accessible from Travis CI server',
diff --git a/youtube_dl/extractor/kusi.py b/youtube_dl/extractor/kusi.py

index 2e66e8cf9d791abe27d908e04e48fd6cd3bfd4dc..6a7e3baa70cc019b497828ab84d176e55216356f 100644 (file)
--- a/youtube_dl/extractor/kusi.py
+++ b/youtube_dl/extractor/kusi.py
@@ -27,7 +27,7 @@ class KUSIIE(InfoExtractor):
              'duration': 223.586,
              'upload_date': '20160826',
              'timestamp': 1472233118,
-            'thumbnail': 're:^https?://.*\.jpg$'
+            'thumbnail': r're:^https?://.*\.jpg$'
          },
      }, {
          'url': 'http://kusi.com/video?clipId=12203019',
diff --git a/youtube_dl/extractor/laola1tv.py b/youtube_dl/extractor/laola1tv.py

index 2fab38079aac0c5f20a1772d52fa52642cb520bf..3190b187c9dfb8fa9204e9761b47ded0c17f5f2d 100644 (file)
--- a/youtube_dl/extractor/laola1tv.py
+++ b/youtube_dl/extractor/laola1tv.py
@@ -1,25 +1,115 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
-import re
-
  from .common import InfoExtractor
-from ..compat import (
-    compat_urllib_parse_urlencode,
-    compat_urlparse,
-)
  from ..utils import (
      ExtractorError,
-    sanitized_Request,
      unified_strdate,
      urlencode_postdata,
      xpath_element,
      xpath_text,
+    urljoin,
+    update_url_query,
  )
  
  
+class Laola1TvEmbedIE(InfoExtractor):
+    IE_NAME = 'laola1tv:embed'
+    _VALID_URL = r'https?://(?:www\.)?laola1\.tv/titanplayer\.php\?.*?\bvideoid=(?P<id>\d+)'
+    _TEST = {
+        # flashvars.premium = "false";
+        'url': 'https://www.laola1.tv/titanplayer.php?videoid=708065&type=V&lang=en&portal=int&customer=1024',
+        'info_dict': {
+            'id': '708065',
+            'ext': 'mp4',
+            'title': 'MA Long CHN - FAN Zhendong CHN',
+            'uploader': 'ITTF - International Table Tennis Federation',
+            'upload_date': '20161211',
+        },
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        webpage = self._download_webpage(url, video_id)
+        flash_vars = self._search_regex(
+            r'(?s)flashvars\s*=\s*({.+?});', webpage, 'flash vars')
+
+        def get_flashvar(x, *args, **kwargs):
+            flash_var = self._search_regex(
+                r'%s\s*:\s*"([^"]+)"' % x,
+                flash_vars, x, default=None)
+            if not flash_var:
+                flash_var = self._search_regex([
+                    r'flashvars\.%s\s*=\s*"([^"]+)"' % x,
+                    r'%s\s*=\s*"([^"]+)"' % x],
+                    webpage, x, *args, **kwargs)
+            return flash_var
+
+        hd_doc = self._download_xml(
+            'http://www.laola1.tv/server/hd_video.php', video_id, query={
+                'play': get_flashvar('streamid'),
+                'partner': get_flashvar('partnerid'),
+                'portal': get_flashvar('portalid'),
+                'lang': get_flashvar('sprache'),
+                'v5ident': '',
+            })
+
+        _v = lambda x, **k: xpath_text(hd_doc, './/video/' + x, **k)
+        title = _v('title', fatal=True)
+
+        token_url = None
+        premium = get_flashvar('premium', default=None)
+        if premium:
+            token_url = update_url_query(
+                _v('url', fatal=True), {
+                    'timestamp': get_flashvar('timestamp'),
+                    'auth': get_flashvar('auth'),
+                })
+        else:
+            data_abo = urlencode_postdata(
+                dict((i, v) for i, v in enumerate(_v('req_liga_abos').split(','))))
+            token_url = self._download_json(
+                'https://club.laola1.tv/sp/laola1/api/v3/user/session/premium/player/stream-access',
+                video_id, query={
+                    'videoId': _v('id'),
+                    'target': self._search_regex(r'vs_target = (\d+);', webpage, 'vs target'),
+                    'label': _v('label'),
+                    'area': _v('area'),
+                }, data=data_abo)['data']['stream-access'][0]
+
+        token_doc = self._download_xml(
+            token_url, video_id, 'Downloading token',
+            headers=self.geo_verification_headers())
+
+        token_attrib = xpath_element(token_doc, './/token').attrib
+
+        if token_attrib['status'] != '0':
+            raise ExtractorError(
+                'Token error: %s' % token_attrib['comment'], expected=True)
+
+        formats = self._extract_akamai_formats(
+            '%s?hdnea=%s' % (token_attrib['url'], token_attrib['auth']),
+            video_id)
+        self._sort_formats(formats)
+
+        categories_str = _v('meta_sports')
+        categories = categories_str.split(',') if categories_str else []
+        is_live = _v('islive') == 'true'
+
+        return {
+            'id': video_id,
+            'title': self._live_title(title) if is_live else title,
+            'upload_date': unified_strdate(_v('time_date')),
+            'uploader': _v('meta_organisation'),
+            'categories': categories,
+            'is_live': is_live,
+            'formats': formats,
+        }
+
+
  class Laola1TvIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?laola1\.tv/(?P<lang>[a-z]+)-(?P<portal>[a-z]+)/(?P<kind>[^/]+)/(?P<slug>[^/?#&]+)'
+    IE_NAME = 'laola1tv'
+    _VALID_URL = r'https?://(?:www\.)?laola1\.tv/[a-z]+-[a-z]+/[^/]+/(?P<id>[^/?#&]+)'
      _TESTS = [{
          'url': 'http://www.laola1.tv/de-de/video/straubing-tigers-koelner-haie/227883.html',
          'info_dict': {
@@ -67,85 +157,20 @@ class Laola1TvIE(InfoExtractor):
      }]
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        display_id = mobj.group('slug')
-        kind = mobj.group('kind')
-        lang = mobj.group('lang')
-        portal = mobj.group('portal')
+        display_id = self._match_id(url)
  
          webpage = self._download_webpage(url, display_id)
  
          if 'Dieser Livestream ist bereits beendet.' in webpage:
              raise ExtractorError('This live stream has already finished.', expected=True)
  
-        iframe_url = self._search_regex(
+        iframe_url = urljoin(url, self._search_regex(
              r'<iframe[^>]*?id="videoplayer"[^>]*?src="([^"]+)"',
-            webpage, 'iframe url')
-
-        video_id = self._search_regex(
-            r'videoid=(\d+)', iframe_url, 'video id')
-
-        iframe = self._download_webpage(compat_urlparse.urljoin(
-            url, iframe_url), display_id, 'Downloading iframe')
-
-        partner_id = self._search_regex(
-            r'partnerid\s*:\s*(["\'])(?P<partner_id>.+?)\1',
-            iframe, 'partner id', group='partner_id')
-
-        hd_doc = self._download_xml(
-            'http://www.laola1.tv/server/hd_video.php?%s'
-            % compat_urllib_parse_urlencode({
-                'play': video_id,
-                'partner': partner_id,
-                'portal': portal,
-                'lang': lang,
-                'v5ident': '',
-            }), display_id)
-
-        _v = lambda x, **k: xpath_text(hd_doc, './/video/' + x, **k)
-        title = _v('title', fatal=True)
-
-        VS_TARGETS = {
-            'video': '2',
-            'livestream': '17',
-        }
-
-        req = sanitized_Request(
-            'https://club.laola1.tv/sp/laola1/api/v3/user/session/premium/player/stream-access?%s' %
-            compat_urllib_parse_urlencode({
-                'videoId': video_id,
-                'target': VS_TARGETS.get(kind, '2'),
-                'label': _v('label'),
-                'area': _v('area'),
-            }),
-            urlencode_postdata(
-                dict((i, v) for i, v in enumerate(_v('req_liga_abos').split(',')))))
-
-        token_url = self._download_json(req, display_id)['data']['stream-access'][0]
-        token_doc = self._download_xml(token_url, display_id, 'Downloading token')
-
-        token_attrib = xpath_element(token_doc, './/token').attrib
-        token_auth = token_attrib['auth']
-
-        if token_auth in ('blocked', 'restricted', 'error'):
-            raise ExtractorError(
-                'Token error: %s' % token_attrib['comment'], expected=True)
-
-        formats = self._extract_f4m_formats(
-            '%s?hdnea=%s&hdcore=3.2.0' % (token_attrib['url'], token_auth),
-            video_id, f4m_id='hds')
-        self._sort_formats(formats)
-
-        categories_str = _v('meta_sports')
-        categories = categories_str.split(',') if categories_str else []
+            webpage, 'iframe url'))
  
          return {
-            'id': video_id,
+            '_type': 'url',
              'display_id': display_id,
-            'title': title,
-            'upload_date': unified_strdate(_v('time_date')),
-            'uploader': _v('meta_organisation'),
-            'categories': categories,
-            'is_live': _v('islive') == 'true',
-            'formats': formats,
+            'url': iframe_url,
+            'ie_key': 'Laola1TvEmbed',
          }
diff --git a/youtube_dl/extractor/leeco.py b/youtube_dl/extractor/leeco.py

index c48a5aad17ad36324b3cf70956d0ed234ffa522b..4321f90c87febbf44b4bedec97a0ba3d6a3e3b49 100644 (file)
--- a/youtube_dl/extractor/leeco.py
+++ b/youtube_dl/extractor/leeco.py
@@ -386,8 +386,8 @@ def b64decode(s):
          return formats
  
      def _real_extract(self, url):
-        uu_mobj = re.search('uu=([\w]+)', url)
-        vu_mobj = re.search('vu=([\w]+)', url)
+        uu_mobj = re.search(r'uu=([\w]+)', url)
+        vu_mobj = re.search(r'vu=([\w]+)', url)
  
          if not uu_mobj or not vu_mobj:
              raise ExtractorError('Invalid URL: %s' % url, expected=True)
diff --git a/youtube_dl/extractor/lemonde.py b/youtube_dl/extractor/lemonde.py

index be66fff0390e184cc1fd1c8dfa5ccd155664b760..42568f315ed1b6818907f1236705ebdeed2c02cb 100644 (file)
--- a/youtube_dl/extractor/lemonde.py
+++ b/youtube_dl/extractor/lemonde.py
@@ -12,7 +12,7 @@ class LemondeIE(InfoExtractor):
              'id': 'lqm3kl',
              'ext': 'mp4',
              'title': "Comprendre l'affaire Bygmalion en 5 minutes",
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'duration': 320,
              'upload_date': '20160119',
              'timestamp': 1453194778,
diff --git a/youtube_dl/extractor/libraryofcongress.py b/youtube_dl/extractor/libraryofcongress.py

index 0a94366fd8059b093d6b6600e380d829b4fe34c7..40295a30b51f733b637c651cc8a434ede14f517a 100644 (file)
--- a/youtube_dl/extractor/libraryofcongress.py
+++ b/youtube_dl/extractor/libraryofcongress.py
@@ -25,7 +25,7 @@ class LibraryOfCongressIE(InfoExtractor):
              'id': '90716351',
              'ext': 'mp4',
              'title': "Pa's trip to Mars",
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 0,
              'view_count': int,
          },
diff --git a/youtube_dl/extractor/libsyn.py b/youtube_dl/extractor/libsyn.py

index d375695f5a26dbc072455777487ed239820c1ec6..4750b03a3fb2f47818858338b7eb9a8b4889c012 100644 (file)
--- a/youtube_dl/extractor/libsyn.py
+++ b/youtube_dl/extractor/libsyn.py
@@ -41,7 +41,7 @@ def _real_extract(self, url):
  
          formats = [{
              'url': media_url,
-        } for media_url in set(re.findall('var\s+mediaURL(?:Libsyn)?\s*=\s*"([^"]+)"', webpage))]
+        } for media_url in set(re.findall(r'var\s+mediaURL(?:Libsyn)?\s*=\s*"([^"]+)"', webpage))]
  
          podcast_title = self._search_regex(
              r'<h2>([^<]+)</h2>', webpage, 'podcast title', default=None)
diff --git a/youtube_dl/extractor/lifenews.py b/youtube_dl/extractor/lifenews.py

index afce2010eafadc3ceaab1eaa7d846e5e6360d547..42e263bfaba76f97a4318ab5624872fabda435a2 100644 (file)
--- a/youtube_dl/extractor/lifenews.py
+++ b/youtube_dl/extractor/lifenews.py
@@ -176,7 +176,7 @@ class LifeEmbedIE(InfoExtractor):
              'id': 'e50c2dec2867350528e2574c899b8291',
              'ext': 'mp4',
              'title': 'e50c2dec2867350528e2574c899b8291',
-            'thumbnail': 're:http://.*\.jpg',
+            'thumbnail': r're:http://.*\.jpg',
          }
      }, {
          # with 1080p
diff --git a/youtube_dl/extractor/limelight.py b/youtube_dl/extractor/limelight.py

index b7bfa7a6d524e4a5ebd190947b52a369a211e753..e635f3c4dc46c6407a166dec9ab2ef06981b6221 100644 (file)
--- a/youtube_dl/extractor/limelight.py
+++ b/youtube_dl/extractor/limelight.py
@@ -59,14 +59,26 @@ def _extract_info(self, streams, mobile_urls, properties):
                      format_id = 'rtmp'
                      if stream.get('videoBitRate'):
                          format_id += '-%d' % int_or_none(stream['videoBitRate'])
-                    http_url = 'http://cpl.delvenetworks.com/' + rtmp.group('playpath')[4:]
-                    urls.append(http_url)
-                    http_fmt = fmt.copy()
-                    http_fmt.update({
-                        'url': http_url,
-                        'format_id': format_id.replace('rtmp', 'http'),
-                    })
-                    formats.append(http_fmt)
+                    http_format_id = format_id.replace('rtmp', 'http')
+
+                    CDN_HOSTS = (
+                        ('delvenetworks.com', 'cpl.delvenetworks.com'),
+                        ('video.llnw.net', 's2.content.video.llnw.net'),
+                    )
+                    for cdn_host, http_host in CDN_HOSTS:
+                        if cdn_host not in rtmp.group('host').lower():
+                            continue
+                        http_url = 'http://%s/%s' % (http_host, rtmp.group('playpath')[4:])
+                        urls.append(http_url)
+                        if self._is_valid_url(http_url, video_id, http_format_id):
+                            http_fmt = fmt.copy()
+                            http_fmt.update({
+                                'url': http_url,
+                                'format_id': http_format_id,
+                            })
+                            formats.append(http_fmt)
+                            break
+
                      fmt.update({
                          'url': rtmp.group('url'),
                          'play_path': rtmp.group('playpath'),
@@ -164,7 +176,7 @@ class LimelightMediaIE(LimelightBaseIE):
              'ext': 'mp4',
              'title': 'HaP and the HB Prince Trailer',
              'description': 'md5:8005b944181778e313d95c1237ddb640',
-            'thumbnail': 're:^https?://.*\.jpeg$',
+            'thumbnail': r're:^https?://.*\.jpeg$',
              'duration': 144.23,
              'timestamp': 1244136834,
              'upload_date': '20090604',
@@ -181,7 +193,7 @@ class LimelightMediaIE(LimelightBaseIE):
              'id': 'a3e00274d4564ec4a9b29b9466432335',
              'ext': 'mp4',
              'title': '3Play Media Overview Video',
-            'thumbnail': 're:^https?://.*\.jpeg$',
+            'thumbnail': r're:^https?://.*\.jpeg$',
              'duration': 78.101,
              'timestamp': 1338929955,
              'upload_date': '20120605',
diff --git a/youtube_dl/extractor/litv.py b/youtube_dl/extractor/litv.py

index ded717cf2823f6b999d310eafe72801ff507daa3..337b1b15cf9d783750fa225b5538edfbc1fcfde2 100644 (file)
--- a/youtube_dl/extractor/litv.py
+++ b/youtube_dl/extractor/litv.py
@@ -31,7 +31,7 @@ class LiTVIE(InfoExtractor):
              'id': 'VOD00041610',
              'ext': 'mp4',
              'title': '花千骨第1集',
-            'thumbnail': 're:https?://.*\.jpg$',
+            'thumbnail': r're:https?://.*\.jpg$',
              'description': 'md5:c7017aa144c87467c4fb2909c4b05d6f',
              'episode_number': 1,
          },
@@ -80,7 +80,7 @@ def _real_extract(self, url):
          webpage = self._download_webpage(url, video_id)
  
          program_info = self._parse_json(self._search_regex(
-            'var\s+programInfo\s*=\s*([^;]+)', webpage, 'VOD data', default='{}'),
+            r'var\s+programInfo\s*=\s*([^;]+)', webpage, 'VOD data', default='{}'),
              video_id)
  
          season_list = list(program_info.get('seasonList', {}).values())
diff --git a/youtube_dl/extractor/liveleak.py b/youtube_dl/extractor/liveleak.py

index ea0565ac05099aab8c05609aee4140a1b4c2c1c7..c7de65353e616dc0d5f2ee1b0128c059d6f4f933 100644 (file)
--- a/youtube_dl/extractor/liveleak.py
+++ b/youtube_dl/extractor/liveleak.py
@@ -18,7 +18,7 @@ class LiveLeakIE(InfoExtractor):
              'description': 'extremely bad day for this guy..!',
              'uploader': 'ljfriel2',
              'title': 'Most unlucky car accident',
-            'thumbnail': 're:^https?://.*\.jpg$'
+            'thumbnail': r're:^https?://.*\.jpg$'
          }
      }, {
          'url': 'http://www.liveleak.com/view?i=f93_1390833151',
@@ -29,7 +29,7 @@ class LiveLeakIE(InfoExtractor):
              'description': 'German Television Channel NDR does an exclusive interview with Edward Snowden.\r\nUploaded on LiveLeak cause German Television thinks the rest of the world isn\'t intereseted in Edward Snowden.',
              'uploader': 'ARD_Stinkt',
              'title': 'German Television does first Edward Snowden Interview (ENGLISH)',
-            'thumbnail': 're:^https?://.*\.jpg$'
+            'thumbnail': r're:^https?://.*\.jpg$'
          }
      }, {
          'url': 'http://www.liveleak.com/view?i=4f7_1392687779',
@@ -52,8 +52,24 @@ class LiveLeakIE(InfoExtractor):
              'description': 'Happened on 27.7.2014. \r\nAt 0:53 you can see people still swimming at near beach.',
              'uploader': 'bony333',
              'title': 'Crazy Hungarian tourist films close call waterspout in Croatia',
-            'thumbnail': 're:^https?://.*\.jpg$'
+            'thumbnail': r're:^https?://.*\.jpg$'
          }
+    }, {
+        # Covers https://github.com/rg3/youtube-dl/pull/10664#issuecomment-247439521
+        'url': 'http://m.liveleak.com/view?i=763_1473349649',
+        'add_ie': ['Youtube'],
+        'info_dict': {
+            'id': '763_1473349649',
+            'ext': 'mp4',
+            'title': 'Reporters and public officials ignore epidemic of black on asian violence in Sacramento | Colin Flaherty',
+            'description': 'Colin being the warrior he is and showing the injustice Asians in Sacramento are being subjected to.',
+            'uploader': 'Ziz',
+            'upload_date': '20160908',
+            'uploader_id': 'UCEbta5E_jqlZmEJsriTEtnw'
+        },
+        'params': {
+            'skip_download': True,
+        },
      }]
  
      @staticmethod
@@ -87,7 +103,7 @@ def _real_extract(self, url):
              else:
                  # Maybe an embed?
                  embed_url = self._search_regex(
-                    r'<iframe[^>]+src="(http://www.prochan.com/embed\?[^"]+)"',
+                    r'<iframe[^>]+src="(https?://(?:www\.)?(?:prochan|youtube)\.com/embed[^"]+)"',
                      webpage, 'embed URL')
                  return {
                      '_type': 'url_transparent',
@@ -107,6 +123,7 @@ def _real_extract(self, url):
              'format_note': s.get('label'),
              'url': s['file'],
          } for i, s in enumerate(sources)]
+
          for i, s in enumerate(sources):
              # Removing '.h264_*.mp4' gives the raw video, which is essentially
              # the same video without the LiveLeak logo at the top (see
diff --git a/youtube_dl/extractor/livestream.py b/youtube_dl/extractor/livestream.py

index bc7894bf13ed29963aa1dad7880cf8549be1ca77..c863413bf008baa6baf0233b8185a10ea119d091 100644 (file)
--- a/youtube_dl/extractor/livestream.py
+++ b/youtube_dl/extractor/livestream.py
@@ -37,7 +37,7 @@ class LivestreamIE(InfoExtractor):
              'duration': 5968.0,
              'like_count': int,
              'view_count': int,
-            'thumbnail': 're:^http://.*\.jpg$'
+            'thumbnail': r're:^http://.*\.jpg$'
          }
      }, {
          'url': 'http://new.livestream.com/tedx/cityenglish',
diff --git a/youtube_dl/extractor/lnkgo.py b/youtube_dl/extractor/lnkgo.py

index fd23b0b43fa91af1e828c9038f83ac228b93aa94..068378c9c509a0483650feac42a8ffe92cc60328 100644 (file)
--- a/youtube_dl/extractor/lnkgo.py
+++ b/youtube_dl/extractor/lnkgo.py
@@ -22,7 +22,7 @@ class LnkGoIE(InfoExtractor):
              'description': 'md5:d82a5e36b775b7048617f263a0e3475e',
              'age_limit': 7,
              'duration': 3019,
-            'thumbnail': 're:^https?://.*\.jpg$'
+            'thumbnail': r're:^https?://.*\.jpg$'
          },
          'params': {
              'skip_download': True,  # HLS download
@@ -37,7 +37,7 @@ class LnkGoIE(InfoExtractor):
              'description': 'md5:7352d113a242a808676ff17e69db6a69',
              'age_limit': 18,
              'duration': 346,
-            'thumbnail': 're:^https?://.*\.jpg$'
+            'thumbnail': r're:^https?://.*\.jpg$'
          },
          'params': {
              'skip_download': True,  # HLS download
diff --git a/youtube_dl/extractor/lynda.py b/youtube_dl/extractor/lynda.py

index f4dcfd93fa760878566568636d9c2b864b6c7556..da94eab561b91d6b70675911e432b5750d5d5b04 100644 (file)
--- a/youtube_dl/extractor/lynda.py
+++ b/youtube_dl/extractor/lynda.py
@@ -73,7 +73,7 @@ def _login(self):
  
          # Already logged in
          if any(re.search(p, signin_page) for p in (
-                'isLoggedIn\s*:\s*true', r'logout\.aspx', r'>Log out<')):
+                r'isLoggedIn\s*:\s*true', r'logout\.aspx', r'>Log out<')):
              return
  
          # Step 2: submit email
diff --git a/youtube_dl/extractor/matchtv.py b/youtube_dl/extractor/matchtv.py

index 33b0b539fa9dfde80274d983aa003ea7b39e6622..bc9933a8134eea759918aad3475dd42c7f6b406e 100644 (file)
--- a/youtube_dl/extractor/matchtv.py
+++ b/youtube_dl/extractor/matchtv.py
@@ -14,7 +14,7 @@ class MatchTVIE(InfoExtractor):
          'info_dict': {
              'id': 'matchtv-live',
              'ext': 'flv',
-            'title': 're:^Матч ТВ - Прямой эфир \d{4}-\d{2}-\d{2} \d{2}:\d{2}$',
+            'title': r're:^Матч ТВ - Прямой эфир \d{4}-\d{2}-\d{2} \d{2}:\d{2}$',
              'is_live': True,
          },
          'params': {
diff --git a/youtube_dl/extractor/mdr.py b/youtube_dl/extractor/mdr.py

index 2100583df46ab7955846f8e3b08467d13ed3440e..6e4290aadd6e0d3543d5b48d5df39577032437e2 100644 (file)
--- a/youtube_dl/extractor/mdr.py
+++ b/youtube_dl/extractor/mdr.py
@@ -72,7 +72,7 @@ def _real_extract(self, url):
  
          data_url = self._search_regex(
              r'(?:dataURL|playerXml(?:["\'])?)\s*:\s*(["\'])(?P<url>.+/(?:video|audio)-?[0-9]+-avCustom\.xml)\1',
-            webpage, 'data url', group='url').replace('\/', '/')
+            webpage, 'data url', group='url').replace(r'\/', '/')
  
          doc = self._download_xml(
              compat_urlparse.urljoin(url, data_url), video_id)
diff --git a/youtube_dl/extractor/meipai.py b/youtube_dl/extractor/meipai.py

new file mode 100644 (file)

index 0000000..c8eacb4
--- /dev/null
+++ b/youtube_dl/extractor/meipai.py
@@ -0,0 +1,104 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import (
+    int_or_none,
+    parse_duration,
+    unified_timestamp,
+)
+
+
+class MeipaiIE(InfoExtractor):
+    IE_DESC = '美拍'
+    _VALID_URL = r'https?://(?:www\.)?meipai.com/media/(?P<id>[0-9]+)'
+    _TESTS = [{
+        # regular uploaded video
+        'url': 'http://www.meipai.com/media/531697625',
+        'md5': 'e3e9600f9e55a302daecc90825854b4f',
+        'info_dict': {
+            'id': '531697625',
+            'ext': 'mp4',
+            'title': '#葉子##阿桑##余姿昀##超級女聲#',
+            'description': '#葉子##阿桑##余姿昀##超級女聲#',
+            'thumbnail': r're:^https?://.*\.jpg$',
+            'duration': 152,
+            'timestamp': 1465492420,
+            'upload_date': '20160609',
+            'view_count': 35511,
+            'creator': '她她-TATA',
+            'tags': ['葉子', '阿桑', '余姿昀', '超級女聲'],
+        }
+    }, {
+        # record of live streaming
+        'url': 'http://www.meipai.com/media/585526361',
+        'md5': 'ff7d6afdbc6143342408223d4f5fb99a',
+        'info_dict': {
+            'id': '585526361',
+            'ext': 'mp4',
+            'title': '姿昀和善願 練歌練琴啦😁😁😁',
+            'description': '姿昀和善願 練歌練琴啦😁😁😁',
+            'thumbnail': r're:^https?://.*\.jpg$',
+            'duration': 5975,
+            'timestamp': 1474311799,
+            'upload_date': '20160919',
+            'view_count': 1215,
+            'creator': '她她-TATA',
+        }
+    }]
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        webpage = self._download_webpage(url, video_id)
+
+        title = self._og_search_title(
+            webpage, default=None) or self._html_search_regex(
+            r'<title[^>]*>([^<]+)</title>', webpage, 'title')
+
+        formats = []
+
+        # recorded playback of live streaming
+        m3u8_url = self._html_search_regex(
+            r'file:\s*encodeURIComponent\((["\'])(?P<url>(?:(?!\1).)+)\1\)',
+            webpage, 'm3u8 url', group='url', default=None)
+        if m3u8_url:
+            formats.extend(self._extract_m3u8_formats(
+                m3u8_url, video_id, 'mp4', entry_protocol='m3u8_native',
+                m3u8_id='hls', fatal=False))
+
+        if not formats:
+            # regular uploaded video
+            video_url = self._search_regex(
+                r'data-video=(["\'])(?P<url>(?:(?!\1).)+)\1', webpage, 'video url',
+                group='url', default=None)
+            if video_url:
+                formats.append({
+                    'url': video_url,
+                    'format_id': 'http',
+                })
+
+        timestamp = unified_timestamp(self._og_search_property(
+            'video:release_date', webpage, 'release date', fatal=False))
+
+        tags = self._og_search_property(
+            'video:tag', webpage, 'tags', default='').split(',')
+
+        view_count = int_or_none(self._html_search_meta(
+            'interactionCount', webpage, 'view count'))
+        duration = parse_duration(self._html_search_meta(
+            'duration', webpage, 'duration'))
+        creator = self._og_search_property(
+            'video:director', webpage, 'creator', fatal=False)
+
+        return {
+            'id': video_id,
+            'title': title,
+            'description': self._og_search_description(webpage),
+            'thumbnail': self._og_search_thumbnail(webpage),
+            'duration': duration,
+            'timestamp': timestamp,
+            'view_count': view_count,
+            'creator': creator,
+            'tags': tags,
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/melonvod.py b/youtube_dl/extractor/melonvod.py

new file mode 100644 (file)

index 0000000..bd8cf13
--- /dev/null
+++ b/youtube_dl/extractor/melonvod.py
@@ -0,0 +1,72 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import (
+    int_or_none,
+    urljoin,
+)
+
+
+class MelonVODIE(InfoExtractor):
+    _VALID_URL = r'https?://vod\.melon\.com/video/detail2\.html?\?.*?mvId=(?P<id>[0-9]+)'
+    _TEST = {
+        'url': 'http://vod.melon.com/video/detail2.htm?mvId=50158734',
+        'info_dict': {
+            'id': '50158734',
+            'ext': 'mp4',
+            'title': "Jessica 'Wonderland' MV Making Film",
+            'thumbnail': r're:^https?://.*\.jpg$',
+            'artist': 'Jessica (제시카)',
+            'upload_date': '20161212',
+            'duration': 203,
+        },
+        'params': {
+            'skip_download': 'm3u8 download',
+        }
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        play_info = self._download_json(
+            'http://vod.melon.com/video/playerInfo.json', video_id,
+            note='Downloading player info JSON', query={'mvId': video_id})
+
+        title = play_info['mvInfo']['MVTITLE']
+
+        info = self._download_json(
+            'http://vod.melon.com/delivery/streamingInfo.json', video_id,
+            note='Downloading streaming info JSON',
+            query={
+                'contsId': video_id,
+                'contsType': 'VIDEO',
+            })
+
+        stream_info = info['streamingInfo']
+
+        formats = self._extract_m3u8_formats(
+            stream_info['encUrl'], video_id, 'mp4', m3u8_id='hls')
+        self._sort_formats(formats)
+
+        artist_list = play_info.get('artistList')
+        artist = None
+        if isinstance(artist_list, list):
+            artist = ', '.join(
+                [a['ARTISTNAMEWEBLIST']
+                 for a in artist_list if a.get('ARTISTNAMEWEBLIST')])
+
+        thumbnail = urljoin(info.get('staticDomain'), stream_info.get('imgPath'))
+
+        duration = int_or_none(stream_info.get('playTime'))
+        upload_date = stream_info.get('mvSvcOpenDt', '')[:8] or None
+
+        return {
+            'id': video_id,
+            'title': title,
+            'artist': artist,
+            'thumbnail': thumbnail,
+            'upload_date': upload_date,
+            'duration': duration,
+            'formats': formats
+        }
diff --git a/youtube_dl/extractor/metacafe.py b/youtube_dl/extractor/metacafe.py

index e6e7659a1de0ebe86f48a4128192de5d14d6d586..9880924e692380fffde3d0c776da329225de4ef8 100644 (file)
--- a/youtube_dl/extractor/metacafe.py
+++ b/youtube_dl/extractor/metacafe.py
@@ -133,7 +133,7 @@ def _real_extract(self, url):
          video_id, display_id = re.match(self._VALID_URL, url).groups()
  
          # the video may come from an external site
-        m_external = re.match('^(\w{2})-(.*)$', video_id)
+        m_external = re.match(r'^(\w{2})-(.*)$', video_id)
          if m_external is not None:
              prefix, ext_id = m_external.groups()
              # Check if video comes from YouTube
diff --git a/youtube_dl/extractor/mgoon.py b/youtube_dl/extractor/mgoon.py

index 94bc87b00797951f932394166d6c5b8f5c3e6d1a..7bb473900fcdb39dccbabbed325350fc04349c4f 100644 (file)
--- a/youtube_dl/extractor/mgoon.py
+++ b/youtube_dl/extractor/mgoon.py
@@ -27,7 +27,7 @@ class MgoonIE(InfoExtractor):
                  'upload_date': '20131220',
                  'ext': 'mp4',
                  'title': 'md5:543aa4c27a4931d371c3f433e8cebebc',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
              }
          },
          {
diff --git a/youtube_dl/extractor/mgtv.py b/youtube_dl/extractor/mgtv.py

index e0bb5d208856a121f40f533fcacf3b7bd98d13ea..659ede8c2254d6ce524953c298c8a69b0b13d745 100644 (file)
--- a/youtube_dl/extractor/mgtv.py
+++ b/youtube_dl/extractor/mgtv.py
@@ -18,7 +18,7 @@ class MGTVIE(InfoExtractor):
              'title': '我是歌手第四季双年巅峰会：韩红李玟“双王”领军对抗',
              'description': '我是歌手第四季双年巅峰会',
              'duration': 7461,
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          },
      }, {
          # no tbr extracted from stream_url
diff --git a/youtube_dl/extractor/minhateca.py b/youtube_dl/extractor/minhateca.py

index e6730b75a68d27c16e694fedaac088d27a0ab1ec..dccc542497692ac6aa14a6e36f6b96b0aad7741d 100644 (file)
--- a/youtube_dl/extractor/minhateca.py
+++ b/youtube_dl/extractor/minhateca.py
@@ -19,7 +19,7 @@ class MinhatecaIE(InfoExtractor):
              'id': '125848331',
              'ext': 'mp4',
              'title': 'youtube-dl test video',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'filesize_approx': 1530000,
              'duration': 9,
              'view_count': int,
diff --git a/youtube_dl/extractor/ministrygrid.py b/youtube_dl/extractor/ministrygrid.py

index 10190d5f6e1f3f55b3274855c7614bea62b620e5..8ad9239c50519b15bb4f4db3f41bde9d199759ac 100644 (file)
--- a/youtube_dl/extractor/ministrygrid.py
+++ b/youtube_dl/extractor/ministrygrid.py
@@ -17,7 +17,7 @@ class MinistryGridIE(InfoExtractor):
              'id': '3453494717001',
              'ext': 'mp4',
              'title': 'The Gospel by Numbers',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'upload_date': '20140410',
              'description': 'Coming soon from T4G 2014!',
              'uploader_id': '2034960640001',
diff --git a/youtube_dl/extractor/mitele.py b/youtube_dl/extractor/mitele.py

index c41ab1e91a7a2dd5655fcef7230e5ceadd305648..79e0b8ada1aaefeb90b479967d0f9e2197818bff 100644 (file)
--- a/youtube_dl/extractor/mitele.py
+++ b/youtube_dl/extractor/mitele.py
@@ -75,7 +75,7 @@ def _get_player_info(self, url, webpage):
  
  class MiTeleIE(InfoExtractor):
      IE_DESC = 'mitele.es'
-    _VALID_URL = r'https?://(?:www\.)?mitele\.es/programas-tv/(?:[^/]+/)(?P<id>[^/]+)/player'
+    _VALID_URL = r'https?://(?:www\.)?mitele\.es/(?:[^/]+/)+(?P<id>[^/]+)/player'
  
      _TESTS = [{
          'url': 'http://www.mitele.es/programas-tv/diario-de/57b0dfb9c715da65618b4afa/player',
@@ -86,8 +86,11 @@ class MiTeleIE(InfoExtractor):
              'description': 'md5:3b6fce7eaa41b2d97358726378d9369f',
              'series': 'Diario de',
              'season': 'La redacción',
+            'season_number': 14,
+            'season_id': 'diario_de_t14_11981',
              'episode': 'Programa 144',
-            'thumbnail': 're:(?i)^https?://.*\.jpg$',
+            'episode_number': 3,
+            'thumbnail': r're:(?i)^https?://.*\.jpg$',
              'duration': 2913,
          },
          'add_ie': ['Ooyala'],
@@ -101,60 +104,102 @@ class MiTeleIE(InfoExtractor):
              'description': 'md5:5ff132013f0cd968ffbf1f5f3538a65f',
              'series': 'Cuarto Milenio',
              'season': 'Temporada 6',
+            'season_number': 6,
+            'season_id': 'cuarto_milenio_t06_12715',
              'episode': 'Programa 226',
-            'thumbnail': 're:(?i)^https?://.*\.jpg$',
+            'episode_number': 24,
+            'thumbnail': r're:(?i)^https?://.*\.jpg$',
              'duration': 7313,
          },
          'params': {
              'skip_download': True,
          },
          'add_ie': ['Ooyala'],
+    }, {
+        'url': 'http://www.mitele.es/series-online/la-que-se-avecina/57aac5c1c915da951a8b45ed/player',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
          webpage = self._download_webpage(url, video_id)
  
-        gigya_url = self._search_regex(r'<gigya-api>[^>]*</gigya-api>[^>]*<script\s*src="([^"]*)">[^>]*</script>', webpage, 'gigya', default=None)
-        gigya_sc = self._download_webpage(compat_urlparse.urljoin(r'http://www.mitele.es/', gigya_url), video_id, 'Downloading gigya script')
+        gigya_url = self._search_regex(
+            r'<gigya-api>[^>]*</gigya-api>[^>]*<script\s+src="([^"]*)">[^>]*</script>',
+            webpage, 'gigya', default=None)
+        gigya_sc = self._download_webpage(
+            compat_urlparse.urljoin('http://www.mitele.es/', gigya_url),
+            video_id, 'Downloading gigya script')
+
          # Get a appKey/uuid for getting the session key
-        appKey_var = self._search_regex(r'value\("appGridApplicationKey",([0-9a-f]+)\)', gigya_sc, 'appKey variable')
-        appKey = self._search_regex(r'var %s="([0-9a-f]+)"' % appKey_var, gigya_sc, 'appKey')
-        uid = compat_str(uuid.uuid4())
-        session_url = 'https://appgrid-api.cloud.accedo.tv/session?appKey=%s&uuid=%s' % (appKey, uid)
-        session_json = self._download_json(session_url, video_id, 'Downloading session keys')
-        sessionKey = compat_str(session_json['sessionKey'])
-
-        paths_url = 'https://appgrid-api.cloud.accedo.tv/metadata/general_configuration,%20web_configuration?sessionKey=' + sessionKey
-        paths = self._download_json(paths_url, video_id, 'Downloading paths JSON')
+        appKey_var = self._search_regex(
+            r'value\s*\(\s*["\']appGridApplicationKey["\']\s*,\s*([0-9a-f]+)',
+            gigya_sc, 'appKey variable')
+        appKey = self._search_regex(
+            r'var\s+%s\s*=\s*["\']([0-9a-f]+)' % appKey_var, gigya_sc, 'appKey')
+
+        session_json = self._download_json(
+            'https://appgrid-api.cloud.accedo.tv/session',
+            video_id, 'Downloading session keys', query={
+                'appKey': appKey,
+                'uuid': compat_str(uuid.uuid4()),
+            })
+
+        paths = self._download_json(
+            'https://appgrid-api.cloud.accedo.tv/metadata/general_configuration,%20web_configuration',
+            video_id, 'Downloading paths JSON',
+            query={'sessionKey': compat_str(session_json['sessionKey'])})
+
          ooyala_s = paths['general_configuration']['api_configuration']['ooyala_search']
-        data_p = (
-            'http://' + ooyala_s['base_url'] + ooyala_s['full_path'] + ooyala_s['provider_id'] +
-            '/docs/' + video_id + '?include_titles=Series,Season&product_name=test&format=full')
-        data = self._download_json(data_p, video_id, 'Downloading data JSON')
-        source = data['hits']['hits'][0]['_source']
-        embedCode = source['offers'][0]['embed_codes'][0]
+        source = self._download_json(
+            'http://%s%s%s/docs/%s' % (
+                ooyala_s['base_url'], ooyala_s['full_path'],
+                ooyala_s['provider_id'], video_id),
+            video_id, 'Downloading data JSON', query={
+                'include_titles': 'Series,Season',
+                'product_name': 'test',
+                'format': 'full',
+            })['hits']['hits'][0]['_source']
  
+        embedCode = source['offers'][0]['embed_codes'][0]
          titles = source['localizable_titles'][0]
+
          title = titles.get('title_medium') or titles['title_long']
-        episode = titles['title_sort_name']
-        description = titles['summary_long']
-        titles_series = source['localizable_titles_series'][0]
-        series = titles_series['title_long']
-        titles_season = source['localizable_titles_season'][0]
-        season = titles_season['title_medium']
-        duration = parse_duration(source['videos'][0]['duration'])
+
+        description = titles.get('summary_long') or titles.get('summary_medium')
+
+        def get(key1, key2):
+            value1 = source.get(key1)
+            if not value1 or not isinstance(value1, list):
+                return
+            if not isinstance(value1[0], dict):
+                return
+            return value1[0].get(key2)
+
+        series = get('localizable_titles_series', 'title_medium')
+
+        season = get('localizable_titles_season', 'title_medium')
+        season_number = int_or_none(source.get('season_number'))
+        season_id = source.get('season_id')
+
+        episode = titles.get('title_sort_name')
+        episode_number = int_or_none(source.get('episode_number'))
+
+        duration = parse_duration(get('videos', 'duration'))
  
          return {
              '_type': 'url_transparent',
              # for some reason only HLS is supported
-            'url': smuggle_url('ooyala:' + embedCode, {'supportedformats': 'm3u8'}),
+            'url': smuggle_url('ooyala:' + embedCode, {'supportedformats': 'm3u8,dash'}),
              'id': video_id,
              'title': title,
              'description': description,
              'series': series,
              'season': season,
+            'season_number': season_number,
+            'season_id': season_id,
              'episode': episode,
+            'episode_number': episode_number,
              'duration': duration,
-            'thumbnail': source['images'][0]['url'],
+            'thumbnail': get('images', 'url'),
          }
diff --git a/youtube_dl/extractor/mixcloud.py b/youtube_dl/extractor/mixcloud.py

index 560fe188b675a619785332eea285484fa85154bf..a24b3165a49670444024f4503877efa3467b8dbc 100644 (file)
--- a/youtube_dl/extractor/mixcloud.py
+++ b/youtube_dl/extractor/mixcloud.py
@@ -16,13 +16,12 @@
      clean_html,
      ExtractorError,
      OnDemandPagedList,
-    parse_count,
      str_to_int,
  )
  
  
  class MixcloudIE(InfoExtractor):
-    _VALID_URL = r'^(?:https?://)?(?:www\.)?mixcloud\.com/([^/]+)/(?!stream|uploads|favorites|listens|playlists)([^/]+)'
+    _VALID_URL = r'https?://(?:(?:www|beta|m)\.)?mixcloud\.com/([^/]+)/(?!stream|uploads|favorites|listens|playlists)([^/]+)'
      IE_NAME = 'mixcloud'
  
      _TESTS = [{
@@ -34,9 +33,8 @@ class MixcloudIE(InfoExtractor):
              'description': 'After quite a long silence from myself, finally another Drum\'n\'Bass mix with my favourite current dance floor bangers.',
              'uploader': 'Daniel Holbach',
              'uploader_id': 'dholbach',
-            'thumbnail': 're:https?://.*\.jpg',
+            'thumbnail': r're:https?://.*\.jpg',
              'view_count': int,
-            'like_count': int,
          },
      }, {
          'url': 'http://www.mixcloud.com/gillespeterson/caribou-7-inch-vinyl-mix-chat/',
@@ -49,8 +47,10 @@ class MixcloudIE(InfoExtractor):
              'uploader_id': 'gillespeterson',
              'thumbnail': 're:https?://.*',
              'view_count': int,
-            'like_count': int,
          },
+    }, {
+        'url': 'https://beta.mixcloud.com/RedLightRadio/nosedrip-15-red-light-radio-01-18-2016/',
+        'only_matching': True,
      }]
  
      # See https://www.mixcloud.com/media/js2/www_js_2.9e23256562c080482435196ca3975ab5.js
@@ -86,26 +86,18 @@ def _real_extract(self, url):
  
          song_url = play_info['stream_url']
  
-        PREFIX = (
-            r'm-play-on-spacebar[^>]+'
-            r'(?:\s+[a-zA-Z0-9-]+(?:="[^"]+")?)*?\s+')
-        title = self._html_search_regex(
-            PREFIX + r'm-title="([^"]+)"', webpage, 'title')
+        title = self._html_search_regex(r'm-title="([^"]+)"', webpage, 'title')
          thumbnail = self._proto_relative_url(self._html_search_regex(
-            PREFIX + r'm-thumbnail-url="([^"]+)"', webpage, 'thumbnail',
-            fatal=False))
+            r'm-thumbnail-url="([^"]+)"', webpage, 'thumbnail', fatal=False))
          uploader = self._html_search_regex(
-            PREFIX + r'm-owner-name="([^"]+)"',
-            webpage, 'uploader', fatal=False)
+            r'm-owner-name="([^"]+)"', webpage, 'uploader', fatal=False)
          uploader_id = self._search_regex(
              r'\s+"profile": "([^"]+)",', webpage, 'uploader id', fatal=False)
          description = self._og_search_description(webpage)
-        like_count = parse_count(self._search_regex(
-            r'\bbutton-favorite[^>]+>.*?<span[^>]+class=["\']toggle-number[^>]+>\s*([^<]+)',
-            webpage, 'like count', default=None))
          view_count = str_to_int(self._search_regex(
              [r'<meta itemprop="interactionCount" content="UserPlays:([0-9]+)"',
-             r'/listeners/?">([0-9,.]+)</a>'],
+             r'/listeners/?">([0-9,.]+)</a>',
+             r'm-tooltip=["\']([\d,.]+) plays'],
              webpage, 'play count', default=None))
  
          return {
@@ -117,7 +109,6 @@ def _real_extract(self, url):
              'uploader': uploader,
              'uploader_id': uploader_id,
              'view_count': view_count,
-            'like_count': like_count,
          }
  
  
diff --git a/youtube_dl/extractor/mlb.py b/youtube_dl/extractor/mlb.py

index e242b897f2b63cf624805c7564cf7e2f02a9d16b..59cd4b8389f28a72f9d16df70edfa64a7ce2ba40 100644 (file)
--- a/youtube_dl/extractor/mlb.py
+++ b/youtube_dl/extractor/mlb.py
@@ -37,7 +37,7 @@ class MLBIE(InfoExtractor):
                  'duration': 66,
                  'timestamp': 1405980600,
                  'upload_date': '20140721',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
              },
          },
          {
@@ -51,7 +51,7 @@ class MLBIE(InfoExtractor):
                  'duration': 46,
                  'timestamp': 1405105800,
                  'upload_date': '20140711',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
              },
          },
          {
@@ -65,7 +65,7 @@ class MLBIE(InfoExtractor):
                  'duration': 488,
                  'timestamp': 1405399936,
                  'upload_date': '20140715',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
              },
          },
          {
@@ -79,7 +79,7 @@ class MLBIE(InfoExtractor):
                  'duration': 52,
                  'timestamp': 1405390722,
                  'upload_date': '20140715',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
              },
          },
          {
diff --git a/youtube_dl/extractor/mnet.py b/youtube_dl/extractor/mnet.py

index e3f42e7bdae7503d69d7585a407c3d68e333086e..6a85dcbd522cfb087499daab81482fac84d75a0a 100644 (file)
--- a/youtube_dl/extractor/mnet.py
+++ b/youtube_dl/extractor/mnet.py
@@ -22,7 +22,7 @@ class MnetIE(InfoExtractor):
              'timestamp': 1451564040,
              'age_limit': 0,
              'thumbnails': 'mincount:5',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'ext': 'flv',
          },
          'params': {
diff --git a/youtube_dl/extractor/moevideo.py b/youtube_dl/extractor/moevideo.py

index 91ee9c4e95204718cb069fe1dc36908821b7af6d..44bcc498254dc2503b8ce103b59d1c207c44df66 100644 (file)
--- a/youtube_dl/extractor/moevideo.py
+++ b/youtube_dl/extractor/moevideo.py
@@ -30,7 +30,7 @@ class MoeVideoIE(InfoExtractor):
                  'ext': 'flv',
                  'title': 'Sink cut out machine',
                  'description': 'md5:f29ff97b663aefa760bf7ca63c8ca8a8',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
                  'width': 540,
                  'height': 360,
                  'duration': 179,
@@ -46,7 +46,7 @@ class MoeVideoIE(InfoExtractor):
                  'ext': 'flv',
                  'title': 'Operacion Condor.',
                  'description': 'md5:7e68cb2fcda66833d5081c542491a9a3',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
                  'width': 480,
                  'height': 296,
                  'duration': 6027,
diff --git a/youtube_dl/extractor/mofosex.py b/youtube_dl/extractor/mofosex.py

index e3bbe5aa8997694f62a07d8a2e0c383aa64daae1..54716f5c7af1dc15e9b0a5b5174b08ba68782bce 100644 (file)
--- a/youtube_dl/extractor/mofosex.py
+++ b/youtube_dl/extractor/mofosex.py
@@ -18,7 +18,7 @@ class MofosexIE(KeezMoviesIE):
              'display_id': 'amateur-teen-playing-and-masturbating-318131',
              'ext': 'mp4',
              'title': 'amateur teen playing and masturbating',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'upload_date': '20121114',
              'view_count': int,
              'like_count': int,
diff --git a/youtube_dl/extractor/mojvideo.py b/youtube_dl/extractor/mojvideo.py

index 0ba435dc5597219e9b569c441e58e3a196e1bfbe..165e658c94424268f3c8df0987dffbdff474723a 100644 (file)
--- a/youtube_dl/extractor/mojvideo.py
+++ b/youtube_dl/extractor/mojvideo.py
@@ -20,7 +20,7 @@ class MojvideoIE(InfoExtractor):
              'display_id': 'v-avtu-pred-mano-rdecelaska-alfi-nipic',
              'ext': 'mp4',
              'title': 'V avtu pred mano rdečelaska - Alfi Nipič',
-            'thumbnail': 're:^http://.*\.jpg$',
+            'thumbnail': r're:^http://.*\.jpg$',
              'duration': 242,
          }
      }
diff --git a/youtube_dl/extractor/motherless.py b/youtube_dl/extractor/motherless.py

index 5e1a8a71a93aa28962d7f260af966d10cf8e9f7a..6fe3b6049b2917ed5d7b075d0ca2c7ae943c459f 100644 (file)
--- a/youtube_dl/extractor/motherless.py
+++ b/youtube_dl/extractor/motherless.py
@@ -23,7 +23,7 @@ class MotherlessIE(InfoExtractor):
              'categories': ['Gaming', 'anal', 'reluctant', 'rough', 'Wife'],
              'upload_date': '20100913',
              'uploader_id': 'famouslyfuckedup',
-            'thumbnail': 're:http://.*\.jpg',
+            'thumbnail': r're:http://.*\.jpg',
              'age_limit': 18,
          }
      }, {
@@ -37,7 +37,7 @@ class MotherlessIE(InfoExtractor):
                             'game', 'hairy'],
              'upload_date': '20140622',
              'uploader_id': 'Sulivana7x',
-            'thumbnail': 're:http://.*\.jpg',
+            'thumbnail': r're:http://.*\.jpg',
              'age_limit': 18,
          },
          'skip': '404',
@@ -51,7 +51,7 @@ class MotherlessIE(InfoExtractor):
              'categories': ['superheroine heroine  superher'],
              'upload_date': '20140827',
              'uploader_id': 'shade0230',
-            'thumbnail': 're:http://.*\.jpg',
+            'thumbnail': r're:http://.*\.jpg',
              'age_limit': 18,
          }
      }, {
diff --git a/youtube_dl/extractor/movieclips.py b/youtube_dl/extractor/movieclips.py

index 30c206f9b61e22d3e029a68979643fc6ee7de635..5453da1acfe19a103d97a026b725b81e86d7859d 100644 (file)
--- a/youtube_dl/extractor/movieclips.py
+++ b/youtube_dl/extractor/movieclips.py
@@ -20,7 +20,7 @@ class MovieClipsIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Warcraft Trailer 1',
              'description': 'Watch Trailer 1 from Warcraft (2016). Legendary’s WARCRAFT is a 3D epic adventure of world-colliding conflict based.',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'timestamp': 1446843055,
              'upload_date': '20151106',
              'uploader': 'Movieclips',
diff --git a/youtube_dl/extractor/moviezine.py b/youtube_dl/extractor/moviezine.py

index 478e3996743d1eca8434a786b58c4bd799a7dc55..85cc6e22f59cd5af0e194dee675beb3bc69a9369 100644 (file)
--- a/youtube_dl/extractor/moviezine.py
+++ b/youtube_dl/extractor/moviezine.py
@@ -16,7 +16,7 @@ class MoviezineIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Oculus - Trailer 1',
              'description': 'md5:40cc6790fc81d931850ca9249b40e8a4',
-            'thumbnail': 're:http://.*\.jpg',
+            'thumbnail': r're:http://.*\.jpg',
          },
      }
  
diff --git a/youtube_dl/extractor/movingimage.py b/youtube_dl/extractor/movingimage.py

index bb789c32edb45e78e9806faaae169af09826135e..4f62d628a24dbf65db894dce2dc24e56fde7403a 100644 (file)
--- a/youtube_dl/extractor/movingimage.py
+++ b/youtube_dl/extractor/movingimage.py
@@ -18,7 +18,7 @@ class MovingImageIE(InfoExtractor):
              'title': 'SHETLAND WOOL',
              'description': 'md5:c5afca6871ad59b4271e7704fe50ab04',
              'duration': 900,
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          },
      }
  
diff --git a/youtube_dl/extractor/msn.py b/youtube_dl/extractor/msn.py

index d75ce8b3b510b68ca0dfe754d8fcf1741e6cbd9d..1473bcf4845d4b8470393686a7bd9baf1df0e398 100644 (file)
--- a/youtube_dl/extractor/msn.py
+++ b/youtube_dl/extractor/msn.py
@@ -78,11 +78,6 @@ def _real_extract(self, url):
                  m3u8_formats = self._extract_m3u8_formats(
                      format_url, display_id, 'mp4',
                      m3u8_id='hls', fatal=False)
-                # Despite metadata in m3u8 all video+audio formats are
-                # actually video-only (no audio)
-                for f in m3u8_formats:
-                    if f.get('acodec') != 'none' and f.get('vcodec') != 'none':
-                        f['acodec'] = 'none'
                  formats.extend(m3u8_formats)
              else:
                  formats.append({
diff --git a/youtube_dl/extractor/mtv.py b/youtube_dl/extractor/mtv.py

index 74a3a035e771803154b6685fcd4cbfd3dbb20a9b..8acea1461a662dc40840526c4efabcbe7a7c29b0 100644 (file)
--- a/youtube_dl/extractor/mtv.py
+++ b/youtube_dl/extractor/mtv.py
@@ -17,6 +17,7 @@
      sanitized_Request,
      strip_or_none,
      timeconvert,
+    try_get,
      unescapeHTML,
      update_url_query,
      url_basename,
@@ -41,15 +42,6 @@ def _remove_template_parameter(url):
          # Remove the templates, like &device={device}
          return re.sub(r'&[^=]*?={.*?}(?=(&|$))', '', url)
  
-    # This was originally implemented for ComedyCentral, but it also works here
-    @classmethod
-    def _transform_rtmp_url(cls, rtmp_video_url):
-        m = re.match(r'^rtmpe?://.*?/(?P<finalid>gsp\..+?/.*)$', rtmp_video_url)
-        if not m:
-            return {'rtmp': rtmp_video_url}
-        base = 'http://viacommtvstrmfs.fplive.net/'
-        return {'http': base + m.group('finalid')}
-
      def _get_feed_url(self, uri):
          return self._FEED_URL
  
@@ -76,7 +68,7 @@ def _extract_mobile_video_formats(self, mtvn_id):
          url = re.sub(r'.+pxE=mp4', 'http://mtvnmobile.vo.llnwd.net/kip0/_pxn=0+_pxK=18639+_pxE=mp4', url, 1)
          return [{'url': url, 'ext': 'mp4'}]
  
-    def _extract_video_formats(self, mdoc, mtvn_id):
+    def _extract_video_formats(self, mdoc, mtvn_id, video_id):
          if re.match(r'.*/(error_country_block\.swf|geoblock\.mp4|copyright_error\.flv(?:\?geo\b.+?)?)$', mdoc.find('.//src').text) is not None:
              if mtvn_id is not None and self._MOBILE_TEMPLATE is not None:
                  self.to_screen('The normal version is not available from your '
@@ -87,21 +79,33 @@ def _extract_video_formats(self, mdoc, mtvn_id):
  
          formats = []
          for rendition in mdoc.findall('.//rendition'):
-            try:
-                _, _, ext = rendition.attrib['type'].partition('/')
-                rtmp_video_url = rendition.find('./src').text
-                if rtmp_video_url.endswith('siteunavail.png'):
-                    continue
-                new_urls = self._transform_rtmp_url(rtmp_video_url)
-                formats.extend([{
-                    'ext': 'flv' if new_url.startswith('rtmp') else ext,
-                    'url': new_url,
-                    'format_id': '-'.join(filter(None, [kind, rendition.get('bitrate')])),
-                    'width': int(rendition.get('width')),
-                    'height': int(rendition.get('height')),
-                } for kind, new_url in new_urls.items()])
-            except (KeyError, TypeError):
-                raise ExtractorError('Invalid rendition field.')
+            if rendition.get('method') == 'hls':
+                hls_url = rendition.find('./src').text
+                formats.extend(self._extract_m3u8_formats(
+                    hls_url, video_id, ext='mp4', entry_protocol='m3u8_native',
+                    m3u8_id='hls'))
+            else:
+                # fms
+                try:
+                    _, _, ext = rendition.attrib['type'].partition('/')
+                    rtmp_video_url = rendition.find('./src').text
+                    if 'error_not_available.swf' in rtmp_video_url:
+                        raise ExtractorError(
+                            '%s said: video is not available' % self.IE_NAME,
+                            expected=True)
+                    if rtmp_video_url.endswith('siteunavail.png'):
+                        continue
+                    formats.extend([{
+                        'ext': 'flv' if rtmp_video_url.startswith('rtmp') else ext,
+                        'url': rtmp_video_url,
+                        'format_id': '-'.join(filter(None, [
+                            'rtmp' if rtmp_video_url.startswith('rtmp') else None,
+                            rendition.get('bitrate')])),
+                        'width': int(rendition.get('width')),
+                        'height': int(rendition.get('height')),
+                    }])
+                except (KeyError, TypeError):
+                    raise ExtractorError('Invalid rendition field.')
          self._sort_formats(formats)
          return formats
  
@@ -117,15 +121,17 @@ def _extract_subtitles(self, mdoc, mtvn_id):
              } for typographic in transcript.findall('./typographic')]
          return subtitles
  
-    def _get_video_info(self, itemdoc):
+    def _get_video_info(self, itemdoc, use_hls=True):
          uri = itemdoc.find('guid').text
          video_id = self._id_from_uri(uri)
          self.report_extraction(video_id)
          content_el = itemdoc.find('%s/%s' % (_media_xml_tag('group'), _media_xml_tag('content')))
          mediagen_url = self._remove_template_parameter(content_el.attrib['url'])
+        mediagen_url = mediagen_url.replace('device={device}', '')
          if 'acceptMethods' not in mediagen_url:
              mediagen_url += '&' if '?' in mediagen_url else '?'
-            mediagen_url += 'acceptMethods=fms'
+            mediagen_url += 'acceptMethods='
+            mediagen_url += 'hls' if use_hls else 'fms'
  
          mediagen_doc = self._download_xml(mediagen_url, video_id,
                                            'Downloading video urls')
@@ -166,9 +172,11 @@ def _get_video_info(self, itemdoc):
          if mtvn_id_node is not None:
              mtvn_id = mtvn_id_node.text
  
+        formats = self._extract_video_formats(mediagen_doc, mtvn_id, video_id)
+
          return {
              'title': title,
-            'formats': self._extract_video_formats(mediagen_doc, mtvn_id),
+            'formats': formats,
              'subtitles': self._extract_subtitles(mediagen_doc, mtvn_id),
              'id': video_id,
              'thumbnail': self._get_thumbnail_url(uri, itemdoc),
@@ -183,13 +191,13 @@ def _get_feed_query(self, uri):
              data['lang'] = self._LANG
          return data
  
-    def _get_videos_info(self, uri):
+    def _get_videos_info(self, uri, use_hls=True):
          video_id = self._id_from_uri(uri)
          feed_url = self._get_feed_url(uri)
          info_url = update_url_query(feed_url, self._get_feed_query(uri))
-        return self._get_videos_info_from_url(info_url, video_id)
+        return self._get_videos_info_from_url(info_url, video_id, use_hls)
  
-    def _get_videos_info_from_url(self, url, video_id):
+    def _get_videos_info_from_url(self, url, video_id, use_hls=True):
          idoc = self._download_xml(
              url, video_id,
              'Downloading info', transform_source=fix_xml_ampersands)
@@ -198,9 +206,30 @@ def _get_videos_info_from_url(self, url, video_id):
          description = xpath_text(idoc, './channel/description')
  
          return self.playlist_result(
-            [self._get_video_info(item) for item in idoc.findall('.//item')],
+            [self._get_video_info(item, use_hls) for item in idoc.findall('.//item')],
              playlist_title=title, playlist_description=description)
  
+    def _extract_triforce_mgid(self, webpage, data_zone=None, video_id=None):
+        triforce_feed = self._parse_json(self._search_regex(
+            r'triforceManifestFeed\s*=\s*({.+?})\s*;\s*\n', webpage,
+            'triforce feed', default='{}'), video_id, fatal=False)
+
+        data_zone = self._search_regex(
+            r'data-zone=(["\'])(?P<zone>.+?_lc_promo.*?)\1', webpage,
+            'data zone', default=data_zone, group='zone')
+
+        feed_url = try_get(
+            triforce_feed, lambda x: x['manifest']['zones'][data_zone]['feed'],
+            compat_str)
+        if not feed_url:
+            return
+
+        feed = self._download_json(feed_url, video_id, fatal=False)
+        if not feed:
+            return
+
+        return try_get(feed, lambda x: x['result']['data']['id'], compat_str)
+
      def _extract_mgid(self, webpage):
          try:
              # the url can be http://media.mtvnservices.com/fb/{mgid}.swf
@@ -221,7 +250,11 @@ def _extract_mgid(self, webpage):
              sm4_embed = self._html_search_meta(
                  'sm4:video:embed', webpage, 'sm4 embed', default='')
              mgid = self._search_regex(
-                r'embed/(mgid:.+?)["\'&?/]', sm4_embed, 'mgid')
+                r'embed/(mgid:.+?)["\'&?/]', sm4_embed, 'mgid', default=None)
+
+        if not mgid:
+            mgid = self._extract_triforce_mgid(webpage)
+
          return mgid
  
      def _real_extract(self, url):
@@ -271,7 +304,7 @@ def _real_extract(self, url):
  
  class MTVIE(MTVServicesInfoExtractor):
      IE_NAME = 'mtv'
-    _VALID_URL = r'https?://(?:www\.)?mtv\.com/(?:video-clips|full-episodes)/(?P<id>[^/?#.]+)'
+    _VALID_URL = r'https?://(?:www\.)?mtv\.com/(?:video-clips|(?:full-)?episodes)/(?P<id>[^/?#.]+)'
      _FEED_URL = 'http://www.mtv.com/feeds/mrss/'
  
      _TESTS = [{
@@ -288,9 +321,41 @@ class MTVIE(MTVServicesInfoExtractor):
      }, {
          'url': 'http://www.mtv.com/full-episodes/94tujl/unlocking-the-truth-gates-of-hell-season-1-ep-101',
          'only_matching': True,
+    }, {
+        'url': 'http://www.mtv.com/episodes/g8xu7q/teen-mom-2-breaking-the-wall-season-7-ep-713',
+        'only_matching': True,
      }]
  
  
+class MTV81IE(InfoExtractor):
+    IE_NAME = 'mtv81'
+    _VALID_URL = r'https?://(?:www\.)?mtv81\.com/videos/(?P<id>[^/?#.]+)'
+
+    _TEST = {
+        'url': 'http://www.mtv81.com/videos/artist-to-watch/the-godfather-of-japanese-hip-hop-segment-1/',
+        'md5': '1edbcdf1e7628e414a8c5dcebca3d32b',
+        'info_dict': {
+            'id': '5e14040d-18a4-47c4-a582-43ff602de88e',
+            'ext': 'mp4',
+            'title': 'Unlocking The Truth|July 18, 2016|1|101|Trailer',
+            'description': '"Unlocking the Truth" premieres August 17th at 11/10c.',
+            'timestamp': 1468846800,
+            'upload_date': '20160718',
+        },
+    }
+
+    def _extract_mgid(self, webpage):
+        return self._search_regex(
+            r'getTheVideo\((["\'])(?P<id>mgid:.+?)\1', webpage,
+            'mgid', group='id')
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        webpage = self._download_webpage(url, video_id)
+        mgid = self._extract_mgid(webpage)
+        return self.url_result('http://media.mtvnservices.com/embed/%s' % mgid)
+
+
  class MTVVideoIE(MTVServicesInfoExtractor):
      IE_NAME = 'mtv:video'
      _VALID_URL = r'''(?x)^https?://
diff --git a/youtube_dl/extractor/muenchentv.py b/youtube_dl/extractor/muenchentv.py

index d9f17613633d245283f5f5745acca2feb273cbf5..2cc2bf229b3bee21fa5b79e40af2a666cd4ccea1 100644 (file)
--- a/youtube_dl/extractor/muenchentv.py
+++ b/youtube_dl/extractor/muenchentv.py
@@ -22,7 +22,7 @@ class MuenchenTVIE(InfoExtractor):
              'ext': 'mp4',
              'title': 're:^münchen.tv-Livestream [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
              'is_live': True,
-            'thumbnail': 're:^https?://.*\.jpg$'
+            'thumbnail': r're:^https?://.*\.jpg$'
          },
          'params': {
              'skip_download': True,
diff --git a/youtube_dl/extractor/mwave.py b/youtube_dl/extractor/mwave.py

index fea1caf478b2a862ae3a028b4a80041b734a5e1b..a67276596f0cf148d2f944a1f1375831a18eca4a 100644 (file)
--- a/youtube_dl/extractor/mwave.py
+++ b/youtube_dl/extractor/mwave.py
@@ -18,7 +18,7 @@ class MwaveIE(InfoExtractor):
              'id': '168859',
              'ext': 'flv',
              'title': '[M COUNTDOWN] SISTAR - SHAKE IT',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'uploader': 'M COUNTDOWN',
              'duration': 206,
              'view_count': int,
@@ -70,7 +70,7 @@ class MwaveMeetGreetIE(InfoExtractor):
              'id': '173294',
              'ext': 'flv',
              'title': '[MEET&GREET] Park BoRam',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'uploader': 'Mwave',
              'duration': 3634,
              'view_count': int,
diff --git a/youtube_dl/extractor/myvi.py b/youtube_dl/extractor/myvi.py

index 4c65be122fd536c6e39fd9f10052e8933e96417d..621ae74a7930cbaefb1f5c867de27d70499fe5d6 100644 (file)
--- a/youtube_dl/extractor/myvi.py
+++ b/youtube_dl/extractor/myvi.py
@@ -27,7 +27,7 @@ class MyviIE(SprutoBaseIE):
              'id': 'f16b2bbd-cde8-481c-a981-7cd48605df43',
              'ext': 'mp4',
              'title': 'хозяин жизни',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 25,
          },
      }, {
diff --git a/youtube_dl/extractor/myvideo.py b/youtube_dl/extractor/myvideo.py

index 6d447a4935e49cd3c4f7525fff6ffe5e9883656e..6bb64eb63c52018c34650a6361e7d70cad2459e0 100644 (file)
--- a/youtube_dl/extractor/myvideo.py
+++ b/youtube_dl/extractor/myvideo.py
@@ -160,7 +160,7 @@ def _real_extract(self, url):
          else:
              video_playpath = ''
  
-        video_swfobj = self._search_regex('swfobject.embedSWF\(\'(.+?)\'', webpage, 'swfobj')
+        video_swfobj = self._search_regex(r'swfobject.embedSWF\(\'(.+?)\'', webpage, 'swfobj')
          video_swfobj = compat_urllib_parse_unquote(video_swfobj)
  
          video_title = self._html_search_regex("<h1(?: class='globalHd')?>(.*?)</h1>",
diff --git a/youtube_dl/extractor/naver.py b/youtube_dl/extractor/naver.py

index 055070ff54fd8990c2e58ab1d6df037b19f3a029..e8131333f8458505b7a323f378c5fee848414934 100644 (file)
--- a/youtube_dl/extractor/naver.py
+++ b/youtube_dl/extractor/naver.py
@@ -12,10 +12,10 @@
  
  
  class NaverIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:m\.)?tvcast\.naver\.com/v/(?P<id>\d+)'
+    _VALID_URL = r'https?://(?:m\.)?tv(?:cast)?\.naver\.com/v/(?P<id>\d+)'
  
      _TESTS = [{
-        'url': 'http://tvcast.naver.com/v/81652',
+        'url': 'http://tv.naver.com/v/81652',
          'info_dict': {
              'id': '81652',
              'ext': 'mp4',
@@ -24,7 +24,7 @@ class NaverIE(InfoExtractor):
              'upload_date': '20130903',
          },
      }, {
-        'url': 'http://tvcast.naver.com/v/395837',
+        'url': 'http://tv.naver.com/v/395837',
          'md5': '638ed4c12012c458fefcddfd01f173cd',
          'info_dict': {
              'id': '395837',
@@ -34,6 +34,9 @@ class NaverIE(InfoExtractor):
              'upload_date': '20150519',
          },
          'skip': 'Georestricted',
+    }, {
+        'url': 'http://tvcast.naver.com/v/81652',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
diff --git a/youtube_dl/extractor/nbc.py b/youtube_dl/extractor/nbc.py

index 7f1bd9229303ec0390c9d10937374a0cc986790b..434a94de49b9f1623385f20386feaa1b34da75fe 100644 (file)
--- a/youtube_dl/extractor/nbc.py
+++ b/youtube_dl/extractor/nbc.py
@@ -9,6 +9,7 @@
      lowercase_escape,
      smuggle_url,
      unescapeHTML,
+    update_url_query,
  )
  
  
@@ -208,7 +209,7 @@ class NBCNewsIE(ThePlatformIE):
              'url': 'http://www.nbcnews.com/watch/nbcnews-com/how-twitter-reacted-to-the-snowden-interview-269389891880',
              'md5': 'af1adfa51312291a017720403826bb64',
              'info_dict': {
-                'id': '269389891880',
+                'id': 'p_tweet_snow_140529',
                  'ext': 'mp4',
                  'title': 'How Twitter Reacted To The Snowden Interview',
                  'description': 'md5:65a0bd5d76fe114f3c2727aa3a81fe64',
@@ -232,7 +233,7 @@ class NBCNewsIE(ThePlatformIE):
              'url': 'http://www.nbcnews.com/nightly-news/video/nightly-news-with-brian-williams-full-broadcast-february-4-394064451844',
              'md5': '73135a2e0ef819107bbb55a5a9b2a802',
              'info_dict': {
-                'id': '394064451844',
+                'id': 'nn_netcast_150204',
                  'ext': 'mp4',
                  'title': 'Nightly News with Brian Williams Full Broadcast (February 4)',
                  'description': 'md5:1c10c1eccbe84a26e5debb4381e2d3c5',
@@ -245,7 +246,7 @@ class NBCNewsIE(ThePlatformIE):
              'url': 'http://www.nbcnews.com/business/autos/volkswagen-11-million-vehicles-could-have-suspect-software-emissions-scandal-n431456',
              'md5': 'a49e173825e5fcd15c13fc297fced39d',
              'info_dict': {
-                'id': '529953347624',
+                'id': 'x_lon_vwhorn_150922',
                  'ext': 'mp4',
                  'title': 'Volkswagen U.S. Chief:\xa0 We Have Totally Screwed Up',
                  'description': 'md5:c8be487b2d80ff0594c005add88d8351',
@@ -258,7 +259,7 @@ class NBCNewsIE(ThePlatformIE):
              'url': 'http://www.today.com/video/see-the-aurora-borealis-from-space-in-stunning-new-nasa-video-669831235788',
              'md5': '118d7ca3f0bea6534f119c68ef539f71',
              'info_dict': {
-                'id': '669831235788',
+                'id': 'tdy_al_space_160420',
                  'ext': 'mp4',
                  'title': 'See the aurora borealis from space in stunning new NASA video',
                  'description': 'md5:74752b7358afb99939c5f8bb2d1d04b1',
@@ -271,15 +272,14 @@ class NBCNewsIE(ThePlatformIE):
              'url': 'http://www.msnbc.com/all-in-with-chris-hayes/watch/the-chaotic-gop-immigration-vote-314487875924',
              'md5': '6d236bf4f3dddc226633ce6e2c3f814d',
              'info_dict': {
-                'id': '314487875924',
+                'id': 'n_hayes_Aimm_140801_272214',
                  'ext': 'mp4',
                  'title': 'The chaotic GOP immigration vote',
                  'description': 'The Republican House votes on a border bill that has no chance of getting through the Senate or signed by the President and is drawing criticism from all sides.',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
                  'timestamp': 1406937606,
                  'upload_date': '20140802',
                  'uploader': 'NBCU-NEWS',
-                'categories': ['MSNBC/Topics/Franchise/Best of last night', 'MSNBC/Topics/General/Congress'],
              },
          },
          {
@@ -311,28 +311,41 @@ def _real_extract(self, url):
          else:
              # "feature" and "nightly-news" pages use theplatform.com
              video_id = mobj.group('mpx_id')
-            if not video_id.isdigit():
-                webpage = self._download_webpage(url, video_id)
-                info = None
-                bootstrap_json = self._search_regex(
-                    [r'(?m)(?:var\s+(?:bootstrapJson|playlistData)|NEWS\.videoObj)\s*=\s*({.+});?\s*$',
-                     r'videoObj\s*:\s*({.+})', r'data-video="([^"]+)"'],
-                    webpage, 'bootstrap json', default=None)
+            webpage = self._download_webpage(url, video_id)
+
+            filter_param = 'byId'
+            bootstrap_json = self._search_regex(
+                [r'(?m)(?:var\s+(?:bootstrapJson|playlistData)|NEWS\.videoObj)\s*=\s*({.+});?\s*$',
+                 r'videoObj\s*:\s*({.+})', r'data-video="([^"]+)"',
+                 r'jQuery\.extend\(Drupal\.settings\s*,\s*({.+?})\);'],
+                webpage, 'bootstrap json', default=None)
+            if bootstrap_json:
                  bootstrap = self._parse_json(
                      bootstrap_json, video_id, transform_source=unescapeHTML)
+
+                info = None
                  if 'results' in bootstrap:
                      info = bootstrap['results'][0]['video']
                  elif 'video' in bootstrap:
                      info = bootstrap['video']
+                elif 'msnbcVideoInfo' in bootstrap:
+                    info = bootstrap['msnbcVideoInfo']['meta']
+                elif 'msnbcThePlatform' in bootstrap:
+                    info = bootstrap['msnbcThePlatform']['videoPlayer']['video']
                  else:
                      info = bootstrap
-                video_id = info['mpxId']
+
+                if 'guid' in info:
+                    video_id = info['guid']
+                    filter_param = 'byGuid'
+                elif 'mpxId' in info:
+                    video_id = info['mpxId']
  
              return {
                  '_type': 'url_transparent',
                  'id': video_id,
                  # http://feed.theplatform.com/f/2E2eJC/nbcnews also works
-                'url': 'http://feed.theplatform.com/f/2E2eJC/nnd_NBCNews?byId=%s' % video_id,
+                'url': update_url_query('http://feed.theplatform.com/f/2E2eJC/nnd_NBCNews', {filter_param: video_id}),
                  'ie_key': 'ThePlatformFeed',
              }
  
diff --git a/youtube_dl/extractor/ndr.py b/youtube_dl/extractor/ndr.py

index e3b0da2e966eb9486ab5307a933c51d74f2a14ba..07528d140f38bfa68a0d04cb85978d1017bae547 100644 (file)
--- a/youtube_dl/extractor/ndr.py
+++ b/youtube_dl/extractor/ndr.py
@@ -302,7 +302,7 @@ class NDREmbedIE(NDREmbedBaseIE):
          'info_dict': {
              'id': 'livestream217',
              'ext': 'flv',
-            'title': 're:^NDR Fernsehen Niedersachsen \d{4}-\d{2}-\d{2} \d{2}:\d{2}$',
+            'title': r're:^NDR Fernsehen Niedersachsen \d{4}-\d{2}-\d{2} \d{2}:\d{2}$',
              'is_live': True,
              'upload_date': '20150910',
          },
@@ -367,7 +367,7 @@ class NJoyEmbedIE(NDREmbedBaseIE):
          'info_dict': {
              'id': 'webradioweltweit100',
              'ext': 'mp3',
-            'title': 're:^N-JOY Weltweit \d{4}-\d{2}-\d{2} \d{2}:\d{2}$',
+            'title': r're:^N-JOY Weltweit \d{4}-\d{2}-\d{2} \d{2}:\d{2}$',
              'is_live': True,
              'uploader': 'njoy',
              'upload_date': '20150810',
diff --git a/youtube_dl/extractor/ndtv.py b/youtube_dl/extractor/ndtv.py

index 96528f6499d1e02c5208e61fe8abd1f606b29392..255f608783edad0aa3838de028dd8ea07d9ae1b0 100644 (file)
--- a/youtube_dl/extractor/ndtv.py
+++ b/youtube_dl/extractor/ndtv.py
@@ -21,7 +21,7 @@ class NDTVIE(InfoExtractor):
              'description': 'md5:ab2d4b4a6056c5cb4caa6d729deabf02',
              'upload_date': '20131208',
              'duration': 1327,
-            'thumbnail': 're:https?://.*\.jpg',
+            'thumbnail': r're:https?://.*\.jpg',
          },
      }
  
diff --git a/youtube_dl/extractor/netzkino.py b/youtube_dl/extractor/netzkino.py

index 0d165a82ad53ac8ac16ca8943c934db9fb28b720..aec3026b12755e38d3ace0e8978a72409dcc562c 100644 (file)
--- a/youtube_dl/extractor/netzkino.py
+++ b/youtube_dl/extractor/netzkino.py
@@ -25,7 +25,7 @@ class NetzkinoIE(InfoExtractor):
              'comments': 'mincount:3',
              'description': 'md5:1eddeacc7e62d5a25a2d1a7290c64a28',
              'upload_date': '20120813',
-            'thumbnail': 're:https?://.*\.jpg$',
+            'thumbnail': r're:https?://.*\.jpg$',
              'timestamp': 1344858571,
              'age_limit': 12,
          },
diff --git a/youtube_dl/extractor/nextmedia.py b/youtube_dl/extractor/nextmedia.py

index dee9056d39e9bb0076d390054006c6dd4246afae..680f03aad4b318a70806555ac14d57a4bdfd05e0 100644 (file)
--- a/youtube_dl/extractor/nextmedia.py
+++ b/youtube_dl/extractor/nextmedia.py
@@ -2,7 +2,15 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
-from ..utils import parse_iso8601
+from ..compat import compat_urlparse
+from ..utils import (
+    clean_html,
+    get_element_by_class,
+    int_or_none,
+    parse_iso8601,
+    remove_start,
+    unified_timestamp,
+)
  
  
  class NextMediaIE(InfoExtractor):
@@ -15,7 +23,7 @@ class NextMediaIE(InfoExtractor):
              'id': '53109199',
              'ext': 'mp4',
              'title': '【佔領金鐘】50外國領事議員撐場 讚學生勇敢香港有希望',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'description': 'md5:28222b9912b6665a21011b034c70fcc7',
              'timestamp': 1415456273,
              'upload_date': '20141108',
@@ -30,6 +38,12 @@ def _real_extract(self, url):
          return self._extract_from_nextmedia_page(news_id, url, page)
  
      def _extract_from_nextmedia_page(self, news_id, url, page):
+        redirection_url = self._search_regex(
+            r'window\.location\.href\s*=\s*([\'"])(?P<url>(?!\1).+)\1',
+            page, 'redirection URL', default=None, group='url')
+        if redirection_url:
+            return self.url_result(compat_urlparse.urljoin(url, redirection_url))
+
          title = self._fetch_title(page)
          video_url = self._search_regex(self._URL_PATTERN, page, 'video url')
  
@@ -76,7 +90,7 @@ class NextMediaActionNewsIE(NextMediaIE):
              'id': '19009428',
              'ext': 'mp4',
              'title': '【壹週刊】細10年男友偷食　50歲邵美琪再失戀',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'description': 'md5:cd802fad1f40fd9ea178c1e2af02d659',
              'timestamp': 1421791200,
              'upload_date': '20150120',
@@ -93,7 +107,7 @@ def _real_extract(self, url):
  
  class AppleDailyIE(NextMediaIE):
      IE_DESC = '臺灣蘋果日報'
-    _VALID_URL = r'https?://(www|ent)\.appledaily\.com\.tw/(?:animation|appledaily|enews|realtimenews|actionnews)/[^/]+/[^/]+/(?P<date>\d+)/(?P<id>\d+)(/.*)?'
+    _VALID_URL = r'https?://(www|ent)\.appledaily\.com\.tw/[^/]+/[^/]+/[^/]+/(?P<date>\d+)/(?P<id>\d+)(/.*)?'
      _TESTS = [{
          'url': 'http://ent.appledaily.com.tw/enews/article/entertainment/20150128/36354694',
          'md5': 'a843ab23d150977cc55ef94f1e2c1e4d',
@@ -101,7 +115,7 @@ class AppleDailyIE(NextMediaIE):
              'id': '36354694',
              'ext': 'mp4',
              'title': '周亭羽走過摩鐵陰霾2男陪吃 九把刀孤寒看醫生',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'description': 'md5:2acd430e59956dc47cd7f67cb3c003f4',
              'upload_date': '20150128',
          }
@@ -112,7 +126,7 @@ class AppleDailyIE(NextMediaIE):
              'id': '550549',
              'ext': 'mp4',
              'title': '不滿被踩腳　山東兩大媽一路打下車',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'description': 'md5:175b4260c1d7c085993474217e4ab1b4',
              'upload_date': '20150128',
          }
@@ -123,7 +137,7 @@ class AppleDailyIE(NextMediaIE):
              'id': '5003671',
              'ext': 'mp4',
              'title': '20正妹熱舞　《刀龍傳說Online》火辣上市',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'description': 'md5:23c0aac567dc08c9c16a3161a2c2e3cd',
              'upload_date': '20150128',
          },
@@ -150,13 +164,17 @@ class AppleDailyIE(NextMediaIE):
              'id': '35770334',
              'ext': 'mp4',
              'title': '咖啡占卜測 XU裝熟指數',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'description': 'md5:7b859991a6a4fedbdf3dd3b66545c748',
              'upload_date': '20140417',
          },
      }, {
          'url': 'http://www.appledaily.com.tw/actionnews/appledaily/7/20161003/960588/',
          'only_matching': True,
+    }, {
+        # Redirected from http://ent.appledaily.com.tw/enews/article/entertainment/20150128/36354694
+        'url': 'http://ent.appledaily.com.tw/section/article/headline/20150128/36354694',
+        'only_matching': True,
      }]
  
      _URL_PATTERN = r'\{url: \'(.+)\'\}'
@@ -173,3 +191,48 @@ def _fetch_timestamp(self, page):
  
      def _fetch_description(self, page):
          return self._html_search_meta('description', page, 'news description')
+
+
+class NextTVIE(InfoExtractor):
+    IE_DESC = '壹電視'
+    _VALID_URL = r'https?://(?:www\.)?nexttv\.com\.tw/(?:[^/]+/)+(?P<id>\d+)'
+
+    _TEST = {
+        'url': 'http://www.nexttv.com.tw/news/realtime/politics/11779671',
+        'info_dict': {
+            'id': '11779671',
+            'ext': 'mp4',
+            'title': '「超收稅」近4千億！　藍議員籲發消費券',
+            'thumbnail': r're:^https?://.*\.jpg$',
+            'timestamp': 1484825400,
+            'upload_date': '20170119',
+            'view_count': int,
+        },
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, video_id)
+
+        title = self._html_search_regex(
+            r'<h1[^>]*>([^<]+)</h1>', webpage, 'title')
+
+        data = self._hidden_inputs(webpage)
+
+        video_url = data['ntt-vod-src-detailview']
+
+        date_str = get_element_by_class('date', webpage)
+        timestamp = unified_timestamp(date_str + '+0800') if date_str else None
+
+        view_count = int_or_none(remove_start(
+            clean_html(get_element_by_class('click', webpage)), '點閱：'))
+
+        return {
+            'id': video_id,
+            'title': title,
+            'url': video_url,
+            'thumbnail': data.get('ntt-vod-img-src'),
+            'timestamp': timestamp,
+            'view_count': view_count,
+        }
diff --git a/youtube_dl/extractor/nfl.py b/youtube_dl/extractor/nfl.py

index 3930d16f16e4d295e9afeb84f88eb36dc7ffc30b..460deb162df7994caa389b1f37c4174cec3fbf78 100644 (file)
--- a/youtube_dl/extractor/nfl.py
+++ b/youtube_dl/extractor/nfl.py
@@ -72,7 +72,7 @@ class NFLIE(InfoExtractor):
              'description': 'md5:56323bfb0ac4ee5ab24bd05fdf3bf478',
              'upload_date': '20140921',
              'timestamp': 1411337580,
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          }
      }, {
          'url': 'http://prod.www.steelers.clubs.nfl.com/video-and-audio/videos/LIVE_Post_Game_vs_Browns/9d72f26a-9e2b-4718-84d3-09fb4046c266',
@@ -84,7 +84,7 @@ class NFLIE(InfoExtractor):
              'description': 'md5:6a97f7e5ebeb4c0e69a418a89e0636e8',
              'upload_date': '20131229',
              'timestamp': 1388354455,
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          }
      }, {
          'url': 'http://www.nfl.com/news/story/0ap3000000467586/article/patriots-seahawks-involved-in-lategame-skirmish',
diff --git a/youtube_dl/extractor/nick.py b/youtube_dl/extractor/nick.py

index 7672845bfd0c6ebbc08ef326f024f4a02bb44a71..08a75929e1e049249759f94e6179cfa8932ba87c 100644 (file)
--- a/youtube_dl/extractor/nick.py
+++ b/youtube_dl/extractor/nick.py
@@ -10,7 +10,7 @@
  class NickIE(MTVServicesInfoExtractor):
      # None of videos on the website are still alive?
      IE_NAME = 'nick.com'
-    _VALID_URL = r'https?://(?:www\.)?nick(?:jr)?\.com/(?:videos/clip|[^/]+/videos)/(?P<id>[^/?#.]+)'
+    _VALID_URL = r'https?://(?:(?:www|beta)\.)?nick(?:jr)?\.com/(?:[^/]+/)?(?:videos/clip|[^/]+/videos)/(?P<id>[^/?#.]+)'
      _FEED_URL = 'http://udat.mtvnservices.com/service1/dispatch.htm'
      _TESTS = [{
          'url': 'http://www.nick.com/videos/clip/alvinnn-and-the-chipmunks-112-full-episode.html',
@@ -57,6 +57,9 @@ class NickIE(MTVServicesInfoExtractor):
      }, {
          'url': 'http://www.nickjr.com/paw-patrol/videos/pups-save-a-goldrush-s3-ep302-full-episode/',
          'only_matching': True,
+    }, {
+        'url': 'http://beta.nick.com/nicky-ricky-dicky-and-dawn/videos/nicky-ricky-dicky-dawn-301-full-episode/',
+        'only_matching': True,
      }]
  
      def _get_feed_query(self, uri):
diff --git a/youtube_dl/extractor/niconico.py b/youtube_dl/extractor/niconico.py

index a104e33f8bdea73540779e41db45d92c1249668a..8baac23e4b16643a59e6a83570862b6a5afd2b45 100644 (file)
--- a/youtube_dl/extractor/niconico.py
+++ b/youtube_dl/extractor/niconico.py
@@ -7,7 +7,6 @@
  
  from .common import InfoExtractor
  from ..compat import (
-    compat_urllib_parse_urlencode,
      compat_urlparse,
  )
  from ..utils import (
@@ -40,6 +39,7 @@ class NiconicoIE(InfoExtractor):
              'description': '(c) copyright 2008, Blender Foundation / www.bigbuckbunny.org',
              'duration': 33,
          },
+        'skip': 'Requires an account',
      }, {
          # File downloaded with and without credentials are different, so omit
          # the md5 field
@@ -55,6 +55,7 @@ class NiconicoIE(InfoExtractor):
              'timestamp': 1304065916,
              'duration': 209,
          },
+        'skip': 'Requires an account',
      }, {
          # 'video exists but is marked as "deleted"
          # md5 is unstable
@@ -65,9 +66,10 @@ class NiconicoIE(InfoExtractor):
              'description': 'deleted',
              'title': 'ドラえもんエターナル第3話「決戦第3新東京市」＜前編＞',
              'upload_date': '20071224',
-            'timestamp': 1198527840,  # timestamp field has different value if logged in
+            'timestamp': int,  # timestamp field has different value if logged in
              'duration': 304,
          },
+        'skip': 'Requires an account',
      }, {
          'url': 'http://www.nicovideo.jp/watch/so22543406',
          'info_dict': {
@@ -79,13 +81,12 @@ class NiconicoIE(InfoExtractor):
              'upload_date': '20140104',
              'uploader': 'アニメロチャンネル',
              'uploader_id': '312',
-        }
+        },
+        'skip': 'The viewing period of the video you were searching for has expired.',
      }]
  
      _VALID_URL = r'https?://(?:www\.|secure\.)?nicovideo\.jp/watch/(?P<id>(?:[a-z]{2})?[0-9]+)'
      _NETRC_MACHINE = 'niconico'
-    # Determine whether the downloader used authentication to download video
-    _AUTHENTICATED = False
  
      def _real_initialize(self):
          self._login()
@@ -109,8 +110,6 @@ def _login(self):
          if re.search(r'(?i)<h1 class="mb8p4">Log in error</h1>', login_results) is not None:
              self._downloader.report_warning('unable to log in: bad username or password')
              return False
-        # Successful login
-        self._AUTHENTICATED = True
          return True
  
      def _real_extract(self, url):
@@ -128,35 +127,19 @@ def _real_extract(self, url):
              'http://ext.nicovideo.jp/api/getthumbinfo/' + video_id, video_id,
              note='Downloading video info page')
  
-        if self._AUTHENTICATED:
-            # Get flv info
-            flv_info_webpage = self._download_webpage(
-                'http://flapi.nicovideo.jp/api/getflv/' + video_id + '?as3=1',
-                video_id, 'Downloading flv info')
-        else:
-            # Get external player info
-            ext_player_info = self._download_webpage(
-                'http://ext.nicovideo.jp/thumb_watch/' + video_id, video_id)
-            thumb_play_key = self._search_regex(
-                r'\'thumbPlayKey\'\s*:\s*\'(.*?)\'', ext_player_info, 'thumbPlayKey')
-
-            # Get flv info
-            flv_info_data = compat_urllib_parse_urlencode({
-                'k': thumb_play_key,
-                'v': video_id
-            })
-            flv_info_request = sanitized_Request(
-                'http://ext.nicovideo.jp/thumb_watch', flv_info_data,
-                {'Content-Type': 'application/x-www-form-urlencoded'})
-            flv_info_webpage = self._download_webpage(
-                flv_info_request, video_id,
-                note='Downloading flv info', errnote='Unable to download flv info')
+        # Get flv info
+        flv_info_webpage = self._download_webpage(
+            'http://flapi.nicovideo.jp/api/getflv/' + video_id + '?as3=1',
+            video_id, 'Downloading flv info')
  
          flv_info = compat_urlparse.parse_qs(flv_info_webpage)
          if 'url' not in flv_info:
              if 'deleted' in flv_info:
                  raise ExtractorError('The video has been deleted.',
                                       expected=True)
+            elif 'closed' in flv_info:
+                raise ExtractorError('Niconico videos now require logging in',
+                                     expected=True)
              else:
                  raise ExtractorError('Unable to find video URL')
  
diff --git a/youtube_dl/extractor/normalboots.py b/youtube_dl/extractor/normalboots.py

index 6aa0895b82e5949657a62b009addc8e93885936e..61fe571dfea17a3fab3206eb6eea0b7a2cbb979b 100644 (file)
--- a/youtube_dl/extractor/normalboots.py
+++ b/youtube_dl/extractor/normalboots.py
@@ -2,7 +2,7 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
-from .screenwavemedia import ScreenwaveMediaIE
+from .jwplatform import JWPlatformIE
  
  from ..utils import (
      unified_strdate,
@@ -25,7 +25,7 @@ class NormalbootsIE(InfoExtractor):
              # m3u8 download
              'skip_download': True,
          },
-        'add_ie': ['ScreenwaveMedia'],
+        'add_ie': ['JWPlatform'],
      }
  
      def _real_extract(self, url):
@@ -39,15 +39,13 @@ def _real_extract(self, url):
              r'<span style="text-transform:uppercase; font-size:inherit;">[A-Za-z]+, (?P<date>.*)</span>',
              webpage, 'date', fatal=False))
  
-        screenwavemedia_url = self._html_search_regex(
-            ScreenwaveMediaIE.EMBED_PATTERN, webpage, 'screenwave URL',
-            group='url')
+        jwplatform_url = JWPlatformIE._extract_url(webpage)
  
          return {
              '_type': 'url_transparent',
              'id': video_id,
-            'url': screenwavemedia_url,
-            'ie_key': ScreenwaveMediaIE.ie_key(),
+            'url': jwplatform_url,
+            'ie_key': JWPlatformIE.ie_key(),
              'title': self._og_search_title(webpage),
              'description': self._og_search_description(webpage),
              'thumbnail': self._og_search_thumbnail(webpage),
diff --git a/youtube_dl/extractor/nosvideo.py b/youtube_dl/extractor/nosvideo.py

index eab816e4916bc2fae7d72cde598cb5b5f69bfde4..53c500c351690cf82872f3c724d6a5c60a6b5f93 100644 (file)
--- a/youtube_dl/extractor/nosvideo.py
+++ b/youtube_dl/extractor/nosvideo.py
@@ -17,7 +17,7 @@
  
  class NosVideoIE(InfoExtractor):
      _VALID_URL = r'https?://(?:www\.)?nosvideo\.com/' + \
-                 '(?:embed/|\?v=)(?P<id>[A-Za-z0-9]{12})/?'
+                 r'(?:embed/|\?v=)(?P<id>[A-Za-z0-9]{12})/?'
      _PLAYLIST_URL = 'http://nosvideo.com/xml/{xml_id:s}.xml'
      _FILE_DELETED_REGEX = r'<b>File Not Found</b>'
      _TEST = {
@@ -27,7 +27,7 @@ class NosVideoIE(InfoExtractor):
              'id': 'mu8fle7g7rpq',
              'ext': 'mp4',
              'title': 'big_buck_bunny_480p_surround-fix.avi.mp4',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          }
      }
  
diff --git a/youtube_dl/extractor/nova.py b/youtube_dl/extractor/nova.py

index 103952345aa98ed186515452baf2f945409ffdaa..06cb8cb3f5a867104a8bd67abd9ce60bbc1df25f 100644 (file)
--- a/youtube_dl/extractor/nova.py
+++ b/youtube_dl/extractor/nova.py
@@ -21,7 +21,7 @@ class NovaIE(InfoExtractor):
              'ext': 'flv',
              'title': 'Duel: Michal Hrdlička a Petr Suchoň',
              'description': 'md5:d0cc509858eee1b1374111c588c6f5d5',
-            'thumbnail': 're:^https?://.*\.(?:jpg)',
+            'thumbnail': r're:^https?://.*\.(?:jpg)',
          },
          'params': {
              # rtmp download
@@ -36,7 +36,7 @@ class NovaIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Podzemní nemocnice v pražské Krči',
              'description': 'md5:f0a42dd239c26f61c28f19e62d20ef53',
-            'thumbnail': 're:^https?://.*\.(?:jpg)',
+            'thumbnail': r're:^https?://.*\.(?:jpg)',
          }
      }, {
          'url': 'http://novaplus.nova.cz/porad/policie-modrava/video/5591-policie-modrava-15-dil-blondynka-na-hrbitove',
@@ -46,7 +46,7 @@ class NovaIE(InfoExtractor):
              'ext': 'flv',
              'title': 'Policie Modrava - 15. díl - Blondýnka na hřbitově',
              'description': 'md5:dc24e50be5908df83348e50d1431295e',  # Make sure this description is clean of html tags
-            'thumbnail': 're:^https?://.*\.(?:jpg)',
+            'thumbnail': r're:^https?://.*\.(?:jpg)',
          },
          'params': {
              # rtmp download
@@ -58,7 +58,7 @@ class NovaIE(InfoExtractor):
              'id': '1756858',
              'ext': 'flv',
              'title': 'Televizní noviny - 30. 5. 2015',
-            'thumbnail': 're:^https?://.*\.(?:jpg)',
+            'thumbnail': r're:^https?://.*\.(?:jpg)',
              'upload_date': '20150530',
          },
          'params': {
@@ -72,7 +72,7 @@ class NovaIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Zaklínač 3: Divoký hon',
              'description': 're:.*Pokud se stejně jako my nemůžete.*',
-            'thumbnail': 're:https?://.*\.jpg(\?.*)?',
+            'thumbnail': r're:https?://.*\.jpg(\?.*)?',
              'upload_date': '20150521',
          },
          'params': {
diff --git a/youtube_dl/extractor/novamov.py b/youtube_dl/extractor/novamov.py

index 3bbd4735502e113fcc46a07981ff5863c52fef15..829c71960442a1159efd729f30704d046e92babd 100644 (file)
--- a/youtube_dl/extractor/novamov.py
+++ b/youtube_dl/extractor/novamov.py
@@ -24,7 +24,7 @@ class NovaMovIE(InfoExtractor):
                                  )
                                  (?P<id>[a-z\d]{13})
                              '''
-    _VALID_URL = _VALID_URL_TEMPLATE % {'host': 'novamov\.com'}
+    _VALID_URL = _VALID_URL_TEMPLATE % {'host': r'novamov\.com'}
  
      _HOST = 'www.novamov.com'
  
@@ -104,7 +104,7 @@ class WholeCloudIE(NovaMovIE):
      IE_NAME = 'wholecloud'
      IE_DESC = 'WholeCloud'
  
-    _VALID_URL = NovaMovIE._VALID_URL_TEMPLATE % {'host': '(?:wholecloud\.net|movshare\.(?:net|sx|ag))'}
+    _VALID_URL = NovaMovIE._VALID_URL_TEMPLATE % {'host': r'(?:wholecloud\.net|movshare\.(?:net|sx|ag))'}
  
      _HOST = 'www.wholecloud.net'
  
@@ -128,7 +128,7 @@ class NowVideoIE(NovaMovIE):
      IE_NAME = 'nowvideo'
      IE_DESC = 'NowVideo'
  
-    _VALID_URL = NovaMovIE._VALID_URL_TEMPLATE % {'host': 'nowvideo\.(?:to|ch|ec|sx|eu|at|ag|co|li)'}
+    _VALID_URL = NovaMovIE._VALID_URL_TEMPLATE % {'host': r'nowvideo\.(?:to|ch|ec|sx|eu|at|ag|co|li)'}
  
      _HOST = 'www.nowvideo.to'
  
@@ -152,7 +152,7 @@ class VideoWeedIE(NovaMovIE):
      IE_NAME = 'videoweed'
      IE_DESC = 'VideoWeed'
  
-    _VALID_URL = NovaMovIE._VALID_URL_TEMPLATE % {'host': 'videoweed\.(?:es|com)'}
+    _VALID_URL = NovaMovIE._VALID_URL_TEMPLATE % {'host': r'videoweed\.(?:es|com)'}
  
      _HOST = 'www.videoweed.es'
  
@@ -176,7 +176,7 @@ class CloudTimeIE(NovaMovIE):
      IE_NAME = 'cloudtime'
      IE_DESC = 'CloudTime'
  
-    _VALID_URL = NovaMovIE._VALID_URL_TEMPLATE % {'host': 'cloudtime\.to'}
+    _VALID_URL = NovaMovIE._VALID_URL_TEMPLATE % {'host': r'cloudtime\.to'}
  
      _HOST = 'www.cloudtime.to'
  
@@ -190,7 +190,7 @@ class AuroraVidIE(NovaMovIE):
      IE_NAME = 'auroravid'
      IE_DESC = 'AuroraVid'
  
-    _VALID_URL = NovaMovIE._VALID_URL_TEMPLATE % {'host': 'auroravid\.to'}
+    _VALID_URL = NovaMovIE._VALID_URL_TEMPLATE % {'host': r'auroravid\.to'}
  
      _HOST = 'www.auroravid.to'
  
diff --git a/youtube_dl/extractor/nowness.py b/youtube_dl/extractor/nowness.py

index 7e53463164b281e84a349a6fc382f5e203f278a4..b6c5ee6e417e12d35731e45e5755435867a3b67b 100644 (file)
--- a/youtube_dl/extractor/nowness.py
+++ b/youtube_dl/extractor/nowness.py
@@ -62,7 +62,7 @@ class NownessIE(NownessBaseIE):
              'ext': 'mp4',
              'title': 'Candor: The Art of Gesticulation',
              'description': 'Candor: The Art of Gesticulation',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'timestamp': 1446745676,
              'upload_date': '20151105',
              'uploader_id': '2385340575001',
@@ -76,7 +76,7 @@ class NownessIE(NownessBaseIE):
              'ext': 'mp4',
              'title': 'Kasper Bjørke ft. Jaakko Eino Kalevi: TNR',
              'description': 'Kasper Bjørke ft. Jaakko Eino Kalevi: TNR',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'timestamp': 1407315371,
              'upload_date': '20140806',
              'uploader_id': '2385340575001',
@@ -91,7 +91,7 @@ class NownessIE(NownessBaseIE):
              'ext': 'mp4',
              'title': 'Bleu, Blanc, Rouge - A Godard Supercut',
              'description': 'md5:f0ea5f1857dffca02dbd37875d742cec',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'upload_date': '20150607',
              'uploader': 'Cinema Sem Lei',
              'uploader_id': 'cinemasemlei',
diff --git a/youtube_dl/extractor/nowtv.py b/youtube_dl/extractor/nowtv.py

index 916a102bfc381cbfe9d2baf83ceb5d39241cd69d..e43b37136e13f2547b43bc474ae3654d36af6595 100644 (file)
--- a/youtube_dl/extractor/nowtv.py
+++ b/youtube_dl/extractor/nowtv.py
@@ -83,7 +83,7 @@ class NowTVIE(NowTVBaseIE):
              'ext': 'flv',
              'title': 'Inka Bause stellt die neuen Bauern vor',
              'description': 'md5:e234e1ed6d63cf06be5c070442612e7e',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'timestamp': 1432580700,
              'upload_date': '20150525',
              'duration': 2786,
@@ -101,7 +101,7 @@ class NowTVIE(NowTVBaseIE):
              'ext': 'flv',
              'title': 'Berlin - Tag & Nacht (Folge 934)',
              'description': 'md5:c85e88c2e36c552dfe63433bc9506dd0',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'timestamp': 1432666800,
              'upload_date': '20150526',
              'duration': 2641,
@@ -119,7 +119,7 @@ class NowTVIE(NowTVBaseIE):
              'ext': 'flv',
              'title': 'Hals- und Beinbruch',
              'description': 'md5:b50d248efffe244e6f56737f0911ca57',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'timestamp': 1432415400,
              'upload_date': '20150523',
              'duration': 2742,
@@ -137,7 +137,7 @@ class NowTVIE(NowTVBaseIE):
              'ext': 'flv',
              'title': 'Angst!',
              'description': 'md5:30cbc4c0b73ec98bcd73c9f2a8c17c4e',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'timestamp': 1222632900,
              'upload_date': '20080928',
              'duration': 3025,
@@ -155,7 +155,7 @@ class NowTVIE(NowTVBaseIE):
              'ext': 'flv',
              'title': 'Thema u.a.: Der erste Blick: Die Apple Watch',
              'description': 'md5:4312b6c9d839ffe7d8caf03865a531af',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'timestamp': 1432751700,
              'upload_date': '20150527',
              'duration': 1083,
@@ -173,7 +173,7 @@ class NowTVIE(NowTVBaseIE):
              'ext': 'flv',
              'title': "Büro-Fall / Chihuahua 'Joel'",
              'description': 'md5:e62cb6bf7c3cc669179d4f1eb279ad8d',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'timestamp': 1432408200,
              'upload_date': '20150523',
              'duration': 3092,
diff --git a/youtube_dl/extractor/noz.py b/youtube_dl/extractor/noz.py

index c47a33d1570537aedc0dc3c2415a5f138fbdc5bb..ccafd77232b5f1e69f61dd5a99e904c7775a51a7 100644 (file)
--- a/youtube_dl/extractor/noz.py
+++ b/youtube_dl/extractor/noz.py
@@ -24,7 +24,7 @@ class NozIE(InfoExtractor):
              'duration': 215,
              'title': '3:2 - Deutschland gewinnt Badminton-Länderspiel in Melle',
              'description': 'Vor rund 370 Zuschauern gewinnt die deutsche Badminton-Nationalmannschaft am Donnerstag ein EM-Vorbereitungsspiel gegen Frankreich in Melle. Video Moritz Frankenberg.',
-            'thumbnail': 're:^http://.*\.jpg',
+            'thumbnail': r're:^http://.*\.jpg',
          },
      }]
  
diff --git a/youtube_dl/extractor/npo.py b/youtube_dl/extractor/npo.py

index c91f5846171be2a720523a4531313703d18920fd..9624371450d0d555a8d843aac84346eae4b236fc 100644 (file)
--- a/youtube_dl/extractor/npo.py
+++ b/youtube_dl/extractor/npo.py
@@ -241,7 +241,7 @@ def _get_info(self, video_id):
          if metadata.get('tt888') == 'ja':
              subtitles['nl'] = [{
                  'ext': 'vtt',
-                'url': 'http://e.omroep.nl/tt888/%s' % video_id,
+                'url': 'http://tt888.omroep.nl/tt888/%s' % video_id,
              }]
  
          return {
diff --git a/youtube_dl/extractor/nrk.py b/youtube_dl/extractor/nrk.py

index aed98141be2365aeca164649a829a0efa72b6b01..fc3c0cd3ccb25ab8c41fdb1b8e9b424458c93209 100644 (file)
--- a/youtube_dl/extractor/nrk.py
+++ b/youtube_dl/extractor/nrk.py
@@ -17,14 +17,15 @@
  class NRKBaseIE(InfoExtractor):
      _faked_ip = None
  
-    def _download_webpage(self, *args, **kwargs):
+    def _download_webpage_handle(self, *args, **kwargs):
          # NRK checks X-Forwarded-For HTTP header in order to figure out the
          # origin of the client behind proxy. This allows to bypass geo
          # restriction by faking this header's value to some Norway IP.
          # We will do so once we encounter any geo restriction error.
          if self._faked_ip:
-            kwargs.setdefault('headers', {})['X-Forwarded-For'] = self._faked_ip
-        return super(NRKBaseIE, self)._download_webpage(*args, **kwargs)
+            # NB: str is intentional
+            kwargs.setdefault(str('headers'), {})['X-Forwarded-For'] = self._faked_ip
+        return super(NRKBaseIE, self)._download_webpage_handle(*args, **kwargs)
  
      def _fake_ip(self):
          # Use fake IP from 37.191.128.0/17 in order to workaround geo
@@ -43,8 +44,17 @@ def _real_extract(self, url):
          title = data.get('fullTitle') or data.get('mainTitle') or data['title']
          video_id = data.get('id') or video_id
  
+        http_headers = {'X-Forwarded-For': self._faked_ip} if self._faked_ip else {}
+
          entries = []
  
+        conviva = data.get('convivaStatistics') or {}
+        live = (data.get('mediaElementType') == 'Live' or
+                data.get('isLive') is True or conviva.get('isLive'))
+
+        def make_title(t):
+            return self._live_title(t) if live else t
+
          media_assets = data.get('mediaAssets')
          if media_assets and isinstance(media_assets, list):
              def video_id_and_title(idx):
@@ -58,6 +68,13 @@ def video_id_and_title(idx):
                  if not formats:
                      continue
                  self._sort_formats(formats)
+
+                # Some f4m streams may not work with hdcore in fragments' URLs
+                for f in formats:
+                    extra_param = f.get('extra_param_to_segment_url')
+                    if extra_param and 'hdcore' in extra_param:
+                        del f['extra_param_to_segment_url']
+
                  entry_id, entry_title = video_id_and_title(num)
                  duration = parse_duration(asset.get('duration'))
                  subtitles = {}
@@ -69,10 +86,11 @@ def video_id_and_title(idx):
                          })
                  entries.append({
                      'id': asset.get('carrierId') or entry_id,
-                    'title': entry_title,
+                    'title': make_title(entry_title),
                      'duration': duration,
                      'subtitles': subtitles,
                      'formats': formats,
+                    'http_headers': http_headers,
                  })
  
          if not entries:
@@ -83,14 +101,15 @@ def video_id_and_title(idx):
                  duration = parse_duration(data.get('duration'))
                  entries = [{
                      'id': video_id,
-                    'title': title,
+                    'title': make_title(title),
                      'duration': duration,
                      'formats': formats,
                  }]
  
          if not entries:
-            message_type = data.get('messageType')
-            if message_type == 'ProgramIsGeoBlocked' and not self._faked_ip:
+            message_type = data.get('messageType', '')
+            # Can be ProgramIsGeoBlocked or ChannelIsGeoBlocked*
+            if 'IsGeoBlocked' in message_type and not self._faked_ip:
                  self.report_warning(
                      'Video is geo restricted, trying to fake IP')
                  self._fake_ip()
@@ -106,10 +125,25 @@ def video_id_and_title(idx):
                      message_type, message_type)),
                  expected=True)
  
-        conviva = data.get('convivaStatistics') or {}
          series = conviva.get('seriesName') or data.get('seriesTitle')
          episode = conviva.get('episodeName') or data.get('episodeNumberOrDate')
  
+        season_number = None
+        episode_number = None
+        if data.get('mediaElementType') == 'Episode':
+            _season_episode = data.get('scoresStatistics', {}).get('springStreamStream') or \
+                data.get('relativeOriginUrl', '')
+            EPISODENUM_RE = [
+                r'/s(?P<season>\d{,2})e(?P<episode>\d{,2})\.',
+                r'/sesong-(?P<season>\d{,2})/episode-(?P<episode>\d{,2})',
+            ]
+            season_number = int_or_none(self._search_regex(
+                EPISODENUM_RE, _season_episode, 'season number',
+                default=None, group='season'))
+            episode_number = int_or_none(self._search_regex(
+                EPISODENUM_RE, _season_episode, 'episode number',
+                default=None, group='episode'))
+
          thumbnails = None
          images = data.get('images')
          if images and isinstance(images, dict):
@@ -122,11 +156,15 @@ def video_id_and_title(idx):
                  } for image in web_images if image.get('imageUrl')]
  
          description = data.get('description')
+        category = data.get('mediaAnalytics', {}).get('category')
  
          common_info = {
              'description': description,
              'series': series,
              'episode': episode,
+            'season_number': season_number,
+            'episode_number': episode_number,
+            'categories': [category] if category else None,
              'age_limit': parse_age_limit(data.get('legalAge')),
              'thumbnails': thumbnails,
          }
@@ -189,7 +227,15 @@ class NRKIE(NRKBaseIE):
  
  class NRKTVIE(NRKBaseIE):
      IE_DESC = 'NRK TV and NRK Radio'
-    _VALID_URL = r'https?://(?:tv|radio)\.nrk(?:super)?\.no/(?:serie/[^/]+|program)/(?P<id>[a-zA-Z]{4}\d{8})(?:/\d{2}-\d{2}-\d{4})?(?:#del=(?P<part_id>\d+))?'
+    _EPISODE_RE = r'(?P<id>[a-zA-Z]{4}\d{8})'
+    _VALID_URL = r'''(?x)
+                        https?://
+                            (?:tv|radio)\.nrk(?:super)?\.no/
+                            (?:serie/[^/]+|program)/
+                            (?![Ee]pisodes)%s
+                            (?:/\d{2}-\d{2}-\d{4})?
+                            (?:\#del=(?P<part_id>\d+))?
+                    ''' % _EPISODE_RE
      _API_HOST = 'psapi-we.nrk.no'
  
      _TESTS = [{
@@ -201,63 +247,145 @@ class NRKTVIE(NRKBaseIE):
              'title': '20 spørsmål 23.05.2014',
              'description': 'md5:bdea103bc35494c143c6a9acdd84887a',
              'duration': 1741,
+            'series': '20 spørsmål - TV',
+            'episode': '23.05.2014',
          },
      }, {
          'url': 'https://tv.nrk.no/program/mdfp15000514',
-        'md5': '43d0be26663d380603a9cf0c24366531',
          'info_dict': {
              'id': 'MDFP15000514CA',
              'ext': 'mp4',
              'title': 'Grunnlovsjubiléet - Stor ståhei for ingenting 24.05.2014',
              'description': 'md5:89290c5ccde1b3a24bb8050ab67fe1db',
              'duration': 4605,
+            'series': 'Kunnskapskanalen',
+            'episode': '24.05.2014',
+        },
+        'params': {
+            'skip_download': True,
          },
      }, {
          # single playlist video
          'url': 'https://tv.nrk.no/serie/tour-de-ski/MSPO40010515/06-01-2015#del=2',
-        'md5': 'adbd1dbd813edaf532b0a253780719c2',
          'info_dict': {
              'id': 'MSPO40010515-part2',
              'ext': 'flv',
              'title': 'Tour de Ski: Sprint fri teknikk, kvinner og menn 06.01.2015 (del 2:2)',
              'description': 'md5:238b67b97a4ac7d7b4bf0edf8cc57d26',
          },
-        'skip': 'Only works from Norway',
+        'params': {
+            'skip_download': True,
+        },
+        'expected_warnings': ['Video is geo restricted'],
+        'skip': 'particular part is not supported currently',
      }, {
          'url': 'https://tv.nrk.no/serie/tour-de-ski/MSPO40010515/06-01-2015',
          'playlist': [{
-            'md5': '9480285eff92d64f06e02a5367970a7a',
              'info_dict': {
-                'id': 'MSPO40010515-part1',
-                'ext': 'flv',
-                'title': 'Tour de Ski: Sprint fri teknikk, kvinner og menn 06.01.2015 (del 1:2)',
-                'description': 'md5:238b67b97a4ac7d7b4bf0edf8cc57d26',
+                'id': 'MSPO40010515AH',
+                'ext': 'mp4',
+                'title': 'Sprint fri teknikk, kvinner og menn 06.01.2015 (Part 1)',
+                'description': 'md5:c03aba1e917561eface5214020551b7a',
+                'duration': 772,
+                'series': 'Tour de Ski',
+                'episode': '06.01.2015',
+            },
+            'params': {
+                'skip_download': True,
              },
          }, {
-            'md5': 'adbd1dbd813edaf532b0a253780719c2',
              'info_dict': {
-                'id': 'MSPO40010515-part2',
-                'ext': 'flv',
-                'title': 'Tour de Ski: Sprint fri teknikk, kvinner og menn 06.01.2015 (del 2:2)',
-                'description': 'md5:238b67b97a4ac7d7b4bf0edf8cc57d26',
+                'id': 'MSPO40010515BH',
+                'ext': 'mp4',
+                'title': 'Sprint fri teknikk, kvinner og menn 06.01.2015 (Part 2)',
+                'description': 'md5:c03aba1e917561eface5214020551b7a',
+                'duration': 6175,
+                'series': 'Tour de Ski',
+                'episode': '06.01.2015',
+            },
+            'params': {
+                'skip_download': True,
              },
          }],
          'info_dict': {
              'id': 'MSPO40010515',
-            'title': 'Tour de Ski: Sprint fri teknikk, kvinner og menn',
-            'description': 'md5:238b67b97a4ac7d7b4bf0edf8cc57d26',
-            'duration': 6947.52,
+            'title': 'Sprint fri teknikk, kvinner og menn 06.01.2015',
+            'description': 'md5:c03aba1e917561eface5214020551b7a',
+        },
+        'expected_warnings': ['Video is geo restricted'],
+    }, {
+        'url': 'https://tv.nrk.no/serie/anno/KMTE50001317/sesong-3/episode-13',
+        'info_dict': {
+            'id': 'KMTE50001317AA',
+            'ext': 'mp4',
+            'title': 'Anno 13:30',
+            'description': 'md5:11d9613661a8dbe6f9bef54e3a4cbbfa',
+            'duration': 2340,
+            'series': 'Anno',
+            'episode': '13:30',
+            'season_number': 3,
+            'episode_number': 13,
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }, {
+        'url': 'https://tv.nrk.no/serie/nytt-paa-nytt/MUHH46000317/27-01-2017',
+        'info_dict': {
+            'id': 'MUHH46000317AA',
+            'ext': 'mp4',
+            'title': 'Nytt på Nytt 27.01.2017',
+            'description': 'md5:5358d6388fba0ea6f0b6d11c48b9eb4b',
+            'duration': 1796,
+            'series': 'Nytt på nytt',
+            'episode': '27.01.2017',
+        },
+        'params': {
+            'skip_download': True,
          },
-        'skip': 'Only works from Norway',
      }, {
          'url': 'https://radio.nrk.no/serie/dagsnytt/NPUB21019315/12-07-2015#',
          'only_matching': True,
      }]
  
  
-class NRKPlaylistIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?nrk\.no/(?!video|skole)(?:[^/]+/)+(?P<id>[^/]+)'
+class NRKTVDirekteIE(NRKTVIE):
+    IE_DESC = 'NRK TV Direkte and NRK Radio Direkte'
+    _VALID_URL = r'https?://(?:tv|radio)\.nrk\.no/direkte/(?P<id>[^/?#&]+)'
  
+    _TESTS = [{
+        'url': 'https://tv.nrk.no/direkte/nrk1',
+        'only_matching': True,
+    }, {
+        'url': 'https://radio.nrk.no/direkte/p1_oslo_akershus',
+        'only_matching': True,
+    }]
+
+
+class NRKPlaylistBaseIE(InfoExtractor):
+    def _extract_description(self, webpage):
+        pass
+
+    def _real_extract(self, url):
+        playlist_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, playlist_id)
+
+        entries = [
+            self.url_result('nrk:%s' % video_id, NRKIE.ie_key())
+            for video_id in re.findall(self._ITEM_RE, webpage)
+        ]
+
+        playlist_title = self. _extract_title(webpage)
+        playlist_description = self._extract_description(webpage)
+
+        return self.playlist_result(
+            entries, playlist_id, playlist_title, playlist_description)
+
+
+class NRKPlaylistIE(NRKPlaylistBaseIE):
+    _VALID_URL = r'https?://(?:www\.)?nrk\.no/(?!video|skole)(?:[^/]+/)+(?P<id>[^/]+)'
+    _ITEM_RE = r'class="[^"]*\brich\b[^"]*"[^>]+data-video-id="([^"]+)"'
      _TESTS = [{
          'url': 'http://www.nrk.no/troms/gjenopplev-den-historiske-solformorkelsen-1.12270763',
          'info_dict': {
@@ -276,23 +404,86 @@ class NRKPlaylistIE(InfoExtractor):
          'playlist_count': 5,
      }]
  
+    def _extract_title(self, webpage):
+        return self._og_search_title(webpage, fatal=False)
+
+    def _extract_description(self, webpage):
+        return self._og_search_description(webpage)
+
+
+class NRKTVEpisodesIE(NRKPlaylistBaseIE):
+    _VALID_URL = r'https?://tv\.nrk\.no/program/[Ee]pisodes/[^/]+/(?P<id>\d+)'
+    _ITEM_RE = r'data-episode=["\']%s' % NRKTVIE._EPISODE_RE
+    _TESTS = [{
+        'url': 'https://tv.nrk.no/program/episodes/nytt-paa-nytt/69031',
+        'info_dict': {
+            'id': '69031',
+            'title': 'Nytt på nytt, sesong: 201210',
+        },
+        'playlist_count': 4,
+    }]
+
+    def _extract_title(self, webpage):
+        return self._html_search_regex(
+            r'<h1>([^<]+)</h1>', webpage, 'title', fatal=False)
+
+
+class NRKTVSeriesIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:tv|radio)\.nrk(?:super)?\.no/serie/(?P<id>[^/]+)'
+    _ITEM_RE = r'(?:data-season=["\']|id=["\']season-)(?P<id>\d+)'
+    _TESTS = [{
+        'url': 'https://tv.nrk.no/serie/groenn-glede',
+        'info_dict': {
+            'id': 'groenn-glede',
+            'title': 'Grønn glede',
+            'description': 'md5:7576e92ae7f65da6993cf90ee29e4608',
+        },
+        'playlist_mincount': 9,
+    }, {
+        'url': 'http://tv.nrksuper.no/serie/labyrint',
+        'info_dict': {
+            'id': 'labyrint',
+            'title': 'Labyrint',
+            'description': 'md5:58afd450974c89e27d5a19212eee7115',
+        },
+        'playlist_mincount': 3,
+    }, {
+        'url': 'https://tv.nrk.no/serie/broedrene-dal-og-spektralsteinene',
+        'only_matching': True,
+    }, {
+        'url': 'https://tv.nrk.no/serie/saving-the-human-race',
+        'only_matching': True,
+    }, {
+        'url': 'https://tv.nrk.no/serie/postmann-pat',
+        'only_matching': True,
+    }]
+
+    @classmethod
+    def suitable(cls, url):
+        return False if NRKTVIE.suitable(url) else super(NRKTVSeriesIE, cls).suitable(url)
+
      def _real_extract(self, url):
-        playlist_id = self._match_id(url)
+        series_id = self._match_id(url)
  
-        webpage = self._download_webpage(url, playlist_id)
+        webpage = self._download_webpage(url, series_id)
  
          entries = [
-            self.url_result('nrk:%s' % video_id, 'NRK')
-            for video_id in re.findall(
-                r'class="[^"]*\brich\b[^"]*"[^>]+data-video-id="([^"]+)"',
-                webpage)
+            self.url_result(
+                'https://tv.nrk.no/program/Episodes/{series}/{season}'.format(
+                    series=series_id, season=season_id))
+            for season_id in re.findall(self._ITEM_RE, webpage)
          ]
  
-        playlist_title = self._og_search_title(webpage)
-        playlist_description = self._og_search_description(webpage)
+        title = self._html_search_meta(
+            'seriestitle', webpage,
+            'title', default=None) or self._og_search_title(
+            webpage, fatal=False)
  
-        return self.playlist_result(
-            entries, playlist_id, playlist_title, playlist_description)
+        description = self._html_search_meta(
+            'series_description', webpage,
+            'description', default=None) or self._og_search_description(webpage)
+
+        return self.playlist_result(entries, series_id, title, description)
  
  
  class NRKSkoleIE(InfoExtractor):
diff --git a/youtube_dl/extractor/ntvde.py b/youtube_dl/extractor/ntvde.py

index d28a8154247f75cbc612f7999083cd60275c5a88..101a5374ccd9c780d6f34e8ff82ef67148e8a0cb 100644 (file)
--- a/youtube_dl/extractor/ntvde.py
+++ b/youtube_dl/extractor/ntvde.py
@@ -22,7 +22,7 @@ class NTVDeIE(InfoExtractor):
          'info_dict': {
              'id': '14438086',
              'ext': 'mp4',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'title': 'Schnee und Glätte führen zu zahlreichen Unfällen und Staus',
              'alt_title': 'Winterchaos auf deutschen Straßen',
              'description': 'Schnee und Glätte sorgen deutschlandweit für einen chaotischen Start in die Woche: Auf den Straßen kommt es zu kilometerlangen Staus und Dutzenden Glätteunfällen. In Düsseldorf und München wirbelt der Schnee zudem den Flugplan durcheinander. Dutzende Flüge landen zu spät, einige fallen ganz aus.',
diff --git a/youtube_dl/extractor/ntvru.py b/youtube_dl/extractor/ntvru.py

index 7d7a785ab10e7b71ceb4729a012ebb574c7752d5..4f9cedb84a47a8481b2c4058c5a59483b9a613bd 100644 (file)
--- a/youtube_dl/extractor/ntvru.py
+++ b/youtube_dl/extractor/ntvru.py
@@ -21,7 +21,7 @@ class NTVRuIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Командующий Черноморским флотом провел переговоры в штабе ВМС Украины',
              'description': 'Командующий Черноморским флотом провел переговоры в штабе ВМС Украины',
-            'thumbnail': 're:^http://.*\.jpg',
+            'thumbnail': r're:^http://.*\.jpg',
              'duration': 136,
          },
      }, {
@@ -32,7 +32,7 @@ class NTVRuIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Родные пассажиров пропавшего Boeing не верят в трагический исход',
              'description': 'Родные пассажиров пропавшего Boeing не верят в трагический исход',
-            'thumbnail': 're:^http://.*\.jpg',
+            'thumbnail': r're:^http://.*\.jpg',
              'duration': 172,
          },
      }, {
@@ -43,7 +43,7 @@ class NTVRuIE(InfoExtractor):
              'ext': 'mp4',
              'title': '«Сегодня». 21 марта 2014 года. 16:00',
              'description': '«Сегодня». 21 марта 2014 года. 16:00',
-            'thumbnail': 're:^http://.*\.jpg',
+            'thumbnail': r're:^http://.*\.jpg',
              'duration': 1496,
          },
      }, {
@@ -54,7 +54,7 @@ class NTVRuIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Остросюжетный фильм «Кома»',
              'description': 'Остросюжетный фильм «Кома»',
-            'thumbnail': 're:^http://.*\.jpg',
+            'thumbnail': r're:^http://.*\.jpg',
              'duration': 5592,
          },
      }, {
@@ -65,7 +65,7 @@ class NTVRuIE(InfoExtractor):
              'ext': 'mp4',
              'title': '«Дело врачей»: «Деревце жизни»',
              'description': '«Дело врачей»: «Деревце жизни»',
-            'thumbnail': 're:^http://.*\.jpg',
+            'thumbnail': r're:^http://.*\.jpg',
              'duration': 2590,
          },
      }]
diff --git a/youtube_dl/extractor/oktoberfesttv.py b/youtube_dl/extractor/oktoberfesttv.py

index 50fbbc79c12761449adc70e74a58f0442f5b9cfa..a914068f958943eddb79a992413d86bad1c343fa 100644 (file)
--- a/youtube_dl/extractor/oktoberfesttv.py
+++ b/youtube_dl/extractor/oktoberfesttv.py
@@ -13,7 +13,7 @@ class OktoberfestTVIE(InfoExtractor):
              'id': 'hb-zelt',
              'ext': 'mp4',
              'title': 're:^Live-Kamera: Hofbräuzelt [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'is_live': True,
          },
          'params': {
diff --git a/youtube_dl/extractor/ondemandkorea.py b/youtube_dl/extractor/ondemandkorea.py

new file mode 100644 (file)

index 0000000..de1d6b0
--- /dev/null
+++ b/youtube_dl/extractor/ondemandkorea.py
@@ -0,0 +1,60 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .jwplatform import JWPlatformBaseIE
+from ..utils import (
+    ExtractorError,
+    js_to_json,
+)
+
+
+class OnDemandKoreaIE(JWPlatformBaseIE):
+    _VALID_URL = r'https?://(?:www\.)?ondemandkorea\.com/(?P<id>[^/]+)\.html'
+    _TEST = {
+        'url': 'http://www.ondemandkorea.com/ask-us-anything-e43.html',
+        'info_dict': {
+            'id': 'ask-us-anything-e43',
+            'ext': 'mp4',
+            'title': 'Ask Us Anything : E43',
+            'thumbnail': r're:^https?://.*\.jpg$',
+        },
+        'params': {
+            'skip_download': 'm3u8 download'
+        }
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        webpage = self._download_webpage(url, video_id, fatal=False)
+
+        if not webpage:
+            # Page sometimes returns captcha page with HTTP 403
+            raise ExtractorError(
+                'Unable to access page. You may have been blocked.',
+                expected=True)
+
+        if 'msg_block_01.png' in webpage:
+            self.raise_geo_restricted(
+                'This content is not available in your region')
+
+        if 'This video is only available to ODK PLUS members.' in webpage:
+            raise ExtractorError(
+                'This video is only available to ODK PLUS members.',
+                expected=True)
+
+        title = self._og_search_title(webpage)
+
+        jw_config = self._parse_json(
+            self._search_regex(
+                r'(?s)jwplayer\(([\'"])(?:(?!\1).)+\1\)\.setup\s*\((?P<options>.+?)\);',
+                webpage, 'jw config', group='options'),
+            video_id, transform_source=js_to_json)
+        info = self._parse_jwplayer_data(
+            jw_config, video_id, require_title=False, m3u8_id='hls',
+            base_url=url)
+
+        info.update({
+            'title': title,
+            'thumbnail': self._og_search_thumbnail(webpage),
+        })
+        return info
diff --git a/youtube_dl/extractor/onionstudios.py b/youtube_dl/extractor/onionstudios.py

index 6fb1a3fcc0bd565677b232adcb883b3649715dde..1d336cf3069d8aae29eeb4e90a7c3f20241cab2e 100644 (file)
--- a/youtube_dl/extractor/onionstudios.py
+++ b/youtube_dl/extractor/onionstudios.py
@@ -22,7 +22,7 @@ class OnionStudiosIE(InfoExtractor):
              'id': '2937',
              'ext': 'mp4',
              'title': 'Hannibal charges forward, stops for a cocktail',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'uploader': 'The A.V. Club',
              'uploader_id': 'the-av-club',
          },
diff --git a/youtube_dl/extractor/ooyala.py b/youtube_dl/extractor/ooyala.py

index c2807d0f61b2ab5134944bd0c79b2030df80d3a1..84be2b1e3fe9fec47e915c13b14a8f23c0383232 100644 (file)
--- a/youtube_dl/extractor/ooyala.py
+++ b/youtube_dl/extractor/ooyala.py
@@ -18,7 +18,7 @@ class OoyalaBaseIE(InfoExtractor):
      _CONTENT_TREE_BASE = _PLAYER_BASE + 'player_api/v1/content_tree/'
      _AUTHORIZATION_URL_TEMPLATE = _PLAYER_BASE + 'sas/player_api/v2/authorization/embed_code/%s/%s?'
  
-    def _extract(self, content_tree_url, video_id, domain='example.org', supportedformats=None):
+    def _extract(self, content_tree_url, video_id, domain='example.org', supportedformats=None, embed_token=None):
          content_tree = self._download_json(content_tree_url, video_id)['content_tree']
          metadata = content_tree[list(content_tree)[0]]
          embed_code = metadata['embed_code']
@@ -29,7 +29,8 @@ def _extract(self, content_tree_url, video_id, domain='example.org', supportedfo
              self._AUTHORIZATION_URL_TEMPLATE % (pcode, embed_code) +
              compat_urllib_parse_urlencode({
                  'domain': domain,
-                'supportedFormats': supportedformats or 'mp4,rtmp,m3u8,hds',
+                'supportedFormats': supportedformats or 'mp4,rtmp,m3u8,hds,dash,smooth',
+                'embedToken': embed_token,
              }), video_id)
  
          cur_auth_data = auth_data['authorization_data'][embed_code]
@@ -52,6 +53,12 @@ def _extract(self, content_tree_url, video_id, domain='example.org', supportedfo
                  elif delivery_type == 'hds' or ext == 'f4m':
                      formats.extend(self._extract_f4m_formats(
                          s_url + '?hdcore=3.7.0', embed_code, f4m_id='hds', fatal=False))
+                elif delivery_type == 'dash' or ext == 'mpd':
+                    formats.extend(self._extract_mpd_formats(
+                        s_url, embed_code, mpd_id='dash', fatal=False))
+                elif delivery_type == 'smooth':
+                    self._extract_ism_formats(
+                        s_url, embed_code, ism_id='mss', fatal=False)
                  elif ext == 'smil':
                      formats.extend(self._extract_smil_formats(
                          s_url, embed_code, fatal=False))
@@ -146,8 +153,9 @@ def _real_extract(self, url):
          embed_code = self._match_id(url)
          domain = smuggled_data.get('domain')
          supportedformats = smuggled_data.get('supportedformats')
+        embed_token = smuggled_data.get('embed_token')
          content_tree_url = self._CONTENT_TREE_BASE + 'embed_code/%s/%s' % (embed_code, embed_code)
-        return self._extract(content_tree_url, embed_code, domain, supportedformats)
+        return self._extract(content_tree_url, embed_code, domain, supportedformats, embed_token)
  
  
  class OoyalaExternalIE(OoyalaBaseIE):
diff --git a/youtube_dl/extractor/openload.py b/youtube_dl/extractor/openload.py

index d3d4101de8a84edda5ec641d299ed7e3813a2910..32289d8976dcf602839546d179c06ef224a79b20 100644 (file)
--- a/youtube_dl/extractor/openload.py
+++ b/youtube_dl/extractor/openload.py
@@ -1,11 +1,10 @@
  # coding: utf-8
-from __future__ import unicode_literals, division
+from __future__ import unicode_literals
+
+import re
  
  from .common import InfoExtractor
-from ..compat import (
-    compat_chr,
-    compat_ord,
-)
+from ..compat import compat_chr
  from ..utils import (
      determine_ext,
      ExtractorError,
@@ -13,7 +12,7 @@
  
  
  class OpenloadIE(InfoExtractor):
-    _VALID_URL = r'https?://openload\.(?:co|io)/(?:f|embed)/(?P<id>[a-zA-Z0-9-_]+)'
+    _VALID_URL = r'https?://(?:openload\.(?:co|io)|oload\.tv)/(?:f|embed)/(?P<id>[a-zA-Z0-9-_]+)'
  
      _TESTS = [{
          'url': 'https://openload.co/f/kUEfGclsU9o',
@@ -22,7 +21,7 @@ class OpenloadIE(InfoExtractor):
              'id': 'kUEfGclsU9o',
              'ext': 'mp4',
              'title': 'skyrim_no-audio_1080.mp4',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          },
      }, {
          'url': 'https://openload.co/embed/rjC09fkPLYs',
@@ -30,7 +29,7 @@ class OpenloadIE(InfoExtractor):
              'id': 'rjC09fkPLYs',
              'ext': 'mp4',
              'title': 'movie.mp4',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'subtitles': {
                  'en': [{
                      'ext': 'vtt',
@@ -54,8 +53,17 @@ class OpenloadIE(InfoExtractor):
          # for title and ext
          'url': 'https://openload.co/embed/Sxz5sADo82g/',
          'only_matching': True,
+    }, {
+        'url': 'https://oload.tv/embed/KnG-kKZdcfY/',
+        'only_matching': True,
      }]
  
+    @staticmethod
+    def _extract_urls(webpage):
+        return re.findall(
+            r'<iframe[^>]+src=["\']((?:https?://)?(?:openload\.(?:co|io)|oload\.tv)/embed/[a-zA-Z0-9-_]+)',
+            webpage)
+
      def _real_extract(self, url):
          video_id = self._match_id(url)
          webpage = self._download_webpage('https://openload.co/embed/%s/' % video_id, video_id)
@@ -63,29 +71,21 @@ def _real_extract(self, url):
          if 'File not found' in webpage or 'deleted by the owner' in webpage:
              raise ExtractorError('File not found', expected=True)
  
-        # The following decryption algorithm is written by @yokrysty and
-        # declared to be freely used in youtube-dl
-        # See https://github.com/rg3/youtube-dl/issues/10408
-        enc_data = self._html_search_regex(
-            r'<span[^>]*>([^<]+)</span>\s*<span[^>]*>[^<]+</span>\s*<span[^>]+id="streamurl"',
-            webpage, 'encrypted data')
+        ol_id = self._search_regex(
+            '<span[^>]+id="[^"]+"[^>]*>([0-9]+)</span>',
+            webpage, 'openload ID')
  
-        magic = compat_ord(enc_data[-1])
-        video_url_chars = []
+        first_three_chars = int(float(ol_id[0:][:3]))
+        fifth_char = int(float(ol_id[3:5]))
+        urlcode = ''
+        num = 5
  
-        for idx, c in enumerate(enc_data):
-            j = compat_ord(c)
-            if j == magic:
-                j -= 1
-            elif j == magic - 1:
-                j += 1
-            if j >= 33 and j <= 126:
-                j = ((j + 14) % 94) + 33
-            if idx == len(enc_data) - 1:
-                j += 3
-            video_url_chars += compat_chr(j)
+        while num < len(ol_id):
+            urlcode += compat_chr(int(float(ol_id[num:][:3])) +
+                                  first_three_chars - fifth_char * int(float(ol_id[num + 3:][:2])))
+            num += 5
  
-        video_url = 'https://openload.co/stream/%s?mime=true' % ''.join(video_url_chars)
+        video_url = 'https://openload.co/stream/' + urlcode
  
          title = self._og_search_title(webpage, default=None) or self._search_regex(
              r'<span[^>]+class=["\']title["\'][^>]*>([^<]+)', webpage,
@@ -101,8 +101,7 @@ def _real_extract(self, url):
              'thumbnail': self._og_search_thumbnail(webpage, default=None),
              'url': video_url,
              # Seems all videos have extensions in their titles
-            'ext': determine_ext(title),
+            'ext': determine_ext(title, 'mp4'),
              'subtitles': subtitles,
          }
-
          return info_dict
diff --git a/youtube_dl/extractor/orf.py b/youtube_dl/extractor/orf.py

index b4cce7ea9334c7bbaf9e617932189504dcd25121..1e2c54e68c3eb16b0ee8e9afb7b50b07ae429207 100644 (file)
--- a/youtube_dl/extractor/orf.py
+++ b/youtube_dl/extractor/orf.py
@@ -247,7 +247,7 @@ class ORFIPTVIE(InfoExtractor):
              'title': 'Weitere Evakuierungen um Vulkan Calbuco',
              'description': 'md5:d689c959bdbcf04efeddedbf2299d633',
              'duration': 68.197,
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'upload_date': '20150425',
          },
      }
diff --git a/youtube_dl/extractor/pandoratv.py b/youtube_dl/extractor/pandoratv.py

index 2b07958bb1f5815a162dadadff4f450f7ea0e97d..89c95fffb6e6ca7279d927424c19be9325e2ac4d 100644 (file)
--- a/youtube_dl/extractor/pandoratv.py
+++ b/youtube_dl/extractor/pandoratv.py
@@ -11,6 +11,7 @@
      float_or_none,
      parse_duration,
      str_to_int,
+    urlencode_postdata,
  )
  
  
@@ -25,7 +26,7 @@ class PandoraTVIE(InfoExtractor):
              'ext': 'flv',
              'title': '頭を撫でてくれる？',
              'description': '頭を撫でてくれる？',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 39,
              'upload_date': '20151218',
              'uploader': 'カワイイ動物まとめ',
@@ -56,6 +57,22 @@ def _real_extract(self, url):
                  r'^v(\d+)[Uu]rl$', format_id, 'height', default=None)
              if not height:
                  continue
+
+            play_url = self._download_json(
+                'http://m.pandora.tv/?c=api&m=play_url', video_id,
+                data=urlencode_postdata({
+                    'prgid': video_id,
+                    'runtime': info.get('runtime'),
+                    'vod_url': format_url,
+                }),
+                headers={
+                    'Origin': url,
+                    'Content-Type': 'application/x-www-form-urlencoded',
+                })
+            format_url = play_url.get('url')
+            if not format_url:
+                continue
+
              formats.append({
                  'format_id': '%sp' % height,
                  'url': format_url,
diff --git a/youtube_dl/extractor/pbs.py b/youtube_dl/extractor/pbs.py

index b490ef74c5fb768751d4598ff88e70a13d41c060..6baed773fc6bf741a69f1baf222148065ef169c4 100644 (file)
--- a/youtube_dl/extractor/pbs.py
+++ b/youtube_dl/extractor/pbs.py
@@ -236,7 +236,7 @@ class PBSIE(InfoExtractor):
                  'title': 'Great Performances - Dudamel Conducts Verdi Requiem at the Hollywood Bowl - Full',
                  'description': 'md5:657897370e09e2bc6bf0f8d2cd313c6b',
                  'duration': 6559,
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
              },
          },
          {
@@ -249,7 +249,7 @@ class PBSIE(InfoExtractor):
                  'description': 'md5:c741d14e979fc53228c575894094f157',
                  'title': 'NOVA - Killer Typhoon',
                  'duration': 3172,
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
                  'upload_date': '20140122',
                  'age_limit': 10,
              },
@@ -270,7 +270,7 @@ class PBSIE(InfoExtractor):
                  'title': 'American Experience - Death and the Civil War, Chapter 1',
                  'description': 'md5:67fa89a9402e2ee7d08f53b920674c18',
                  'duration': 682,
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
              },
              'params': {
                  'skip_download': True,  # requires ffmpeg
@@ -286,7 +286,7 @@ class PBSIE(InfoExtractor):
                  'title': 'FRONTLINE - United States of Secrets (Part One)',
                  'description': 'md5:55756bd5c551519cc4b7703e373e217e',
                  'duration': 6851,
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
              },
          },
          {
@@ -302,7 +302,7 @@ class PBSIE(InfoExtractor):
                  'title': "A Chef's Life - Season 3, Ep. 5: Prickly Business",
                  'description': 'md5:c0ff7475a4b70261c7e58f493c2792a5',
                  'duration': 1480,
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
              },
          },
          {
@@ -315,7 +315,7 @@ class PBSIE(InfoExtractor):
                  'title': 'FRONTLINE - The Atomic Artists',
                  'description': 'md5:f677e4520cfacb4a5ce1471e31b57800',
                  'duration': 723,
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
              },
              'params': {
                  'skip_download': True,  # requires ffmpeg
@@ -330,7 +330,7 @@ class PBSIE(InfoExtractor):
                  'ext': 'mp4',
                  'title': 'FRONTLINE - Netanyahu at War',
                  'duration': 6852,
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
                  'formats': 'mincount:8',
              },
          },
@@ -350,6 +350,15 @@ class PBSIE(InfoExtractor):
          410: 'This video has expired and is no longer available for online streaming.',
      }
  
+    def _real_initialize(self):
+        cookie = (self._download_json(
+            'http://localization.services.pbs.org/localize/auto/cookie/',
+            None, headers=self.geo_verification_headers(), fatal=False) or {}).get('cookie')
+        if cookie:
+            station = self._search_regex(r'#?s=\["([^"]+)"', cookie, 'station')
+            if station:
+                self._set_cookie('.pbs.org', 'pbsol.station', station)
+
      def _extract_webpage(self, url):
          mobj = re.match(self._VALID_URL, url)
  
@@ -476,7 +485,8 @@ def extract_redirect_urls(info):
  
              redirect_info = self._download_json(
                  '%s?format=json' % redirect['url'], display_id,
-                'Downloading %s video url info' % (redirect_id or num))
+                'Downloading %s video url info' % (redirect_id or num),
+                headers=self.geo_verification_headers())
  
              if redirect_info['status'] == 'error':
                  raise ExtractorError(
@@ -558,7 +568,7 @@ def extract_redirect_urls(info):
          # Try turning it to 'program - title' naming scheme if possible
          alt_title = info.get('program', {}).get('title')
          if alt_title:
-            info['title'] = alt_title + ' - ' + re.sub(r'^' + alt_title + '[\s\-:]+', '', info['title'])
+            info['title'] = alt_title + ' - ' + re.sub(r'^' + alt_title + r'[\s\-:]+', '', info['title'])
  
          description = info.get('description') or info.get(
              'program', {}).get('description') or description
diff --git a/youtube_dl/extractor/people.py b/youtube_dl/extractor/people.py

index 9ecdbc13b7535765222b422c815e0dc78f2f69b9..6ca95715eec0a2ac74bd5529d65f9a1e6347211f 100644 (file)
--- a/youtube_dl/extractor/people.py
+++ b/youtube_dl/extractor/people.py
@@ -14,7 +14,7 @@ class PeopleIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Astronaut Love Triangle Victim Speaks Out: “The Crime in 2007 Hasn’t Defined Us”',
              'description': 'Colleen Shipman speaks to PEOPLE for the first time about life after the attack',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'duration': 246.318,
              'timestamp': 1458720585,
              'upload_date': '20160323',
diff --git a/youtube_dl/extractor/phoenix.py b/youtube_dl/extractor/phoenix.py

index ac009f60f7785ea4efaaa7b0c867c10a998e877e..e435c28e171b9a25d3cc839b7b67a6c0d27e2272 100644 (file)
--- a/youtube_dl/extractor/phoenix.py
+++ b/youtube_dl/extractor/phoenix.py
@@ -1,9 +1,9 @@
  from __future__ import unicode_literals
  
-from .zdf import ZDFIE
+from .dreisat import DreiSatIE
  
  
-class PhoenixIE(ZDFIE):
+class PhoenixIE(DreiSatIE):
      IE_NAME = 'phoenix.de'
      _VALID_URL = r'''(?x)https?://(?:www\.)?phoenix\.de/content/
          (?:
diff --git a/youtube_dl/extractor/piksel.py b/youtube_dl/extractor/piksel.py

new file mode 100644 (file)

index 0000000..d44edcd
--- /dev/null
+++ b/youtube_dl/extractor/piksel.py
@@ -0,0 +1,106 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..compat import compat_str
+from ..utils import (
+    ExtractorError,
+    dict_get,
+    int_or_none,
+    unescapeHTML,
+    parse_iso8601,
+)
+
+
+class PikselIE(InfoExtractor):
+    _VALID_URL = r'https?://player\.piksel\.com/v/(?P<id>[a-z0-9]+)'
+    _TEST = {
+        'url': 'http://player.piksel.com/v/nv60p12f',
+        'md5': 'd9c17bbe9c3386344f9cfd32fad8d235',
+        'info_dict': {
+            'id': 'nv60p12f',
+            'ext': 'mp4',
+            'title': 'فن الحياة  - الحلقة 1',
+            'description': 'احدث برامج الداعية الاسلامي " مصطفي حسني " فى رمضان 2016علي النهار نور',
+            'timestamp': 1465231790,
+            'upload_date': '20160606',
+        }
+    }
+
+    @staticmethod
+    def _extract_url(webpage):
+        mobj = re.search(
+            r'<iframe[^>]+src=["\'](?P<url>(?:https?:)?//player\.piksel\.com/v/[a-z0-9]+)',
+            webpage)
+        if mobj:
+            return mobj.group('url')
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        webpage = self._download_webpage(url, video_id)
+        app_token = self._search_regex(
+            r'clientAPI\s*:\s*"([^"]+)"', webpage, 'app token')
+        response = self._download_json(
+            'http://player.piksel.com/ws/ws_program/api/%s/mode/json/apiv/5' % app_token,
+            video_id, query={
+                'v': video_id
+            })['response']
+        failure = response.get('failure')
+        if failure:
+            raise ExtractorError(response['failure']['reason'], expected=True)
+        video_data = response['WsProgramResponse']['program']['asset']
+        title = video_data['title']
+
+        formats = []
+
+        m3u8_url = dict_get(video_data, [
+            'm3u8iPadURL',
+            'ipadM3u8Url',
+            'm3u8AndroidURL',
+            'm3u8iPhoneURL',
+            'iphoneM3u8Url'])
+        if m3u8_url:
+            formats.extend(self._extract_m3u8_formats(
+                m3u8_url, video_id, 'mp4', 'm3u8_native',
+                m3u8_id='hls', fatal=False))
+
+        asset_type = dict_get(video_data, ['assetType', 'asset_type'])
+        for asset_file in video_data.get('assetFiles', []):
+            # TODO: extract rtmp formats
+            http_url = asset_file.get('http_url')
+            if not http_url:
+                continue
+            tbr = None
+            vbr = int_or_none(asset_file.get('videoBitrate'), 1024)
+            abr = int_or_none(asset_file.get('audioBitrate'), 1024)
+            if asset_type == 'video':
+                tbr = vbr + abr
+            elif asset_type == 'audio':
+                tbr = abr
+
+            format_id = ['http']
+            if tbr:
+                format_id.append(compat_str(tbr))
+
+            formats.append({
+                'format_id': '-'.join(format_id),
+                'url': unescapeHTML(http_url),
+                'vbr': vbr,
+                'abr': abr,
+                'width': int_or_none(asset_file.get('videoWidth')),
+                'height': int_or_none(asset_file.get('videoHeight')),
+                'filesize': int_or_none(asset_file.get('filesize')),
+                'tbr': tbr,
+            })
+        self._sort_formats(formats)
+
+        return {
+            'id': video_id,
+            'title': title,
+            'description': video_data.get('description'),
+            'thumbnail': video_data.get('thumbnailUrl'),
+            'timestamp': parse_iso8601(video_data.get('dateadd')),
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/pinkbike.py b/youtube_dl/extractor/pinkbike.py

index a52210fabf538bff34cc39029f07778755873b65..6a4580d54c8733166316c80a2361499e85eb9baf 100644 (file)
--- a/youtube_dl/extractor/pinkbike.py
+++ b/youtube_dl/extractor/pinkbike.py
@@ -23,7 +23,7 @@ class PinkbikeIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Brandon Semenuk - RAW 100',
              'description': 'Official release: www.redbull.ca/rupertwalker',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 100,
              'upload_date': '20150406',
              'uploader': 'revelco',
diff --git a/youtube_dl/extractor/pladform.py b/youtube_dl/extractor/pladform.py

index 77e1211d6095cf17464ce09a27b756157b4931e9..e38c7618e4d29177721f21a36479b7cbd3d0cf28 100644 (file)
--- a/youtube_dl/extractor/pladform.py
+++ b/youtube_dl/extractor/pladform.py
@@ -34,7 +34,7 @@ class PladformIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Тайны перевала Дятлова • 1 серия 2 часть',
              'description': 'Документальный сериал-расследование одной из самых жутких тайн ХХ века',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 694,
              'age_limit': 0,
          },
diff --git a/youtube_dl/extractor/playtvak.py b/youtube_dl/extractor/playtvak.py

index 1e8096a259ad5568d87b96bd566f646ae641862f..391e1bd09ca5677d196c0f67a86c0cb1421b2158 100644 (file)
--- a/youtube_dl/extractor/playtvak.py
+++ b/youtube_dl/extractor/playtvak.py
@@ -25,7 +25,7 @@ class PlaytvakIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Vyžeňte vosy a sršně ze zahrady',
              'description': 'md5:f93d398691044d303bc4a3de62f3e976',
-            'thumbnail': 're:(?i)^https?://.*\.(?:jpg|png)$',
+            'thumbnail': r're:(?i)^https?://.*\.(?:jpg|png)$',
              'duration': 279,
              'timestamp': 1438732860,
              'upload_date': '20150805',
@@ -38,7 +38,7 @@ class PlaytvakIE(InfoExtractor):
              'ext': 'flv',
              'title': 're:^Přímý přenos iDNES.cz [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
              'description': 'Sledujte provoz na ranveji Letiště Václava Havla v Praze',
-            'thumbnail': 're:(?i)^https?://.*\.(?:jpg|png)$',
+            'thumbnail': r're:(?i)^https?://.*\.(?:jpg|png)$',
              'is_live': True,
          },
          'params': {
@@ -52,7 +52,7 @@ class PlaytvakIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Zavřeli jsme mraženou pizzu do auta. Upekla se',
              'description': 'md5:01e73f02329e2e5760bd5eed4d42e3c2',
-            'thumbnail': 're:(?i)^https?://.*\.(?:jpg|png)$',
+            'thumbnail': r're:(?i)^https?://.*\.(?:jpg|png)$',
              'duration': 39,
              'timestamp': 1438969140,
              'upload_date': '20150807',
@@ -66,7 +66,7 @@ class PlaytvakIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Táhni! Demonstrace proti imigrantům budila emoce',
              'description': 'md5:97c81d589a9491fbfa323c9fa3cca72c',
-            'thumbnail': 're:(?i)^https?://.*\.(?:jpg|png)$',
+            'thumbnail': r're:(?i)^https?://.*\.(?:jpg|png)$',
              'timestamp': 1439052180,
              'upload_date': '20150808',
              'is_live': False,
@@ -79,7 +79,7 @@ class PlaytvakIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Recesisté udělali z billboardu kolotoč',
              'description': 'md5:7369926049588c3989a66c9c1a043c4c',
-            'thumbnail': 're:(?i)^https?://.*\.(?:jpg|png)$',
+            'thumbnail': r're:(?i)^https?://.*\.(?:jpg|png)$',
              'timestamp': 1415725500,
              'upload_date': '20141111',
              'is_live': False,
diff --git a/youtube_dl/extractor/playvid.py b/youtube_dl/extractor/playvid.py

index 79c2db08541e93d1d377c53c3e8adc415f4302e2..4aef186ea22b4dab1be50a0bdd6dbcbbcae1e2b1 100644 (file)
--- a/youtube_dl/extractor/playvid.py
+++ b/youtube_dl/extractor/playvid.py
@@ -34,7 +34,7 @@ class PlayvidIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Ellen Euro Cutie Blond Takes a Sexy Survey Get Facial in The Park',
              'age_limit': 18,
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          },
      }]
  
diff --git a/youtube_dl/extractor/playwire.py b/youtube_dl/extractor/playwire.py

index 0bc7431189a0eed819fb85a6fbbdc1558a4b84ed..4d96a10a7156225140eee7b1332a282316efb928 100644 (file)
--- a/youtube_dl/extractor/playwire.py
+++ b/youtube_dl/extractor/playwire.py
@@ -18,7 +18,7 @@ class PlaywireIE(InfoExtractor):
              'id': '3353705',
              'ext': 'mp4',
              'title': 'S04_RM_UCL_Rus',
-            'thumbnail': 're:^https?://.*\.png$',
+            'thumbnail': r're:^https?://.*\.png$',
              'duration': 145.94,
          },
      }, {
diff --git a/youtube_dl/extractor/pluralsight.py b/youtube_dl/extractor/pluralsight.py

index 0ffd41ecd3b73bdaaba3b27cd1638cdf0383103e..5c798e874837ff1704650fa991d5b07cde8ab210 100644 (file)
--- a/youtube_dl/extractor/pluralsight.py
+++ b/youtube_dl/extractor/pluralsight.py
@@ -157,13 +157,10 @@ def _real_extract(self, url):
  
          display_id = '%s-%s' % (name, clip_id)
  
-        parsed_url = compat_urlparse.urlparse(url)
-
-        payload_url = compat_urlparse.urlunparse(parsed_url._replace(
-            netloc='app.pluralsight.com', path='player/api/v1/payload'))
-
          course = self._download_json(
-            payload_url, display_id, headers={'Referer': url})['payload']['course']
+            'https://app.pluralsight.com/player/user/api/v1/player/payload',
+            display_id, data=urlencode_postdata({'courseId': course_name}),
+            headers={'Referer': url})
  
          collection = course['modules']
  
diff --git a/youtube_dl/extractor/polskieradio.py b/youtube_dl/extractor/polskieradio.py

index 5ff173774a410bf0eba85069f6f1ed33cd583e7b..2ac1fcb0bc90f500696ca0dc29db9f4c911f0d27 100644 (file)
--- a/youtube_dl/extractor/polskieradio.py
+++ b/youtube_dl/extractor/polskieradio.py
@@ -36,7 +36,7 @@ class PolskieRadioIE(InfoExtractor):
                  'timestamp': 1456594200,
                  'upload_date': '20160227',
                  'duration': 2364,
-                'thumbnail': 're:^https?://static\.prsa\.pl/images/.*\.jpg$'
+                'thumbnail': r're:^https?://static\.prsa\.pl/images/.*\.jpg$'
              },
          }],
      }, {
diff --git a/youtube_dl/extractor/porncom.py b/youtube_dl/extractor/porncom.py

index d85e0294df62d7540304f2a8e87c4f989fcc2e07..8218c7d3bf7ddc8cc7de74f2fc5d2d838cecc982 100644 (file)
--- a/youtube_dl/extractor/porncom.py
+++ b/youtube_dl/extractor/porncom.py
@@ -22,7 +22,7 @@ class PornComIE(InfoExtractor):
              'display_id': 'teen-grabs-a-dildo-and-fucks-her-pussy-live-on-1hottie-i-rec',
              'ext': 'mp4',
              'title': 'Teen grabs a dildo and fucks her pussy live on 1hottie, I rec',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 551,
              'view_count': int,
              'age_limit': 18,
diff --git a/youtube_dl/extractor/pornflip.py b/youtube_dl/extractor/pornflip.py

new file mode 100644 (file)

index 0000000..a4a5d39
--- /dev/null
+++ b/youtube_dl/extractor/pornflip.py
@@ -0,0 +1,92 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..compat import (
+    compat_parse_qs,
+    compat_str,
+)
+from ..utils import (
+    int_or_none,
+    try_get,
+    unified_timestamp,
+)
+
+
+class PornFlipIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?pornflip\.com/(?:v|embed)/(?P<id>[0-9A-Za-z]{11})'
+    _TESTS = [{
+        'url': 'https://www.pornflip.com/v/wz7DfNhMmep',
+        'md5': '98c46639849145ae1fd77af532a9278c',
+        'info_dict': {
+            'id': 'wz7DfNhMmep',
+            'ext': 'mp4',
+            'title': '2 Amateurs swallow make his dream cumshots true',
+            'thumbnail': r're:^https?://.*\.jpg$',
+            'duration': 112,
+            'timestamp': 1481655502,
+            'upload_date': '20161213',
+            'uploader_id': '106786',
+            'uploader': 'figifoto',
+            'view_count': int,
+            'age_limit': 18,
+        }
+    }, {
+        'url': 'https://www.pornflip.com/embed/wz7DfNhMmep',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        webpage = self._download_webpage(
+            'https://www.pornflip.com/v/%s' % video_id, video_id)
+
+        flashvars = compat_parse_qs(self._search_regex(
+            r'<embed[^>]+flashvars=(["\'])(?P<flashvars>(?:(?!\1).)+)\1',
+            webpage, 'flashvars', group='flashvars'))
+
+        title = flashvars['video_vars[title]'][0]
+
+        def flashvar(kind):
+            return try_get(
+                flashvars, lambda x: x['video_vars[%s]' % kind][0], compat_str)
+
+        formats = []
+        for key, value in flashvars.items():
+            if not (value and isinstance(value, list)):
+                continue
+            format_url = value[0]
+            if key == 'video_vars[hds_manifest]':
+                formats.extend(self._extract_mpd_formats(
+                    format_url, video_id, mpd_id='dash', fatal=False))
+                continue
+            height = self._search_regex(
+                r'video_vars\[video_urls\]\[(\d+)', key, 'height', default=None)
+            if not height:
+                continue
+            formats.append({
+                'url': format_url,
+                'format_id': 'http-%s' % height,
+                'height': int_or_none(height),
+            })
+        self._sort_formats(formats)
+
+        uploader = self._html_search_regex(
+            (r'<span[^>]+class="name"[^>]*>\s*<a[^>]+>\s*<strong>(?P<uploader>[^<]+)',
+             r'<meta[^>]+content=(["\'])[^>]*\buploaded by (?P<uploader>.+?)\1'),
+            webpage, 'uploader', fatal=False, group='uploader')
+
+        return {
+            'id': video_id,
+            'formats': formats,
+            'title': title,
+            'thumbnail': flashvar('big_thumb'),
+            'duration': int_or_none(flashvar('duration')),
+            'timestamp': unified_timestamp(self._html_search_meta(
+                'uploadDate', webpage, 'timestamp')),
+            'uploader_id': flashvar('author_id'),
+            'uploader': uploader,
+            'view_count': int_or_none(flashvar('views')),
+            'age_limit': 18,
+        }
diff --git a/youtube_dl/extractor/pornhd.py b/youtube_dl/extractor/pornhd.py

index 8df12eec0d44c371d99b536b55694cfd2211f9d0..842317e6c9cc2312064fae4e61e5703352aa0096 100644 (file)
--- a/youtube_dl/extractor/pornhd.py
+++ b/youtube_dl/extractor/pornhd.py
@@ -21,7 +21,7 @@ class PornHdIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Restroom selfie masturbation',
              'description': 'md5:3748420395e03e31ac96857a8f125b2b',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'view_count': int,
              'age_limit': 18,
          }
@@ -35,7 +35,7 @@ class PornHdIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Sierra loves doing laundry',
              'description': 'md5:8ff0523848ac2b8f9b065ba781ccf294',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'view_count': int,
              'age_limit': 18,
          },
diff --git a/youtube_dl/extractor/pornhub.py b/youtube_dl/extractor/pornhub.py

index 40dbe6967fac2126b7bf6e6a1245768b3c039c8e..3eaf56973ec35072d8f0549c5850357ca94ed12b 100644 (file)
--- a/youtube_dl/extractor/pornhub.py
+++ b/youtube_dl/extractor/pornhub.py
@@ -229,7 +229,14 @@ def _real_extract(self, url):
  
          webpage = self._download_webpage(url, playlist_id)
  
-        entries = self._extract_entries(webpage)
+        # Only process container div with main playlist content skipping
+        # drop-down menu that uses similar pattern for videos (see
+        # https://github.com/rg3/youtube-dl/issues/11594).
+        container = self._search_regex(
+            r'(?s)(<div[^>]+class=["\']container.+)', webpage,
+            'container', default=webpage)
+
+        entries = self._extract_entries(container)
  
          playlist = self._parse_json(
              self._search_regex(
@@ -243,12 +250,12 @@ def _real_extract(self, url):
  class PornHubPlaylistIE(PornHubPlaylistBaseIE):
      _VALID_URL = r'https?://(?:www\.)?pornhub\.com/playlist/(?P<id>\d+)'
      _TESTS = [{
-        'url': 'http://www.pornhub.com/playlist/6201671',
+        'url': 'http://www.pornhub.com/playlist/4667351',
          'info_dict': {
-            'id': '6201671',
-            'title': 'P0p4',
+            'id': '4667351',
+            'title': 'Nataly Hot',
          },
-        'playlist_mincount': 35,
+        'playlist_mincount': 2,
      }]
  
  
diff --git a/youtube_dl/extractor/pornotube.py b/youtube_dl/extractor/pornotube.py

index 63816c3588cebe889e77a24a928cc789ef07c7d5..1b5b9a320dcd31a0f28ad6ed8a20555008072d88 100644 (file)
--- a/youtube_dl/extractor/pornotube.py
+++ b/youtube_dl/extractor/pornotube.py
@@ -19,7 +19,7 @@ class PornotubeIE(InfoExtractor):
              'description': 'md5:a8304bef7ef06cb4ab476ca6029b01b0',
              'categories': ['Adult Humor', 'Blondes'],
              'uploader': 'Alpha Blue Archives',
-            'thumbnail': 're:^https?://.*\\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'timestamp': 1417582800,
              'age_limit': 18,
          }
diff --git a/youtube_dl/extractor/pornovoisines.py b/youtube_dl/extractor/pornovoisines.py

index 58f557e3995f25a3787018150c953cb088e4fe81..b6b71069d31070644afd46e2d92a939e4b33f744 100644 (file)
--- a/youtube_dl/extractor/pornovoisines.py
+++ b/youtube_dl/extractor/pornovoisines.py
@@ -23,7 +23,7 @@ class PornoVoisinesIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Recherche appartement',
              'description': 'md5:fe10cb92ae2dd3ed94bb4080d11ff493',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'upload_date': '20140925',
              'duration': 120,
              'view_count': int,
diff --git a/youtube_dl/extractor/pornoxo.py b/youtube_dl/extractor/pornoxo.py

index 3c9087f2dfe3caa30c879f4905e857a046fd789c..1a0cce7e0274bb4a06bf9b0604d9ebdf75cf3df5 100644 (file)
--- a/youtube_dl/extractor/pornoxo.py
+++ b/youtube_dl/extractor/pornoxo.py
@@ -20,7 +20,7 @@ class PornoXOIE(JWPlatformBaseIE):
              'display_id': 'striptease-from-sexy-secretary',
              'description': 'md5:0ee35252b685b3883f4a1d38332f9980',
              'categories': list,  # NSFW
-            'thumbnail': 're:https?://.*\.jpg$',
+            'thumbnail': r're:https?://.*\.jpg$',
              'age_limit': 18,
          }
      }
diff --git a/youtube_dl/extractor/presstv.py b/youtube_dl/extractor/presstv.py

index 2da93ed348671a363120d03cb626a4e9d808fd9d..b5c279203b9486e765f6e79f2d9ce8b67acf73b1 100644 (file)
--- a/youtube_dl/extractor/presstv.py
+++ b/youtube_dl/extractor/presstv.py
@@ -19,7 +19,7 @@ class PressTVIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Organic mattresses used to clean waste water',
              'upload_date': '20160409',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'description': 'md5:20002e654bbafb6908395a5c0cfcd125'
          }
      }
diff --git a/youtube_dl/extractor/promptfile.py b/youtube_dl/extractor/promptfile.py

index d40cca06f989b7c99329e1650497a06e9a6390e4..23ac93d7e248bce034fcb221d26089d8be412ee2 100644 (file)
--- a/youtube_dl/extractor/promptfile.py
+++ b/youtube_dl/extractor/promptfile.py
@@ -20,7 +20,7 @@ class PromptFileIE(InfoExtractor):
              'id': '86D1CE8462-576CAAE416',
              'ext': 'mp4',
              'title': 'oceans.mp4',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          }
      }
  
diff --git a/youtube_dl/extractor/prosiebensat1.py b/youtube_dl/extractor/prosiebensat1.py

index 7cc07a2ad5b88c51aa9f5d339839fd743727e17e..5091d8456faf3a4841ba770ea408aa75be806f83 100644 (file)
--- a/youtube_dl/extractor/prosiebensat1.py
+++ b/youtube_dl/extractor/prosiebensat1.py
@@ -85,6 +85,9 @@ def fix_bitrate(bitrate):
                      formats.extend(self._extract_m3u8_formats(
                          source_url, clip_id, 'mp4', 'm3u8_native',
                          m3u8_id='hls', fatal=False))
+                elif mimetype == 'application/dash+xml':
+                    formats.extend(self._extract_mpd_formats(
+                        source_url, clip_id, mpd_id='dash', fatal=False))
                  else:
                      tbr = fix_bitrate(source['bitrate'])
                      if protocol in ('rtmp', 'rtmpe'):
@@ -144,16 +147,12 @@ class ProSiebenSat1IE(ProSiebenSat1BaseIE):
              'url': 'http://www.prosieben.de/tv/circus-halligalli/videos/218-staffel-2-episode-18-jahresrueckblick-ganze-folge',
              'info_dict': {
                  'id': '2104602',
-                'ext': 'flv',
+                'ext': 'mp4',
                  'title': 'Episode 18 - Staffel 2',
                  'description': 'md5:8733c81b702ea472e069bc48bb658fc1',
                  'upload_date': '20131231',
                  'duration': 5845.04,
              },
-            'params': {
-                # rtmp download
-                'skip_download': True,
-            },
          },
          {
              'url': 'http://www.prosieben.de/videokatalog/Gesellschaft/Leben/Trends/video-Lady-Umstyling-f%C3%BCr-Audrina-Rebekka-Audrina-Fergen-billig-aussehen-Battal-Modica-700544.html',
@@ -255,7 +254,7 @@ class ProSiebenSat1IE(ProSiebenSat1BaseIE):
              'url': 'http://www.the-voice-of-germany.de/video/31-andreas-kuemmert-rocket-man-clip',
              'info_dict': {
                  'id': '2572814',
-                'ext': 'flv',
+                'ext': 'mp4',
                  'title': 'Andreas Kümmert: Rocket Man',
                  'description': 'md5:6ddb02b0781c6adf778afea606652e38',
                  'upload_date': '20131017',
@@ -269,7 +268,7 @@ class ProSiebenSat1IE(ProSiebenSat1BaseIE):
              'url': 'http://www.fem.com/wellness/videos/wellness-video-clip-kurztripps-zum-valentinstag.html',
              'info_dict': {
                  'id': '2156342',
-                'ext': 'flv',
+                'ext': 'mp4',
                  'title': 'Kurztrips zum Valentinstag',
                  'description': 'Romantischer Kurztrip zum Valentinstag? Nina Heinemann verrät, was sich hier wirklich lohnt.',
                  'duration': 307.24,
@@ -286,12 +285,13 @@ class ProSiebenSat1IE(ProSiebenSat1BaseIE):
                  'description': 'md5:63b8963e71f481782aeea877658dec84',
              },
              'playlist_count': 2,
+            'skip': 'This video is unavailable',
          },
          {
              'url': 'http://www.7tv.de/circus-halligalli/615-best-of-circus-halligalli-ganze-folge',
              'info_dict': {
                  'id': '4187506',
-                'ext': 'flv',
+                'ext': 'mp4',
                  'title': 'Best of Circus HalliGalli',
                  'description': 'md5:8849752efd90b9772c9db6fdf87fb9e9',
                  'upload_date': '20151229',
@@ -372,7 +372,9 @@ def _extract_clip(self, url, webpage):
          title = self._html_search_regex(self._TITLE_REGEXES, webpage, 'title')
          info = self._extract_video_info(url, clip_id)
          description = self._html_search_regex(
-            self._DESCRIPTION_REGEXES, webpage, 'description', fatal=False)
+            self._DESCRIPTION_REGEXES, webpage, 'description', default=None)
+        if description is None:
+            description = self._og_search_description(webpage)
          thumbnail = self._og_search_thumbnail(webpage)
          upload_date = unified_strdate(self._html_search_regex(
              self._UPLOAD_DATE_REGEXES, webpage, 'upload date', default=None))
@@ -391,7 +393,7 @@ def _extract_playlist(self, url, webpage):
              self._PLAYLIST_ID_REGEXES, webpage, 'playlist id')
          playlist = self._parse_json(
              self._search_regex(
-                'var\s+contentResources\s*=\s*(\[.+?\]);\s*</script',
+                r'var\s+contentResources\s*=\s*(\[.+?\]);\s*</script',
                  webpage, 'playlist'),
              playlist_id)
          entries = []
diff --git a/youtube_dl/extractor/puls4.py b/youtube_dl/extractor/puls4.py

index 1c54af0022f087788d6bb11a25639f1a184b42b8..80091b85f88db2025982d195a54c64e164152881 100644 (file)
--- a/youtube_dl/extractor/puls4.py
+++ b/youtube_dl/extractor/puls4.py
@@ -10,7 +10,7 @@
  
  
  class Puls4IE(ProSiebenSat1BaseIE):
-    _VALID_URL = r'https?://(?:www\.)?puls4\.com/(?P<id>(?:[^/]+/)*?videos/[^?#]+)'
+    _VALID_URL = r'https?://(?:www\.)?puls4\.com/(?P<id>[^?#&]+)'
      _TESTS = [{
          'url': 'http://www.puls4.com/2-minuten-2-millionen/staffel-3/videos/2min2miotalk/Tobias-Homberger-von-myclubs-im-2min2miotalk-118118',
          'md5': 'fd3c6b0903ac72c9d004f04bc6bb3e03',
@@ -22,6 +22,12 @@ class Puls4IE(ProSiebenSat1BaseIE):
              'upload_date': '20160830',
              'uploader': 'PULS_4',
          },
+    }, {
+        'url': 'http://www.puls4.com/pro-und-contra/wer-wird-prasident/Ganze-Folgen/Wer-wird-Praesident.-Norbert-Hofer',
+        'only_matching': True,
+    }, {
+        'url': 'http://www.puls4.com/pro-und-contra/wer-wird-prasident/Ganze-Folgen/Wer-wird-Praesident-Analyse-des-Interviews-mit-Norbert-Hofer-416598',
+        'only_matching': True,
      }]
      _TOKEN = 'puls4'
      _SALT = '01!kaNgaiNgah1Ie4AeSha'
diff --git a/youtube_dl/extractor/qqmusic.py b/youtube_dl/extractor/qqmusic.py

index 37cb9e2c9dded7c9fa6e1e9eeef4ebeccdf9b4a9..17c27da46da7576205afba3c53254728f711d974 100644 (file)
--- a/youtube_dl/extractor/qqmusic.py
+++ b/youtube_dl/extractor/qqmusic.py
@@ -29,7 +29,7 @@ class QQMusicIE(InfoExtractor):
              'release_date': '20141227',
              'creator': '林俊杰',
              'description': 'md5:d327722d0361576fde558f1ac68a7065',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          }
      }, {
          'note': 'There is no mp3-320 version of this song.',
@@ -42,7 +42,7 @@ class QQMusicIE(InfoExtractor):
              'release_date': '20050626',
              'creator': '李季美',
              'description': 'md5:46857d5ed62bc4ba84607a805dccf437',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          }
      }, {
          'note': 'lyrics not in .lrc format',
@@ -54,7 +54,7 @@ class QQMusicIE(InfoExtractor):
              'release_date': '19970225',
              'creator': 'Dark Funeral',
              'description': 'md5:ed14d5bd7ecec19609108052c25b2c11',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          },
          'params': {
              'skip_download': True,
diff --git a/youtube_dl/extractor/r7.py b/youtube_dl/extractor/r7.py

index 069dbfaed0638e396d024ec81d5142d18f9ad90f..ed38c77ebb6bdeaacabff4b565fe121ee86d07fb 100644 (file)
--- a/youtube_dl/extractor/r7.py
+++ b/youtube_dl/extractor/r7.py
@@ -23,7 +23,7 @@ class R7IE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Policiais humilham suspeito à beira da morte: "Morre com dignidade"',
              'description': 'md5:01812008664be76a6479aa58ec865b72',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 98,
              'like_count': int,
              'view_count': int,
diff --git a/youtube_dl/extractor/radiobremen.py b/youtube_dl/extractor/radiobremen.py

index 0aa8d059bf81dffd28df727650b20aafc49302eb..2c35f9845177b6bda4dd352f207ee2e36efdcb8d 100644 (file)
--- a/youtube_dl/extractor/radiobremen.py
+++ b/youtube_dl/extractor/radiobremen.py
@@ -20,7 +20,7 @@ class RadioBremenIE(InfoExtractor):
              'duration': 178,
              'width': 512,
              'title': 'Druck auf Patrick Öztürk',
-            'thumbnail': 're:https?://.*\.jpg$',
+            'thumbnail': r're:https?://.*\.jpg$',
              'description': 'Gegen den SPD-Bürgerschaftsabgeordneten Patrick Öztürk wird wegen Beihilfe zum gewerbsmäßigen Betrug ermittelt. Am Donnerstagabend sollte er dem Vorstand des SPD-Unterbezirks Bremerhaven dazu Rede und Antwort stehen.',
          },
      }
diff --git a/youtube_dl/extractor/radiode.py b/youtube_dl/extractor/radiode.py

index aa5f6f8ad41d1dcdb3cb975e2fcf883c8d3ac7f9..2c06c8b1e416c4d8c0d98f3a917c0823064458f8 100644 (file)
--- a/youtube_dl/extractor/radiode.py
+++ b/youtube_dl/extractor/radiode.py
@@ -13,7 +13,7 @@ class RadioDeIE(InfoExtractor):
              'ext': 'mp3',
              'title': 're:^NDR 2 [0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}$',
              'description': 'md5:591c49c702db1a33751625ebfb67f273',
-            'thumbnail': 're:^https?://.*\.png',
+            'thumbnail': r're:^https?://.*\.png',
              'is_live': True,
          },
          'params': {
diff --git a/youtube_dl/extractor/radiojavan.py b/youtube_dl/extractor/radiojavan.py

index ec4fa6e602ea779dd6d3a530ea6cfb639eee3cf4..a53ad97a56ef9000ea5ed65fbf0e24276b03f6f3 100644 (file)
--- a/youtube_dl/extractor/radiojavan.py
+++ b/youtube_dl/extractor/radiojavan.py
@@ -18,7 +18,7 @@ class RadioJavanIE(InfoExtractor):
              'id': 'chaartaar-ashoobam',
              'ext': 'mp4',
              'title': 'Chaartaar - Ashoobam',
-            'thumbnail': 're:^https?://.*\.jpe?g$',
+            'thumbnail': r're:^https?://.*\.jpe?g$',
              'upload_date': '20150215',
              'view_count': int,
              'like_count': int,
diff --git a/youtube_dl/extractor/rai.py b/youtube_dl/extractor/rai.py

index dc640b1bcb58ddb79c89e5f2346a5bc5c63a3547..41afbd9afa5472fdbd782db06f392abd587cf570 100644 (file)
--- a/youtube_dl/extractor/rai.py
+++ b/youtube_dl/extractor/rai.py
@@ -120,7 +120,7 @@ class RaiTVIE(RaiBaseIE):
                  'description': 'md5:f27c544694cacb46a078db84ec35d2d9',
                  'upload_date': '20140407',
                  'duration': 6160,
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
              }
          },
          {
@@ -133,7 +133,7 @@ class RaiTVIE(RaiBaseIE):
                  'title': 'TG PRIMO TEMPO',
                  'upload_date': '20140612',
                  'duration': 1758,
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
              },
              'skip': 'Geo-restricted to Italy',
          },
@@ -169,7 +169,7 @@ class RaiTVIE(RaiBaseIE):
                  'description': 'md5:364b604f7db50594678f483353164fb8',
                  'upload_date': '20140923',
                  'duration': 386,
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
              }
          },
      ]
diff --git a/youtube_dl/extractor/rbmaradio.py b/youtube_dl/extractor/rbmaradio.py

index 471928ef86b5d434953fc694eef0bb7da8edd334..53b82fba3964b519fe3829ad1f7384755e943b7c 100644 (file)
--- a/youtube_dl/extractor/rbmaradio.py
+++ b/youtube_dl/extractor/rbmaradio.py
@@ -22,7 +22,7 @@ class RBMARadioIE(InfoExtractor):
              'ext': 'mp3',
              'title': 'Main Stage - Ford & Lopatin',
              'description': 'md5:4f340fb48426423530af5a9d87bd7b91',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'duration': 2452,
              'timestamp': 1307103164,
              'upload_date': '20110603',
diff --git a/youtube_dl/extractor/reuters.py b/youtube_dl/extractor/reuters.py

index 961d504eb261cf4fd05eb50f16401abde5076d42..9dc482d21634965b3a16be80c08ee3b2952eee6c 100644 (file)
--- a/youtube_dl/extractor/reuters.py
+++ b/youtube_dl/extractor/reuters.py
@@ -32,7 +32,7 @@ def _real_extract(self, url):
              webpage, 'video data'))
  
          def get_json_value(key, fatal=False):
-            return self._search_regex('"%s"\s*:\s*"([^"]+)"' % key, video_data, key, fatal=fatal)
+            return self._search_regex(r'"%s"\s*:\s*"([^"]+)"' % key, video_data, key, fatal=fatal)
  
          title = unescapeHTML(get_json_value('title', fatal=True))
          mmid, fid = re.search(r',/(\d+)\?f=(\d+)', get_json_value('flv', fatal=True)).groups()
diff --git a/youtube_dl/extractor/reverbnation.py b/youtube_dl/extractor/reverbnation.py

index 4875009e5cafd68867b67393d36d90625e5f29c8..4cb99c244c34369902d085a60b068c5991d37ad8 100644 (file)
--- a/youtube_dl/extractor/reverbnation.py
+++ b/youtube_dl/extractor/reverbnation.py
@@ -18,7 +18,7 @@ class ReverbNationIE(InfoExtractor):
              'title': 'MONA LISA',
              'uploader': 'ALKILADOS',
              'uploader_id': '216429',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
          },
      }]
  
diff --git a/youtube_dl/extractor/ro220.py b/youtube_dl/extractor/ro220.py

index 962b524e94d2bddd12781b8dace06a1b28bc2c71..69934ef2b042903afeb4ca6bada108ec39caf2d1 100644 (file)
--- a/youtube_dl/extractor/ro220.py
+++ b/youtube_dl/extractor/ro220.py
@@ -14,7 +14,7 @@ class Ro220IE(InfoExtractor):
              'id': 'LYV6doKo7f',
              'ext': 'mp4',
              'title': 'Luati-le Banii sez 4 ep 1',
-            'description': 're:^Iata-ne reveniti dupa o binemeritata vacanta\. +Va astept si pe Facebook cu pareri si comentarii.$',
+            'description': r're:^Iata-ne reveniti dupa o binemeritata vacanta\. +Va astept si pe Facebook cu pareri si comentarii.$',
          }
      }
  
diff --git a/youtube_dl/extractor/rockstargames.py b/youtube_dl/extractor/rockstargames.py

index 48128e219bf468a1e3a01ec0f1116304ced1221d..cd6904bc935ef4c3a308cee47cff56602a8de691 100644 (file)
--- a/youtube_dl/extractor/rockstargames.py
+++ b/youtube_dl/extractor/rockstargames.py
@@ -18,7 +18,7 @@ class RockstarGamesIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Further Adventures in Finance and Felony Trailer',
              'description': 'md5:6d31f55f30cb101b5476c4a379e324a3',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'timestamp': 1464876000,
              'upload_date': '20160602',
          }
diff --git a/youtube_dl/extractor/roosterteeth.py b/youtube_dl/extractor/roosterteeth.py

index f5b2f560c7f70c4e341aaf38f9718ea4994b811b..46dfc78f5edac0e9e8ef66f37efa4bbd7afcf3ea 100644 (file)
--- a/youtube_dl/extractor/roosterteeth.py
+++ b/youtube_dl/extractor/roosterteeth.py
@@ -26,7 +26,7 @@ class RoosterTeethIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Million Dollars, But...: Million Dollars, But... The Game Announcement',
              'description': 'md5:0cc3b21986d54ed815f5faeccd9a9ca5',
-            'thumbnail': 're:^https?://.*\.png$',
+            'thumbnail': r're:^https?://.*\.png$',
              'series': 'Million Dollars, But...',
              'episode': 'Million Dollars, But... The Game Announcement',
              'comment_count': int,
diff --git a/youtube_dl/extractor/rottentomatoes.py b/youtube_dl/extractor/rottentomatoes.py

index 1d404d20aa8b2223c68cada46e4bfe87613eb6ae..14c8e823698174f60890d9c27535e1dce40c9ce6 100644 (file)
--- a/youtube_dl/extractor/rottentomatoes.py
+++ b/youtube_dl/extractor/rottentomatoes.py
@@ -14,7 +14,7 @@ class RottenTomatoesIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Toy Story 3',
              'description': 'From the creators of the beloved TOY STORY films, comes a story that will reunite the gang in a whole new way.',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          },
      }
  
diff --git a/youtube_dl/extractor/rte.py b/youtube_dl/extractor/rte.py

index ebe563ebb89e86e28a6bf55669cd066aca44d851..a6fac6c35d00327c2858f9aead301845c4af572a 100644 (file)
--- a/youtube_dl/extractor/rte.py
+++ b/youtube_dl/extractor/rte.py
@@ -4,118 +4,31 @@
  import re
  
  from .common import InfoExtractor
+from ..compat import compat_HTTPError
  from ..utils import (
      float_or_none,
      parse_iso8601,
      unescapeHTML,
+    ExtractorError,
  )
  
  
-class RteIE(InfoExtractor):
-    IE_NAME = 'rte'
-    IE_DESC = 'Raidió Teilifís Éireann TV'
-    _VALID_URL = r'https?://(?:www\.)?rte\.ie/player/[^/]{2,3}/show/[^/]+/(?P<id>[0-9]+)'
-    _TEST = {
-        'url': 'http://www.rte.ie/player/ie/show/iwitness-862/10478715/',
-        'info_dict': {
-            'id': '10478715',
-            'ext': 'flv',
-            'title': 'Watch iWitness  online',
-            'thumbnail': 're:^https?://.*\.jpg$',
-            'description': 'iWitness : The spirit of Ireland, one voice and one minute at a time.',
-            'duration': 60.046,
-        },
-        'params': {
-            'skip_download': 'f4m fails with --test atm'
-        }
-    }
-
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-        webpage = self._download_webpage(url, video_id)
-
-        title = self._og_search_title(webpage)
-        description = self._html_search_meta('description', webpage, 'description')
-        duration = float_or_none(self._html_search_meta(
-            'duration', webpage, 'duration', fatal=False), 1000)
-
-        thumbnail = None
-        thumbnail_meta = self._html_search_meta('thumbnail', webpage)
-        if thumbnail_meta:
-            thumbnail_id = self._search_regex(
-                r'uri:irus:(.+)', thumbnail_meta,
-                'thumbnail id', fatal=False)
-            if thumbnail_id:
-                thumbnail = 'http://img.rasset.ie/%s.jpg' % thumbnail_id
-
-        feeds_url = self._html_search_meta('feeds-prefix', webpage, 'feeds url') + video_id
-        json_string = self._download_json(feeds_url, video_id)
-
-        # f4m_url = server + relative_url
-        f4m_url = json_string['shows'][0]['media:group'][0]['rte:server'] + json_string['shows'][0]['media:group'][0]['url']
-        f4m_formats = self._extract_f4m_formats(f4m_url, video_id)
-        self._sort_formats(f4m_formats)
-
-        return {
-            'id': video_id,
-            'title': title,
-            'formats': f4m_formats,
-            'description': description,
-            'thumbnail': thumbnail,
-            'duration': duration,
-        }
-
-
-class RteRadioIE(InfoExtractor):
-    IE_NAME = 'rte:radio'
-    IE_DESC = 'Raidió Teilifís Éireann radio'
-    # Radioplayer URLs have two distinct specifier formats,
-    # the old format #!rii=<channel_id>:<id>:<playable_item_id>:<date>:
-    # the new format #!rii=b<channel_id>_<id>_<playable_item_id>_<date>_
-    # where the IDs are int/empty, the date is DD-MM-YYYY, and the specifier may be truncated.
-    # An <id> uniquely defines an individual recording, and is the only part we require.
-    _VALID_URL = r'https?://(?:www\.)?rte\.ie/radio/utils/radioplayer/rteradioweb\.html#!rii=(?:b?[0-9]*)(?:%3A|:|%5F|_)(?P<id>[0-9]+)'
-
-    _TESTS = [{
-        # Old-style player URL; HLS and RTMPE formats
-        'url': 'http://www.rte.ie/radio/utils/radioplayer/rteradioweb.html#!rii=16:10507902:2414:27-12-2015:',
-        'info_dict': {
-            'id': '10507902',
-            'ext': 'mp4',
-            'title': 'Gloria',
-            'thumbnail': 're:^https?://.*\.jpg$',
-            'description': 'md5:9ce124a7fb41559ec68f06387cabddf0',
-            'timestamp': 1451203200,
-            'upload_date': '20151227',
-            'duration': 7230.0,
-        },
-        'params': {
-            'skip_download': 'f4m fails with --test atm'
-        }
-    }, {
-        # New-style player URL; RTMPE formats only
-        'url': 'http://rte.ie/radio/utils/radioplayer/rteradioweb.html#!rii=b16_3250678_8861_06-04-2012_',
-        'info_dict': {
-            'id': '3250678',
-            'ext': 'flv',
-            'title': 'The Lyric Concert with Paul Herriott',
-            'thumbnail': 're:^https?://.*\.jpg$',
-            'description': '',
-            'timestamp': 1333742400,
-            'upload_date': '20120406',
-            'duration': 7199.016,
-        },
-        'params': {
-            'skip_download': 'f4m fails with --test atm'
-        }
-    }]
-
+class RteBaseIE(InfoExtractor):
      def _real_extract(self, url):
          item_id = self._match_id(url)
  
-        json_string = self._download_json(
-            'http://www.rte.ie/rteavgen/getplaylist/?type=web&format=json&id=' + item_id,
-            item_id)
+        try:
+            json_string = self._download_json(
+                'http://www.rte.ie/rteavgen/getplaylist/?type=web&format=json&id=' + item_id,
+                item_id)
+        except ExtractorError as ee:
+            if isinstance(ee.cause, compat_HTTPError) and ee.cause.code == 404:
+                error_info = self._parse_json(ee.cause.read().decode(), item_id, fatal=False)
+                if error_info:
+                    raise ExtractorError(
+                        '%s said: %s' % (self.IE_NAME, error_info['message']),
+                        expected=True)
+            raise
  
          # NB the string values in the JSON are stored using XML escaping(!)
          show = json_string['shows'][0]
@@ -163,3 +76,67 @@ def _real_extract(self, url):
              'duration': duration,
              'formats': formats,
          }
+
+
+class RteIE(RteBaseIE):
+    IE_NAME = 'rte'
+    IE_DESC = 'Raidió Teilifís Éireann TV'
+    _VALID_URL = r'https?://(?:www\.)?rte\.ie/player/[^/]{2,3}/show/[^/]+/(?P<id>[0-9]+)'
+    _TEST = {
+        'url': 'http://www.rte.ie/player/ie/show/iwitness-862/10478715/',
+        'md5': '4a76eb3396d98f697e6e8110563d2604',
+        'info_dict': {
+            'id': '10478715',
+            'ext': 'mp4',
+            'title': 'iWitness',
+            'thumbnail': r're:^https?://.*\.jpg$',
+            'description': 'The spirit of Ireland, one voice and one minute at a time.',
+            'duration': 60.046,
+            'upload_date': '20151012',
+            'timestamp': 1444694160,
+        },
+    }
+
+
+class RteRadioIE(RteBaseIE):
+    IE_NAME = 'rte:radio'
+    IE_DESC = 'Raidió Teilifís Éireann radio'
+    # Radioplayer URLs have two distinct specifier formats,
+    # the old format #!rii=<channel_id>:<id>:<playable_item_id>:<date>:
+    # the new format #!rii=b<channel_id>_<id>_<playable_item_id>_<date>_
+    # where the IDs are int/empty, the date is DD-MM-YYYY, and the specifier may be truncated.
+    # An <id> uniquely defines an individual recording, and is the only part we require.
+    _VALID_URL = r'https?://(?:www\.)?rte\.ie/radio/utils/radioplayer/rteradioweb\.html#!rii=(?:b?[0-9]*)(?:%3A|:|%5F|_)(?P<id>[0-9]+)'
+
+    _TESTS = [{
+        # Old-style player URL; HLS and RTMPE formats
+        'url': 'http://www.rte.ie/radio/utils/radioplayer/rteradioweb.html#!rii=16:10507902:2414:27-12-2015:',
+        'md5': 'c79ccb2c195998440065456b69760411',
+        'info_dict': {
+            'id': '10507902',
+            'ext': 'mp4',
+            'title': 'Gloria',
+            'thumbnail': r're:^https?://.*\.jpg$',
+            'description': 'md5:9ce124a7fb41559ec68f06387cabddf0',
+            'timestamp': 1451203200,
+            'upload_date': '20151227',
+            'duration': 7230.0,
+        },
+    }, {
+        # New-style player URL; RTMPE formats only
+        'url': 'http://rte.ie/radio/utils/radioplayer/rteradioweb.html#!rii=b16_3250678_8861_06-04-2012_',
+        'info_dict': {
+            'id': '3250678',
+            'ext': 'flv',
+            'title': 'The Lyric Concert with Paul Herriott',
+            'thumbnail': r're:^https?://.*\.jpg$',
+            'description': '',
+            'timestamp': 1333742400,
+            'upload_date': '20120406',
+            'duration': 7199.016,
+        },
+        'params': {
+            # rtmp download
+            'skip_download': True,
+        },
+    }]
diff --git a/youtube_dl/extractor/rtl2.py b/youtube_dl/extractor/rtl2.py

index cb4ee88033ba1d761faac452de724a6c44f08503..721ee733ce38c7e4a95c7806b10bbbd346166451 100644 (file)
--- a/youtube_dl/extractor/rtl2.py
+++ b/youtube_dl/extractor/rtl2.py
@@ -2,7 +2,9 @@
  from __future__ import unicode_literals
  
  import re
+
  from .common import InfoExtractor
+from ..utils import int_or_none
  
  
  class RTL2IE(InfoExtractor):
@@ -13,7 +15,7 @@ class RTL2IE(InfoExtractor):
              'id': 'folge-203-0',
              'ext': 'f4v',
              'title': 'GRIP sucht den Sommerkönig',
-            'description': 'Matthias, Det und Helge treten gegeneinander an.'
+            'description': 'md5:e3adbb940fd3c6e76fa341b8748b562f'
          },
          'params': {
              # rtmp download
@@ -25,7 +27,7 @@ class RTL2IE(InfoExtractor):
              'id': '21040-anna-erwischt-alex',
              'ext': 'mp4',
              'title': 'Anna erwischt Alex!',
-            'description': 'Anna ist Alex\' Tochter bei Köln 50667.'
+            'description': 'Anna nimmt ihrem Vater nicht ab, dass er nicht spielt. Und tatsächlich erwischt sie ihn auf frischer Tat.'
          },
          'params': {
              # rtmp download
@@ -52,34 +54,47 @@ def _real_extract(self, url):
                  r'vico_id\s*:\s*([0-9]+)', webpage, 'vico_id')
              vivi_id = self._html_search_regex(
                  r'vivi_id\s*:\s*([0-9]+)', webpage, 'vivi_id')
-        info_url = 'http://www.rtl2.de/video/php/get_video.php?vico_id=' + vico_id + '&vivi_id=' + vivi_id
  
-        info = self._download_json(info_url, video_id)
+        info = self._download_json(
+            'http://www.rtl2.de/sites/default/modules/rtl2/mediathek/php/get_video_jw.php',
+            video_id, query={
+                'vico_id': vico_id,
+                'vivi_id': vivi_id,
+            })
          video_info = info['video']
          title = video_info['titel']
-        description = video_info.get('beschreibung')
-        thumbnail = video_info.get('image')
  
-        download_url = video_info['streamurl']
-        download_url = download_url.replace('\\', '')
-        stream_url = 'mp4:' + self._html_search_regex(r'ondemand/(.*)', download_url, 'stream URL')
-        rtmp_conn = ['S:connect', 'O:1', 'NS:pageUrl:' + url, 'NB:fpad:0', 'NN:videoFunction:1', 'O:0']
+        formats = []
+
+        rtmp_url = video_info.get('streamurl')
+        if rtmp_url:
+            rtmp_url = rtmp_url.replace('\\', '')
+            stream_url = 'mp4:' + self._html_search_regex(r'/ondemand/(.+)', rtmp_url, 'stream URL')
+            rtmp_conn = ['S:connect', 'O:1', 'NS:pageUrl:' + url, 'NB:fpad:0', 'NN:videoFunction:1', 'O:0']
+
+            formats.append({
+                'format_id': 'rtmp',
+                'url': rtmp_url,
+                'play_path': stream_url,
+                'player_url': 'http://www.rtl2.de/flashplayer/vipo_player.swf',
+                'page_url': url,
+                'flash_version': 'LNX 11,2,202,429',
+                'rtmp_conn': rtmp_conn,
+                'no_resume': True,
+                'preference': 1,
+            })
+
+        m3u8_url = video_info.get('streamurl_hls')
+        if m3u8_url:
+            formats.extend(self._extract_akamai_formats(m3u8_url, video_id))
  
-        formats = [{
-            'url': download_url,
-            'play_path': stream_url,
-            'player_url': 'http://www.rtl2.de/flashplayer/vipo_player.swf',
-            'page_url': url,
-            'flash_version': 'LNX 11,2,202,429',
-            'rtmp_conn': rtmp_conn,
-            'no_resume': True,
-        }]
          self._sort_formats(formats)
  
          return {
              'id': video_id,
              'title': title,
-            'thumbnail': thumbnail,
-            'description': description,
+            'thumbnail': video_info.get('image'),
+            'description': video_info.get('beschreibung'),
+            'duration': int_or_none(video_info.get('duration')),
              'formats': formats,
          }
diff --git a/youtube_dl/extractor/rtlnl.py b/youtube_dl/extractor/rtlnl.py

index f0250af8a4c0e06e2642c9cced9f6b330dc1d4f6..54076de280f4e4ae24321c875f553cfcd146a574 100644 (file)
--- a/youtube_dl/extractor/rtlnl.py
+++ b/youtube_dl/extractor/rtlnl.py
@@ -40,7 +40,7 @@ class RtlNlIE(InfoExtractor):
              'ext': 'mp4',
              'timestamp': 1424039400,
              'title': 'RTL Nieuws - Nieuwe beelden Kopenhagen: chaos direct na aanslag',
-            'thumbnail': 're:^https?://screenshots\.rtl\.nl/(?:[^/]+/)*sz=[0-9]+x[0-9]+/uuid=84ae5571-ac25-4225-ae0c-ef8d9efb2aed$',
+            'thumbnail': r're:^https?://screenshots\.rtl\.nl/(?:[^/]+/)*sz=[0-9]+x[0-9]+/uuid=84ae5571-ac25-4225-ae0c-ef8d9efb2aed$',
              'upload_date': '20150215',
              'description': 'Er zijn nieuwe beelden vrijgegeven die vlak na de aanslag in Kopenhagen zijn gemaakt. Op de video is goed te zien hoe omstanders zich bekommeren om één van de slachtoffers, terwijl de eerste agenten ter plaatse komen.',
          }
@@ -52,7 +52,7 @@ class RtlNlIE(InfoExtractor):
              'id': 'f536aac0-1dc3-4314-920e-3bd1c5b3811a',
              'ext': 'mp4',
              'title': 'RTL Nieuws - Meer beelden van overval juwelier',
-            'thumbnail': 're:^https?://screenshots\.rtl\.nl/(?:[^/]+/)*sz=[0-9]+x[0-9]+/uuid=f536aac0-1dc3-4314-920e-3bd1c5b3811a$',
+            'thumbnail': r're:^https?://screenshots\.rtl\.nl/(?:[^/]+/)*sz=[0-9]+x[0-9]+/uuid=f536aac0-1dc3-4314-920e-3bd1c5b3811a$',
              'timestamp': 1437233400,
              'upload_date': '20150718',
              'duration': 30.474,
diff --git a/youtube_dl/extractor/rtp.py b/youtube_dl/extractor/rtp.py

index 82b323cdd4e40b027d3a6c2c06e9ea9d58b171e2..533ee27cbefcb1bb19bc2bedf06163a0c64d74e2 100644 (file)
--- a/youtube_dl/extractor/rtp.py
+++ b/youtube_dl/extractor/rtp.py
@@ -16,7 +16,7 @@ class RTPIE(InfoExtractor):
              'ext': 'mp3',
              'title': 'Paixões Cruzadas',
              'description': 'As paixões musicais de António Cartaxo e António Macedo',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
          },
          'params': {
              # rtmp download
diff --git a/youtube_dl/extractor/rts.py b/youtube_dl/extractor/rts.py

index 3cc32847b7d0ffb937465a4b5f2d9f33f864bc09..48f17b828c1a8c6b0494ad1edb2bc3f914b3417a 100644 (file)
--- a/youtube_dl/extractor/rts.py
+++ b/youtube_dl/extractor/rts.py
@@ -4,27 +4,24 @@
  import re
  
  from .srgssr import SRGSSRIE
-from ..compat import (
-    compat_str,
-    compat_urllib_parse_urlparse,
-)
+from ..compat import compat_str
  from ..utils import (
      int_or_none,
      parse_duration,
      parse_iso8601,
      unescapeHTML,
-    xpath_text,
+    determine_ext,
  )
  
  
  class RTSIE(SRGSSRIE):
      IE_DESC = 'RTS.ch'
-    _VALID_URL = r'rts:(?P<rts_id>\d+)|https?://(?:www\.)?rts\.ch/(?:[^/]+/){2,}(?P<id>[0-9]+)-(?P<display_id>.+?)\.html'
+    _VALID_URL = r'rts:(?P<rts_id>\d+)|https?://(?:.+?\.)?rts\.ch/(?:[^/]+/){2,}(?P<id>[0-9]+)-(?P<display_id>.+?)\.html'
  
      _TESTS = [
          {
              'url': 'http://www.rts.ch/archives/tv/divers/3449373-les-enfants-terribles.html',
-            'md5': 'f254c4b26fb1d3c183793d52bc40d3e7',
+            'md5': 'ff7f8450a90cf58dacb64e29707b4a8e',
              'info_dict': {
                  'id': '3449373',
                  'display_id': 'les-enfants-terribles',
@@ -35,38 +32,20 @@ class RTSIE(SRGSSRIE):
                  'uploader': 'Divers',
                  'upload_date': '19680921',
                  'timestamp': -40280400,
-                'thumbnail': 're:^https?://.*\.image',
+                'thumbnail': r're:^https?://.*\.image',
                  'view_count': int,
              },
-            'params': {
-                # m3u8 download
-                'skip_download': True,
-            }
          },
          {
              'url': 'http://www.rts.ch/emissions/passe-moi-les-jumelles/5624067-entre-ciel-et-mer.html',
-            'md5': 'f1077ac5af686c76528dc8d7c5df29ba',
              'info_dict': {
-                'id': '5742494',
-                'display_id': '5742494',
-                'ext': 'mp4',
-                'duration': 3720,
-                'title': 'Les yeux dans les cieux - Mon homard au Canada',
-                'description': 'md5:d22ee46f5cc5bac0912e5a0c6d44a9f7',
-                'uploader': 'Passe-moi les jumelles',
-                'upload_date': '20140404',
-                'timestamp': 1396635300,
-                'thumbnail': 're:^https?://.*\.image',
-                'view_count': int,
+                'id': '5624065',
+                'title': 'Passe-moi les jumelles',
              },
-            'params': {
-                # m3u8 download
-                'skip_download': True,
-            }
+            'playlist_mincount': 4,
          },
          {
              'url': 'http://www.rts.ch/video/sport/hockey/5745975-1-2-kloten-fribourg-5-2-second-but-pour-gotteron-par-kwiatowski.html',
-            'md5': 'b4326fecd3eb64a458ba73c73e91299d',
              'info_dict': {
                  'id': '5745975',
                  'display_id': '1-2-kloten-fribourg-5-2-second-but-pour-gotteron-par-kwiatowski',
@@ -77,14 +56,18 @@ class RTSIE(SRGSSRIE):
                  'uploader': 'Hockey',
                  'upload_date': '20140403',
                  'timestamp': 1396556882,
-                'thumbnail': 're:^https?://.*\.image',
+                'thumbnail': r're:^https?://.*\.image',
                  'view_count': int,
              },
+            'params': {
+                # m3u8 download
+                'skip_download': True,
+            },
              'skip': 'Blocked outside Switzerland',
          },
          {
              'url': 'http://www.rts.ch/video/info/journal-continu/5745356-londres-cachee-par-un-epais-smog.html',
-            'md5': '9f713382f15322181bb366cc8c3a4ff0',
+            'md5': '1bae984fe7b1f78e94abc74e802ed99f',
              'info_dict': {
                  'id': '5745356',
                  'display_id': 'londres-cachee-par-un-epais-smog',
@@ -92,16 +75,12 @@ class RTSIE(SRGSSRIE):
                  'duration': 33,
                  'title': 'Londres cachée par un épais smog',
                  'description': 'Un important voile de smog recouvre Londres depuis mercredi, provoqué par la pollution et du sable du Sahara.',
-                'uploader': 'Le Journal en continu',
+                'uploader': 'L\'actu en vidéo',
                  'upload_date': '20140403',
                  'timestamp': 1396537322,
-                'thumbnail': 're:^https?://.*\.image',
+                'thumbnail': r're:^https?://.*\.image',
                  'view_count': int,
              },
-            'params': {
-                # m3u8 download
-                'skip_download': True,
-            }
          },
          {
              'url': 'http://www.rts.ch/audio/couleur3/programmes/la-belle-video-de-stephane-laurenceau/5706148-urban-hippie-de-damien-krisl-03-04-2014.html',
@@ -125,6 +104,10 @@ class RTSIE(SRGSSRIE):
                  'title': 'Hockey: Davos décroche son 31e titre de champion de Suisse',
              },
              'playlist_mincount': 5,
+        },
+        {
+            'url': 'http://pages.rts.ch/emissions/passe-moi-les-jumelles/5624065-entre-ciel-et-mer.html',
+            'only_matching': True,
          }
      ]
  
@@ -142,19 +125,32 @@ def download_json(internal_id):
  
          # media_id extracted out of URL is not always a real id
          if 'video' not in all_info and 'audio' not in all_info:
-            page = self._download_webpage(url, display_id)
+            entries = []
  
-            # article with videos on rhs
-            videos = re.findall(
-                r'<article[^>]+class="content-item"[^>]*>\s*<a[^>]+data-video-urn="urn:([^"]+)"',
-                page)
-            if not videos:
+            for item in all_info.get('items', []):
+                item_url = item.get('url')
+                if not item_url:
+                    continue
+                entries.append(self.url_result(item_url, 'RTS'))
+
+            if not entries:
+                page, urlh = self._download_webpage_handle(url, display_id)
+                if re.match(self._VALID_URL, urlh.geturl()).group('id') != media_id:
+                    return self.url_result(urlh.geturl(), 'RTS')
+
+                # article with videos on rhs
                  videos = re.findall(
-                    r'(?s)<iframe[^>]+class="srg-player"[^>]+src="[^"]+urn:([^"]+)"',
+                    r'<article[^>]+class="content-item"[^>]*>\s*<a[^>]+data-video-urn="urn:([^"]+)"',
                      page)
-            if videos:
-                entries = [self.url_result('srgssr:%s' % video_urn, 'SRGSSR') for video_urn in videos]
-                return self.playlist_result(entries, media_id, self._og_search_title(page))
+                if not videos:
+                    videos = re.findall(
+                        r'(?s)<iframe[^>]+class="srg-player"[^>]+src="[^"]+urn:([^"]+)"',
+                        page)
+                if videos:
+                    entries = [self.url_result('srgssr:%s' % video_urn, 'SRGSSR') for video_urn in videos]
+
+            if entries:
+                return self.playlist_result(entries, media_id, all_info.get('title'))
  
              internal_id = self._html_search_regex(
                  r'<(?:video|audio) data-id="([0-9]+)"', page,
@@ -168,36 +164,29 @@ def download_json(internal_id):
  
          info = all_info['video']['JSONinfo'] if 'video' in all_info else all_info['audio']
  
-        upload_timestamp = parse_iso8601(info.get('broadcast_date'))
-        duration = info.get('duration') or info.get('cutout') or info.get('cutduration')
-        if isinstance(duration, compat_str):
-            duration = parse_duration(duration)
-        view_count = info.get('plays')
-        thumbnail = unescapeHTML(info.get('preview_image_url'))
+        title = info['title']
  
          def extract_bitrate(url):
              return int_or_none(self._search_regex(
                  r'-([0-9]+)k\.', url, 'bitrate', default=None))
  
          formats = []
-        for format_id, format_url in info['streams'].items():
-            if format_id == 'hds_sd' and 'hds' in info['streams']:
+        streams = info.get('streams', {})
+        for format_id, format_url in streams.items():
+            if format_id == 'hds_sd' and 'hds' in streams:
                  continue
-            if format_id == 'hls_sd' and 'hls' in info['streams']:
+            if format_id == 'hls_sd' and 'hls' in streams:
                  continue
-            if format_url.endswith('.f4m'):
-                token = self._download_xml(
-                    'http://tp.srgssr.ch/token/akahd.xml?stream=%s/*' % compat_urllib_parse_urlparse(format_url).path,
-                    media_id, 'Downloading %s token' % format_id)
-                auth_params = xpath_text(token, './/authparams', 'auth params')
-                if not auth_params:
-                    continue
-                formats.extend(self._extract_f4m_formats(
-                    '%s?%s&hdcore=3.4.0&plugin=aasp-3.4.0.132.66' % (format_url, auth_params),
-                    media_id, f4m_id=format_id, fatal=False))
-            elif format_url.endswith('.m3u8'):
-                formats.extend(self._extract_m3u8_formats(
-                    format_url, media_id, 'mp4', 'm3u8_native', m3u8_id=format_id, fatal=False))
+            ext = determine_ext(format_url)
+            if ext in ('m3u8', 'f4m'):
+                format_url = self._get_tokenized_src(format_url, media_id, format_id)
+                if ext == 'f4m':
+                    formats.extend(self._extract_f4m_formats(
+                        format_url + ('?' if '?' not in format_url else '&') + 'hdcore=3.4.0',
+                        media_id, f4m_id=format_id, fatal=False))
+                else:
+                    formats.extend(self._extract_m3u8_formats(
+                        format_url, media_id, 'mp4', 'm3u8_native', m3u8_id=format_id, fatal=False))
              else:
                  formats.append({
                      'format_id': format_id,
@@ -205,25 +194,37 @@ def extract_bitrate(url):
                      'tbr': extract_bitrate(format_url),
                  })
  
-        if 'media' in info:
-            formats.extend([{
-                'format_id': '%s-%sk' % (media['ext'], media['rate']),
-                'url': 'http://download-video.rts.ch/%s' % media['url'],
-                'tbr': media['rate'] or extract_bitrate(media['url']),
-            } for media in info['media'] if media.get('rate')])
+        for media in info.get('media', []):
+            media_url = media.get('url')
+            if not media_url or re.match(r'https?://', media_url):
+                continue
+            rate = media.get('rate')
+            ext = media.get('ext') or determine_ext(media_url, 'mp4')
+            format_id = ext
+            if rate:
+                format_id += '-%dk' % rate
+            formats.append({
+                'format_id': format_id,
+                'url': 'http://download-video.rts.ch/' + media_url,
+                'tbr': rate or extract_bitrate(media_url),
+            })
  
          self._check_formats(formats, media_id)
          self._sort_formats(formats)
  
+        duration = info.get('duration') or info.get('cutout') or info.get('cutduration')
+        if isinstance(duration, compat_str):
+            duration = parse_duration(duration)
+
          return {
              'id': media_id,
              'display_id': display_id,
              'formats': formats,
-            'title': info['title'],
+            'title': title,
              'description': info.get('intro'),
              'duration': duration,
-            'view_count': view_count,
+            'view_count': int_or_none(info.get('plays')),
              'uploader': info.get('programName'),
-            'timestamp': upload_timestamp,
-            'thumbnail': thumbnail,
+            'timestamp': parse_iso8601(info.get('broadcast_date')),
+            'thumbnail': unescapeHTML(info.get('preview_image_url')),
          }
diff --git a/youtube_dl/extractor/rtve.py b/youtube_dl/extractor/rtve.py

index 6a43b036e924470055aea3910d1c5ea807483fdb..746677a24892f61249d32757ba5e4cac92d1f756 100644 (file)
--- a/youtube_dl/extractor/rtve.py
+++ b/youtube_dl/extractor/rtve.py
@@ -209,7 +209,10 @@ def _real_extract(self, url):
          title += ' ' + time.strftime('%Y-%m-%dZ%H%M%S', start_time)
  
          vidplayer_id = self._search_regex(
-            r'playerId=player([0-9]+)', webpage, 'internal video ID')
+            (r'playerId=player([0-9]+)',
+             r'class=["\'].*?\blive_mod\b.*?["\'][^>]+data-assetid=["\'](\d+)',
+             r'data-id=["\'](\d+)'),
+            webpage, 'internal video ID')
          png_url = 'http://www.rtve.es/ztnr/movil/thumbnail/amonet/videos/%s.png' % vidplayer_id
          png = self._download_webpage(png_url, video_id, 'Downloading url information')
          m3u8_url = _decrypt_url(png)
diff --git a/youtube_dl/extractor/rtvnh.py b/youtube_dl/extractor/rtvnh.py

index f6454c6b0082ed431fa74de49dd5881d3b0b7a0f..6a00f7007221e40fd06293e9c60d31409641d94c 100644 (file)
--- a/youtube_dl/extractor/rtvnh.py
+++ b/youtube_dl/extractor/rtvnh.py
@@ -14,7 +14,7 @@ class RTVNHIE(InfoExtractor):
              'id': '131946',
              'ext': 'mp4',
              'title': 'Grote zoektocht in zee bij Zandvoort naar vermiste vrouw',
-            'thumbnail': 're:^https?:.*\.jpg$'
+            'thumbnail': r're:^https?:.*\.jpg$'
          }
      }
  
diff --git a/youtube_dl/extractor/rudo.py b/youtube_dl/extractor/rudo.py

index 9a330c1961b75f662caa457fabe231e6aa4bcb8a..3bfe934d82c9db7fe7ff8b1b8820d4c96ca0cb32 100644 (file)
--- a/youtube_dl/extractor/rudo.py
+++ b/youtube_dl/extractor/rudo.py
@@ -28,7 +28,7 @@ class RudoIE(JWPlatformBaseIE):
      @classmethod
      def _extract_url(self, webpage):
          mobj = re.search(
-            '<iframe[^>]+src=(?P<q1>[\'"])(?P<url>(?:https?:)?//rudo\.video/vod/[0-9a-zA-Z]+)(?P=q1)',
+            r'<iframe[^>]+src=(?P<q1>[\'"])(?P<url>(?:https?:)?//rudo\.video/vod/[0-9a-zA-Z]+)(?P=q1)',
              webpage)
          if mobj:
              return mobj.group('url')
diff --git a/youtube_dl/extractor/ruhd.py b/youtube_dl/extractor/ruhd.py

index ce631b46c30bcd2eda03c798d61bed616f41e0b4..2b830cf477eef731caef1f2a6cddf10ef3efa14c 100644 (file)
--- a/youtube_dl/extractor/ruhd.py
+++ b/youtube_dl/extractor/ruhd.py
@@ -14,7 +14,7 @@ class RUHDIE(InfoExtractor):
              'ext': 'divx',
              'title': 'КОТ бааааам',
              'description': 'классный кот)',
-            'thumbnail': 're:^http://.*\.jpg$',
+            'thumbnail': r're:^http://.*\.jpg$',
          }
      }
  
diff --git a/youtube_dl/extractor/ruutu.py b/youtube_dl/extractor/ruutu.py

index 2fce4e81b7f44c4c70ff5e6e775a4743032a231b..20d01754a17998f90c64f33cf76693028dd57103 100644 (file)
--- a/youtube_dl/extractor/ruutu.py
+++ b/youtube_dl/extractor/ruutu.py
@@ -5,6 +5,7 @@
  from ..compat import compat_urllib_parse_urlparse
  from ..utils import (
      determine_ext,
+    ExtractorError,
      int_or_none,
      xpath_attr,
      xpath_text,
@@ -22,7 +23,7 @@ class RuutuIE(InfoExtractor):
                  'ext': 'mp4',
                  'title': 'Oletko aina halunnut tietää mitä tapahtuu vain hetki ennen lähetystä? - Nyt se selvisi!',
                  'description': 'md5:cfc6ccf0e57a814360df464a91ff67d6',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
                  'duration': 114,
                  'age_limit': 0,
              },
@@ -35,7 +36,7 @@ class RuutuIE(InfoExtractor):
                  'ext': 'mp4',
                  'title': 'Superpesis: katso koko kausi Ruudussa',
                  'description': 'md5:bfb7336df2a12dc21d18fa696c9f8f23',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
                  'duration': 40,
                  'age_limit': 0,
              },
@@ -48,7 +49,7 @@ class RuutuIE(InfoExtractor):
                  'ext': 'mp4',
                  'title': 'Osa 1: Mikael Jungner',
                  'description': 'md5:7d90f358c47542e3072ff65d7b1bcffe',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
                  'age_limit': 0,
              },
          },
@@ -80,6 +81,9 @@ def extract_formats(node):
                      elif ext == 'f4m':
                          formats.extend(self._extract_f4m_formats(
                              video_url, video_id, f4m_id='hds', fatal=False))
+                    elif ext == 'mpd':
+                        formats.extend(self._extract_mpd_formats(
+                            video_url, video_id, mpd_id='dash', fatal=False))
                      else:
                          proto = compat_urllib_parse_urlparse(video_url).scheme
                          if not child.tag.startswith('HTTP') and proto != 'rtmp':
@@ -101,6 +105,11 @@ def extract_formats(node):
                          })
  
          extract_formats(video_xml.find('./Clip'))
+
+        drm = xpath_text(video_xml, './Clip/DRM', default=None)
+        if not formats and drm:
+            raise ExtractorError('This video is DRM protected.', expected=True)
+
          self._sort_formats(formats)
  
          return {
diff --git a/youtube_dl/extractor/savefrom.py b/youtube_dl/extractor/savefrom.py

index 5b7367b94119792661506624264b121503cc6858..30f9cf8245856398239e88e5e8454c1df4bd8c3f 100644 (file)
--- a/youtube_dl/extractor/savefrom.py
+++ b/youtube_dl/extractor/savefrom.py
@@ -20,7 +20,7 @@ class SaveFromIE(InfoExtractor):
              'upload_date': '20120816',
              'uploader': 'Howcast',
              'uploader_id': 'Howcast',
-            'description': 're:(?s).* Hi, my name is Rene Dreifuss\. And I\'m here to show you some MMA.*',
+            'description': r're:(?s).* Hi, my name is Rene Dreifuss\. And I\'m here to show you some MMA.*',
          },
          'params': {
              'skip_download': True
diff --git a/youtube_dl/extractor/sbs.py b/youtube_dl/extractor/sbs.py

index 43131fb7e5ce82d69d25bf639ce6c2bffe35182a..845712a7640afe9f675757c2f711830c9c79a00f 100644 (file)
--- a/youtube_dl/extractor/sbs.py
+++ b/youtube_dl/extractor/sbs.py
@@ -22,7 +22,7 @@ class SBSIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Dingo Conservation (The Feed)',
              'description': 'md5:f250a9856fca50d22dec0b5b8015f8a5',
-            'thumbnail': 're:http://.*\.jpg',
+            'thumbnail': r're:http://.*\.jpg',
              'duration': 308,
              'timestamp': 1408613220,
              'upload_date': '20140821',
diff --git a/youtube_dl/extractor/screencast.py b/youtube_dl/extractor/screencast.py

index ed9de964841e52c1e5753556d6b9e53339ba23c3..62a6a8337ccf5d247a38be29cf93b5b72f36dfd7 100644 (file)
--- a/youtube_dl/extractor/screencast.py
+++ b/youtube_dl/extractor/screencast.py
@@ -21,7 +21,7 @@ class ScreencastIE(InfoExtractor):
              'ext': 'm4v',
              'title': 'Color Measurement with Ocean Optics Spectrometers',
              'description': 'md5:240369cde69d8bed61349a199c5fb153',
-            'thumbnail': 're:^https?://.*\.(?:gif|jpg)$',
+            'thumbnail': r're:^https?://.*\.(?:gif|jpg)$',
          }
      }, {
          'url': 'http://www.screencast.com/t/V2uXehPJa1ZI',
@@ -31,7 +31,7 @@ class ScreencastIE(InfoExtractor):
              'ext': 'mov',
              'title': 'The Amadeus Spectrometer',
              'description': 're:^In this video, our friends at.*To learn more about Amadeus, visit',
-            'thumbnail': 're:^https?://.*\.(?:gif|jpg)$',
+            'thumbnail': r're:^https?://.*\.(?:gif|jpg)$',
          }
      }, {
          'url': 'http://www.screencast.com/t/aAB3iowa',
@@ -41,7 +41,7 @@ class ScreencastIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Google Earth Export',
              'description': 'Provides a demo of a CommunityViz export to Google Earth, one of the 3D viewing options.',
-            'thumbnail': 're:^https?://.*\.(?:gif|jpg)$',
+            'thumbnail': r're:^https?://.*\.(?:gif|jpg)$',
          }
      }, {
          'url': 'http://www.screencast.com/t/X3ddTrYh',
@@ -51,7 +51,7 @@ class ScreencastIE(InfoExtractor):
              'ext': 'wmv',
              'title': 'Toolkit 6 User Group Webinar (2014-03-04) - Default Judgment and First Impression',
              'description': 'md5:7b9f393bc92af02326a5c5889639eab0',
-            'thumbnail': 're:^https?://.*\.(?:gif|jpg)$',
+            'thumbnail': r're:^https?://.*\.(?:gif|jpg)$',
          }
      }, {
          'url': 'http://screencast.com/t/aAB3iowa',
diff --git a/youtube_dl/extractor/screencastomatic.py b/youtube_dl/extractor/screencastomatic.py

index 7a88a42cd84dbfd9f343567dffb5f462c10329b7..94a2a37d20696fa3ffc65b6f1df04cab42c7785d 100644 (file)
--- a/youtube_dl/extractor/screencastomatic.py
+++ b/youtube_dl/extractor/screencastomatic.py
@@ -14,7 +14,7 @@ class ScreencastOMaticIE(JWPlatformBaseIE):
              'id': 'c2lD3BeOPl',
              'ext': 'mp4',
              'title': 'Welcome to 3-4 Philosophy @ DECV!',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'description': 'as the title says! also: some general info re 1) VCE philosophy and 2) distance learning.',
              'duration': 369.163,
          }
diff --git a/youtube_dl/extractor/screenjunkies.py b/youtube_dl/extractor/screenjunkies.py

deleted file mode 100644 (file)

index 02e574c..0000000
--- a/youtube_dl/extractor/screenjunkies.py
+++ /dev/null
@@ -1,138 +0,0 @@
-from __future__ import unicode_literals
-
-import re
-
-from .common import InfoExtractor
-from ..compat import compat_str
-from ..utils import (
-    int_or_none,
-    parse_age_limit,
-)
-
-
-class ScreenJunkiesIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?screenjunkies\.com/video/(?P<display_id>[^/]+?)(?:-(?P<id>\d+))?(?:[/?#&]|$)'
-    _TESTS = [{
-        'url': 'http://www.screenjunkies.com/video/best-quentin-tarantino-movie-2841915',
-        'md5': '5c2b686bec3d43de42bde9ec047536b0',
-        'info_dict': {
-            'id': '2841915',
-            'display_id': 'best-quentin-tarantino-movie',
-            'ext': 'mp4',
-            'title': 'Best Quentin Tarantino Movie',
-            'thumbnail': 're:^https?://.*\.jpg',
-            'duration': 3671,
-            'age_limit': 13,
-            'tags': list,
-        },
-    }, {
-        'url': 'http://www.screenjunkies.com/video/honest-trailers-the-dark-knight',
-        'info_dict': {
-            'id': '2348808',
-            'display_id': 'honest-trailers-the-dark-knight',
-            'ext': 'mp4',
-            'title': "Honest Trailers: 'The Dark Knight'",
-            'thumbnail': 're:^https?://.*\.jpg',
-            'age_limit': 10,
-            'tags': list,
-        },
-    }, {
-        # requires subscription but worked around
-        'url': 'http://www.screenjunkies.com/video/knocking-dead-ep-1-the-show-so-far-3003285',
-        'info_dict': {
-            'id': '3003285',
-            'display_id': 'knocking-dead-ep-1-the-show-so-far',
-            'ext': 'mp4',
-            'title': 'Knocking Dead Ep 1: State of The Dead Recap',
-            'thumbnail': 're:^https?://.*\.jpg',
-            'duration': 3307,
-            'age_limit': 13,
-            'tags': list,
-        },
-    }]
-
-    _DEFAULT_BITRATES = (48, 150, 496, 864, 2240)
-
-    def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
-        display_id = mobj.group('display_id')
-
-        if not video_id:
-            webpage = self._download_webpage(url, display_id)
-            video_id = self._search_regex(
-                (r'src=["\']/embed/(\d+)', r'data-video-content-id=["\'](\d+)'),
-                webpage, 'video id')
-
-        webpage = self._download_webpage(
-            'http://www.screenjunkies.com/embed/%s' % video_id,
-            display_id, 'Downloading video embed page')
-        embed_vars = self._parse_json(
-            self._search_regex(
-                r'(?s)embedVars\s*=\s*({.+?})\s*</script>', webpage, 'embed vars'),
-            display_id)
-
-        title = embed_vars['contentName']
-
-        formats = []
-        bitrates = []
-        for f in embed_vars.get('media', []):
-            if not f.get('uri') or f.get('mediaPurpose') != 'play':
-                continue
-            bitrate = int_or_none(f.get('bitRate'))
-            if bitrate:
-                bitrates.append(bitrate)
-            formats.append({
-                'url': f['uri'],
-                'format_id': 'http-%d' % bitrate if bitrate else 'http',
-                'width': int_or_none(f.get('width')),
-                'height': int_or_none(f.get('height')),
-                'tbr': bitrate,
-                'format': 'mp4',
-            })
-
-        if not bitrates:
-            # When subscriptionLevel > 0, i.e. plus subscription is required
-            # media list will be empty. However, hds and hls uris are still
-            # available. We can grab them assuming bitrates to be default.
-            bitrates = self._DEFAULT_BITRATES
-
-        auth_token = embed_vars.get('AuthToken')
-
-        def construct_manifest_url(base_url, ext):
-            pieces = [base_url]
-            pieces.extend([compat_str(b) for b in bitrates])
-            pieces.append('_kbps.mp4.%s?%s' % (ext, auth_token))
-            return ','.join(pieces)
-
-        if bitrates and auth_token:
-            hds_url = embed_vars.get('hdsUri')
-            if hds_url:
-                f4m_formats = self._extract_f4m_formats(
-                    construct_manifest_url(hds_url, 'f4m'),
-                    display_id, f4m_id='hds', fatal=False)
-                if len(f4m_formats) == len(bitrates):
-                    for f, bitrate in zip(f4m_formats, bitrates):
-                        if not f.get('tbr'):
-                            f['format_id'] = 'hds-%d' % bitrate
-                            f['tbr'] = bitrate
-                # TODO: fix f4m downloader to handle manifests without bitrates if possible
-                # formats.extend(f4m_formats)
-
-            hls_url = embed_vars.get('hlsUri')
-            if hls_url:
-                formats.extend(self._extract_m3u8_formats(
-                    construct_manifest_url(hls_url, 'm3u8'),
-                    display_id, 'mp4', entry_protocol='m3u8_native', m3u8_id='hls', fatal=False))
-        self._sort_formats(formats)
-
-        return {
-            'id': video_id,
-            'display_id': display_id,
-            'title': title,
-            'thumbnail': embed_vars.get('thumbUri'),
-            'duration': int_or_none(embed_vars.get('videoLengthInSeconds')) or None,
-            'age_limit': parse_age_limit(embed_vars.get('audienceRating')),
-            'tags': embed_vars.get('tags', '').split(','),
-            'formats': formats,
-        }
diff --git a/youtube_dl/extractor/screenwavemedia.py b/youtube_dl/extractor/screenwavemedia.py

deleted file mode 100644 (file)

index 7d77e88..0000000
--- a/youtube_dl/extractor/screenwavemedia.py
+++ /dev/null
@@ -1,146 +0,0 @@
-# coding: utf-8
-from __future__ import unicode_literals
-
-import re
-
-from .common import InfoExtractor
-from ..utils import (
-    int_or_none,
-    unified_strdate,
-    js_to_json,
-)
-
-
-class ScreenwaveMediaIE(InfoExtractor):
-    _VALID_URL = r'(?:https?:)?//player\d?\.screenwavemedia\.com/(?:play/)?[a-zA-Z]+\.php\?.*\bid=(?P<id>[A-Za-z0-9-]+)'
-    EMBED_PATTERN = r'src=(["\'])(?P<url>(?:https?:)?//player\d?\.screenwavemedia\.com/(?:play/)?[a-zA-Z]+\.php\?.*\bid=.+?)\1'
-    _TESTS = [{
-        'url': 'http://player.screenwavemedia.com/play/play.php?playerdiv=videoarea&companiondiv=squareAd&id=Cinemassacre-19911',
-        'only_matching': True,
-    }]
-
-    def _real_extract(self, url):
-        video_id = self._match_id(url)
-
-        playerdata = self._download_webpage(
-            'http://player.screenwavemedia.com/player.php?id=%s' % video_id,
-            video_id, 'Downloading player webpage')
-
-        vidtitle = self._search_regex(
-            r'\'vidtitle\'\s*:\s*"([^"]+)"', playerdata, 'vidtitle').replace('\\/', '/')
-
-        playerconfig = self._download_webpage(
-            'http://player.screenwavemedia.com/player.js',
-            video_id, 'Downloading playerconfig webpage')
-
-        videoserver = self._search_regex(r'SWMServer\s*=\s*"([\d\.]+)"', playerdata, 'videoserver')
-
-        sources = self._parse_json(
-            js_to_json(
-                re.sub(
-                    r'(?s)/\*.*?\*/', '',
-                    self._search_regex(
-                        r'sources\s*:\s*(\[[^\]]+?\])', playerconfig,
-                        'sources',
-                    ).replace(
-                        "' + thisObj.options.videoserver + '",
-                        videoserver
-                    ).replace(
-                        "' + playerVidId + '",
-                        video_id
-                    )
-                )
-            ),
-            video_id, fatal=False
-        )
-
-        # Fallback to hardcoded sources if JS changes again
-        if not sources:
-            self.report_warning('Falling back to a hardcoded list of streams')
-            sources = [{
-                'file': 'http://%s/vod/%s_%s.mp4' % (videoserver, video_id, format_id),
-                'type': 'mp4',
-                'label': format_label,
-            } for format_id, format_label in (
-                ('low', '144p Low'), ('med', '160p Med'), ('high', '360p High'), ('hd1', '720p HD1'))]
-            sources.append({
-                'file': 'http://%s/vod/smil:%s.smil/playlist.m3u8' % (videoserver, video_id),
-                'type': 'hls',
-            })
-
-        formats = []
-        for source in sources:
-            file_ = source.get('file')
-            if not file_:
-                continue
-            if source.get('type') == 'hls':
-                formats.extend(self._extract_m3u8_formats(file_, video_id, ext='mp4'))
-            else:
-                format_id = self._search_regex(
-                    r'_(.+?)\.[^.]+$', file_, 'format id', default=None)
-                if not self._is_valid_url(file_, video_id, format_id or 'video'):
-                    continue
-                format_label = source.get('label')
-                height = int_or_none(self._search_regex(
-                    r'^(\d+)[pP]', format_label, 'height', default=None))
-                formats.append({
-                    'url': file_,
-                    'format_id': format_id,
-                    'format': format_label,
-                    'ext': source.get('type'),
-                    'height': height,
-                })
-        self._sort_formats(formats, field_preference=('height', 'width', 'tbr', 'format_id'))
-
-        return {
-            'id': video_id,
-            'title': vidtitle,
-            'formats': formats,
-        }
-
-
-class TeamFourIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?teamfourstar\.com/video/(?P<id>[a-z0-9\-]+)/?'
-    _TEST = {
-        'url': 'http://teamfourstar.com/video/a-moment-with-tfs-episode-4/',
-        'info_dict': {
-            'id': 'TeamFourStar-5292a02f20bfa',
-            'ext': 'mp4',
-            'upload_date': '20130401',
-            'description': 'Check out this and more on our website: http://teamfourstar.com\nTFS Store: http://sharkrobot.com/team-four-star\nFollow on Twitter: http://twitter.com/teamfourstar\nLike on FB: http://facebook.com/teamfourstar',
-            'title': 'A Moment With TFS Episode 4',
-        },
-        'params': {
-            # m3u8 download
-            'skip_download': True,
-        },
-    }
-
-    def _real_extract(self, url):
-        display_id = self._match_id(url)
-        webpage = self._download_webpage(url, display_id)
-
-        playerdata_url = self._search_regex(
-            r'src="(http://player\d?\.screenwavemedia\.com/(?:play/)?[a-zA-Z]+\.php\?[^"]*\bid=.+?)"',
-            webpage, 'player data URL')
-
-        video_title = self._html_search_regex(
-            r'<div class="heroheadingtitle">(?P<title>.+?)</div>',
-            webpage, 'title')
-        video_date = unified_strdate(self._html_search_regex(
-            r'<div class="heroheadingdate">(?P<date>.+?)</div>',
-            webpage, 'date', fatal=False))
-        video_description = self._html_search_regex(
-            r'(?s)<div class="postcontent">(?P<description>.+?)</div>',
-            webpage, 'description', fatal=False)
-        video_thumbnail = self._og_search_thumbnail(webpage)
-
-        return {
-            '_type': 'url_transparent',
-            'display_id': display_id,
-            'title': video_title,
-            'description': video_description,
-            'upload_date': video_date,
-            'thumbnail': video_thumbnail,
-            'url': playerdata_url,
-        }
diff --git a/youtube_dl/extractor/senateisvp.py b/youtube_dl/extractor/senateisvp.py

index 35540c082ef2f7c4d6fa9cf9ce8acf404bc33a8c..387a4f7f6952adcb6d1954106ce580f44cde6e6f 100644 (file)
--- a/youtube_dl/extractor/senateisvp.py
+++ b/youtube_dl/extractor/senateisvp.py
@@ -55,7 +55,7 @@ class SenateISVPIE(InfoExtractor):
              'id': 'judiciary031715',
              'ext': 'mp4',
              'title': 'Integrated Senate Video Player',
-            'thumbnail': 're:^https?://.*\.(?:jpg|png)$',
+            'thumbnail': r're:^https?://.*\.(?:jpg|png)$',
          },
          'params': {
              # m3u8 download
diff --git a/youtube_dl/extractor/sendtonews.py b/youtube_dl/extractor/sendtonews.py

index 2dbe490bba7717a7719290113f26ed5c795ae218..9880a5a78c1f4b18d41c55e1899405dbdb98e7dc 100644 (file)
--- a/youtube_dl/extractor/sendtonews.py
+++ b/youtube_dl/extractor/sendtonews.py
@@ -8,6 +8,9 @@
      float_or_none,
      parse_iso8601,
      update_url_query,
+    int_or_none,
+    determine_protocol,
+    unescapeHTML,
  )
  
  
@@ -20,18 +23,18 @@ class SendtoNewsIE(JWPlatformBaseIE):
          'info_dict': {
              'id': 'GxfCe0Zo7D-175909-5588'
          },
-        'playlist_count': 9,
+        'playlist_count': 8,
          # test the first video only to prevent lengthy tests
          'playlist': [{
              'info_dict': {
-                'id': '198180',
+                'id': '240385',
                  'ext': 'mp4',
-                'title': 'Recap: CLE 5, LAA 4',
-                'description': '8/14/16: Naquin, Almonte lead Indians in 5-4 win',
-                'duration': 57.343,
-                'thumbnail': 're:https?://.*\.jpg$',
-                'upload_date': '20160815',
-                'timestamp': 1471221961,
+                'title': 'Indians introduce Encarnacion',
+                'description': 'Indians president of baseball operations Chris Antonetti and Edwin Encarnacion discuss the slugger\'s three-year contract with Cleveland',
+                'duration': 137.898,
+                'thumbnail': r're:https?://.*\.jpg$',
+                'upload_date': '20170105',
+                'timestamp': 1483649762,
              },
          }],
          'params': {
@@ -64,7 +67,20 @@ def _real_extract(self, url):
          for video in playlist_data['playlistData'][0]:
              info_dict = self._parse_jwplayer_data(
                  video['jwconfiguration'],
-                require_title=False, rtmp_params={'no_resume': True})
+                require_title=False, m3u8_id='hls', rtmp_params={'no_resume': True})
+
+            for f in info_dict['formats']:
+                if f.get('tbr'):
+                    continue
+                tbr = int_or_none(self._search_regex(
+                    r'/(\d+)k/', f['url'], 'bitrate', default=None))
+                if not tbr:
+                    continue
+                f.update({
+                    'format_id': '%s-%d' % (determine_protocol(f), tbr),
+                    'tbr': tbr,
+                })
+            self._sort_formats(info_dict['formats'], ('tbr', 'height', 'width', 'format_id'))
  
              thumbnails = []
              if video.get('thumbnailUrl'):
@@ -78,8 +94,8 @@ def _real_extract(self, url):
                      'url': video['smThumbnailUrl'],
                  })
              info_dict.update({
-                'title': video['S_headLine'],
-                'description': video.get('S_fullStory'),
+                'title': video['S_headLine'].strip(),
+                'description': unescapeHTML(video.get('S_fullStory')),
                  'thumbnails': thumbnails,
                  'duration': float_or_none(video.get('SM_length')),
                  'timestamp': parse_iso8601(video.get('S_sysDate'), delimiter=' '),
diff --git a/youtube_dl/extractor/sexu.py b/youtube_dl/extractor/sexu.py

index a99b2a8e7be1bc9de8a01d6ae2de6fb36055703c..5e22ea73029b3d254d76fa0722c8041daa17a6fd 100644 (file)
--- a/youtube_dl/extractor/sexu.py
+++ b/youtube_dl/extractor/sexu.py
@@ -14,7 +14,7 @@ class SexuIE(InfoExtractor):
              'title': 'md5:4d05a19a5fc049a63dbbaf05fb71d91b',
              'description': 'md5:2b75327061310a3afb3fbd7d09e2e403',
              'categories': list,  # NSFW
-            'thumbnail': 're:https?://.*\.jpg$',
+            'thumbnail': r're:https?://.*\.jpg$',
              'age_limit': 18,
          }
      }
diff --git a/youtube_dl/extractor/sharesix.py b/youtube_dl/extractor/sharesix.py

deleted file mode 100644 (file)

index 9cce5ce..0000000
--- a/youtube_dl/extractor/sharesix.py
+++ /dev/null
@@ -1,91 +0,0 @@
-# coding: utf-8
-from __future__ import unicode_literals
-
-import re
-
-from .common import InfoExtractor
-from ..utils import (
-    parse_duration,
-    sanitized_Request,
-    urlencode_postdata,
-)
-
-
-class ShareSixIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?sharesix\.com/(?:f/)?(?P<id>[0-9a-zA-Z]+)'
-    _TESTS = [
-        {
-            'url': 'http://sharesix.com/f/OXjQ7Y6',
-            'md5': '9e8e95d8823942815a7d7c773110cc93',
-            'info_dict': {
-                'id': 'OXjQ7Y6',
-                'ext': 'mp4',
-                'title': 'big_buck_bunny_480p_surround-fix.avi',
-                'duration': 596,
-                'width': 854,
-                'height': 480,
-            },
-        },
-        {
-            'url': 'http://sharesix.com/lfrwoxp35zdd',
-            'md5': 'dd19f1435b7cec2d7912c64beeee8185',
-            'info_dict': {
-                'id': 'lfrwoxp35zdd',
-                'ext': 'flv',
-                'title': 'WhiteBoard___a_Mac_vs_PC_Parody_Cartoon.mp4.flv',
-                'duration': 65,
-                'width': 1280,
-                'height': 720,
-            },
-        }
-    ]
-
-    def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
-
-        fields = {
-            'method_free': 'Free'
-        }
-        post = urlencode_postdata(fields)
-        req = sanitized_Request(url, post)
-        req.add_header('Content-type', 'application/x-www-form-urlencoded')
-
-        webpage = self._download_webpage(req, video_id,
-                                         'Downloading video page')
-
-        video_url = self._search_regex(
-            r"var\slnk1\s=\s'([^']+)'", webpage, 'video URL')
-        title = self._html_search_regex(
-            r'(?s)<dt>Filename:</dt>.+?<dd>(.+?)</dd>', webpage, 'title')
-        duration = parse_duration(
-            self._search_regex(
-                r'(?s)<dt>Length:</dt>.+?<dd>(.+?)</dd>',
-                webpage,
-                'duration',
-                fatal=False
-            )
-        )
-
-        m = re.search(
-            r'''(?xs)<dt>Width\sx\sHeight</dt>.+?
-                     <dd>(?P<width>\d+)\sx\s(?P<height>\d+)</dd>''',
-            webpage
-        )
-        width = height = None
-        if m:
-            width, height = int(m.group('width')), int(m.group('height'))
-
-        formats = [{
-            'format_id': 'sd',
-            'url': video_url,
-            'width': width,
-            'height': height,
-        }]
-
-        return {
-            'id': video_id,
-            'title': title,
-            'duration': duration,
-            'formats': formats,
-        }
diff --git a/youtube_dl/extractor/showroomlive.py b/youtube_dl/extractor/showroomlive.py

new file mode 100644 (file)

index 0000000..efd9d56
--- /dev/null
+++ b/youtube_dl/extractor/showroomlive.py
@@ -0,0 +1,84 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..compat import compat_str
+from ..utils import (
+    ExtractorError,
+    int_or_none,
+    urljoin,
+)
+
+
+class ShowRoomLiveIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?showroom-live\.com/(?!onlive|timetable|event|campaign|news|ranking|room)(?P<id>[^/?#&]+)'
+    _TEST = {
+        'url': 'https://www.showroom-live.com/48_Nana_Okada',
+        'only_matching': True,
+    }
+
+    def _real_extract(self, url):
+        broadcaster_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, broadcaster_id)
+
+        room_id = self._search_regex(
+            (r'SrGlobal\.roomId\s*=\s*(\d+)',
+             r'(?:profile|room)\?room_id\=(\d+)'), webpage, 'room_id')
+
+        room = self._download_json(
+            urljoin(url, '/api/room/profile?room_id=%s' % room_id),
+            broadcaster_id)
+
+        is_live = room.get('is_onlive')
+        if is_live is not True:
+            raise ExtractorError('%s is offline' % broadcaster_id, expected=True)
+
+        uploader = room.get('performer_name') or broadcaster_id
+        title = room.get('room_name') or room.get('main_name') or uploader
+
+        streaming_url_list = self._download_json(
+            urljoin(url, '/api/live/streaming_url?room_id=%s' % room_id),
+            broadcaster_id)['streaming_url_list']
+
+        formats = []
+        for stream in streaming_url_list:
+            stream_url = stream.get('url')
+            if not stream_url:
+                continue
+            stream_type = stream.get('type')
+            if stream_type == 'hls':
+                m3u8_formats = self._extract_m3u8_formats(
+                    stream_url, broadcaster_id, ext='mp4', m3u8_id='hls',
+                    live=True)
+                for f in m3u8_formats:
+                    f['quality'] = int_or_none(stream.get('quality', 100))
+                formats.extend(m3u8_formats)
+            elif stream_type == 'rtmp':
+                stream_name = stream.get('stream_name')
+                if not stream_name:
+                    continue
+                formats.append({
+                    'url': stream_url,
+                    'play_path': stream_name,
+                    'page_url': url,
+                    'player_url': 'https://www.showroom-live.com/assets/swf/v3/ShowRoomLive.swf',
+                    'rtmp_live': True,
+                    'ext': 'flv',
+                    'format_id': 'rtmp',
+                    'format_note': stream.get('label'),
+                    'quality': int_or_none(stream.get('quality', 100)),
+                })
+        self._sort_formats(formats)
+
+        return {
+            'id': compat_str(room.get('live_id') or broadcaster_id),
+            'title': self._live_title(title),
+            'description': room.get('description'),
+            'timestamp': int_or_none(room.get('current_live_started_at')),
+            'uploader': uploader,
+            'uploader_id': broadcaster_id,
+            'view_count': int_or_none(room.get('view_num')),
+            'formats': formats,
+            'is_live': True,
+        }
diff --git a/youtube_dl/extractor/skysports.py b/youtube_dl/extractor/skysports.py

index 9dc78c7d2b27748fd2e1c083e7b265448cbff500..4ca9f6b3c811f59ef11eb82d173554341f3ab66d 100644 (file)
--- a/youtube_dl/extractor/skysports.py
+++ b/youtube_dl/extractor/skysports.py
@@ -2,18 +2,19 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
+from ..utils import strip_or_none
  
  
  class SkySportsIE(InfoExtractor):
      _VALID_URL = r'https?://(?:www\.)?skysports\.com/watch/video/(?P<id>[0-9]+)'
      _TEST = {
          'url': 'http://www.skysports.com/watch/video/10328419/bale-its-our-time-to-shine',
-        'md5': 'c44a1db29f27daf9a0003e010af82100',
+        'md5': '77d59166cddc8d3cb7b13e35eaf0f5ec',
          'info_dict': {
              'id': '10328419',
-            'ext': 'flv',
-            'title': 'Bale: Its our time to shine',
-            'description': 'md5:9fd1de3614d525f5addda32ac3c482c9',
+            'ext': 'mp4',
+            'title': 'Bale: It\'s our time to shine',
+            'description': 'md5:e88bda94ae15f7720c5cb467e777bb6d',
          },
          'add_ie': ['Ooyala'],
      }
@@ -28,6 +29,6 @@ def _real_extract(self, url):
              'url': 'ooyala:%s' % self._search_regex(
                  r'data-video-id="([^"]+)"', webpage, 'ooyala id'),
              'title': self._og_search_title(webpage),
-            'description': self._og_search_description(webpage),
+            'description': strip_or_none(self._og_search_description(webpage)),
              'ie_key': 'Ooyala',
          }
diff --git a/youtube_dl/extractor/slutload.py b/youtube_dl/extractor/slutload.py

index 18cc7721e142c7493bbebdfcb59f621e3fedaf4f..7145d285a0244acdbdb58dfbf330d2c75391dea3 100644 (file)
--- a/youtube_dl/extractor/slutload.py
+++ b/youtube_dl/extractor/slutload.py
@@ -13,7 +13,7 @@ class SlutloadIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'virginie baisee en cam',
              'age_limit': 18,
-            'thumbnail': 're:https?://.*?\.jpg'
+            'thumbnail': r're:https?://.*?\.jpg'
          }
      }
  
diff --git a/youtube_dl/extractor/smotri.py b/youtube_dl/extractor/smotri.py

index def46abda45c5d4899f3c3e5a3fb775592efdfa6..370fa887968128281a6286f78a1fdf4bf59f7b9f 100644 (file)
--- a/youtube_dl/extractor/smotri.py
+++ b/youtube_dl/extractor/smotri.py
@@ -81,7 +81,7 @@ class SmotriIE(InfoExtractor):
                  'uploader': 'psavari1',
                  'uploader_id': 'psavari1',
                  'upload_date': '20081103',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
              },
              'params': {
                  'videopassword': '223322',
@@ -117,7 +117,7 @@ class SmotriIE(InfoExtractor):
                  'uploader': 'вАся',
                  'uploader_id': 'asya_prosto',
                  'upload_date': '20081218',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
                  'age_limit': 18,
              },
              'params': {
diff --git a/youtube_dl/extractor/snotr.py b/youtube_dl/extractor/snotr.py

index 4819fe5b4b6322cc02e9e1fdd4c128cbe28e55b0..f773547483fbf7828118b7b3ff2e537e05b9628c 100644 (file)
--- a/youtube_dl/extractor/snotr.py
+++ b/youtube_dl/extractor/snotr.py
@@ -22,7 +22,7 @@ class SnotrIE(InfoExtractor):
              'duration': 248,
              'filesize_approx': 40700000,
              'description': 'A drone flying through Fourth of July Fireworks',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          },
          'expected_warnings': ['description'],
      }, {
@@ -34,7 +34,7 @@ class SnotrIE(InfoExtractor):
              'duration': 126,
              'filesize_approx': 8500000,
              'description': 'The top 10 George W. Bush moments, brought to you by David Letterman!',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          }
      }]
  
diff --git a/youtube_dl/extractor/soundcloud.py b/youtube_dl/extractor/soundcloud.py

index 3b7ecb3c343291e3fec8af451b4bb2bc3dde9fae..b3aa4ce26ab95933b40f3606c86b8ae6cefc531b 100644 (file)
--- a/youtube_dl/extractor/soundcloud.py
+++ b/youtube_dl/extractor/soundcloud.py
@@ -121,7 +121,7 @@ class SoundcloudIE(InfoExtractor):
          },
      ]
  
-    _CLIENT_ID = '02gUJC0hH2ct1EGOcYXQIzRFU91c72Ea'
+    _CLIENT_ID = 'fDoItMDbsbZz8dY16ZzARCZmzgHBPotA'
      _IPHONE_CLIENT_ID = '376f225bf427445fc4bfb6b99b72e0bf'
  
      @staticmethod
@@ -173,46 +173,54 @@ def _extract_info_dict(self, info, full_title=None, quiet=False, secret_token=No
              })
  
          # We have to retrieve the url
-        streams_url = ('http://api.soundcloud.com/i1/tracks/{0}/streams?'
-                       'client_id={1}&secret_token={2}'.format(track_id, self._IPHONE_CLIENT_ID, secret_token))
          format_dict = self._download_json(
-            streams_url,
-            track_id, 'Downloading track url')
+            'http://api.soundcloud.com/i1/tracks/%s/streams' % track_id,
+            track_id, 'Downloading track url', query={
+                'client_id': self._CLIENT_ID,
+                'secret_token': secret_token,
+            })
  
          for key, stream_url in format_dict.items():
+            abr = int_or_none(self._search_regex(
+                r'_(\d+)_url', key, 'audio bitrate', default=None))
              if key.startswith('http'):
-                formats.append({
+                stream_formats = [{
                      'format_id': key,
                      'ext': ext,
                      'url': stream_url,
-                    'vcodec': 'none',
-                })
+                }]
              elif key.startswith('rtmp'):
                  # The url doesn't have an rtmp app, we have to extract the playpath
                  url, path = stream_url.split('mp3:', 1)
-                formats.append({
+                stream_formats = [{
                      'format_id': key,
                      'url': url,
                      'play_path': 'mp3:' + path,
                      'ext': 'flv',
-                    'vcodec': 'none',
-                })
-
-            if not formats:
-                # We fallback to the stream_url in the original info, this
-                # cannot be always used, sometimes it can give an HTTP 404 error
-                formats.append({
-                    'format_id': 'fallback',
-                    'url': info['stream_url'] + '?client_id=' + self._CLIENT_ID,
-                    'ext': ext,
-                    'vcodec': 'none',
-                })
-
-            for f in formats:
-                if f['format_id'].startswith('http'):
-                    f['protocol'] = 'http'
-                if f['format_id'].startswith('rtmp'):
-                    f['protocol'] = 'rtmp'
+                }]
+            elif key.startswith('hls'):
+                stream_formats = self._extract_m3u8_formats(
+                    stream_url, track_id, 'mp3', entry_protocol='m3u8_native',
+                    m3u8_id=key, fatal=False)
+            else:
+                continue
+
+            for f in stream_formats:
+                f['abr'] = abr
+
+            formats.extend(stream_formats)
+
+        if not formats:
+            # We fallback to the stream_url in the original info, this
+            # cannot be always used, sometimes it can give an HTTP 404 error
+            formats.append({
+                'format_id': 'fallback',
+                'url': info['stream_url'] + '?client_id=' + self._CLIENT_ID,
+                'ext': ext,
+            })
+
+        for f in formats:
+            f['vcodec'] = 'none'
  
          self._check_formats(formats, track_id)
          self._sort_formats(formats)
diff --git a/youtube_dl/extractor/soundgasm.py b/youtube_dl/extractor/soundgasm.py

index 3a4ddf57ea369a0b250a4d786738e0ea4db9e1dd..e004e2c5ab12705c8d9ff5e12b25f53579539c72 100644 (file)
--- a/youtube_dl/extractor/soundgasm.py
+++ b/youtube_dl/extractor/soundgasm.py
@@ -27,7 +27,7 @@ def _real_extract(self, url):
          webpage = self._download_webpage(url, display_id)
          audio_url = self._html_search_regex(
              r'(?s)m4a\:\s"([^"]+)"', webpage, 'audio URL')
-        audio_id = re.split('\/|\.', audio_url)[-2]
+        audio_id = re.split(r'\/|\.', audio_url)[-2]
          description = self._html_search_regex(
              r'(?s)<li>Description:\s(.*?)<\/li>', webpage, 'description',
              fatal=False)
diff --git a/youtube_dl/extractor/southpark.py b/youtube_dl/extractor/southpark.py

index 08f8c5744a84dffda03904afd30d44cac42f2917..d8ce416fc7d1a9ec2e3561752890d916f2bcf93a 100644 (file)
--- a/youtube_dl/extractor/southpark.py
+++ b/youtube_dl/extractor/southpark.py
@@ -6,7 +6,7 @@
  
  class SouthParkIE(MTVServicesInfoExtractor):
      IE_NAME = 'southpark.cc.com'
-    _VALID_URL = r'https?://(?:www\.)?(?P<url>southpark\.cc\.com/(?:clips|full-episodes)/(?P<id>.+?)(\?|#|$))'
+    _VALID_URL = r'https?://(?:www\.)?(?P<url>southpark\.cc\.com/(?:clips|(?:full-)?episodes)/(?P<id>.+?)(\?|#|$))'
  
      _FEED_URL = 'http://www.southparkstudios.com/feeds/video-player/mrss'
  
@@ -75,7 +75,7 @@ class SouthParkDeIE(SouthParkIE):
  
  class SouthParkNlIE(SouthParkIE):
      IE_NAME = 'southpark.nl'
-    _VALID_URL = r'https?://(?:www\.)?(?P<url>southpark\.nl/(?:clips|full-episodes)/(?P<id>.+?)(\?|#|$))'
+    _VALID_URL = r'https?://(?:www\.)?(?P<url>southpark\.nl/(?:clips|(?:full-)?episodes)/(?P<id>.+?)(\?|#|$))'
      _FEED_URL = 'http://www.southpark.nl/feeds/video-player/mrss/'
  
      _TESTS = [{
diff --git a/youtube_dl/extractor/spankbang.py b/youtube_dl/extractor/spankbang.py

index 186d22b7d1608b01bb0a3d45082403e6a58bb05e..123c33ac36e275d8b624c8830153235c3a4ef338 100644 (file)
--- a/youtube_dl/extractor/spankbang.py
+++ b/youtube_dl/extractor/spankbang.py
@@ -15,7 +15,7 @@ class SpankBangIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'fantasy solo',
              'description': 'Watch fantasy solo free HD porn video - 05 minutes - dillion harper masturbates on a bed free adult movies.',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'uploader': 'silly2587',
              'age_limit': 18,
          }
diff --git a/youtube_dl/extractor/spankwire.py b/youtube_dl/extractor/spankwire.py

index 92a7120a3242e732ceb58f51b4391a5efbc569d8..44d8fa52f3071ca00971624db81ce4ad6b2141e3 100644 (file)
--- a/youtube_dl/extractor/spankwire.py
+++ b/youtube_dl/extractor/spankwire.py
@@ -85,7 +85,7 @@ def _real_extract(self, url):
              r'playerData\.cdnPath([0-9]{3,})\s*=\s*(?:encodeURIComponent\()?["\']([^"\']+)["\']', webpage)
          heights = [int(video[0]) for video in videos]
          video_urls = list(map(compat_urllib_parse_unquote, [video[1] for video in videos]))
-        if webpage.find('flashvars\.encrypted = "true"') != -1:
+        if webpage.find(r'flashvars\.encrypted = "true"') != -1:
              password = self._search_regex(
                  r'flashvars\.video_title = "([^"]+)',
                  webpage, 'password').replace('+', ' ')
diff --git a/youtube_dl/extractor/spiegeltv.py b/youtube_dl/extractor/spiegeltv.py

index 034bd47ff617bdc96d572b7065b3af03c7117468..e1cfb869834cf0d50b04a11c5fc137d7c9afcad8 100644 (file)
--- a/youtube_dl/extractor/spiegeltv.py
+++ b/youtube_dl/extractor/spiegeltv.py
@@ -18,7 +18,7 @@ class SpiegeltvIE(InfoExtractor):
              'ext': 'm4v',
              'title': 'Flug MH370',
              'description': 'Das Rätsel um die Boeing 777 der Malaysia-Airlines',
-            'thumbnail': 're:http://.*\.jpg$',
+            'thumbnail': r're:http://.*\.jpg$',
          },
          'params': {
              # m3u8 download
diff --git a/youtube_dl/extractor/spike.py b/youtube_dl/extractor/spike.py

index 218785ee4e11045bcbb09416cd3bc6862a757ac0..c59896a17905c006eabb40271d846eebe7908a66 100644 (file)
--- a/youtube_dl/extractor/spike.py
+++ b/youtube_dl/extractor/spike.py
@@ -1,5 +1,7 @@
  from __future__ import unicode_literals
  
+import re
+
  from .mtv import MTVServicesInfoExtractor
  
  
@@ -16,6 +18,15 @@ class SpikeIE(MTVServicesInfoExtractor):
              'timestamp': 1388120400,
              'upload_date': '20131227',
          },
+    }, {
+        'url': 'http://www.spike.com/full-episodes/j830qm/lip-sync-battle-joel-mchale-vs-jim-rash-season-2-ep-209',
+        'md5': 'b25c6f16418aefb9ad5a6cae2559321f',
+        'info_dict': {
+            'id': '37ace3a8-1df6-48be-85b8-38df8229e241',
+            'ext': 'mp4',
+            'title': 'Lip Sync Battle|April 28, 2016|2|209|Joel McHale Vs. Jim Rash|Act 1',
+            'description': 'md5:a739ca8f978a7802f67f8016d27ce114',
+        },
      }, {
          'url': 'http://www.spike.com/video-clips/lhtu8m/',
          'only_matching': True,
@@ -32,3 +43,12 @@ class SpikeIE(MTVServicesInfoExtractor):
  
      _FEED_URL = 'http://www.spike.com/feeds/mrss/'
      _MOBILE_TEMPLATE = 'http://m.spike.com/videos/video.rbml?id=%s'
+    _CUSTOM_URL_REGEX = re.compile(r'spikenetworkapp://([^/]+/[-a-fA-F0-9]+)')
+
+    def _extract_mgid(self, webpage):
+        mgid = super(SpikeIE, self)._extract_mgid(webpage)
+        if mgid is None:
+            url_parts = self._search_regex(self._CUSTOM_URL_REGEX, webpage, 'episode_id')
+            video_type, episode_id = url_parts.split('/', 1)
+            mgid = 'mgid:arc:{0}:spike.com:{1}'.format(video_type, episode_id)
+        return mgid
diff --git a/youtube_dl/extractor/sport5.py b/youtube_dl/extractor/sport5.py

index 7e67833062d0a21d2c663b1b5d24246d653f0116..a417b5a4ef0ddf302f11dc36f430572192e64262 100644 (file)
--- a/youtube_dl/extractor/sport5.py
+++ b/youtube_dl/extractor/sport5.py
@@ -41,7 +41,7 @@ def _real_extract(self, url):
  
          webpage = self._download_webpage(url, media_id)
  
-        video_id = self._html_search_regex('clipId=([\w-]+)', webpage, 'video id')
+        video_id = self._html_search_regex(r'clipId=([\w-]+)', webpage, 'video id')
  
          metadata = self._download_xml(
              'http://sport5-metadata-rr-d.nsacdn.com/vod/vod/%s/HDS/metadata.xml' % video_id,
diff --git a/youtube_dl/extractor/sportbox.py b/youtube_dl/extractor/sportbox.py

index e5c28ae890ee61536052a5716677d486d0a5b43e..b512cd20fbe495023eb7ec9f2c342245b5f2a44d 100644 (file)
--- a/youtube_dl/extractor/sportbox.py
+++ b/youtube_dl/extractor/sportbox.py
@@ -21,7 +21,7 @@ class SportBoxIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Гонка 2  заезд ««Объединенный 2000»: классы Туринг и Супер-продакшн',
              'description': 'md5:3d72dc4a006ab6805d82f037fdc637ad',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'upload_date': '20140928',
          },
          'params': {
@@ -73,7 +73,7 @@ class SportBoxEmbedIE(InfoExtractor):
              'id': '211355',
              'ext': 'mp4',
              'title': 'В Новороссийске прошел детский турнир «Поле славы боевой»',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          },
          'params': {
              # m3u8 download
diff --git a/youtube_dl/extractor/sportdeutschland.py b/youtube_dl/extractor/sportdeutschland.py

index a9927f6e29d1d52463cefc3503414305bec0e919..a3c35a899a2186f1e937771cd0e34df408b2d361 100644 (file)
--- a/youtube_dl/extractor/sportdeutschland.py
+++ b/youtube_dl/extractor/sportdeutschland.py
@@ -20,8 +20,8 @@ class SportDeutschlandIE(InfoExtractor):
              'title': 're:Li-Ning Badminton Weltmeisterschaft 2014 Kopenhagen',
              'categories': ['Badminton'],
              'view_count': int,
-            'thumbnail': 're:^https?://.*\.jpg$',
-            'description': 're:Die Badminton-WM 2014 aus Kopenhagen bei Sportdeutschland\.TV',
+            'thumbnail': r're:^https?://.*\.jpg$',
+            'description': r're:Die Badminton-WM 2014 aus Kopenhagen bei Sportdeutschland\.TV',
              'timestamp': int,
              'upload_date': 're:^201408[23][0-9]$',
          },
@@ -38,7 +38,7 @@ class SportDeutschlandIE(InfoExtractor):
              'timestamp': 1408976060,
              'duration': 2732,
              'title': 'Li-Ning Badminton Weltmeisterschaft 2014 Kopenhagen: Herren Einzel, Wei Lee vs. Keun Lee',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'view_count': int,
              'categories': ['Li-Ning Badminton WM 2014'],
  
diff --git a/youtube_dl/extractor/srgssr.py b/youtube_dl/extractor/srgssr.py

index 246970c4d98a7d4592deadc1c7744c1504ccefef..319a48a7a543dfcfade0cb91726103a66d864711 100644 (file)
--- a/youtube_dl/extractor/srgssr.py
+++ b/youtube_dl/extractor/srgssr.py
@@ -4,6 +4,7 @@
  import re
  
  from .common import InfoExtractor
+from ..compat import compat_urllib_parse_urlparse
  from ..utils import (
      ExtractorError,
      parse_iso8601,
@@ -23,6 +24,16 @@ class SRGSSRIE(InfoExtractor):
          'STARTDATE': 'This video is not yet available. Please try again later.',
      }
  
+    def _get_tokenized_src(self, url, video_id, format_id):
+        sp = compat_urllib_parse_urlparse(url).path.split('/')
+        token = self._download_json(
+            'http://tp.srgssr.ch/akahd/token?acl=/%s/%s/*' % (sp[1], sp[2]),
+            video_id, 'Downloading %s token' % format_id, fatal=False) or {}
+        auth_params = token.get('token', {}).get('authparams')
+        if auth_params:
+            url += '?' + auth_params
+        return url
+
      def get_media_data(self, bu, media_type, media_id):
          media_data = self._download_json(
              'http://il.srgssr.ch/integrationlayer/1.0/ue/%s/%s/play/%s.json' % (bu, media_type, media_id),
@@ -37,9 +48,6 @@ def get_media_data(self, bu, media_type, media_id):
      def _real_extract(self, url):
          bu, media_type, media_id = re.match(self._VALID_URL, url).groups()
  
-        if bu == 'rts':
-            return self.url_result('rts:%s' % media_id, 'RTS')
-
          media_data = self.get_media_data(bu, media_type, media_id)
  
          metadata = media_data['AssetMetadatas']['AssetMetadata'][0]
@@ -61,14 +69,16 @@ def _real_extract(self, url):
                  asset_url = asset['text']
                  quality = asset['@quality']
                  format_id = '%s-%s' % (protocol, quality)
-                if protocol == 'HTTP-HDS':
-                    formats.extend(self._extract_f4m_formats(
-                        asset_url + '?hdcore=3.4.0', media_id,
-                        f4m_id=format_id, fatal=False))
-                elif protocol == 'HTTP-HLS':
-                    formats.extend(self._extract_m3u8_formats(
-                        asset_url, media_id, 'mp4', 'm3u8_native',
-                        m3u8_id=format_id, fatal=False))
+                if protocol.startswith('HTTP-HDS') or protocol.startswith('HTTP-HLS'):
+                    asset_url = self._get_tokenized_src(asset_url, media_id, format_id)
+                    if protocol.startswith('HTTP-HDS'):
+                        formats.extend(self._extract_f4m_formats(
+                            asset_url + ('?' if '?' not in asset_url else '&') + 'hdcore=3.4.0',
+                            media_id, f4m_id=format_id, fatal=False))
+                    elif protocol.startswith('HTTP-HLS'):
+                        formats.extend(self._extract_m3u8_formats(
+                            asset_url, media_id, 'mp4', 'm3u8_native',
+                            m3u8_id=format_id, fatal=False))
                  else:
                      formats.append({
                          'format_id': format_id,
@@ -94,10 +104,10 @@ class SRGSSRPlayIE(InfoExtractor):
  
      _TESTS = [{
          'url': 'http://www.srf.ch/play/tv/10vor10/video/snowden-beantragt-asyl-in-russland?id=28e1a57d-5b76-4399-8ab3-9097f071e6c5',
-        'md5': '4cd93523723beff51bb4bee974ee238d',
+        'md5': 'da6b5b3ac9fa4761a942331cef20fcb3',
          'info_dict': {
              'id': '28e1a57d-5b76-4399-8ab3-9097f071e6c5',
-            'ext': 'm4v',
+            'ext': 'mp4',
              'upload_date': '20130701',
              'title': 'Snowden beantragt Asyl in Russland',
              'timestamp': 1372713995,
@@ -140,7 +150,7 @@ class SRGSSRPlayIE(InfoExtractor):
              'uploader': '19h30',
              'upload_date': '20141201',
              'timestamp': 1417458600,
-            'thumbnail': 're:^https?://.*\.image',
+            'thumbnail': r're:^https?://.*\.image',
              'view_count': int,
          },
          'params': {
diff --git a/youtube_dl/extractor/srmediathek.py b/youtube_dl/extractor/srmediathek.py

index b03272f7a273e8a3726adb03d805bd2a449849bf..28baf901c9f021c15544f099f78dd5d5a6b9165c 100644 (file)
--- a/youtube_dl/extractor/srmediathek.py
+++ b/youtube_dl/extractor/srmediathek.py
@@ -20,7 +20,7 @@ class SRMediathekIE(ARDMediathekIE):
              'ext': 'mp4',
              'title': 'sportarena (26.10.2014)',
              'description': 'Ringen: KSV Köllerbach gegen Aachen-Walheim; Frauen-Fußball: 1. FC Saarbrücken gegen Sindelfingen; Motorsport: Rallye in Losheim; dazu: Interview mit Timo Bernhard; Turnen: TG Saar; Reitsport: Deutscher Voltigier-Pokal; Badminton: Interview mit Michael Fuchs ',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          },
          'skip': 'no longer available',
      }, {
diff --git a/youtube_dl/extractor/stanfordoc.py b/youtube_dl/extractor/stanfordoc.py

index 4a3d8bb8f267b588c59e2f16b208955a70d362d9..cce65fb1014d3595670707d8009832ea37f448dc 100644 (file)
--- a/youtube_dl/extractor/stanfordoc.py
+++ b/youtube_dl/extractor/stanfordoc.py
@@ -66,7 +66,7 @@ def _real_extract(self, url):
                  r'(?s)<description>([^<]+)</description>',
                  coursepage, 'description', fatal=False)
  
-            links = orderedSet(re.findall('<a href="(VideoPage.php\?[^"]+)">', coursepage))
+            links = orderedSet(re.findall(r'<a href="(VideoPage.php\?[^"]+)">', coursepage))
              info['entries'] = [self.url_result(
                  'http://openclassroom.stanford.edu/MainFolder/%s' % unescapeHTML(l)
              ) for l in links]
@@ -84,7 +84,7 @@ def _real_extract(self, url):
              rootpage = self._download_webpage(rootURL, info['id'],
                                                errnote='Unable to download course info page')
  
-            links = orderedSet(re.findall('<a href="(CoursePage.php\?[^"]+)">', rootpage))
+            links = orderedSet(re.findall(r'<a href="(CoursePage.php\?[^"]+)">', rootpage))
              info['entries'] = [self.url_result(
                  'http://openclassroom.stanford.edu/MainFolder/%s' % unescapeHTML(l)
              ) for l in links]
diff --git a/youtube_dl/extractor/stitcher.py b/youtube_dl/extractor/stitcher.py

index 0f8782d038c9fdadf903b05479ff468a039c6aa4..97d1ff6811b27140c77932a766b7cb9d3dbfe7b6 100644 (file)
--- a/youtube_dl/extractor/stitcher.py
+++ b/youtube_dl/extractor/stitcher.py
@@ -22,7 +22,7 @@ class StitcherIE(InfoExtractor):
              'title': 'Machine Learning Mastery and Cancer Clusters',
              'description': 'md5:55163197a44e915a14a1ac3a1de0f2d3',
              'duration': 1604,
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
          },
      }, {
          'url': 'http://www.stitcher.com/podcast/panoply/vulture-tv/e/the-rare-hourlong-comedy-plus-40846275?autoplay=true',
@@ -33,7 +33,7 @@ class StitcherIE(InfoExtractor):
              'title': "The CW's 'Crazy Ex-Girlfriend'",
              'description': 'md5:04f1e2f98eb3f5cbb094cea0f9e19b17',
              'duration': 2235,
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
          },
          'params': {
              'skip_download': True,
diff --git a/youtube_dl/extractor/streamable.py b/youtube_dl/extractor/streamable.py

index 2c26fa689003c6203399eca293c32c8998636ea5..e973c867c1a23eeeacbbc706269e50900f4f60b9 100644 (file)
--- a/youtube_dl/extractor/streamable.py
+++ b/youtube_dl/extractor/streamable.py
@@ -21,7 +21,7 @@ class StreamableIE(InfoExtractor):
                  'id': 'dnd1',
                  'ext': 'mp4',
                  'title': 'Mikel Oiarzabal scores to make it 0-3 for La Real against Espanyol',
-                'thumbnail': 're:https?://.*\.jpg$',
+                'thumbnail': r're:https?://.*\.jpg$',
                  'uploader': 'teabaker',
                  'timestamp': 1454964157.35115,
                  'upload_date': '20160208',
@@ -37,7 +37,7 @@ class StreamableIE(InfoExtractor):
                  'id': 'moo',
                  'ext': 'mp4',
                  'title': '"Please don\'t eat me!"',
-                'thumbnail': 're:https?://.*\.jpg$',
+                'thumbnail': r're:https?://.*\.jpg$',
                  'timestamp': 1426115495,
                  'upload_date': '20150311',
                  'duration': 12,
diff --git a/youtube_dl/extractor/streetvoice.py b/youtube_dl/extractor/streetvoice.py

index e529051d100b8024007229200648ea259b3d1677..91612c7f22d260c8544cd0ead31dd830daab0424 100644 (file)
--- a/youtube_dl/extractor/streetvoice.py
+++ b/youtube_dl/extractor/streetvoice.py
@@ -16,7 +16,7 @@ class StreetVoiceIE(InfoExtractor):
              'ext': 'mp3',
              'title': '輸',
              'description': 'Crispy脆樂團 - 輸',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 260,
              'upload_date': '20091018',
              'uploader': 'Crispy脆樂團',
diff --git a/youtube_dl/extractor/sunporno.py b/youtube_dl/extractor/sunporno.py

index ef9be7926866f6420d802f14cfdf83b3a9e4f69b..68051169b974d7bc748238566be1c31734eb8ed7 100644 (file)
--- a/youtube_dl/extractor/sunporno.py
+++ b/youtube_dl/extractor/sunporno.py
@@ -21,7 +21,7 @@ class SunPornoIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'md5:0a400058e8105d39e35c35e7c5184164',
              'description': 'md5:a31241990e1bd3a64e72ae99afb325fb',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 302,
              'age_limit': 18,
          }
diff --git a/youtube_dl/extractor/svt.py b/youtube_dl/extractor/svt.py

index fb0a4b24ef5bf65ff13ca2288395f09540e71d48..10cf808857e231cee482434010161a93eee85027 100644 (file)
--- a/youtube_dl/extractor/svt.py
+++ b/youtube_dl/extractor/svt.py
@@ -129,7 +129,7 @@ class SVTPlayIE(SVTBaseIE):
              'ext': 'mp4',
              'title': 'Flygplan till Haile Selassie',
              'duration': 3527,
-            'thumbnail': 're:^https?://.*[\.-]jpg$',
+            'thumbnail': r're:^https?://.*[\.-]jpg$',
              'age_limit': 0,
              'subtitles': {
                  'sv': [{
diff --git a/youtube_dl/extractor/swrmediathek.py b/youtube_dl/extractor/swrmediathek.py

index 6d69f7686b37bd2b39b6362373eadefedef0b932..0f615979e132278c91deef05cc4f0e812c158f3d 100644 (file)
--- a/youtube_dl/extractor/swrmediathek.py
+++ b/youtube_dl/extractor/swrmediathek.py
@@ -1,10 +1,12 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
-import re
-
  from .common import InfoExtractor
-from ..utils import parse_duration
+from ..utils import (
+    parse_duration,
+    int_or_none,
+    determine_protocol,
+)
  
  
  class SWRMediathekIE(InfoExtractor):
@@ -18,7 +20,7 @@ class SWRMediathekIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'SWR odysso',
              'description': 'md5:2012e31baad36162e97ce9eb3f157b8a',
-            'thumbnail': 're:^http:.*\.jpg$',
+            'thumbnail': r're:^http:.*\.jpg$',
              'duration': 2602,
              'upload_date': '20140515',
              'uploader': 'SWR Fernsehen',
@@ -32,12 +34,13 @@ class SWRMediathekIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Nachtcafé - Alltagsdroge Alkohol - zwischen Sektempfang und Komasaufen',
              'description': 'md5:e0a3adc17e47db2c23aab9ebc36dbee2',
-            'thumbnail': 're:http://.*\.jpg',
+            'thumbnail': r're:http://.*\.jpg',
              'duration': 5305,
              'upload_date': '20140516',
              'uploader': 'SWR Fernsehen',
              'uploader_id': '990030',
          },
+        'skip': 'redirect to http://swrmediathek.de/index.htm?hinweis=swrlink',
      }, {
          'url': 'http://swrmediathek.de/player.htm?show=bba23e10-cb93-11e3-bf7f-0026b975f2e6',
          'md5': '4382e4ef2c9d7ce6852535fa867a0dd3',
@@ -46,59 +49,67 @@ class SWRMediathekIE(InfoExtractor):
              'ext': 'mp3',
              'title': 'Saša Stanišic: Vor dem Fest',
              'description': 'md5:5b792387dc3fbb171eb709060654e8c9',
-            'thumbnail': 're:http://.*\.jpg',
+            'thumbnail': r're:http://.*\.jpg',
              'duration': 3366,
              'upload_date': '20140520',
              'uploader': 'SWR 2',
              'uploader_id': '284670',
-        }
+        },
+        'skip': 'redirect to http://swrmediathek.de/index.htm?hinweis=swrlink',
      }]
  
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
+        video_id = self._match_id(url)
  
          video = self._download_json(
-            'http://swrmediathek.de/AjaxEntry?ekey=%s' % video_id, video_id, 'Downloading video JSON')
+            'http://swrmediathek.de/AjaxEntry?ekey=%s' % video_id,
+            video_id, 'Downloading video JSON')
  
          attr = video['attr']
-        media_type = attr['entry_etype']
+        title = attr['entry_title']
+        media_type = attr.get('entry_etype')
  
          formats = []
-        for entry in video['sub']:
-            if entry['name'] != 'entry_media':
+        for entry in video.get('sub', []):
+            if entry.get('name') != 'entry_media':
                  continue
  
-            entry_attr = entry['attr']
-            codec = entry_attr['val0']
-            quality = int(entry_attr['val1'])
-
-            fmt = {
-                'url': entry_attr['val2'],
-                'quality': quality,
-            }
-
-            if media_type == 'Video':
-                fmt.update({
-                    'format_note': ['144p', '288p', '544p', '720p'][quality - 1],
-                    'vcodec': codec,
-                })
-            elif media_type == 'Audio':
-                fmt.update({
-                    'acodec': codec,
+            entry_attr = entry.get('attr', {})
+            f_url = entry_attr.get('val2')
+            if not f_url:
+                continue
+            codec = entry_attr.get('val0')
+            if codec == 'm3u8':
+                formats.extend(self._extract_m3u8_formats(
+                    f_url, video_id, 'mp4', 'm3u8_native',
+                    m3u8_id='hls', fatal=False))
+            elif codec == 'f4m':
+                formats.extend(self._extract_f4m_formats(
+                    f_url + '?hdcore=3.7.0', video_id,
+                    f4m_id='hds', fatal=False))
+            else:
+                formats.append({
+                    'format_id': determine_protocol({'url': f_url}),
+                    'url': f_url,
+                    'quality': int_or_none(entry_attr.get('val1')),
+                    'vcodec': codec if media_type == 'Video' else 'none',
+                    'acodec': codec if media_type == 'Audio' else None,
                  })
-            formats.append(fmt)
-
          self._sort_formats(formats)
  
+        upload_date = None
+        entry_pdatet = attr.get('entry_pdatet')
+        if entry_pdatet:
+            upload_date = entry_pdatet[:-4]
+
          return {
              'id': video_id,
-            'title': attr['entry_title'],
-            'description': attr['entry_descl'],
-            'thumbnail': attr['entry_image_16_9'],
-            'duration': parse_duration(attr['entry_durat']),
-            'upload_date': attr['entry_pdatet'][:-4],
-            'uploader': attr['channel_title'],
-            'uploader_id': attr['channel_idkey'],
+            'title': title,
+            'description': attr.get('entry_descl'),
+            'thumbnail': attr.get('entry_image_16_9'),
+            'duration': parse_duration(attr.get('entry_durat')),
+            'upload_date': upload_date,
+            'uploader': attr.get('channel_title'),
+            'uploader_id': attr.get('channel_idkey'),
              'formats': formats,
          }
diff --git a/youtube_dl/extractor/tagesschau.py b/youtube_dl/extractor/tagesschau.py

index 8670cee28d381de6011e3187db3024bcc40519de..c351b754594a08be2f585f901c3a71ac425bcfd7 100644 (file)
--- a/youtube_dl/extractor/tagesschau.py
+++ b/youtube_dl/extractor/tagesschau.py
@@ -23,7 +23,7 @@ class TagesschauPlayerIE(InfoExtractor):
              'id': '179517',
              'ext': 'mp4',
              'title': 'Marie Kristin Boese, ARD Berlin, über den zukünftigen Kurs der AfD',
-            'thumbnail': 're:^https?:.*\.jpg$',
+            'thumbnail': r're:^https?:.*\.jpg$',
              'formats': 'mincount:6',
          },
      }, {
@@ -33,7 +33,7 @@ class TagesschauPlayerIE(InfoExtractor):
              'id': '29417',
              'ext': 'mp3',
              'title': 'Trabi - Bye, bye Rennpappe',
-            'thumbnail': 're:^https?:.*\.jpg$',
+            'thumbnail': r're:^https?:.*\.jpg$',
              'formats': 'mincount:2',
          },
      }, {
@@ -135,7 +135,7 @@ class TagesschauIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Regierungsumbildung in Athen: Neue Minister in Griechenland vereidigt',
              'description': '18.07.2015 20:10 Uhr',
-            'thumbnail': 're:^https?:.*\.jpg$',
+            'thumbnail': r're:^https?:.*\.jpg$',
          },
      }, {
          'url': 'http://www.tagesschau.de/multimedia/sendung/ts-5727.html',
@@ -145,7 +145,7 @@ class TagesschauIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Sendung: tagesschau \t04.12.2014 20:00 Uhr',
              'description': 'md5:695c01bfd98b7e313c501386327aea59',
-            'thumbnail': 're:^https?:.*\.jpg$',
+            'thumbnail': r're:^https?:.*\.jpg$',
          },
      }, {
          # exclusive audio
@@ -156,7 +156,7 @@ class TagesschauIE(InfoExtractor):
              'ext': 'mp3',
              'title': 'Trabi - Bye, bye Rennpappe',
              'description': 'md5:8687dda862cbbe2cfb2df09b56341317',
-            'thumbnail': 're:^https?:.*\.jpg$',
+            'thumbnail': r're:^https?:.*\.jpg$',
          },
      }, {
          # audio in article
@@ -167,7 +167,7 @@ class TagesschauIE(InfoExtractor):
              'ext': 'mp3',
              'title': 'Viele Baustellen für neuen BND-Chef',
              'description': 'md5:1e69a54be3e1255b2b07cdbce5bcd8b4',
-            'thumbnail': 're:^https?:.*\.jpg$',
+            'thumbnail': r're:^https?:.*\.jpg$',
          },
      }, {
          'url': 'http://www.tagesschau.de/inland/afd-parteitag-135.html',
diff --git a/youtube_dl/extractor/tass.py b/youtube_dl/extractor/tass.py

index 5293393efc219526b61fe04ff12ff25f1d49b33c..6d336da788b8f3b51b3c61d3d0a2215388f0342a 100644 (file)
--- a/youtube_dl/extractor/tass.py
+++ b/youtube_dl/extractor/tass.py
@@ -21,7 +21,7 @@ class TassIE(InfoExtractor):
                  'ext': 'mp4',
                  'title': 'Посетителям московского зоопарка показали красную панду',
                  'description': 'Приехавшую из Дублина Зейну можно увидеть в павильоне "Кошки тропиков"',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
              },
          },
          {
diff --git a/youtube_dl/extractor/tdslifeway.py b/youtube_dl/extractor/tdslifeway.py

index 4d1f5c8016063ce1d18e5152e479b788e6152c25..101c6ee31a97bfa9841aeebffb911f662853d89c 100644 (file)
--- a/youtube_dl/extractor/tdslifeway.py
+++ b/youtube_dl/extractor/tdslifeway.py
@@ -13,7 +13,7 @@ class TDSLifewayIE(InfoExtractor):
              'id': '3453494717001',
              'ext': 'mp4',
              'title': 'The Gospel by Numbers',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'upload_date': '20140410',
              'description': 'Coming soon from T4G 2014!',
              'uploader_id': '2034960640001',
diff --git a/youtube_dl/extractor/teachertube.py b/youtube_dl/extractor/teachertube.py

index df5d5556fadf82c8dc680643389fdeccf989793f..f14713a78904c0e879571d6642061f3baa7a617a 100644 (file)
--- a/youtube_dl/extractor/teachertube.py
+++ b/youtube_dl/extractor/teachertube.py
@@ -24,7 +24,7 @@ class TeacherTubeIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Measures of dispersion from a frequency table',
              'description': 'Measures of dispersion from a frequency table',
-            'thumbnail': 're:http://.*\.jpg',
+            'thumbnail': r're:http://.*\.jpg',
          },
      }, {
          'url': 'http://www.teachertube.com/viewVideo.php?video_id=340064',
@@ -34,7 +34,7 @@ class TeacherTubeIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'How to Make Paper Dolls _ Paper Art Projects',
              'description': 'Learn how to make paper dolls in this simple',
-            'thumbnail': 're:http://.*\.jpg',
+            'thumbnail': r're:http://.*\.jpg',
          },
      }, {
          'url': 'http://www.teachertube.com/music.php?music_id=8805',
diff --git a/youtube_dl/extractor/teamfourstar.py b/youtube_dl/extractor/teamfourstar.py

new file mode 100644 (file)

index 0000000..a8c6ed7
--- /dev/null
+++ b/youtube_dl/extractor/teamfourstar.py
@@ -0,0 +1,48 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from .jwplatform import JWPlatformIE
+from ..utils import unified_strdate
+
+
+class TeamFourStarIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?teamfourstar\.com/(?P<id>[a-z0-9\-]+)'
+    _TEST = {
+        'url': 'http://teamfourstar.com/tfs-abridged-parody-episode-1-2/',
+        'info_dict': {
+            'id': '0WdZO31W',
+            'title': 'TFS Abridged Parody Episode 1',
+            'description': 'md5:d60bc389588ebab2ee7ad432bda953ae',
+            'ext': 'mp4',
+            'timestamp': 1394168400,
+            'upload_date': '20080508',
+        },
+    }
+
+    def _real_extract(self, url):
+        display_id = self._match_id(url)
+        webpage = self._download_webpage(url, display_id)
+
+        jwplatform_url = JWPlatformIE._extract_url(webpage)
+
+        video_title = self._html_search_regex(
+            r'<h1[^>]+class="entry-title"[^>]*>(?P<title>.+?)</h1>',
+            webpage, 'title')
+        video_date = unified_strdate(self._html_search_regex(
+            r'<span[^>]+class="meta-date date updated"[^>]*>(?P<date>.+?)</span>',
+            webpage, 'date', fatal=False))
+        video_description = self._html_search_regex(
+            r'(?s)<div[^>]+class="content-inner"[^>]*>.*?(?P<description><p>.+?)</div>',
+            webpage, 'description', fatal=False)
+        video_thumbnail = self._og_search_thumbnail(webpage)
+
+        return {
+            '_type': 'url_transparent',
+            'display_id': display_id,
+            'title': video_title,
+            'description': video_description,
+            'upload_date': video_date,
+            'thumbnail': video_thumbnail,
+            'url': jwplatform_url,
+        }
diff --git a/youtube_dl/extractor/ted.py b/youtube_dl/extractor/ted.py

index 451cde76d2e757fcdfb30ad96847b16aa4d156ff..1b1afab32c349be119f3db8c19c6ed68e5c5ccce 100644 (file)
--- a/youtube_dl/extractor/ted.py
+++ b/youtube_dl/extractor/ted.py
@@ -47,7 +47,7 @@ class TEDIE(InfoExtractor):
              'id': 'tSVI8ta_P4w',
              'ext': 'mp4',
              'title': 'Vishal Sikka: The beauty and power of algorithms',
-            'thumbnail': 're:^https?://.+\.jpg',
+            'thumbnail': r're:^https?://.+\.jpg',
              'description': 'md5:6261fdfe3e02f4f579cbbfc00aff73f4',
              'upload_date': '20140122',
              'uploader_id': 'TEDInstitute',
@@ -189,7 +189,7 @@ def _talk_info(self, url, video_name):
                          'format_id': '%s-%sk' % (format_id, bitrate),
                          'tbr': bitrate,
                      })
-                    if re.search('\d+k', h264_url):
+                    if re.search(r'\d+k', h264_url):
                          http_url = h264_url
              elif format_id == 'rtmp':
                  streamer = talk_info.get('streamer')
diff --git a/youtube_dl/extractor/telebruxelles.py b/youtube_dl/extractor/telebruxelles.py

index eefecc490c5d13476259497e79f7a3ebe68caee7..5886e9c1bb7e0c4e9b192480ac2cfa48118ffe2a 100644 (file)
--- a/youtube_dl/extractor/telebruxelles.py
+++ b/youtube_dl/extractor/telebruxelles.py
@@ -7,33 +7,30 @@
  
  
  class TeleBruxellesIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?(?:telebruxelles|bx1)\.be/(news|sport|dernier-jt)/?(?P<id>[^/#?]+)'
+    _VALID_URL = r'https?://(?:www\.)?(?:telebruxelles|bx1)\.be/(news|sport|dernier-jt|emission)/?(?P<id>[^/#?]+)'
      _TESTS = [{
-        'url': 'http://www.telebruxelles.be/news/auditions-devant-parlement-francken-galant-tres-attendus/',
-        'md5': '59439e568c9ee42fb77588b2096b214f',
+        'url': 'http://bx1.be/news/que-risque-lauteur-dune-fausse-alerte-a-la-bombe/',
+        'md5': 'a2a67a5b1c3e8c9d33109b902f474fd9',
          'info_dict': {
-            'id': '11942',
-            'display_id': 'auditions-devant-parlement-francken-galant-tres-attendus',
-            'ext': 'flv',
-            'title': 'Parlement : Francken et Galant répondent aux interpellations de l’opposition',
-            'description': 're:Les auditions des ministres se poursuivent*'
-        },
-        'params': {
-            'skip_download': 'requires rtmpdump'
+            'id': '158856',
+            'display_id': 'que-risque-lauteur-dune-fausse-alerte-a-la-bombe',
+            'ext': 'mp4',
+            'title': 'Que risque l’auteur d’une fausse alerte à la bombe ?',
+            'description': 'md5:3cf8df235d44ebc5426373050840e466',
          },
      }, {
-        'url': 'http://www.telebruxelles.be/sport/basket-brussels-bat-mons-80-74/',
-        'md5': '181d3fbdcf20b909309e5aef5c6c6047',
+        'url': 'http://bx1.be/sport/futsal-schaerbeek-sincline-5-3-a-thulin/',
+        'md5': 'dfe07ecc9c153ceba8582ac912687675',
          'info_dict': {
-            'id': '10091',
-            'display_id': 'basket-brussels-bat-mons-80-74',
-            'ext': 'flv',
-            'title': 'Basket : le Brussels bat Mons 80-74',
-            'description': 're:^Ils l\u2019on fait ! En basket, le B*',
-        },
-        'params': {
-            'skip_download': 'requires rtmpdump'
+            'id': '158433',
+            'display_id': 'futsal-schaerbeek-sincline-5-3-a-thulin',
+            'ext': 'mp4',
+            'title': 'Futsal : Schaerbeek s’incline 5-3 à Thulin',
+            'description': 'md5:fd013f1488d5e2dceb9cebe39e2d569b',
          },
+    }, {
+        'url': 'http://bx1.be/emission/bxenf1-gastronomie/',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
@@ -50,13 +47,13 @@ def _real_extract(self, url):
              r'file\s*:\s*"(rtmp://[^/]+/vod/mp4:"\s*\+\s*"[^"]+"\s*\+\s*".mp4)"',
              webpage, 'RTMP url')
          rtmp_url = re.sub(r'"\s*\+\s*"', '', rtmp_url)
+        formats = self._extract_wowza_formats(rtmp_url, article_id or display_id)
+        self._sort_formats(formats)
  
          return {
              'id': article_id or display_id,
              'display_id': display_id,
              'title': title,
              'description': description,
-            'url': rtmp_url,
-            'ext': 'flv',
-            'rtmp_live': True  # if rtmpdump is not called with "--live" argument, the download is blocked and can be completed
+            'formats': formats,
          }
diff --git a/youtube_dl/extractor/telegraaf.py b/youtube_dl/extractor/telegraaf.py

index 58078c531d151e319fb7e707d8116a730507962b..0f576c1aba1a01491f657d8d892291cfe7934f21 100644 (file)
--- a/youtube_dl/extractor/telegraaf.py
+++ b/youtube_dl/extractor/telegraaf.py
@@ -17,7 +17,7 @@ class TelegraafIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Tikibad ontruimd wegens brand',
              'description': 'md5:05ca046ff47b931f9b04855015e163a4',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 33,
          },
          'params': {
diff --git a/youtube_dl/extractor/telemb.py b/youtube_dl/extractor/telemb.py

index 1bbd0e7bdfbf7148ce6c82c57c3e5113263626d3..9bcac4ec008239b5f0a11bea385f674d545bf2d9 100644 (file)
--- a/youtube_dl/extractor/telemb.py
+++ b/youtube_dl/extractor/telemb.py
@@ -19,7 +19,7 @@ class TeleMBIE(InfoExtractor):
                  'ext': 'mp4',
                  'title': 'Mons - Cook with Danielle : des cours de cuisine en anglais ! - Les reportages',
                  'description': 'md5:bc5225f47b17c309761c856ad4776265',
-                'thumbnail': 're:^http://.*\.(?:jpg|png)$',
+                'thumbnail': r're:^http://.*\.(?:jpg|png)$',
              }
          },
          {
@@ -32,7 +32,7 @@ class TeleMBIE(InfoExtractor):
                  'ext': 'mp4',
                  'title': 'Havré - Incendie mortel - Les reportages',
                  'description': 'md5:5e54cb449acb029c2b7734e2d946bd4a',
-                'thumbnail': 're:^http://.*\.(?:jpg|png)$',
+                'thumbnail': r're:^http://.*\.(?:jpg|png)$',
              }
          },
      ]
diff --git a/youtube_dl/extractor/telewebion.py b/youtube_dl/extractor/telewebion.py

index 7786b281371181b8e42378cac766946fdf59b762..1207b1a1b8cdcc5fc5b3d1c71b51c54ba1c300e4 100644 (file)
--- a/youtube_dl/extractor/telewebion.py
+++ b/youtube_dl/extractor/telewebion.py
@@ -13,7 +13,7 @@ class TelewebionIE(InfoExtractor):
              'id': '1263668',
              'ext': 'mp4',
              'title': 'قرعه\u200cکشی لیگ قهرمانان اروپا',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'view_count': int,
          },
          'params': {
diff --git a/youtube_dl/extractor/theplatform.py b/youtube_dl/extractor/theplatform.py

index cfbf7f4e1562c78ea1d5ae44437694a5325eb70b..192d8fa292e0a6f360929590274d06b4745fb8f6 100644 (file)
--- a/youtube_dl/extractor/theplatform.py
+++ b/youtube_dl/extractor/theplatform.py
@@ -33,7 +33,9 @@
  
  class ThePlatformBaseIE(OnceIE):
      def _extract_theplatform_smil(self, smil_url, video_id, note='Downloading SMIL data'):
-        meta = self._download_xml(smil_url, video_id, note=note, query={'format': 'SMIL'})
+        meta = self._download_xml(
+            smil_url, video_id, note=note, query={'format': 'SMIL'},
+            headers=self.geo_verification_headers())
          error_element = find_xpath_attr(meta, _x('.//smil:ref'), 'src')
          if error_element is not None and error_element.attrib['src'].startswith(
                  'http://link.theplatform.com/s/errorFiles/Unavailable.'):
@@ -154,7 +156,7 @@ class ThePlatformIE(ThePlatformBaseIE, AdobePassIE):
              'title': 'iPhone Siri’s sassy response to a math question has people talking',
              'description': 'md5:a565d1deadd5086f3331d57298ec6333',
              'duration': 83.0,
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'timestamp': 1435752600,
              'upload_date': '20150701',
              'uploader': 'NBCU-NEWS',
@@ -295,7 +297,7 @@ class ThePlatformFeedIE(ThePlatformBaseIE):
              'ext': 'mp4',
              'title': 'The Biden factor: will Joe run in 2016?',
              'description': 'Could Vice President Joe Biden be preparing a 2016 campaign? Mark Halperin and Sam Stein weigh in.',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'upload_date': '20140208',
              'timestamp': 1391824260,
              'duration': 467.0,
diff --git a/youtube_dl/extractor/thisamericanlife.py b/youtube_dl/extractor/thisamericanlife.py

index 36493a5de06cb0401b78fc5f1ecf2fca59208cb7..91e45f2c3def81545454e20e0b5e07617fa54030 100644 (file)
--- a/youtube_dl/extractor/thisamericanlife.py
+++ b/youtube_dl/extractor/thisamericanlife.py
@@ -13,7 +13,7 @@ class ThisAmericanLifeIE(InfoExtractor):
              'ext': 'm4a',
              'title': '487: Harper High School, Part One',
              'description': 'md5:ee40bdf3fb96174a9027f76dbecea655',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          },
      }, {
          'url': 'http://www.thisamericanlife.org/play_full.php?play=487',
diff --git a/youtube_dl/extractor/thisoldhouse.py b/youtube_dl/extractor/thisoldhouse.py

index 7629f0d10e4ebc40bf25b0f02f52b2524ab9e303..197258df141b4b6864afa0e4c1df7d0db431f64e 100644 (file)
--- a/youtube_dl/extractor/thisoldhouse.py
+++ b/youtube_dl/extractor/thisoldhouse.py
@@ -5,10 +5,10 @@
  
  
  class ThisOldHouseIE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?thisoldhouse\.com/(?:watch|how-to)/(?P<id>[^/?#]+)'
+    _VALID_URL = r'https?://(?:www\.)?thisoldhouse\.com/(?:watch|how-to|tv-episode)/(?P<id>[^/?#]+)'
      _TESTS = [{
          'url': 'https://www.thisoldhouse.com/how-to/how-to-build-storage-bench',
-        'md5': '568acf9ca25a639f0c4ff905826b662f',
+        'md5': '946f05bbaa12a33f9ae35580d2dfcfe3',
          'info_dict': {
              'id': '2REGtUDQ',
              'ext': 'mp4',
@@ -20,6 +20,9 @@ class ThisOldHouseIE(InfoExtractor):
      }, {
          'url': 'https://www.thisoldhouse.com/watch/arlington-arts-crafts-arts-and-crafts-class-begins',
          'only_matching': True,
+    }, {
+        'url': 'https://www.thisoldhouse.com/tv-episode/ask-toh-shelf-rough-electric',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
diff --git a/youtube_dl/extractor/tinypic.py b/youtube_dl/extractor/tinypic.py

index c43cace24d5bfd107328944d0bd290594ec06b3f..bc2def508c41afbc62bd2148eb12c4a8be0a649e 100644 (file)
--- a/youtube_dl/extractor/tinypic.py
+++ b/youtube_dl/extractor/tinypic.py
@@ -34,7 +34,7 @@ def _real_extract(self, url):
          webpage = self._download_webpage(url, video_id, 'Downloading page')
  
          mobj = re.search(r'(?m)fo\.addVariable\("file",\s"(?P<fileid>[\da-z]+)"\);\n'
-                         '\s+fo\.addVariable\("s",\s"(?P<serverid>\d+)"\);', webpage)
+                         r'\s+fo\.addVariable\("s",\s"(?P<serverid>\d+)"\);', webpage)
          if mobj is None:
              raise ExtractorError('Video %s does not exist' % video_id, expected=True)
  
diff --git a/youtube_dl/extractor/tnaflix.py b/youtube_dl/extractor/tnaflix.py

index 77d56b8ca87306a66c22a7e41c5d01de6bba9cb6..7e6ec3430bda4bd042d0b598ad2c7ef4dea53e77 100644 (file)
--- a/youtube_dl/extractor/tnaflix.py
+++ b/youtube_dl/extractor/tnaflix.py
@@ -91,7 +91,7 @@ def _real_extract(self, url):
          formats = []
  
          def extract_video_url(vl):
-            return re.sub('speed=\d+', 'speed=', unescapeHTML(vl.text))
+            return re.sub(r'speed=\d+', 'speed=', unescapeHTML(vl.text))
  
          video_link = cfg_xml.find('./videoLink')
          if video_link is not None:
@@ -174,7 +174,7 @@ class TNAFlixNetworkEmbedIE(TNAFlixNetworkBaseIE):
              'display_id': '6538',
              'ext': 'mp4',
              'title': 'Educational xxx video',
-            'thumbnail': 're:https?://.*\.jpg$',
+            'thumbnail': r're:https?://.*\.jpg$',
              'age_limit': 18,
          },
          'params': {
@@ -209,7 +209,7 @@ class TNAFlixIE(TNAFlixNetworkBaseIE):
              'display_id': 'Carmella-Decesare-striptease',
              'ext': 'mp4',
              'title': 'Carmella Decesare - striptease',
-            'thumbnail': 're:https?://.*\.jpg$',
+            'thumbnail': r're:https?://.*\.jpg$',
              'duration': 91,
              'age_limit': 18,
              'categories': ['Porn Stars'],
@@ -224,7 +224,7 @@ class TNAFlixIE(TNAFlixNetworkBaseIE):
              'ext': 'mp4',
              'title': 'Educational xxx video',
              'description': 'md5:b4fab8f88a8621c8fabd361a173fe5b8',
-            'thumbnail': 're:https?://.*\.jpg$',
+            'thumbnail': r're:https?://.*\.jpg$',
              'duration': 164,
              'age_limit': 18,
              'uploader': 'bobwhite39',
@@ -250,7 +250,7 @@ class EMPFlixIE(TNAFlixNetworkBaseIE):
              'ext': 'mp4',
              'title': 'Amateur Finger Fuck',
              'description': 'Amateur solo finger fucking.',
-            'thumbnail': 're:https?://.*\.jpg$',
+            'thumbnail': r're:https?://.*\.jpg$',
              'duration': 83,
              'age_limit': 18,
              'uploader': 'cwbike',
@@ -280,7 +280,7 @@ class MovieFapIE(TNAFlixNetworkBaseIE):
              'ext': 'mp4',
              'title': 'Experienced MILF Amazing Handjob',
              'description': 'Experienced MILF giving an Amazing Handjob',
-            'thumbnail': 're:https?://.*\.jpg$',
+            'thumbnail': r're:https?://.*\.jpg$',
              'age_limit': 18,
              'uploader': 'darvinfred06',
              'view_count': int,
@@ -298,7 +298,7 @@ class MovieFapIE(TNAFlixNetworkBaseIE):
              'ext': 'flv',
              'title': 'Jeune Couple Russe',
              'description': 'Amateur',
-            'thumbnail': 're:https?://.*\.jpg$',
+            'thumbnail': r're:https?://.*\.jpg$',
              'age_limit': 18,
              'uploader': 'whiskeyjar',
              'view_count': int,
diff --git a/youtube_dl/extractor/toutv.py b/youtube_dl/extractor/toutv.py

index 573f2ff6b5a09eb4c234faf2a953ea58d29851d8..26d770992ab1618c95094513371a01fcf99d1a70 100644 (file)
--- a/youtube_dl/extractor/toutv.py
+++ b/youtube_dl/extractor/toutv.py
@@ -56,7 +56,7 @@ def _real_initialize(self):
                  'state': state,
              })
          login_form = self._search_regex(
-            r'(?s)(<form[^>]+id="Form-login".+?</form>)', login_webpage, 'login form')
+            r'(?s)(<form[^>]+(?:id|name)="Form-login".+?</form>)', login_webpage, 'login form')
          form_data = self._hidden_inputs(login_form)
          form_data.update({
              'login-email': email,
diff --git a/youtube_dl/extractor/tudou.py b/youtube_dl/extractor/tudou.py

index bb8b8e23424e7943f2133028aca187d4fcffeab9..2aae55e7e8f8742b471e4f8ffe94ab2ae79bae25 100644 (file)
--- a/youtube_dl/extractor/tudou.py
+++ b/youtube_dl/extractor/tudou.py
@@ -23,7 +23,7 @@ class TudouIE(InfoExtractor):
              'id': '159448201',
              'ext': 'f4v',
              'title': '卡马乔国足开大脚长传冲吊集锦',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'timestamp': 1372113489000,
              'description': '卡马乔卡家军，开大脚先进战术不完全集锦！',
              'duration': 289.04,
@@ -36,7 +36,7 @@ class TudouIE(InfoExtractor):
              'id': '117049447',
              'ext': 'f4v',
              'title': 'La Sylphide-Bolshoi-Ekaterina Krysanova & Vyacheslav Lopatin 2012',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'timestamp': 1349207518000,
              'description': 'md5:294612423894260f2dcd5c6c04fe248b',
              'duration': 5478.33,
diff --git a/youtube_dl/extractor/tumblr.py b/youtube_dl/extractor/tumblr.py

index ebe411e12aa5fa44e201dcaefc52e839e5b2d212..786143525d4d7cf4455ec59eff20a5e3a88dc4ea 100644 (file)
--- a/youtube_dl/extractor/tumblr.py
+++ b/youtube_dl/extractor/tumblr.py
@@ -17,7 +17,7 @@ class TumblrIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'tatiana maslany news, Orphan Black || DVD extra - behind the scenes ↳...',
              'description': 'md5:37db8211e40b50c7c44e95da14f630b7',
-            'thumbnail': 're:http://.*\.jpg',
+            'thumbnail': r're:http://.*\.jpg',
          }
      }, {
          'url': 'http://5sostrum.tumblr.com/post/90208453769/yall-forgetting-the-greatest-keek-of-them-all',
@@ -27,7 +27,7 @@ class TumblrIE(InfoExtractor):
              'ext': 'mp4',
              'title': '5SOS STRUM ;]',
              'description': 'md5:dba62ac8639482759c8eb10ce474586a',
-            'thumbnail': 're:http://.*\.jpg',
+            'thumbnail': r're:http://.*\.jpg',
          }
      }, {
          'url': 'http://hdvideotest.tumblr.com/post/130323439814/test-description-for-my-hd-video',
@@ -37,7 +37,7 @@ class TumblrIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'HD Video Testing \u2014 Test description for my HD video',
              'description': 'md5:97cc3ab5fcd27ee4af6356701541319c',
-            'thumbnail': 're:http://.*\.jpg',
+            'thumbnail': r're:http://.*\.jpg',
          },
          'params': {
              'format': 'hd',
@@ -92,7 +92,7 @@ class TumblrIE(InfoExtractor):
              'title': 'Video by victoriassecret',
              'description': 'Invisibility or flight…which superpower would YOU choose? #VSFashionShow #ThisOrThat',
              'uploader_id': 'victoriassecret',
-            'thumbnail': 're:^https?://.*\.jpg'
+            'thumbnail': r're:^https?://.*\.jpg'
          },
          'add_ie': ['Instagram'],
      }]
diff --git a/youtube_dl/extractor/tunein.py b/youtube_dl/extractor/tunein.py

index ae4cfaec29b493c3b8b8e11705629901a07a2bf2..7e51de89ed6082d35737142e85efb19726b03985 100644 (file)
--- a/youtube_dl/extractor/tunein.py
+++ b/youtube_dl/extractor/tunein.py
@@ -11,6 +11,12 @@
  class TuneInBaseIE(InfoExtractor):
      _API_BASE_URL = 'http://tunein.com/tuner/tune/'
  
+    @staticmethod
+    def _extract_urls(webpage):
+        return re.findall(
+            r'<iframe[^>]+src=["\'](?P<url>(?:https?://)?tunein\.com/embed/player/[pst]\d+)',
+            webpage)
+
      def _real_extract(self, url):
          content_id = self._match_id(url)
  
@@ -69,82 +75,83 @@ class TuneInClipIE(TuneInBaseIE):
      _VALID_URL = r'https?://(?:www\.)?tunein\.com/station/.*?audioClipId\=(?P<id>\d+)'
      _API_URL_QUERY = '?tuneType=AudioClip&audioclipId=%s'
  
-    _TESTS = [
-        {
-            'url': 'http://tunein.com/station/?stationId=246119&audioClipId=816',
-            'md5': '99f00d772db70efc804385c6b47f4e77',
-            'info_dict': {
-                'id': '816',
-                'title': '32m',
-                'ext': 'mp3',
-            },
+    _TESTS = [{
+        'url': 'http://tunein.com/station/?stationId=246119&audioClipId=816',
+        'md5': '99f00d772db70efc804385c6b47f4e77',
+        'info_dict': {
+            'id': '816',
+            'title': '32m',
+            'ext': 'mp3',
          },
-    ]
+    }]
  
  
  class TuneInStationIE(TuneInBaseIE):
      IE_NAME = 'tunein:station'
-    _VALID_URL = r'https?://(?:www\.)?tunein\.com/(?:radio/.*?-s|station/.*?StationId\=)(?P<id>\d+)'
+    _VALID_URL = r'https?://(?:www\.)?tunein\.com/(?:radio/.*?-s|station/.*?StationId=|embed/player/s)(?P<id>\d+)'
      _API_URL_QUERY = '?tuneType=Station&stationId=%s'
  
      @classmethod
      def suitable(cls, url):
          return False if TuneInClipIE.suitable(url) else super(TuneInStationIE, cls).suitable(url)
  
-    _TESTS = [
-        {
-            'url': 'http://tunein.com/radio/Jazz24-885-s34682/',
-            'info_dict': {
-                'id': '34682',
-                'title': 'Jazz 24 on 88.5 Jazz24 - KPLU-HD2',
-                'ext': 'mp3',
-                'location': 'Tacoma, WA',
-            },
-            'params': {
-                'skip_download': True,  # live stream
-            },
+    _TESTS = [{
+        'url': 'http://tunein.com/radio/Jazz24-885-s34682/',
+        'info_dict': {
+            'id': '34682',
+            'title': 'Jazz 24 on 88.5 Jazz24 - KPLU-HD2',
+            'ext': 'mp3',
+            'location': 'Tacoma, WA',
+        },
+        'params': {
+            'skip_download': True,  # live stream
          },
-    ]
+    }, {
+        'url': 'http://tunein.com/embed/player/s6404/',
+        'only_matching': True,
+    }]
  
  
  class TuneInProgramIE(TuneInBaseIE):
      IE_NAME = 'tunein:program'
-    _VALID_URL = r'https?://(?:www\.)?tunein\.com/(?:radio/.*?-p|program/.*?ProgramId\=)(?P<id>\d+)'
+    _VALID_URL = r'https?://(?:www\.)?tunein\.com/(?:radio/.*?-p|program/.*?ProgramId=|embed/player/p)(?P<id>\d+)'
      _API_URL_QUERY = '?tuneType=Program&programId=%s'
  
-    _TESTS = [
-        {
-            'url': 'http://tunein.com/radio/Jazz-24-p2506/',
-            'info_dict': {
-                'id': '2506',
-                'title': 'Jazz 24 on 91.3 WUKY-HD3',
-                'ext': 'mp3',
-                'location': 'Lexington, KY',
-            },
-            'params': {
-                'skip_download': True,  # live stream
-            },
+    _TESTS = [{
+        'url': 'http://tunein.com/radio/Jazz-24-p2506/',
+        'info_dict': {
+            'id': '2506',
+            'title': 'Jazz 24 on 91.3 WUKY-HD3',
+            'ext': 'mp3',
+            'location': 'Lexington, KY',
          },
-    ]
+        'params': {
+            'skip_download': True,  # live stream
+        },
+    }, {
+        'url': 'http://tunein.com/embed/player/p191660/',
+        'only_matching': True,
+    }]
  
  
  class TuneInTopicIE(TuneInBaseIE):
      IE_NAME = 'tunein:topic'
-    _VALID_URL = r'https?://(?:www\.)?tunein\.com/topic/.*?TopicId\=(?P<id>\d+)'
+    _VALID_URL = r'https?://(?:www\.)?tunein\.com/(?:topic/.*?TopicId=|embed/player/t)(?P<id>\d+)'
      _API_URL_QUERY = '?tuneType=Topic&topicId=%s'
  
-    _TESTS = [
-        {
-            'url': 'http://tunein.com/topic/?TopicId=101830576',
-            'md5': 'c31a39e6f988d188252eae7af0ef09c9',
-            'info_dict': {
-                'id': '101830576',
-                'title': 'Votez pour moi du 29 octobre 2015 (29/10/15)',
-                'ext': 'mp3',
-                'location': 'Belgium',
-            },
+    _TESTS = [{
+        'url': 'http://tunein.com/topic/?TopicId=101830576',
+        'md5': 'c31a39e6f988d188252eae7af0ef09c9',
+        'info_dict': {
+            'id': '101830576',
+            'title': 'Votez pour moi du 29 octobre 2015 (29/10/15)',
+            'ext': 'mp3',
+            'location': 'Belgium',
          },
-    ]
+    }, {
+        'url': 'http://tunein.com/embed/player/t101830576/',
+        'only_matching': True,
+    }]
  
  
  class TuneInShortenerIE(InfoExtractor):
diff --git a/youtube_dl/extractor/turbo.py b/youtube_dl/extractor/turbo.py

index 7ae63a4992a74368ec8b5f6a266a298cb6776b79..25aa9c58e522ec0cdecceeeb296f129c8da92a2d 100644 (file)
--- a/youtube_dl/extractor/turbo.py
+++ b/youtube_dl/extractor/turbo.py
@@ -24,7 +24,7 @@ class TurboIE(InfoExtractor):
              'duration': 3715,
              'title': 'Turbo du 07/09/2014 : Renault Twingo 3, Bentley Continental GT Speed, CES, Guide Achat Dacia... ',
              'description': 'Turbo du 07/09/2014 : Renault Twingo 3, Bentley Continental GT Speed, CES, Guide Achat Dacia...',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          }
      }
  
diff --git a/youtube_dl/extractor/tv2.py b/youtube_dl/extractor/tv2.py

index bd28267b0cb6a0154133c98f567c24f054b5459a..d5071e8a5faf72a25c6b21cdfb2987b6a731fb32 100644 (file)
--- a/youtube_dl/extractor/tv2.py
+++ b/youtube_dl/extractor/tv2.py
@@ -126,7 +126,7 @@ def _real_extract(self, url):
  
          if not assets:
              # New embed pattern
-            for v in re.findall('TV2ContentboxVideo\(({.+?})\)', webpage):
+            for v in re.findall(r'TV2ContentboxVideo\(({.+?})\)', webpage):
                  video = self._parse_json(
                      v, playlist_id, transform_source=js_to_json, fatal=False)
                  if not video:
diff --git a/youtube_dl/extractor/tv4.py b/youtube_dl/extractor/tv4.py

index 5d2d8f13239e6ac5b10f5506143216301e5d4ecf..ad79db92beb3825dc1293b047acf7c61ca99386a 100644 (file)
--- a/youtube_dl/extractor/tv4.py
+++ b/youtube_dl/extractor/tv4.py
@@ -4,11 +4,10 @@
  from .common import InfoExtractor
  from ..compat import compat_str
  from ..utils import (
-    ExtractorError,
      int_or_none,
      parse_iso8601,
      try_get,
-    update_url_query,
+    determine_ext,
  )
  
  
@@ -28,24 +27,24 @@ class TV4IE(InfoExtractor):
      _TESTS = [
          {
              'url': 'http://www.tv4.se/kalla-fakta/klipp/kalla-fakta-5-english-subtitles-2491650',
-            'md5': '909d6454b87b10a25aa04c4bdd416a9b',
+            'md5': 'cb837212f342d77cec06e6dad190e96d',
              'info_dict': {
                  'id': '2491650',
                  'ext': 'mp4',
                  'title': 'Kalla Fakta 5 (english subtitles)',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
                  'timestamp': int,
                  'upload_date': '20131125',
              },
          },
          {
              'url': 'http://www.tv4play.se/iframe/video/3054113',
-            'md5': '77f851c55139ffe0ebd41b6a5552489b',
+            'md5': 'cb837212f342d77cec06e6dad190e96d',
              'info_dict': {
                  'id': '3054113',
                  'ext': 'mp4',
                  'title': 'Så här jobbar ficktjuvarna - se avslöjande bilder',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
                  'description': 'Unika bilder avslöjar hur turisternas fickor vittjas mitt på Stockholms central. Två experter på ficktjuvarna avslöjar knepen du ska se upp för.',
                  'timestamp': int,
                  'upload_date': '20150130',
@@ -75,11 +74,10 @@ def _real_extract(self, url):
          # If is_geo_restricted is true, it doesn't necessarily mean we can't download it
          if info.get('is_geo_restricted'):
              self.report_warning('This content might not be available in your country due to licensing restrictions.')
-        if info.get('requires_subscription'):
-            raise ExtractorError('This content requires subscription.', expected=True)
  
          title = info['title']
  
+        subtitles = {}
          formats = []
          # http formats are linked with unresolvable host
          for kind in ('hls', ''):
@@ -87,26 +85,41 @@ def _real_extract(self, url):
                  'https://prima.tv4play.se/api/web/asset/%s/play.json' % video_id,
                  video_id, 'Downloading sources JSON', query={
                      'protocol': kind,
-                    'videoFormat': 'MP4+WEBVTTS+WEBVTT',
+                    'videoFormat': 'MP4+WEBVTT',
                  })
-            item = try_get(data, lambda x: x['playback']['items']['item'], dict)
-            manifest_url = item.get('url')
-            if not isinstance(manifest_url, compat_str):
+            items = try_get(data, lambda x: x['playback']['items']['item'])
+            if not items:
                  continue
-            if kind == 'hls':
-                formats.extend(self._extract_m3u8_formats(
-                    manifest_url, video_id, 'mp4', entry_protocol='m3u8_native',
-                    m3u8_id=kind, fatal=False))
-            else:
-                formats.extend(self._extract_f4m_formats(
-                    update_url_query(manifest_url, {'hdcore': '3.8.0'}),
-                    video_id, f4m_id='hds', fatal=False))
+            if isinstance(items, dict):
+                items = [items]
+            for item in items:
+                manifest_url = item.get('url')
+                if not isinstance(manifest_url, compat_str):
+                    continue
+                ext = determine_ext(manifest_url)
+                if ext == 'm3u8':
+                    formats.extend(self._extract_m3u8_formats(
+                        manifest_url, video_id, 'mp4', entry_protocol='m3u8_native',
+                        m3u8_id=kind, fatal=False))
+                elif ext == 'f4m':
+                    formats.extend(self._extract_akamai_formats(
+                        manifest_url, video_id, {
+                            'hls': 'tv4play-i.akamaihd.net',
+                        }))
+                elif ext == 'webvtt':
+                    subtitles = self._merge_subtitles(
+                        subtitles, {
+                            'sv': [{
+                                'url': manifest_url,
+                                'ext': 'vtt',
+                            }]})
          self._sort_formats(formats)
  
          return {
              'id': video_id,
              'title': title,
              'formats': formats,
+            'subtitles': subtitles,
              'description': info.get('description'),
              'timestamp': parse_iso8601(info.get('broadcast_date_time')),
              'duration': int_or_none(info.get('duration')),
diff --git a/youtube_dl/extractor/tva.py b/youtube_dl/extractor/tva.py

new file mode 100644 (file)

index 0000000..3ced098
--- /dev/null
+++ b/youtube_dl/extractor/tva.py
@@ -0,0 +1,54 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import (
+    int_or_none,
+    parse_iso8601,
+    smuggle_url,
+)
+
+
+class TVAIE(InfoExtractor):
+    _VALID_URL = r'https?://videos\.tva\.ca/episode/(?P<id>\d+)'
+    _TEST = {
+        'url': 'http://videos.tva.ca/episode/85538',
+        'info_dict': {
+            'id': '85538',
+            'ext': 'mp4',
+            'title': 'Épisode du 25 janvier 2017',
+            'description': 'md5:e9e7fb5532ab37984d2dc87229cadf98',
+            'upload_date': '20170126',
+            'timestamp': 1485442329,
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        }
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        video_data = self._download_json(
+            "https://d18jmrhziuoi7p.cloudfront.net/isl/api/v1/dataservice/Items('%s')" % video_id,
+            video_id, query={
+                '$expand': 'Metadata,CustomId',
+                '$select': 'Metadata,Id,Title,ShortDescription,LongDescription,CreatedDate,CustomId,AverageUserRating,Categories,ShowName',
+                '$format': 'json',
+            })
+        metadata = video_data.get('Metadata', {})
+
+        return {
+            '_type': 'url_transparent',
+            'id': video_id,
+            'title': video_data['Title'],
+            'url': smuggle_url('ooyala:' + video_data['CustomId'], {'supportedformats': 'm3u8,hds'}),
+            'description': video_data.get('LongDescription') or video_data.get('ShortDescription'),
+            'series': video_data.get('ShowName'),
+            'episode': metadata.get('EpisodeTitle'),
+            'episode_number': int_or_none(metadata.get('EpisodeNumber')),
+            'categories': video_data.get('Categories'),
+            'average_rating': video_data.get('AverageUserRating'),
+            'timestamp': parse_iso8601(video_data.get('CreatedDate')),
+            'ie_key': 'Ooyala',
+        }
diff --git a/youtube_dl/extractor/tvanouvelles.py b/youtube_dl/extractor/tvanouvelles.py

new file mode 100644 (file)

index 0000000..1086176
--- /dev/null
+++ b/youtube_dl/extractor/tvanouvelles.py
@@ -0,0 +1,65 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from .brightcove import BrightcoveNewIE
+
+
+class TVANouvellesIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?tvanouvelles\.ca/videos/(?P<id>\d+)'
+    _TEST = {
+        'url': 'http://www.tvanouvelles.ca/videos/5117035533001',
+        'info_dict': {
+            'id': '5117035533001',
+            'ext': 'mp4',
+            'title': 'L’industrie du taxi dénonce l’entente entre Québec et Uber: explications',
+            'description': 'md5:479653b7c8cf115747bf5118066bd8b3',
+            'uploader_id': '1741764581',
+            'timestamp': 1473352030,
+            'upload_date': '20160908',
+        },
+        'add_ie': ['BrightcoveNew'],
+    }
+    BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/1741764581/default_default/index.html?videoId=%s'
+
+    def _real_extract(self, url):
+        brightcove_id = self._match_id(url)
+        return self.url_result(
+            self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id,
+            BrightcoveNewIE.ie_key(), brightcove_id)
+
+
+class TVANouvellesArticleIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?tvanouvelles\.ca/(?:[^/]+/)+(?P<id>[^/?#&]+)'
+    _TEST = {
+        'url': 'http://www.tvanouvelles.ca/2016/11/17/des-policiers-qui-ont-la-meche-un-peu-courte',
+        'info_dict': {
+            'id': 'des-policiers-qui-ont-la-meche-un-peu-courte',
+            'title': 'Des policiers qui ont «la mèche un peu courte»?',
+            'description': 'md5:92d363c8eb0f0f030de9a4a84a90a3a0',
+        },
+        'playlist_mincount': 4,
+    }
+
+    @classmethod
+    def suitable(cls, url):
+        return False if TVANouvellesIE.suitable(url) else super(TVANouvellesArticleIE, cls).suitable(url)
+
+    def _real_extract(self, url):
+        display_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, display_id)
+
+        entries = [
+            self.url_result(
+                'http://www.tvanouvelles.ca/videos/%s' % mobj.group('id'),
+                ie=TVANouvellesIE.ie_key(), video_id=mobj.group('id'))
+            for mobj in re.finditer(
+                r'data-video-id=(["\'])?(?P<id>\d+)', webpage)]
+
+        title = self._og_search_title(webpage, fatal=False)
+        description = self._og_search_description(webpage)
+
+        return self.playlist_result(entries, display_id, title, description)
diff --git a/youtube_dl/extractor/tvc.py b/youtube_dl/extractor/tvc.py

index 4065354ddde2c63698908dfac81dc98cac77e79d..008f64cc2e6486cf779f482c24d86f03a740d939 100644 (file)
--- a/youtube_dl/extractor/tvc.py
+++ b/youtube_dl/extractor/tvc.py
@@ -19,7 +19,7 @@ class TVCIE(InfoExtractor):
              'id': '74622',
              'ext': 'mp4',
              'title': 'События. "События". Эфир от 22.05.2015 14:30',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 1122,
          },
      }
@@ -72,7 +72,7 @@ class TVCArticleIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'События. "События". Эфир от 22.05.2015 14:30',
              'description': 'md5:ad7aa7db22903f983e687b8a3e98c6dd',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 1122,
          },
      }, {
@@ -82,7 +82,7 @@ class TVCArticleIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Эксперты: в столице встал вопрос о максимально безопасных остановках',
              'description': 'md5:f2098f71e21f309e89f69b525fd9846e',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 278,
          },
      }, {
@@ -92,7 +92,7 @@ class TVCArticleIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Ещё не поздно. Эфир от 03.08.2013',
              'description': 'md5:51fae9f3f8cfe67abce014e428e5b027',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 3316,
          },
      }]
diff --git a/youtube_dl/extractor/tweakers.py b/youtube_dl/extractor/tweakers.py

index 7a9386cde3d9e0e5d78bfd368d47819430c53e85..2b10d9bcaec909caa303bb33c1621527a3299797 100644 (file)
--- a/youtube_dl/extractor/tweakers.py
+++ b/youtube_dl/extractor/tweakers.py
@@ -18,7 +18,7 @@ class TweakersIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'New Nintendo 3DS XL - Op alle fronten beter',
              'description': 'md5:3789b21fed9c0219e9bcaacd43fab280',
-            'thumbnail': 're:^https?://.*\.jpe?g$',
+            'thumbnail': r're:^https?://.*\.jpe?g$',
              'duration': 386,
              'uploader_id': 's7JeEm',
          }
diff --git a/youtube_dl/extractor/twentyfourvideo.py b/youtube_dl/extractor/twentyfourvideo.py

index af92b713b08e22343f84a282d3db59b355623f04..a983ebf05ac512242415a3052fbd172668ff060e 100644 (file)
--- a/youtube_dl/extractor/twentyfourvideo.py
+++ b/youtube_dl/extractor/twentyfourvideo.py
@@ -12,7 +12,7 @@
  
  class TwentyFourVideoIE(InfoExtractor):
      IE_NAME = '24video'
-    _VALID_URL = r'https?://(?:www\.)?24video\.(?:net|me|xxx)/(?:video/(?:view|xml)/|player/new24_play\.swf\?id=)(?P<id>\d+)'
+    _VALID_URL = r'https?://(?:www\.)?24video\.(?:net|me|xxx|sex)/(?:video/(?:view|xml)/|player/new24_play\.swf\?id=)(?P<id>\d+)'
  
      _TESTS = [{
          'url': 'http://www.24video.net/video/view/1044982',
@@ -22,7 +22,7 @@ class TwentyFourVideoIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Эротика каменного века',
              'description': 'Как смотрели порно в каменном веке.',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'uploader': 'SUPERTELO',
              'duration': 31,
              'timestamp': 1275937857,
@@ -43,7 +43,7 @@ def _real_extract(self, url):
          video_id = self._match_id(url)
  
          webpage = self._download_webpage(
-            'http://www.24video.net/video/view/%s' % video_id, video_id)
+            'http://www.24video.sex/video/view/%s' % video_id, video_id)
  
          title = self._og_search_title(webpage)
          description = self._html_search_regex(
@@ -69,11 +69,11 @@ def _real_extract(self, url):
  
          # Sets some cookies
          self._download_xml(
-            r'http://www.24video.net/video/xml/%s?mode=init' % video_id,
+            r'http://www.24video.sex/video/xml/%s?mode=init' % video_id,
              video_id, 'Downloading init XML')
  
          video_xml = self._download_xml(
-            'http://www.24video.net/video/xml/%s?mode=play' % video_id,
+            'http://www.24video.sex/video/xml/%s?mode=play' % video_id,
              video_id, 'Downloading video XML')
  
          video = xpath_element(video_xml, './/video', 'video', fatal=True)
diff --git a/youtube_dl/extractor/twentymin.py b/youtube_dl/extractor/twentymin.py

index b721ecb0a106a710b6d140d7d21309307196a684..4fd1aa4bfbdaea2ec5abbac2161f6aea25e5fbbd 100644 (file)
--- a/youtube_dl/extractor/twentymin.py
+++ b/youtube_dl/extractor/twentymin.py
@@ -4,91 +4,88 @@
  import re
  
  from .common import InfoExtractor
-from ..utils import remove_end
+from ..utils import (
+    int_or_none,
+    try_get,
+)
  
  
  class TwentyMinutenIE(InfoExtractor):
      IE_NAME = '20min'
-    _VALID_URL = r'https?://(?:www\.)?20min\.ch/(?:videotv/*\?.*\bvid=(?P<id>\d+)|(?:[^/]+/)*(?P<display_id>[^/#?]+))'
+    _VALID_URL = r'''(?x)
+                    https?://
+                        (?:www\.)?20min\.ch/
+                        (?:
+                            videotv/*\?.*?\bvid=|
+                            videoplayer/videoplayer\.html\?.*?\bvideoId@
+                        )
+                        (?P<id>\d+)
+                    '''
      _TESTS = [{
-        # regular video
          'url': 'http://www.20min.ch/videotv/?vid=469148&cid=2',
-        'md5': 'b52d6bc6ea6398e6a38f12cfd418149c',
+        'md5': 'e7264320db31eed8c38364150c12496e',
          'info_dict': {
              'id': '469148',
-            'ext': 'flv',
+            'ext': 'mp4',
              'title': '85 000 Franken für 15 perfekte Minuten',
-            'description': 'Was die Besucher vom Silvesterzauber erwarten können. (Video: Alice Grosjean/Murat Temel)',
-            'thumbnail': 'http://thumbnails.20min-tv.ch/server063/469148/frame-72-469148.jpg'
-        }
-    }, {
-        # news article with video
-        'url': 'http://www.20min.ch/schweiz/news/story/-Wir-muessen-mutig-nach-vorne-schauen--22050469',
-        'md5': 'cd4cbb99b94130cff423e967cd275e5e',
-        'info_dict': {
-            'id': '469408',
-            'display_id': '-Wir-muessen-mutig-nach-vorne-schauen--22050469',
-            'ext': 'flv',
-            'title': '«Wir müssen mutig nach vorne schauen»',
-            'description': 'Kein Land sei innovativer als die Schweiz, sagte Johann Schneider-Ammann in seiner Neujahrsansprache. Das Land müsse aber seine Hausaufgaben machen.',
-            'thumbnail': 'http://www.20min.ch/images/content/2/2/0/22050469/10/teaserbreit.jpg'
+            'thumbnail': r're:https?://.*\.jpg$',
          },
-        'skip': '"This video is no longer available" is shown both on the web page and in the downloaded file.',
      }, {
-        # YouTube embed
-        'url': 'http://www.20min.ch/ro/sports/football/story/Il-marque-une-bicyclette-de-plus-de-30-metres--21115184',
-        'md5': 'cec64d59aa01c0ed9dbba9cf639dd82f',
+        'url': 'http://www.20min.ch/videoplayer/videoplayer.html?params=client@twentyDE|videoId@523629',
          'info_dict': {
-            'id': 'ivM7A7SpDOs',
+            'id': '523629',
              'ext': 'mp4',
-            'title': 'GOLAZO DE CHILENA DE JAVI GÓMEZ, FINALISTA AL BALÓN DE CLM 2016',
-            'description': 'md5:903c92fbf2b2f66c09de514bc25e9f5a',
-            'upload_date': '20160424',
-            'uploader': 'RTVCM Castilla-La Mancha',
-            'uploader_id': 'RTVCM',
+            'title': 'So kommen Sie bei Eis und Schnee sicher an',
+            'description': 'md5:117c212f64b25e3d95747e5276863f7d',
+            'thumbnail': r're:https?://.*\.jpg$',
+        },
+        'params': {
+            'skip_download': True,
          },
-        'add_ie': ['Youtube'],
      }, {
          'url': 'http://www.20min.ch/videotv/?cid=44&vid=468738',
          'only_matching': True,
-    }, {
-        'url': 'http://www.20min.ch/ro/sortir/cinema/story/Grandir-au-bahut--c-est-dur-18927411',
-        'only_matching': True,
      }]
  
+    @staticmethod
+    def _extract_urls(webpage):
+        return [m.group('url') for m in re.finditer(
+            r'<iframe[^>]+src=(["\'])(?P<url>(?:https?://)?(?:www\.)?20min\.ch/videoplayer/videoplayer.html\?.*?\bvideoId@\d+.*?)\1',
+            webpage)]
+
      def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
-        display_id = mobj.group('display_id') or video_id
+        video_id = self._match_id(url)
+
+        video = self._download_json(
+            'http://api.20min.ch/video/%s/show' % video_id,
+            video_id)['content']
  
-        webpage = self._download_webpage(url, display_id)
+        title = video['title']
  
-        youtube_url = self._html_search_regex(
-            r'<iframe[^>]+src="((?:https?:)?//www\.youtube\.com/embed/[^"]+)"',
-            webpage, 'YouTube embed URL', default=None)
-        if youtube_url is not None:
-            return self.url_result(youtube_url, 'Youtube')
+        formats = [{
+            'format_id': format_id,
+            'url': 'http://podcast.20min-tv.ch/podcast/20min/%s%s.mp4' % (video_id, p),
+            'quality': quality,
+        } for quality, (format_id, p) in enumerate([('sd', ''), ('hd', 'h')])]
+        self._sort_formats(formats)
  
-        title = self._html_search_regex(
-            r'<h1>.*?<span>(.+?)</span></h1>',
-            webpage, 'title', default=None)
-        if not title:
-            title = remove_end(re.sub(
-                r'^20 [Mm]inuten.*? -', '', self._og_search_title(webpage)), ' - News')
+        description = video.get('lead')
+        thumbnail = video.get('thumbnail')
  
-        if not video_id:
-            video_id = self._search_regex(
-                r'"file\d?"\s*,\s*\"(\d+)', webpage, 'video id')
+        def extract_count(kind):
+            return try_get(
+                video,
+                lambda x: int_or_none(x['communityobject']['thumbs_%s' % kind]))
  
-        description = self._html_search_meta(
-            'description', webpage, 'description')
-        thumbnail = self._og_search_thumbnail(webpage)
+        like_count = extract_count('up')
+        dislike_count = extract_count('down')
  
          return {
              'id': video_id,
-            'display_id': display_id,
-            'url': 'http://speed.20min-tv.ch/%sm.flv' % video_id,
              'title': title,
              'description': description,
              'thumbnail': thumbnail,
+            'like_count': like_count,
+            'dislike_count': dislike_count,
+            'formats': formats,
          }
diff --git a/youtube_dl/extractor/twitch.py b/youtube_dl/extractor/twitch.py

index 77414a242d68f2309985235f0418d47c77194417..1ca159a4db62b70dfea23d93e37fb04324b6f563 100644 (file)
--- a/youtube_dl/extractor/twitch.py
+++ b/youtube_dl/extractor/twitch.py
@@ -22,6 +22,7 @@
      orderedSet,
      parse_duration,
      parse_iso8601,
+    update_url_query,
      urlencode_postdata,
  )
  
@@ -205,7 +206,14 @@ class TwitchChapterIE(TwitchItemBaseIE):
  
  class TwitchVodIE(TwitchItemBaseIE):
      IE_NAME = 'twitch:vod'
-    _VALID_URL = r'%s/[^/]+/v/(?P<id>\d+)' % TwitchBaseIE._VALID_URL_BASE
+    _VALID_URL = r'''(?x)
+                    https?://
+                        (?:
+                            (?:www\.)?twitch\.tv/(?:[^/]+/v|videos)/|
+                            player\.twitch\.tv/\?.*?\bvideo=v
+                        )
+                        (?P<id>\d+)
+                    '''
      _ITEM_TYPE = 'vod'
      _ITEM_SHORTCUT = 'v'
  
@@ -215,7 +223,7 @@ class TwitchVodIE(TwitchItemBaseIE):
              'id': 'v6528877',
              'ext': 'mp4',
              'title': 'LCK Summer Split - Week 6 Day 1',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 17208,
              'timestamp': 1435131709,
              'upload_date': '20150624',
@@ -235,7 +243,7 @@ class TwitchVodIE(TwitchItemBaseIE):
              'id': 'v11230755',
              'ext': 'mp4',
              'title': 'Untitled Broadcast',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 1638,
              'timestamp': 1439746708,
              'upload_date': '20150816',
@@ -248,6 +256,12 @@ class TwitchVodIE(TwitchItemBaseIE):
              'skip_download': True,
          },
          'skip': 'HTTP Error 404: Not Found',
+    }, {
+        'url': 'http://player.twitch.tv/?t=5m10s&video=v6528877',
+        'only_matching': True,
+    }, {
+        'url': 'https://www.twitch.tv/videos/6528877',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
@@ -279,6 +293,18 @@ def _real_extract(self, url):
          if 't' in query:
              info['start_time'] = parse_duration(query['t'][0])
  
+        if info.get('timestamp') is not None:
+            info['subtitles'] = {
+                'rechat': [{
+                    'url': update_url_query(
+                        'https://rechat.twitch.tv/rechat-messages', {
+                            'video_id': 'v%s' % item_id,
+                            'start': info['timestamp'],
+                        }),
+                    'ext': 'json',
+                }],
+            }
+
          return info
  
  
@@ -300,7 +326,7 @@ def _extract_playlist(self, channel_id):
              response = self._call_api(
                  self._PLAYLIST_PATH % (channel_id, offset, limit),
                  channel_id,
-                'Downloading %s videos JSON page %s'
+                'Downloading %s JSON page %s'
                  % (self._PLAYLIST_TYPE, counter_override or counter))
              page_entries = self._extract_playlist_page(response)
              if not page_entries:
@@ -350,19 +376,72 @@ class TwitchProfileIE(TwitchPlaylistBaseIE):
      }
  
  
-class TwitchPastBroadcastsIE(TwitchPlaylistBaseIE):
-    IE_NAME = 'twitch:past_broadcasts'
-    _VALID_URL = r'%s/(?P<id>[^/]+)/profile/past_broadcasts/?(?:\#.*)?$' % TwitchBaseIE._VALID_URL_BASE
-    _PLAYLIST_PATH = TwitchPlaylistBaseIE._PLAYLIST_PATH + '&broadcasts=true'
+class TwitchVideosBaseIE(TwitchPlaylistBaseIE):
+    _VALID_URL_VIDEOS_BASE = r'%s/(?P<id>[^/]+)/videos' % TwitchBaseIE._VALID_URL_BASE
+    _PLAYLIST_PATH = TwitchPlaylistBaseIE._PLAYLIST_PATH + '&broadcast_type='
+
+
+class TwitchAllVideosIE(TwitchVideosBaseIE):
+    IE_NAME = 'twitch:videos:all'
+    _VALID_URL = r'%s/all' % TwitchVideosBaseIE._VALID_URL_VIDEOS_BASE
+    _PLAYLIST_PATH = TwitchVideosBaseIE._PLAYLIST_PATH + 'archive,upload,highlight'
+    _PLAYLIST_TYPE = 'all videos'
+
+    _TEST = {
+        'url': 'https://www.twitch.tv/spamfish/videos/all',
+        'info_dict': {
+            'id': 'spamfish',
+            'title': 'Spamfish',
+        },
+        'playlist_mincount': 869,
+    }
+
+
+class TwitchUploadsIE(TwitchVideosBaseIE):
+    IE_NAME = 'twitch:videos:uploads'
+    _VALID_URL = r'%s/uploads' % TwitchVideosBaseIE._VALID_URL_VIDEOS_BASE
+    _PLAYLIST_PATH = TwitchVideosBaseIE._PLAYLIST_PATH + 'upload'
+    _PLAYLIST_TYPE = 'uploads'
+
+    _TEST = {
+        'url': 'https://www.twitch.tv/spamfish/videos/uploads',
+        'info_dict': {
+            'id': 'spamfish',
+            'title': 'Spamfish',
+        },
+        'playlist_mincount': 0,
+    }
+
+
+class TwitchPastBroadcastsIE(TwitchVideosBaseIE):
+    IE_NAME = 'twitch:videos:past-broadcasts'
+    _VALID_URL = r'%s/past-broadcasts' % TwitchVideosBaseIE._VALID_URL_VIDEOS_BASE
+    _PLAYLIST_PATH = TwitchVideosBaseIE._PLAYLIST_PATH + 'archive'
      _PLAYLIST_TYPE = 'past broadcasts'
  
      _TEST = {
-        'url': 'http://www.twitch.tv/spamfish/profile/past_broadcasts',
+        'url': 'https://www.twitch.tv/spamfish/videos/past-broadcasts',
+        'info_dict': {
+            'id': 'spamfish',
+            'title': 'Spamfish',
+        },
+        'playlist_mincount': 0,
+    }
+
+
+class TwitchHighlightsIE(TwitchVideosBaseIE):
+    IE_NAME = 'twitch:videos:highlights'
+    _VALID_URL = r'%s/highlights' % TwitchVideosBaseIE._VALID_URL_VIDEOS_BASE
+    _PLAYLIST_PATH = TwitchVideosBaseIE._PLAYLIST_PATH + 'highlight'
+    _PLAYLIST_TYPE = 'highlights'
+
+    _TEST = {
+        'url': 'https://www.twitch.tv/spamfish/videos/highlights',
          'info_dict': {
              'id': 'spamfish',
              'title': 'Spamfish',
          },
-        'playlist_mincount': 54,
+        'playlist_mincount': 805,
      }
  
  
@@ -474,7 +553,7 @@ class TwitchClipsIE(InfoExtractor):
              'id': 'AggressiveCobraPoooound',
              'ext': 'mp4',
              'title': 'EA Play 2016 Live from the Novo Theatre',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'creator': 'EA',
              'uploader': 'stereotype_',
              'uploader_id': 'stereotype_',
diff --git a/youtube_dl/extractor/twitter.py b/youtube_dl/extractor/twitter.py

index 3411fcf7eb753154aa034474641b5327be7ea127..37e3bc4129fdc43033556d5ab9941965f70b3b8c 100644 (file)
--- a/youtube_dl/extractor/twitter.py
+++ b/youtube_dl/extractor/twitter.py
@@ -25,7 +25,7 @@ def _get_vmap_video_url(self, vmap_url, video_id):
  
  class TwitterCardIE(TwitterBaseIE):
      IE_NAME = 'twitter:card'
-    _VALID_URL = r'https?://(?:www\.)?twitter\.com/i/(?:cards/tfw/v1|videos/tweet)/(?P<id>\d+)'
+    _VALID_URL = r'https?://(?:www\.)?twitter\.com/i/(?:cards/tfw/v1|videos(?:/tweet)?)/(?P<id>\d+)'
      _TESTS = [
          {
              'url': 'https://twitter.com/i/cards/tfw/v1/560070183650213889',
@@ -34,7 +34,7 @@ class TwitterCardIE(TwitterBaseIE):
                  'id': '560070183650213889',
                  'ext': 'mp4',
                  'title': 'Twitter Card',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
                  'duration': 30.033,
              }
          },
@@ -45,7 +45,7 @@ class TwitterCardIE(TwitterBaseIE):
                  'id': '623160978427936768',
                  'ext': 'mp4',
                  'title': 'Twitter Card',
-                'thumbnail': 're:^https?://.*\.jpg',
+                'thumbnail': r're:^https?://.*\.jpg',
                  'duration': 80.155,
              },
          },
@@ -82,8 +82,11 @@ class TwitterCardIE(TwitterBaseIE):
                  'id': '705235433198714880',
                  'ext': 'mp4',
                  'title': 'Twitter web player',
-                'thumbnail': 're:^https?://.*\.jpg',
+                'thumbnail': r're:^https?://.*\.jpg',
              },
+        }, {
+            'url': 'https://twitter.com/i/videos/752274308186120192',
+            'only_matching': True,
          },
      ]
  
@@ -198,7 +201,7 @@ class TwitterIE(InfoExtractor):
              'id': '643211948184596480',
              'ext': 'mp4',
              'title': 'FREE THE NIPPLE - FTN supporters on Hollywood Blvd today!',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'description': 'FREE THE NIPPLE on Twitter: "FTN supporters on Hollywood Blvd today! http://t.co/c7jHH749xJ"',
              'uploader': 'FREE THE NIPPLE',
              'uploader_id': 'freethenipple',
@@ -214,7 +217,7 @@ class TwitterIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Gifs - tu vai cai tu vai cai tu nao eh capaz disso tu vai cai',
              'description': 'Gifs on Twitter: "tu vai cai tu vai cai tu nao eh capaz disso tu vai cai https://t.co/tM46VHFlO5"',
-            'thumbnail': 're:^https?://.*\.png',
+            'thumbnail': r're:^https?://.*\.png',
              'uploader': 'Gifs',
              'uploader_id': 'giphz',
          },
@@ -254,7 +257,7 @@ class TwitterIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'JG - BEAT PROD: @suhmeduh #Damndaniel',
              'description': 'JG on Twitter: "BEAT PROD: @suhmeduh  https://t.co/HBrQ4AfpvZ #Damndaniel https://t.co/byBooq2ejZ"',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'uploader': 'JG',
              'uploader_id': 'jaydingeer',
          },
diff --git a/youtube_dl/extractor/udn.py b/youtube_dl/extractor/udn.py

index 57dd73aef6f6254f22cdcd814e2d76b20c75b847..daf45d0b4e1a3710832875f79e160ebc759849dd 100644 (file)
--- a/youtube_dl/extractor/udn.py
+++ b/youtube_dl/extractor/udn.py
@@ -23,7 +23,7 @@ class UDNEmbedIE(InfoExtractor):
              'id': '300040',
              'ext': 'mp4',
              'title': '生物老師男變女 全校挺"做自己"',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          },
          'params': {
              # m3u8 download
diff --git a/youtube_dl/extractor/uktvplay.py b/youtube_dl/extractor/uktvplay.py

new file mode 100644 (file)

index 0000000..2137502
--- /dev/null
+++ b/youtube_dl/extractor/uktvplay.py
@@ -0,0 +1,33 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+
+
+class UKTVPlayIE(InfoExtractor):
+    _VALID_URL = r'https?://uktvplay\.uktv\.co\.uk/.+?\?.*?\bvideo=(?P<id>\d+)'
+    _TEST = {
+        'url': 'https://uktvplay.uktv.co.uk/shows/world-at-war/c/200/watch-online/?video=2117008346001',
+        'md5': '',
+        'info_dict': {
+            'id': '2117008346001',
+            'ext': 'mp4',
+            'title': 'Pincers',
+            'description': 'Pincers',
+            'uploader_id': '1242911124001',
+            'upload_date': '20130124',
+            'timestamp': 1359049267,
+        },
+        'params': {
+            # m3u8 download
+            'skip_download': True,
+        },
+        'expected_warnings': ['Failed to download MPD manifest']
+    }
+    BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/1242911124001/H1xnMOqP_default/index.html?videoId=%s'
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        return self.url_result(
+            self.BRIGHTCOVE_URL_TEMPLATE % video_id,
+            'BrightcoveNew', video_id)
diff --git a/youtube_dl/extractor/uol.py b/youtube_dl/extractor/uol.py

index c27c643871a5c741a11a134474c2dba20dc91e5e..e67083004789f250faf842ee31fc2b343ad54754 100644 (file)
--- a/youtube_dl/extractor/uol.py
+++ b/youtube_dl/extractor/uol.py
@@ -84,12 +84,27 @@ class UOLIE(InfoExtractor):
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
-        if not video_id.isdigit():
-            embed_page = self._download_webpage('https://jsuol.com.br/c/tv/uol/embed/?params=[embed,%s]' % video_id, video_id)
-            video_id = self._search_regex(r'mediaId=(\d+)', embed_page, 'media id')
+        media_id = None
+
+        if video_id.isdigit():
+            media_id = video_id
+
+        if not media_id:
+            embed_page = self._download_webpage(
+                'https://jsuol.com.br/c/tv/uol/embed/?params=[embed,%s]' % video_id,
+                video_id, 'Downloading embed page', fatal=False)
+            if embed_page:
+                media_id = self._search_regex(
+                    (r'uol\.com\.br/(\d+)', r'mediaId=(\d+)'),
+                    embed_page, 'media id', default=None)
+
+        if not media_id:
+            webpage = self._download_webpage(url, video_id)
+            media_id = self._search_regex(r'mediaId=(\d+)', webpage, 'media id')
+
          video_data = self._download_json(
-            'http://mais.uol.com.br/apiuol/v3/player/getMedia/%s.json' % video_id,
-            video_id)['item']
+            'http://mais.uol.com.br/apiuol/v3/player/getMedia/%s.json' % media_id,
+            media_id)['item']
          title = video_data['title']
  
          query = {
@@ -118,7 +133,7 @@ def _real_extract(self, url):
              tags.append(tag_description)
  
          return {
-            'id': video_id,
+            'id': media_id,
              'title': title,
              'description': clean_html(video_data.get('desMedia')),
              'thumbnail': video_data.get('thumbnail'),
diff --git a/youtube_dl/extractor/uplynk.py b/youtube_dl/extractor/uplynk.py

index 2cd22cf8a1afa51403b3b9801ca7dd08c03503a9..f06bf5b127fd0f352937d79fa6d5267fcb7cdb26 100644 (file)
--- a/youtube_dl/extractor/uplynk.py
+++ b/youtube_dl/extractor/uplynk.py
@@ -30,7 +30,9 @@ class UplynkIE(InfoExtractor):
      def _extract_uplynk_info(self, uplynk_content_url):
          path, external_id, video_id, session_id = re.match(UplynkIE._VALID_URL, uplynk_content_url).groups()
          display_id = video_id or external_id
-        formats = self._extract_m3u8_formats('http://content.uplynk.com/%s.m3u8' % path, display_id, 'mp4')
+        formats = self._extract_m3u8_formats(
+            'http://content.uplynk.com/%s.m3u8' % path,
+            display_id, 'mp4', 'm3u8_native')
          if session_id:
              for f in formats:
                  f['extra_param_to_segment_url'] = 'pbs=' + session_id
diff --git a/youtube_dl/extractor/urort.py b/youtube_dl/extractor/urort.py

index 8872cfcb2795ab0bfb9db1ad5418eb61dd0dffc6..8f6edab4b1f21b241b41accfe4cafefc2dd0092f 100644 (file)
--- a/youtube_dl/extractor/urort.py
+++ b/youtube_dl/extractor/urort.py
@@ -21,7 +21,7 @@ class UrortIE(InfoExtractor):
              'id': '33124-24',
              'ext': 'mp3',
              'title': 'The Bomb',
-            'thumbnail': 're:^https?://.+\.jpg',
+            'thumbnail': r're:^https?://.+\.jpg',
              'uploader': 'Gerilja',
              'uploader_id': 'Gerilja',
              'upload_date': '20100323',
diff --git a/youtube_dl/extractor/ustream.py b/youtube_dl/extractor/ustream.py

index 0c06bf36bd5f76cabecc47e699ad56a45ba63a4a..5737d4d16c853193f5beb08b2eff7126c6be3d3c 100644 (file)
--- a/youtube_dl/extractor/ustream.py
+++ b/youtube_dl/extractor/ustream.py
@@ -69,6 +69,13 @@ class UstreamIE(InfoExtractor):
          },
      }]
  
+    @staticmethod
+    def _extract_url(webpage):
+        mobj = re.search(
+            r'<iframe[^>]+?src=(["\'])(?P<url>http://www\.ustream\.tv/embed/.+?)\1', webpage)
+        if mobj is not None:
+            return mobj.group('url')
+
      def _get_stream_info(self, url, video_id, app_id_ver, extra_note=None):
          def num_to_hex(n):
              return hex(n)[2:]
diff --git a/youtube_dl/extractor/ustudio.py b/youtube_dl/extractor/ustudio.py

index 3484a204658e1f09d472c0b31026ec6621121f1f..56509beedc027ae9eae3f3f36d91be32238d729a 100644 (file)
--- a/youtube_dl/extractor/ustudio.py
+++ b/youtube_dl/extractor/ustudio.py
@@ -22,7 +22,7 @@ class UstudioIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'San Francisco: Golden Gate Bridge',
              'description': 'md5:23925500697f2c6d4830e387ba51a9be',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'upload_date': '20111107',
              'uploader': 'Tony Farley',
          }
diff --git a/youtube_dl/extractor/varzesh3.py b/youtube_dl/extractor/varzesh3.py

index 84698371a8ab2daf77faae1684141eb32425f232..f474ed73f861910d9c593510a4aff6be8244e903 100644 (file)
--- a/youtube_dl/extractor/varzesh3.py
+++ b/youtube_dl/extractor/varzesh3.py
@@ -22,7 +22,7 @@ class Varzesh3IE(InfoExtractor):
              'ext': 'mp4',
              'title': '۵ واکنش برتر دروازه‌بانان؛هفته ۲۶ بوندسلیگا',
              'description': 'فصل ۲۰۱۵-۲۰۱۴',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          },
          'skip': 'HTTP 404 Error',
      }, {
@@ -67,7 +67,7 @@ def _real_extract(self, url):
              webpage, display_id, default=None)
          if video_id is None:
              video_id = self._search_regex(
-                'var\s+VideoId\s*=\s*(\d+);', webpage, 'video id',
+                r'var\s+VideoId\s*=\s*(\d+);', webpage, 'video id',
                  default=display_id)
  
          return {
diff --git a/youtube_dl/extractor/vbox7.py b/youtube_dl/extractor/vbox7.py

index a1e0851b7424e4c73cd34b72c02f16bc1905b6ce..bef6394626d4eca25785d35199c1092f69f45b54 100644 (file)
--- a/youtube_dl/extractor/vbox7.py
+++ b/youtube_dl/extractor/vbox7.py
@@ -4,11 +4,22 @@
  import re
  
  from .common import InfoExtractor
-from ..utils import urlencode_postdata
+from ..utils import ExtractorError
  
  
  class Vbox7IE(InfoExtractor):
-    _VALID_URL = r'https?://(?:www\.)?vbox7\.com/(?:play:|emb/external\.php\?.*?\bvid=)(?P<id>[\da-fA-F]+)'
+    _VALID_URL = r'''(?x)
+                    https?://
+                        (?:[^/]+\.)?vbox7\.com/
+                        (?:
+                            play:|
+                            (?:
+                                emb/external\.php|
+                                player/ext\.swf
+                            )\?.*?\bvid=
+                        )
+                        (?P<id>[\da-fA-F]+)
+                    '''
      _TESTS = [{
          'url': 'http://vbox7.com/play:0946fff23c',
          'md5': 'a60f9ab3a3a2f013ef9a967d5f7be5bf',
@@ -16,6 +27,14 @@ class Vbox7IE(InfoExtractor):
              'id': '0946fff23c',
              'ext': 'mp4',
              'title': 'Борисов: Притеснен съм за бъдещето на България',
+            'description': 'По думите му е опасно страната ни да бъде обявена за "сигурна"',
+            'thumbnail': r're:^https?://.*\.jpg$',
+            'timestamp': 1470982814,
+            'upload_date': '20160812',
+            'uploader': 'zdraveibulgaria',
+        },
+        'params': {
+            'proxy': '127.0.0.1:8118',
          },
      }, {
          'url': 'http://vbox7.com/play:249bb972c2',
@@ -29,12 +48,15 @@ class Vbox7IE(InfoExtractor):
      }, {
          'url': 'http://vbox7.com/emb/external.php?vid=a240d20f9c&autoplay=1',
          'only_matching': True,
+    }, {
+        'url': 'http://i49.vbox7.com/player/ext.swf?vid=0946fff23c&autoplay=1',
+        'only_matching': True,
      }]
  
      @staticmethod
      def _extract_url(webpage):
          mobj = re.search(
-            '<iframe[^>]+src=(?P<q>["\'])(?P<url>(?:https?:)?//vbox7\.com/emb/external\.php.+?)(?P=q)',
+            r'<iframe[^>]+src=(?P<q>["\'])(?P<url>(?:https?:)?//vbox7\.com/emb/external\.php.+?)(?P=q)',
              webpage)
          if mobj:
              return mobj.group('url')
@@ -42,33 +64,41 @@ def _extract_url(webpage):
      def _real_extract(self, url):
          video_id = self._match_id(url)
  
-        webpage = self._download_webpage(
-            'http://vbox7.com/play:%s' % video_id, video_id)
-
-        title = self._html_search_regex(
-            r'<title>(.+?)</title>', webpage, 'title').split('/')[0].strip()
+        response = self._download_json(
+            'https://www.vbox7.com/ajax/video/nextvideo.php?vid=%s' % video_id,
+            video_id)
  
-        video_url = self._search_regex(
-            r'src\s*:\s*(["\'])(?P<url>.+?.mp4.*?)\1',
-            webpage, 'video url', default=None, group='url')
+        if 'error' in response:
+            raise ExtractorError(
+                '%s said: %s' % (self.IE_NAME, response['error']), expected=True)
  
-        thumbnail_url = self._og_search_thumbnail(webpage)
+        video = response['options']
  
-        if not video_url:
-            info_response = self._download_webpage(
-                'http://vbox7.com/play/magare.do', video_id,
-                'Downloading info webpage',
-                data=urlencode_postdata({'as3': '1', 'vid': video_id}),
-                headers={'Content-Type': 'application/x-www-form-urlencoded'})
-            final_url, thumbnail_url = map(
-                lambda x: x.split('=')[1], info_response.split('&'))
+        title = video['title']
+        video_url = video['src']
  
          if '/na.mp4' in video_url:
              self.raise_geo_restricted()
  
-        return {
+        uploader = video.get('uploader')
+
+        webpage = self._download_webpage(
+            'http://vbox7.com/play:%s' % video_id, video_id, fatal=None)
+
+        info = {}
+
+        if webpage:
+            info = self._search_json_ld(
+                webpage.replace('"/*@context"', '"@context"'), video_id,
+                fatal=False)
+
+        info.update({
              'id': video_id,
-            'url': self._proto_relative_url(video_url, 'http:'),
              'title': title,
-            'thumbnail': thumbnail_url,
-        }
+            'url': video_url,
+            'uploader': uploader,
+            'thumbnail': self._proto_relative_url(
+                info.get('thumbnail') or self._og_search_thumbnail(webpage),
+                'http:'),
+        })
+        return info
diff --git a/youtube_dl/extractor/vessel.py b/youtube_dl/extractor/vessel.py

index 6b9c227db7a8a88e89b2df8efd3e067613bf605a..80a643dfe6d6a7a160cb4035b52b9a95b03769cf 100644 (file)
--- a/youtube_dl/extractor/vessel.py
+++ b/youtube_dl/extractor/vessel.py
@@ -24,7 +24,7 @@ class VesselIE(InfoExtractor):
              'id': 'HDN7G5UMs',
              'ext': 'mp4',
              'title': 'Nvidia GeForce GTX Titan X - The Best Video Card on the Market?',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'upload_date': '20150317',
              'description': 'Did Nvidia pull out all the stops on the Titan X, or does its performance leave something to be desired?',
              'timestamp': int,
diff --git a/youtube_dl/extractor/vevo.py b/youtube_dl/extractor/vevo.py

index 783efda7d337217fe0ed86a97a5dfa0902a5b7bf..c4e37f694426c175b1f33d7795ca01baf6f7547b 100644 (file)
--- a/youtube_dl/extractor/vevo.py
+++ b/youtube_dl/extractor/vevo.py
@@ -4,9 +4,9 @@
  
  from .common import InfoExtractor
  from ..compat import (
-    compat_etree_fromstring,
      compat_str,
      compat_urlparse,
+    compat_HTTPError,
  )
  from ..utils import (
      ExtractorError,
@@ -51,7 +51,7 @@ class VevoIE(VevoBaseIE):
              'artist': 'Hurts',
              'genre': 'Pop',
          },
-        'expected_warnings': ['Unable to download SMIL file'],
+        'expected_warnings': ['Unable to download SMIL file', 'Unable to download info'],
      }, {
          'note': 'v3 SMIL format',
          'url': 'http://www.vevo.com/watch/cassadee-pope/i-wish-i-could-break-your-heart/USUV71302923',
@@ -67,7 +67,7 @@ class VevoIE(VevoBaseIE):
              'artist': 'Cassadee Pope',
              'genre': 'Country',
          },
-        'expected_warnings': ['Unable to download SMIL file'],
+        'expected_warnings': ['Unable to download SMIL file', 'Unable to download info'],
      }, {
          'note': 'Age-limited video',
          'url': 'https://www.vevo.com/watch/justin-timberlake/tunnel-vision-explicit/USRV81300282',
@@ -83,7 +83,7 @@ class VevoIE(VevoBaseIE):
              'artist': 'Justin Timberlake',
              'genre': 'Pop',
          },
-        'expected_warnings': ['Unable to download SMIL file'],
+        'expected_warnings': ['Unable to download SMIL file', 'Unable to download info'],
      }, {
          'note': 'No video_info',
          'url': 'http://www.vevo.com/watch/k-camp-1/Till-I-Die/USUV71503000',
@@ -91,15 +91,33 @@ class VevoIE(VevoBaseIE):
          'info_dict': {
              'id': 'USUV71503000',
              'ext': 'mp4',
-            'title': 'K Camp - Till I Die',
+            'title': 'K Camp ft. T.I. - Till I Die',
              'age_limit': 18,
              'timestamp': 1449468000,
              'upload_date': '20151207',
              'uploader': 'K Camp',
              'track': 'Till I Die',
              'artist': 'K Camp',
-            'genre': 'Rap/Hip-Hop',
+            'genre': 'Hip-Hop',
+        },
+        'expected_warnings': ['Unable to download SMIL file', 'Unable to download info'],
+    }, {
+        'note': 'Featured test',
+        'url': 'https://www.vevo.com/watch/lemaitre/Wait/USUV71402190',
+        'md5': 'd28675e5e8805035d949dc5cf161071d',
+        'info_dict': {
+            'id': 'USUV71402190',
+            'ext': 'mp4',
+            'title': 'Lemaitre ft. LoLo - Wait',
+            'age_limit': 0,
+            'timestamp': 1413432000,
+            'upload_date': '20141016',
+            'uploader': 'Lemaitre',
+            'track': 'Wait',
+            'artist': 'Lemaitre',
+            'genre': 'Electronic',
          },
+        'expected_warnings': ['Unable to download SMIL file', 'Unable to download info'],
      }, {
          'note': 'Only available via webpage',
          'url': 'http://www.vevo.com/watch/GBUV71600656',
@@ -122,21 +140,6 @@ class VevoIE(VevoBaseIE):
          'url': 'http://www.vevo.com/watch/INS171400764',
          'only_matching': True,
      }]
-    _SMIL_BASE_URL = 'http://smil.lvl3.vevo.com'
-    _SOURCE_TYPES = {
-        0: 'youtube',
-        1: 'brightcove',
-        2: 'http',
-        3: 'hls_ios',
-        4: 'hls',
-        5: 'smil',  # http
-        7: 'f4m_cc',
-        8: 'f4m_ak',
-        9: 'f4m_l3',
-        10: 'ism',
-        13: 'smil',  # rtmp
-        18: 'dash',
-    }
      _VERSIONS = {
          0: 'youtube',  # only in AuthenticateVideo videoVersions
          1: 'level3',
@@ -145,41 +148,6 @@ class VevoIE(VevoBaseIE):
          4: 'amazon',
      }
  
-    def _parse_smil_formats(self, smil, smil_url, video_id, namespace=None, f4m_params=None, transform_rtmp_url=None):
-        formats = []
-        els = smil.findall('.//{http://www.w3.org/2001/SMIL20/Language}video')
-        for el in els:
-            src = el.attrib['src']
-            m = re.match(r'''(?xi)
-                (?P<ext>[a-z0-9]+):
-                (?P<path>
-                    [/a-z0-9]+     # The directory and main part of the URL
-                    _(?P<tbr>[0-9]+)k
-                    _(?P<width>[0-9]+)x(?P<height>[0-9]+)
-                    _(?P<vcodec>[a-z0-9]+)
-                    _(?P<vbr>[0-9]+)
-                    _(?P<acodec>[a-z0-9]+)
-                    _(?P<abr>[0-9]+)
-                    \.[a-z0-9]+  # File extension
-                )''', src)
-            if not m:
-                continue
-
-            format_url = self._SMIL_BASE_URL + m.group('path')
-            formats.append({
-                'url': format_url,
-                'format_id': 'smil_' + m.group('tbr'),
-                'vcodec': m.group('vcodec'),
-                'acodec': m.group('acodec'),
-                'tbr': int(m.group('tbr')),
-                'vbr': int(m.group('vbr')),
-                'abr': int(m.group('abr')),
-                'ext': m.group('ext'),
-                'width': int(m.group('width')),
-                'height': int(m.group('height')),
-            })
-        return formats
-
      def _initialize_api(self, video_id):
          req = sanitized_Request(
              'http://www.vevo.com/auth', data=b'')
@@ -188,7 +156,7 @@ def _initialize_api(self, video_id):
              note='Retrieving oauth token',
              errnote='Unable to retrieve oauth token')
  
-        if 'THIS PAGE IS CURRENTLY UNAVAILABLE IN YOUR REGION' in webpage:
+        if re.search(r'(?i)THIS PAGE IS CURRENTLY UNAVAILABLE IN YOUR REGION', webpage):
              self.raise_geo_restricted(
                  '%s said: This page is currently unavailable in your region' % self.IE_NAME)
  
@@ -196,145 +164,91 @@ def _initialize_api(self, video_id):
          self._api_url_template = self.http_scheme() + '//apiv2.vevo.com/%s?token=' + auth_info['access_token']
  
      def _call_api(self, path, *args, **kwargs):
-        return self._download_json(self._api_url_template % path, *args, **kwargs)
+        try:
+            data = self._download_json(self._api_url_template % path, *args, **kwargs)
+        except ExtractorError as e:
+            if isinstance(e.cause, compat_HTTPError):
+                errors = self._parse_json(e.cause.read().decode(), None)['errors']
+                error_message = ', '.join([error['message'] for error in errors])
+                raise ExtractorError('%s said: %s' % (self.IE_NAME, error_message), expected=True)
+            raise
+        return data
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
  
-        json_url = 'http://api.vevo.com/VideoService/AuthenticateVideo?isrc=%s' % video_id
-        response = self._download_json(
-            json_url, video_id, 'Downloading video info',
-            'Unable to download info', fatal=False) or {}
-        video_info = response.get('video') or {}
+        self._initialize_api(video_id)
+
+        video_info = self._call_api(
+            'video/%s' % video_id, video_id, 'Downloading api video info',
+            'Failed to download video info')
+
+        video_versions = self._call_api(
+            'video/%s/streams' % video_id, video_id,
+            'Downloading video versions info',
+            'Failed to download video versions info',
+            fatal=False)
+
+        # Some videos are only available via webpage (e.g.
+        # https://github.com/rg3/youtube-dl/issues/9366)
+        if not video_versions:
+            webpage = self._download_webpage(url, video_id)
+            video_versions = self._extract_json(webpage, video_id, 'streams')[video_id][0]
+
+        uploader = None
          artist = None
          featured_artist = None
-        uploader = None
-        view_count = None
+        artists = video_info.get('artists')
+        for curr_artist in artists:
+            if curr_artist.get('role') == 'Featured':
+                featured_artist = curr_artist['name']
+            else:
+                artist = uploader = curr_artist['name']
+
          formats = []
+        for video_version in video_versions:
+            version = self._VERSIONS.get(video_version['version'])
+            version_url = video_version.get('url')
+            if not version_url:
+                continue
  
-        if not video_info:
-            try:
-                self._initialize_api(video_id)
-            except ExtractorError:
-                ytid = response.get('errorInfo', {}).get('ytid')
-                if ytid:
-                    self.report_warning(
-                        'Video is geoblocked, trying with the YouTube video %s' % ytid)
-                    return self.url_result(ytid, 'Youtube', ytid)
-
-                raise
-
-            video_info = self._call_api(
-                'video/%s' % video_id, video_id, 'Downloading api video info',
-                'Failed to download video info')
-
-            video_versions = self._call_api(
-                'video/%s/streams' % video_id, video_id,
-                'Downloading video versions info',
-                'Failed to download video versions info',
-                fatal=False)
-
-            # Some videos are only available via webpage (e.g.
-            # https://github.com/rg3/youtube-dl/issues/9366)
-            if not video_versions:
-                webpage = self._download_webpage(url, video_id)
-                video_versions = self._extract_json(webpage, video_id, 'streams')[video_id][0]
-
-            timestamp = parse_iso8601(video_info.get('releaseDate'))
-            artists = video_info.get('artists')
-            if artists:
-                artist = uploader = artists[0]['name']
-            view_count = int_or_none(video_info.get('views', {}).get('total'))
-
-            for video_version in video_versions:
-                version = self._VERSIONS.get(video_version['version'])
-                version_url = video_version.get('url')
-                if not version_url:
+            if '.ism' in version_url:
+                continue
+            elif '.mpd' in version_url:
+                formats.extend(self._extract_mpd_formats(
+                    version_url, video_id, mpd_id='dash-%s' % version,
+                    note='Downloading %s MPD information' % version,
+                    errnote='Failed to download %s MPD information' % version,
+                    fatal=False))
+            elif '.m3u8' in version_url:
+                formats.extend(self._extract_m3u8_formats(
+                    version_url, video_id, 'mp4', 'm3u8_native',
+                    m3u8_id='hls-%s' % version,
+                    note='Downloading %s m3u8 information' % version,
+                    errnote='Failed to download %s m3u8 information' % version,
+                    fatal=False))
+            else:
+                m = re.search(r'''(?xi)
+                    _(?P<width>[0-9]+)x(?P<height>[0-9]+)
+                    _(?P<vcodec>[a-z0-9]+)
+                    _(?P<vbr>[0-9]+)
+                    _(?P<acodec>[a-z0-9]+)
+                    _(?P<abr>[0-9]+)
+                    \.(?P<ext>[a-z0-9]+)''', version_url)
+                if not m:
                      continue
  
-                if '.ism' in version_url:
-                    continue
-                elif '.mpd' in version_url:
-                    formats.extend(self._extract_mpd_formats(
-                        version_url, video_id, mpd_id='dash-%s' % version,
-                        note='Downloading %s MPD information' % version,
-                        errnote='Failed to download %s MPD information' % version,
-                        fatal=False))
-                elif '.m3u8' in version_url:
-                    formats.extend(self._extract_m3u8_formats(
-                        version_url, video_id, 'mp4', 'm3u8_native',
-                        m3u8_id='hls-%s' % version,
-                        note='Downloading %s m3u8 information' % version,
-                        errnote='Failed to download %s m3u8 information' % version,
-                        fatal=False))
-                else:
-                    m = re.search(r'''(?xi)
-                        _(?P<width>[0-9]+)x(?P<height>[0-9]+)
-                        _(?P<vcodec>[a-z0-9]+)
-                        _(?P<vbr>[0-9]+)
-                        _(?P<acodec>[a-z0-9]+)
-                        _(?P<abr>[0-9]+)
-                        \.(?P<ext>[a-z0-9]+)''', version_url)
-                    if not m:
-                        continue
-
-                    formats.append({
-                        'url': version_url,
-                        'format_id': 'http-%s-%s' % (version, video_version['quality']),
-                        'vcodec': m.group('vcodec'),
-                        'acodec': m.group('acodec'),
-                        'vbr': int(m.group('vbr')),
-                        'abr': int(m.group('abr')),
-                        'ext': m.group('ext'),
-                        'width': int(m.group('width')),
-                        'height': int(m.group('height')),
-                    })
-        else:
-            timestamp = int_or_none(self._search_regex(
-                r'/Date\((\d+)\)/',
-                video_info['releaseDate'], 'release date', fatal=False),
-                scale=1000)
-            artists = video_info.get('mainArtists')
-            if artists:
-                artist = uploader = artists[0]['artistName']
-
-            featured_artists = video_info.get('featuredArtists')
-            if featured_artists:
-                featured_artist = featured_artists[0]['artistName']
-
-            smil_parsed = False
-            for video_version in video_info['videoVersions']:
-                version = self._VERSIONS.get(video_version['version'])
-                if version == 'youtube':
-                    continue
-                else:
-                    source_type = self._SOURCE_TYPES.get(video_version['sourceType'])
-                    renditions = compat_etree_fromstring(video_version['data'])
-                    if source_type == 'http':
-                        for rend in renditions.findall('rendition'):
-                            attr = rend.attrib
-                            formats.append({
-                                'url': attr['url'],
-                                'format_id': 'http-%s-%s' % (version, attr['name']),
-                                'height': int_or_none(attr.get('frameheight')),
-                                'width': int_or_none(attr.get('frameWidth')),
-                                'tbr': int_or_none(attr.get('totalBitrate')),
-                                'vbr': int_or_none(attr.get('videoBitrate')),
-                                'abr': int_or_none(attr.get('audioBitrate')),
-                                'vcodec': attr.get('videoCodec'),
-                                'acodec': attr.get('audioCodec'),
-                            })
-                    elif source_type == 'hls':
-                        formats.extend(self._extract_m3u8_formats(
-                            renditions.find('rendition').attrib['url'], video_id,
-                            'mp4', 'm3u8_native', m3u8_id='hls-%s' % version,
-                            note='Downloading %s m3u8 information' % version,
-                            errnote='Failed to download %s m3u8 information' % version,
-                            fatal=False))
-                    elif source_type == 'smil' and version == 'level3' and not smil_parsed:
-                        formats.extend(self._extract_smil_formats(
-                            renditions.find('rendition').attrib['url'], video_id, False))
-                        smil_parsed = True
+                formats.append({
+                    'url': version_url,
+                    'format_id': 'http-%s-%s' % (version, video_version['quality']),
+                    'vcodec': m.group('vcodec'),
+                    'acodec': m.group('acodec'),
+                    'vbr': int(m.group('vbr')),
+                    'abr': int(m.group('abr')),
+                    'ext': m.group('ext'),
+                    'width': int(m.group('width')),
+                    'height': int(m.group('height')),
+                })
          self._sort_formats(formats)
  
          track = video_info['title']
@@ -355,17 +269,15 @@ def _real_extract(self, url):
          else:
              age_limit = None
  
-        duration = video_info.get('duration')
-
          return {
              'id': video_id,
              'title': title,
              'formats': formats,
              'thumbnail': video_info.get('imageUrl') or video_info.get('thumbnailUrl'),
-            'timestamp': timestamp,
+            'timestamp': parse_iso8601(video_info.get('releaseDate')),
              'uploader': uploader,
-            'duration': duration,
-            'view_count': view_count,
+            'duration': int_or_none(video_info.get('duration')),
+            'view_count': int_or_none(video_info.get('views', {}).get('total')),
              'age_limit': age_limit,
              'track': track,
              'artist': uploader,
diff --git a/youtube_dl/extractor/vgtv.py b/youtube_dl/extractor/vgtv.py

index 3b38ac700296a2eef8c12f0b45406f54785d7684..8a574bc269789e14f3dcadd6167c5caaa46e49e3 100644 (file)
--- a/youtube_dl/extractor/vgtv.py
+++ b/youtube_dl/extractor/vgtv.py
@@ -61,7 +61,7 @@ class VGTVIE(XstreamIE):
                  'ext': 'mp4',
                  'title': 'Hevnen er søt: Episode 10 - Abu',
                  'description': 'md5:e25e4badb5f544b04341e14abdc72234',
-                'thumbnail': 're:^https?://.*\.jpg',
+                'thumbnail': r're:^https?://.*\.jpg',
                  'duration': 648.000,
                  'timestamp': 1404626400,
                  'upload_date': '20140706',
@@ -76,7 +76,7 @@ class VGTVIE(XstreamIE):
                  'ext': 'flv',
                  'title': 'OPPTAK: VGTV følger EM-kvalifiseringen',
                  'description': 'md5:3772d9c0dc2dff92a886b60039a7d4d3',
-                'thumbnail': 're:^https?://.*\.jpg',
+                'thumbnail': r're:^https?://.*\.jpg',
                  'duration': 9103.0,
                  'timestamp': 1410113864,
                  'upload_date': '20140907',
@@ -96,7 +96,7 @@ class VGTVIE(XstreamIE):
                  'ext': 'mp4',
                  'title': 'V75 fra Solvalla 30.05.15',
                  'description': 'md5:b3743425765355855f88e096acc93231',
-                'thumbnail': 're:^https?://.*\.jpg',
+                'thumbnail': r're:^https?://.*\.jpg',
                  'duration': 25966,
                  'timestamp': 1432975582,
                  'upload_date': '20150530',
@@ -200,7 +200,7 @@ def _real_extract(self, url):
              format_info = {
                  'url': mp4_url,
              }
-            mobj = re.search('(\d+)_(\d+)_(\d+)', mp4_url)
+            mobj = re.search(r'(\d+)_(\d+)_(\d+)', mp4_url)
              if mobj:
                  tbr = int(mobj.group(3))
                  format_info.update({
@@ -246,7 +246,7 @@ class BTArticleIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Alrekstad internat',
              'description': 'md5:dc81a9056c874fedb62fc48a300dac58',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'duration': 191,
              'timestamp': 1289991323,
              'upload_date': '20101117',
diff --git a/youtube_dl/extractor/vidbit.py b/youtube_dl/extractor/vidbit.py

index e7ac5a8425bbccad286fb5f49b9d6ca040ed0cfe..91f45b7cc78d38451591a652a0472f6ba35c0383 100644 (file)
--- a/youtube_dl/extractor/vidbit.py
+++ b/youtube_dl/extractor/vidbit.py
@@ -20,7 +20,7 @@ class VidbitIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Intro to VidBit',
              'description': 'md5:5e0d6142eec00b766cbf114bfd3d16b7',
-            'thumbnail': 're:https?://.*\.jpg$',
+            'thumbnail': r're:https?://.*\.jpg$',
              'upload_date': '20160618',
              'view_count': int,
              'comment_count': int,
diff --git a/youtube_dl/extractor/viddler.py b/youtube_dl/extractor/viddler.py

index 8d92aee878d3ad0c0d5725db755451c88e527f66..67808e7e623c420fc7507efa0a57d550dd64655a 100644 (file)
--- a/youtube_dl/extractor/viddler.py
+++ b/youtube_dl/extractor/viddler.py
@@ -26,7 +26,7 @@ class ViddlerIE(InfoExtractor):
              'timestamp': 1335371429,
              'upload_date': '20120425',
              'duration': 100.89,
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'view_count': int,
              'comment_count': int,
              'categories': ['video content', 'high quality video', 'video made easy', 'how to produce video with limited resources', 'viddler'],
diff --git a/youtube_dl/extractor/videa.py b/youtube_dl/extractor/videa.py

new file mode 100644 (file)

index 0000000..311df58
--- /dev/null
+++ b/youtube_dl/extractor/videa.py
@@ -0,0 +1,97 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..utils import (
+    int_or_none,
+    mimetype2ext,
+    parse_codecs,
+    xpath_element,
+    xpath_text,
+)
+
+
+class VideaIE(InfoExtractor):
+    _VALID_URL = r'''(?x)
+                    https?://
+                        videa\.hu/
+                        (?:
+                            videok/(?:[^/]+/)*[^?#&]+-|
+                            player\?.*?\bv=|
+                            player/v/
+                        )
+                        (?P<id>[^?#&]+)
+                    '''
+    _TESTS = [{
+        'url': 'http://videa.hu/videok/allatok/az-orult-kigyasz-285-kigyot-kigyo-8YfIAjxwWGwT8HVQ',
+        'md5': '97a7af41faeaffd9f1fc864a7c7e7603',
+        'info_dict': {
+            'id': '8YfIAjxwWGwT8HVQ',
+            'ext': 'mp4',
+            'title': 'Az őrült kígyász 285 kígyót enged szabadon',
+            'thumbnail': 'http://videa.hu/static/still/1.4.1.1007274.1204470.3',
+            'duration': 21,
+        },
+    }, {
+        'url': 'http://videa.hu/videok/origo/jarmuvek/supercars-elozes-jAHDWfWSJH5XuFhH',
+        'only_matching': True,
+    }, {
+        'url': 'http://videa.hu/player?v=8YfIAjxwWGwT8HVQ',
+        'only_matching': True,
+    }, {
+        'url': 'http://videa.hu/player/v/8YfIAjxwWGwT8HVQ?autoplay=1',
+        'only_matching': True,
+    }]
+
+    @staticmethod
+    def _extract_urls(webpage):
+        return [url for _, url in re.findall(
+            r'<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//videa\.hu/player\?.*?\bv=.+?)\1',
+            webpage)]
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        info = self._download_xml(
+            'http://videa.hu/videaplayer_get_xml.php', video_id,
+            query={'v': video_id})
+
+        video = xpath_element(info, './/video', 'video', fatal=True)
+        sources = xpath_element(info, './/video_sources', 'sources', fatal=True)
+
+        title = xpath_text(video, './title', fatal=True)
+
+        formats = []
+        for source in sources.findall('./video_source'):
+            source_url = source.text
+            if not source_url:
+                continue
+            f = parse_codecs(source.get('codecs'))
+            f.update({
+                'url': source_url,
+                'ext': mimetype2ext(source.get('mimetype')) or 'mp4',
+                'format_id': source.get('name'),
+                'width': int_or_none(source.get('width')),
+                'height': int_or_none(source.get('height')),
+            })
+            formats.append(f)
+        self._sort_formats(formats)
+
+        thumbnail = xpath_text(video, './poster_src')
+        duration = int_or_none(xpath_text(video, './duration'))
+
+        age_limit = None
+        is_adult = xpath_text(video, './is_adult_content', default=None)
+        if is_adult:
+            age_limit = 18 if is_adult == '1' else 0
+
+        return {
+            'id': video_id,
+            'title': title,
+            'thumbnail': thumbnail,
+            'duration': duration,
+            'age_limit': age_limit,
+            'formats': formats,
+        }
diff --git a/youtube_dl/extractor/videomega.py b/youtube_dl/extractor/videomega.py

index 4f0dcd18c7f28ab17aec58c814d53fd8ae21e7ac..c02830dddcd838dbfc8070c17a009d5924a00f7e 100644 (file)
--- a/youtube_dl/extractor/videomega.py
+++ b/youtube_dl/extractor/videomega.py
@@ -19,7 +19,7 @@ class VideoMegaIE(InfoExtractor):
              'id': 'AOSQBJYKIDDIKYJBQSOA',
              'ext': 'mp4',
              'title': '1254207',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          }
      }, {
          'url': 'http://videomega.tv/cdn.php?ref=AOSQBJYKIDDIKYJBQSOA&width=1070&height=600',
diff --git a/youtube_dl/extractor/videomore.py b/youtube_dl/extractor/videomore.py

index 7f25665864c696757903deeb582a64f16eec0d85..9b56630de3516b000436d57ca6e6cbcc580cc0ec 100644 (file)
--- a/youtube_dl/extractor/videomore.py
+++ b/youtube_dl/extractor/videomore.py
@@ -23,7 +23,7 @@ class VideomoreIE(InfoExtractor):
              'title': 'Кино в деталях 5 сезон В гостях Алексей Чумаков и Юлия Ковальчук',
              'series': 'Кино в деталях',
              'episode': 'В гостях Алексей Чумаков и Юлия Ковальчук',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'duration': 2910,
              'view_count': int,
              'comment_count': int,
@@ -37,7 +37,7 @@ class VideomoreIE(InfoExtractor):
              'title': 'Молодежка 2 сезон 40 серия',
              'series': 'Молодежка',
              'episode': '40 серия',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'duration': 2809,
              'view_count': int,
              'comment_count': int,
@@ -53,7 +53,7 @@ class VideomoreIE(InfoExtractor):
              'ext': 'flv',
              'title': 'Промо Команда проиграла из-за Бакина?',
              'episode': 'Команда проиграла из-за Бакина?',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'duration': 29,
              'age_limit': 16,
              'view_count': int,
@@ -145,7 +145,7 @@ class VideomoreVideoIE(InfoExtractor):
              'ext': 'flv',
              'title': 'Ёлки 3',
              'description': '',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'duration': 5579,
              'age_limit': 6,
              'view_count': int,
@@ -168,7 +168,7 @@ class VideomoreVideoIE(InfoExtractor):
              'ext': 'flv',
              'title': '1 серия. Здравствуй, Аквавилль!',
              'description': 'md5:c6003179538b5d353e7bcd5b1372b2d7',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'duration': 754,
              'age_limit': 6,
              'view_count': int,
diff --git a/youtube_dl/extractor/videott.py b/youtube_dl/extractor/videott.py

deleted file mode 100644 (file)

index 0f79871..0000000
--- a/youtube_dl/extractor/videott.py
+++ /dev/null
@@ -1,65 +0,0 @@
-from __future__ import unicode_literals
-
-import re
-import base64
-
-from .common import InfoExtractor
-from ..utils import (
-    unified_strdate,
-    int_or_none,
-)
-
-
-class VideoTtIE(InfoExtractor):
-    _WORKING = False
-    ID_NAME = 'video.tt'
-    IE_DESC = 'video.tt - Your True Tube'
-    _VALID_URL = r'https?://(?:www\.)?video\.tt/(?:(?:video|embed)/|watch_video\.php\?v=)(?P<id>[\da-zA-Z]{9})'
-
-    _TESTS = [{
-        'url': 'http://www.video.tt/watch_video.php?v=amd5YujV8',
-        'md5': 'b13aa9e2f267effb5d1094443dff65ba',
-        'info_dict': {
-            'id': 'amd5YujV8',
-            'ext': 'flv',
-            'title': 'Motivational video Change your mind in just 2.50 mins',
-            'description': '',
-            'upload_date': '20130827',
-            'uploader': 'joseph313',
-        }
-    }, {
-        'url': 'http://video.tt/embed/amd5YujV8',
-        'only_matching': True,
-    }]
-
-    def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
-
-        settings = self._download_json(
-            'http://www.video.tt/player_control/settings.php?v=%s' % video_id, video_id,
-            'Downloading video JSON')['settings']
-
-        video = settings['video_details']['video']
-
-        formats = [
-            {
-                'url': base64.b64decode(res['u'].encode('utf-8')).decode('utf-8'),
-                'ext': 'flv',
-                'format_id': res['l'],
-            } for res in settings['res'] if res['u']
-        ]
-
-        return {
-            'id': video_id,
-            'title': video['title'],
-            'description': video['description'],
-            'thumbnail': settings['config']['thumbnail'],
-            'upload_date': unified_strdate(video['added']),
-            'uploader': video['owner'],
-            'view_count': int_or_none(video['view_count']),
-            'comment_count': None if video.get('comment_count') == '--' else int_or_none(video['comment_count']),
-            'like_count': int_or_none(video['liked']),
-            'dislike_count': int_or_none(video['disliked']),
-            'formats': formats,
-        }
diff --git a/youtube_dl/extractor/vidio.py b/youtube_dl/extractor/vidio.py

index 6898042de728f533367e2bedc50abc710939c1d6..4e4b4e38caaf920eacc7e29b487d9a9ad26d90cc 100644 (file)
--- a/youtube_dl/extractor/vidio.py
+++ b/youtube_dl/extractor/vidio.py
@@ -18,7 +18,7 @@ class VidioIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'DJ_AMBRED - Booyah (Live 2015)',
              'description': 'md5:27dc15f819b6a78a626490881adbadf8',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 149,
              'like_count': int,
          },
diff --git a/youtube_dl/extractor/vidme.py b/youtube_dl/extractor/vidme.py

index b1156d531aba6793fc7ce7dda9649950d922f606..e9ff336c4f5cb2e5a4b08fe5a97aa9993bdf87e0 100644 (file)
--- a/youtube_dl/extractor/vidme.py
+++ b/youtube_dl/extractor/vidme.py
@@ -23,7 +23,7 @@ class VidmeIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Fishing for piranha - the easy way',
              'description': 'source: https://www.facebook.com/photo.php?v=312276045600871',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'timestamp': 1406313244,
              'upload_date': '20140725',
              'age_limit': 0,
@@ -39,7 +39,7 @@ class VidmeIE(InfoExtractor):
              'id': 'Gc6M',
              'ext': 'mp4',
              'title': 'O Mere Dil ke chain - Arnav and Khushi VM',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'timestamp': 1441211642,
              'upload_date': '20150902',
              'uploader': 'SunshineM',
@@ -61,7 +61,7 @@ class VidmeIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'The Carver',
              'description': 'md5:e9c24870018ae8113be936645b93ba3c',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'timestamp': 1433203629,
              'upload_date': '20150602',
              'uploader': 'Thomas',
@@ -82,7 +82,7 @@ class VidmeIE(InfoExtractor):
              'id': 'Wmur',
              'ext': 'mp4',
              'title': 'naked smoking & stretching',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'timestamp': 1430931613,
              'upload_date': '20150506',
              'uploader': 'naked-yogi',
@@ -115,7 +115,7 @@ class VidmeIE(InfoExtractor):
              'id': 'e5g',
              'ext': 'mp4',
              'title': 'Video upload (e5g)',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'timestamp': 1401480195,
              'upload_date': '20140530',
              'uploader': None,
diff --git a/youtube_dl/extractor/viewlift.py b/youtube_dl/extractor/viewlift.py

index 19500eba84f1bf7b4fdf7c59b8b56ca7e5b91efc..18735cfb23d907e4fb83882cda10dda7eb6f41ba 100644 (file)
--- a/youtube_dl/extractor/viewlift.py
+++ b/youtube_dl/extractor/viewlift.py
@@ -14,7 +14,7 @@
  
  
  class ViewLiftBaseIE(InfoExtractor):
-    _DOMAINS_REGEX = '(?:snagfilms|snagxtreme|funnyforfree|kiddovid|winnersview|monumentalsportsnetwork|vayafilm)\.com|kesari\.tv'
+    _DOMAINS_REGEX = r'(?:snagfilms|snagxtreme|funnyforfree|kiddovid|winnersview|monumentalsportsnetwork|vayafilm)\.com|kesari\.tv'
  
  
  class ViewLiftEmbedIE(ViewLiftBaseIE):
@@ -110,7 +110,7 @@ class ViewLiftIE(ViewLiftBaseIE):
              'ext': 'mp4',
              'title': 'Lost for Life',
              'description': 'md5:fbdacc8bb6b455e464aaf98bc02e1c82',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'duration': 4489,
              'categories': ['Documentary', 'Crime', 'Award Winning', 'Festivals']
          }
@@ -123,7 +123,7 @@ class ViewLiftIE(ViewLiftBaseIE):
              'ext': 'mp4',
              'title': 'India',
              'description': 'md5:5c168c5a8f4719c146aad2e0dfac6f5f',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'duration': 979,
              'categories': ['Documentary', 'Sports', 'Politics']
          }
@@ -160,7 +160,7 @@ def _real_extract(self, url):
  
          snag = self._parse_json(
              self._search_regex(
-                'Snag\.page\.data\s*=\s*(\[.+?\]);', webpage, 'snag'),
+                r'Snag\.page\.data\s*=\s*(\[.+?\]);', webpage, 'snag'),
              display_id)
  
          for item in snag:
diff --git a/youtube_dl/extractor/viewster.py b/youtube_dl/extractor/viewster.py

index a93196a0772fd5588dd2f55c327427d00e814eb4..52dd95e2fe19041f01e49b1a4a3f9566c1f0e60b 100644 (file)
--- a/youtube_dl/extractor/viewster.py
+++ b/youtube_dl/extractor/viewster.py
@@ -157,7 +157,7 @@ def concat(suffix, sep='-'):
                          formats.extend(m3u8_formats)
                  else:
                      qualities_basename = self._search_regex(
-                        '/([^/]+)\.csmil/',
+                        r'/([^/]+)\.csmil/',
                          manifest_url, 'qualities basename', default=None)
                      if not qualities_basename:
                          continue
diff --git a/youtube_dl/extractor/viidea.py b/youtube_dl/extractor/viidea.py

index a4f914d1449ad1ad4fd38938fe81591a703d6120..4adcd183030438762f4a82be90adb171e1a38d34 100644 (file)
--- a/youtube_dl/extractor/viidea.py
+++ b/youtube_dl/extractor/viidea.py
@@ -40,7 +40,7 @@ class ViideaIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Automatics, robotics and biocybernetics',
              'description': 'md5:815fc1deb6b3a2bff99de2d5325be482',
-            'thumbnail': 're:http://.*\.jpg',
+            'thumbnail': r're:http://.*\.jpg',
              'timestamp': 1372349289,
              'upload_date': '20130627',
              'duration': 565,
@@ -58,7 +58,7 @@ class ViideaIE(InfoExtractor):
              'ext': 'flv',
              'title': 'NLP at Google',
              'description': 'md5:fc7a6d9bf0302d7cc0e53f7ca23747b3',
-            'thumbnail': 're:http://.*\.jpg',
+            'thumbnail': r're:http://.*\.jpg',
              'timestamp': 1284375600,
              'upload_date': '20100913',
              'duration': 5352,
@@ -74,7 +74,7 @@ class ViideaIE(InfoExtractor):
              'id': '23181',
              'title': 'Deep Learning Summer School, Montreal 2015',
              'description': 'md5:0533a85e4bd918df52a01f0e1ebe87b7',
-            'thumbnail': 're:http://.*\.jpg',
+            'thumbnail': r're:http://.*\.jpg',
              'timestamp': 1438560000,
          },
          'playlist_count': 30,
@@ -85,7 +85,7 @@ class ViideaIE(InfoExtractor):
              'id': '9737',
              'display_id': 'mlss09uk_bishop_ibi',
              'title': 'Introduction To Bayesian Inference',
-            'thumbnail': 're:http://.*\.jpg',
+            'thumbnail': r're:http://.*\.jpg',
              'timestamp': 1251622800,
          },
          'playlist': [{
@@ -94,7 +94,7 @@ class ViideaIE(InfoExtractor):
                  'display_id': 'mlss09uk_bishop_ibi_part1',
                  'ext': 'wmv',
                  'title': 'Introduction To Bayesian Inference (Part 1)',
-                'thumbnail': 're:http://.*\.jpg',
+                'thumbnail': r're:http://.*\.jpg',
                  'duration': 4622,
                  'timestamp': 1251622800,
                  'upload_date': '20090830',
@@ -105,7 +105,7 @@ class ViideaIE(InfoExtractor):
                  'display_id': 'mlss09uk_bishop_ibi_part2',
                  'ext': 'wmv',
                  'title': 'Introduction To Bayesian Inference (Part 2)',
-                'thumbnail': 're:http://.*\.jpg',
+                'thumbnail': r're:http://.*\.jpg',
                  'duration': 5641,
                  'timestamp': 1251622800,
                  'upload_date': '20090830',
diff --git a/youtube_dl/extractor/viki.py b/youtube_dl/extractor/viki.py

index 4351ac4571935fa3c3ace915c0b97f20e67ec18d..9c48701c1a568589e0a875f35fa800386c4a4058 100644 (file)
--- a/youtube_dl/extractor/viki.py
+++ b/youtube_dl/extractor/viki.py
@@ -1,11 +1,12 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
-import json
-import time
-import hmac
  import hashlib
+import hmac
  import itertools
+import json
+import re
+import time
  
  from .common import InfoExtractor
  from ..utils import (
@@ -276,10 +277,14 @@ def _real_extract(self, url):
              height = int_or_none(self._search_regex(
                  r'^(\d+)[pP]$', format_id, 'height', default=None))
              for protocol, format_dict in stream_dict.items():
+                # rtmps URLs does not seem to work
+                if protocol == 'rtmps':
+                    continue
+                format_url = format_dict['url']
                  if format_id == 'm3u8':
                      m3u8_formats = self._extract_m3u8_formats(
-                        format_dict['url'], video_id, 'mp4',
-                        entry_protocol='m3u8_native', preference=-1,
+                        format_url, video_id, 'mp4',
+                        entry_protocol='m3u8_native',
                          m3u8_id='m3u8-%s' % protocol, fatal=False)
                      # Despite CODECS metadata in m3u8 all video-only formats
                      # are actually video+audio
@@ -287,9 +292,23 @@ def _real_extract(self, url):
                          if f.get('acodec') == 'none' and f.get('vcodec') != 'none':
                              f['acodec'] = None
                      formats.extend(m3u8_formats)
+                elif format_url.startswith('rtmp'):
+                    mobj = re.search(
+                        r'^(?P<url>rtmp://[^/]+/(?P<app>.+?))/(?P<playpath>mp4:.+)$',
+                        format_url)
+                    if not mobj:
+                        continue
+                    formats.append({
+                        'format_id': 'rtmp-%s' % format_id,
+                        'ext': 'flv',
+                        'url': mobj.group('url'),
+                        'play_path': mobj.group('playpath'),
+                        'app': mobj.group('app'),
+                        'page_url': url,
+                    })
                  else:
                      formats.append({
-                        'url': format_dict['url'],
+                        'url': format_url,
                          'format_id': '%s-%s' % (format_id, protocol),
                          'height': height,
                      })
diff --git a/youtube_dl/extractor/vimeo.py b/youtube_dl/extractor/vimeo.py

index 51c69a80c216889315a4c5fe070572100c13dd36..61cc469bf27b58bfc70eb8bd036737ec0a4cb66c 100644 (file)
--- a/youtube_dl/extractor/vimeo.py
+++ b/youtube_dl/extractor/vimeo.py
@@ -21,12 +21,12 @@
      sanitized_Request,
      smuggle_url,
      std_headers,
-    unified_strdate,
+    try_get,
+    unified_timestamp,
      unsmuggle_url,
      urlencode_postdata,
      unescapeHTML,
      parse_filesize,
-    try_get,
  )
  
  
@@ -92,29 +92,30 @@ def _set_vimeo_cookie(self, name, value):
      def _vimeo_sort_formats(self, formats):
          # Bitrates are completely broken. Single m3u8 may contain entries in kbps and bps
          # at the same time without actual units specified. This lead to wrong sorting.
-        self._sort_formats(formats, field_preference=('preference', 'height', 'width', 'fps', 'format_id'))
+        self._sort_formats(formats, field_preference=('preference', 'height', 'width', 'fps', 'tbr', 'format_id'))
  
      def _parse_config(self, config, video_id):
+        video_data = config['video']
          # Extract title
-        video_title = config['video']['title']
+        video_title = video_data['title']
  
          # Extract uploader, uploader_url and uploader_id
-        video_uploader = config['video'].get('owner', {}).get('name')
-        video_uploader_url = config['video'].get('owner', {}).get('url')
+        video_uploader = video_data.get('owner', {}).get('name')
+        video_uploader_url = video_data.get('owner', {}).get('url')
          video_uploader_id = video_uploader_url.split('/')[-1] if video_uploader_url else None
  
          # Extract video thumbnail
-        video_thumbnail = config['video'].get('thumbnail')
+        video_thumbnail = video_data.get('thumbnail')
          if video_thumbnail is None:
-            video_thumbs = config['video'].get('thumbs')
+            video_thumbs = video_data.get('thumbs')
              if video_thumbs and isinstance(video_thumbs, dict):
                  _, video_thumbnail = sorted((int(width if width.isdigit() else 0), t_url) for (width, t_url) in video_thumbs.items())[-1]
  
          # Extract video duration
-        video_duration = int_or_none(config['video'].get('duration'))
+        video_duration = int_or_none(video_data.get('duration'))
  
          formats = []
-        config_files = config['video'].get('files') or config['request'].get('files', {})
+        config_files = video_data.get('files') or config['request'].get('files', {})
          for f in config_files.get('progressive', []):
              video_url = f.get('url')
              if not video_url:
@@ -127,10 +128,33 @@ def _parse_config(self, config, video_id):
                  'fps': int_or_none(f.get('fps')),
                  'tbr': int_or_none(f.get('bitrate')),
              })
-        m3u8_url = config_files.get('hls', {}).get('url')
-        if m3u8_url:
-            formats.extend(self._extract_m3u8_formats(
-                m3u8_url, video_id, 'mp4', 'm3u8_native', m3u8_id='hls', fatal=False))
+
+        for files_type in ('hls', 'dash'):
+            for cdn_name, cdn_data in config_files.get(files_type, {}).get('cdns', {}).items():
+                manifest_url = cdn_data.get('url')
+                if not manifest_url:
+                    continue
+                format_id = '%s-%s' % (files_type, cdn_name)
+                if files_type == 'hls':
+                    formats.extend(self._extract_m3u8_formats(
+                        manifest_url, video_id, 'mp4',
+                        'm3u8_native', m3u8_id=format_id,
+                        note='Downloading %s m3u8 information' % cdn_name,
+                        fatal=False))
+                elif files_type == 'dash':
+                    mpd_pattern = r'/%s/(?:sep/)?video/' % video_id
+                    mpd_manifest_urls = []
+                    if re.search(mpd_pattern, manifest_url):
+                        for suffix, repl in (('', 'video'), ('_sep', 'sep/video')):
+                            mpd_manifest_urls.append((format_id + suffix, re.sub(
+                                mpd_pattern, '/%s/%s/' % (video_id, repl), manifest_url)))
+                    else:
+                        mpd_manifest_urls = [(format_id, manifest_url)]
+                    for f_id, m_url in mpd_manifest_urls:
+                        formats.extend(self._extract_mpd_formats(
+                            m_url.replace('/master.json', '/master.mpd'), video_id, f_id,
+                            'Downloading %s MPD information' % cdn_name,
+                            fatal=False))
  
          subtitles = {}
          text_tracks = config['request'].get('text_tracks')
@@ -189,11 +213,13 @@ class VimeoIE(VimeoBaseInfoExtractor):
                  'ext': 'mp4',
                  'title': "youtube-dl test video - \u2605 \" ' \u5e78 / \\ \u00e4 \u21ad \U0001d550",
                  'description': 'md5:2d3305bad981a06ff79f027f19865021',
+                'timestamp': 1355990239,
                  'upload_date': '20121220',
-                'uploader_url': 're:https?://(?:www\.)?vimeo\.com/user7108434',
+                'uploader_url': r're:https?://(?:www\.)?vimeo\.com/user7108434',
                  'uploader_id': 'user7108434',
                  'uploader': 'Filippo Valsorda',
                  'duration': 10,
+                'license': 'by-sa',
              },
          },
          {
@@ -203,7 +229,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
              'info_dict': {
                  'id': '68093876',
                  'ext': 'mp4',
-                'uploader_url': 're:https?://(?:www\.)?vimeo\.com/openstreetmapus',
+                'uploader_url': r're:https?://(?:www\.)?vimeo\.com/openstreetmapus',
                  'uploader_id': 'openstreetmapus',
                  'uploader': 'OpenStreetMap US',
                  'title': 'Andy Allan - Putting the Carto into OpenStreetMap Cartography',
@@ -220,7 +246,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
                  'ext': 'mp4',
                  'title': 'Kathy Sierra: Building the minimum Badass User, Business of Software 2012',
                  'uploader': 'The BLN & Business of Software',
-                'uploader_url': 're:https?://(?:www\.)?vimeo\.com/theblnbusinessofsoftware',
+                'uploader_url': r're:https?://(?:www\.)?vimeo\.com/theblnbusinessofsoftware',
                  'uploader_id': 'theblnbusinessofsoftware',
                  'duration': 3610,
                  'description': None,
@@ -234,12 +260,13 @@ class VimeoIE(VimeoBaseInfoExtractor):
                  'id': '68375962',
                  'ext': 'mp4',
                  'title': 'youtube-dl password protected test video',
+                'timestamp': 1371200155,
                  'upload_date': '20130614',
-                'uploader_url': 're:https?://(?:www\.)?vimeo\.com/user18948128',
+                'uploader_url': r're:https?://(?:www\.)?vimeo\.com/user18948128',
                  'uploader_id': 'user18948128',
                  'uploader': 'Jaime Marquínez Ferrándiz',
                  'duration': 10,
-                'description': 'This is "youtube-dl password protected test video" by  on Vimeo, the home for high quality videos and the people who love them.',
+                'description': 'md5:dca3ea23adb29ee387127bc4ddfce63f',
              },
              'params': {
                  'videopassword': 'youtube-dl',
@@ -253,10 +280,11 @@ class VimeoIE(VimeoBaseInfoExtractor):
                  'ext': 'mp4',
                  'title': 'Key & Peele: Terrorist Interrogation',
                  'description': 'md5:8678b246399b070816b12313e8b4eb5c',
-                'uploader_url': 're:https?://(?:www\.)?vimeo\.com/atencio',
+                'uploader_url': r're:https?://(?:www\.)?vimeo\.com/atencio',
                  'uploader_id': 'atencio',
                  'uploader': 'Peter Atencio',
-                'upload_date': '20130927',
+                'timestamp': 1380339469,
+                'upload_date': '20130928',
                  'duration': 187,
              },
          },
@@ -268,8 +296,9 @@ class VimeoIE(VimeoBaseInfoExtractor):
                  'ext': 'mp4',
                  'title': 'The New Vimeo Player (You Know, For Videos)',
                  'description': 'md5:2ec900bf97c3f389378a96aee11260ea',
+                'timestamp': 1381846109,
                  'upload_date': '20131015',
-                'uploader_url': 're:https?://(?:www\.)?vimeo\.com/staff',
+                'uploader_url': r're:https?://(?:www\.)?vimeo\.com/staff',
                  'uploader_id': 'staff',
                  'uploader': 'Vimeo Staff',
                  'duration': 62,
@@ -284,21 +313,22 @@ class VimeoIE(VimeoBaseInfoExtractor):
                  'ext': 'mp4',
                  'title': 'Pier Solar OUYA Official Trailer',
                  'uploader': 'Tulio Gonçalves',
-                'uploader_url': 're:https?://(?:www\.)?vimeo\.com/user28849593',
+                'uploader_url': r're:https?://(?:www\.)?vimeo\.com/user28849593',
                  'uploader_id': 'user28849593',
              },
          },
          {
              # contains original format
              'url': 'https://vimeo.com/33951933',
-            'md5': '2d9f5475e0537f013d0073e812ab89e6',
+            'md5': '53c688fa95a55bf4b7293d37a89c5c53',
              'info_dict': {
                  'id': '33951933',
                  'ext': 'mp4',
                  'title': 'FOX CLASSICS - Forever Classic ID - A Full Minute',
                  'uploader': 'The DMCI',
-                'uploader_url': 're:https?://(?:www\.)?vimeo\.com/dmci',
+                'uploader_url': r're:https?://(?:www\.)?vimeo\.com/dmci',
                  'uploader_id': 'dmci',
+                'timestamp': 1324343742,
                  'upload_date': '20111220',
                  'description': 'md5:ae23671e82d05415868f7ad1aec21147',
              },
@@ -309,11 +339,12 @@ class VimeoIE(VimeoBaseInfoExtractor):
              'url': 'https://vimeo.com/channels/tributes/6213729',
              'info_dict': {
                  'id': '6213729',
-                'ext': 'mp4',
+                'ext': 'mov',
                  'title': 'Vimeo Tribute: The Shining',
                  'uploader': 'Casey Donahue',
-                'uploader_url': 're:https?://(?:www\.)?vimeo\.com/caseydonahue',
+                'uploader_url': r're:https?://(?:www\.)?vimeo\.com/caseydonahue',
                  'uploader_id': 'caseydonahue',
+                'timestamp': 1250886430,
                  'upload_date': '20090821',
                  'description': 'md5:bdbf314014e58713e6e5b66eb252f4a6',
              },
@@ -323,7 +354,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
              'expected_warnings': ['Unable to download JSON metadata'],
          },
          {
-            # redirects to ondemand extractor and should be passed throught it
+            # redirects to ondemand extractor and should be passed through it
              # for successful extraction
              'url': 'https://vimeo.com/73445910',
              'info_dict': {
@@ -331,7 +362,7 @@ class VimeoIE(VimeoBaseInfoExtractor):
                  'ext': 'mp4',
                  'title': 'The Reluctant Revolutionary',
                  'uploader': '10Ft Films',
-                'uploader_url': 're:https?://(?:www\.)?vimeo\.com/tenfootfilms',
+                'uploader_url': r're:https?://(?:www\.)?vimeo\.com/tenfootfilms',
                  'uploader_id': 'tenfootfilms',
              },
              'params': {
@@ -462,6 +493,9 @@ def _real_extract(self, url):
                      '%s said: %s' % (self.IE_NAME, seed_status['title']),
                      expected=True)
  
+        cc_license = None
+        timestamp = None
+
          # Extract the config JSON
          try:
              try:
@@ -475,14 +509,18 @@ def _real_extract(self, url):
                      vimeo_clip_page_config = self._search_regex(
                          r'vimeo\.clip_page_config\s*=\s*({.+?});', webpage,
                          'vimeo clip page config')
-                    config_url = self._parse_json(
-                        vimeo_clip_page_config, video_id)['player']['config_url']
+                    page_config = self._parse_json(vimeo_clip_page_config, video_id)
+                    config_url = page_config['player']['config_url']
+                    cc_license = page_config.get('cc_license')
+                    timestamp = try_get(
+                        page_config, lambda x: x['clip']['uploaded_on'],
+                        compat_str)
                  config_json = self._download_webpage(config_url, video_id)
                  config = json.loads(config_json)
              except RegexNotFoundError:
                  # For pro videos or player.vimeo.com urls
                  # We try to find out to which variable is assigned the config dic
-                m_variable_name = re.search('(\w)\.video\.id', webpage)
+                m_variable_name = re.search(r'(\w)\.video\.id', webpage)
                  if m_variable_name is not None:
                      config_re = r'%s=({[^}].+?});' % re.escape(m_variable_name.group(1))
                  else:
@@ -545,10 +583,10 @@ def is_rented():
              self._downloader.report_warning('Cannot find video description')
  
          # Extract upload date
-        video_upload_date = None
-        mobj = re.search(r'<time[^>]+datetime="([^"]+)"', webpage)
-        if mobj is not None:
-            video_upload_date = unified_strdate(mobj.group(1))
+        if not timestamp:
+            timestamp = self._search_regex(
+                r'<time[^>]+datetime="([^"]+)"', webpage,
+                'timestamp', default=None)
  
          try:
              view_count = int(self._search_regex(r'UserPlays:(\d+)', webpage, 'view count'))
@@ -585,15 +623,22 @@ def is_rented():
          info_dict = self._parse_config(config, video_id)
          formats.extend(info_dict['formats'])
          self._vimeo_sort_formats(formats)
+
+        if not cc_license:
+            cc_license = self._search_regex(
+                r'<link[^>]+rel=["\']license["\'][^>]+href=(["\'])(?P<license>(?:(?!\1).)+)\1',
+                webpage, 'license', default=None, group='license')
+
          info_dict.update({
              'id': video_id,
              'formats': formats,
-            'upload_date': video_upload_date,
+            'timestamp': unified_timestamp(timestamp),
              'description': video_description,
              'webpage_url': url,
              'view_count': view_count,
              'like_count': like_count,
              'comment_count': comment_count,
+            'license': cc_license,
          })
  
          return info_dict
@@ -611,9 +656,12 @@ class VimeoOndemandIE(VimeoBaseInfoExtractor):
              'ext': 'mp4',
              'title': 'המעבדה - במאי יותם פלדמן',
              'uploader': 'גם סרטים',
-            'uploader_url': 're:https?://(?:www\.)?vimeo\.com/gumfilms',
+            'uploader_url': r're:https?://(?:www\.)?vimeo\.com/gumfilms',
              'uploader_id': 'gumfilms',
          },
+        'params': {
+            'format': 'best[protocol=https]',
+        },
      }, {
          # requires Referer to be passed along with og:video:url
          'url': 'https://vimeo.com/ondemand/36938/126682985',
@@ -622,7 +670,7 @@ class VimeoOndemandIE(VimeoBaseInfoExtractor):
              'ext': 'mp4',
              'title': 'Rävlock, rätt läte på rätt plats',
              'uploader': 'Lindroth & Norin',
-            'uploader_url': 're:https?://(?:www\.)?vimeo\.com/user14430847',
+            'uploader_url': r're:https?://(?:www\.)?vimeo\.com/user14430847',
              'uploader_id': 'user14430847',
          },
          'params': {
@@ -712,12 +760,12 @@ def _title_and_entries(self, list_id, base_url):
              # Try extracting href first since not all videos are available via
              # short https://vimeo.com/id URL (e.g. https://vimeo.com/channels/tributes/6213729)
              clips = re.findall(
-                r'id="clip_(\d+)"[^>]*>\s*<a[^>]+href="(/(?:[^/]+/)*\1)', webpage)
+                r'id="clip_(\d+)"[^>]*>\s*<a[^>]+href="(/(?:[^/]+/)*\1)(?:[^>]+\btitle="([^"]+)")?', webpage)
              if clips:
-                for video_id, video_url in clips:
+                for video_id, video_url, video_title in clips:
                      yield self.url_result(
                          compat_urlparse.urljoin(base_url, video_url),
-                        VimeoIE.ie_key(), video_id=video_id)
+                        VimeoIE.ie_key(), video_id=video_id, video_title=video_title)
              # More relaxed fallback
              else:
                  for video_id in re.findall(r'id=["\']clip_(\d+)', webpage):
@@ -842,7 +890,7 @@ class VimeoReviewIE(VimeoBaseInfoExtractor):
              'title': 're:(?i)^Death by dogma versus assembling agile . Sander Hoogendoorn',
              'uploader': 'DevWeek Events',
              'duration': 2773,
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'uploader_id': 'user22258446',
          }
      }, {
@@ -866,10 +914,14 @@ def _real_initialize(self):
  
      def _get_config_url(self, webpage_url, video_id, video_password_verified=False):
          webpage = self._download_webpage(webpage_url, video_id)
-        data = self._parse_json(self._search_regex(
-            r'window\s*=\s*_extend\(window,\s*({.+?})\);', webpage, 'data',
-            default=NO_DEFAULT if video_password_verified else '{}'), video_id)
-        config_url = data.get('vimeo_esi', {}).get('config', {}).get('configUrl')
+        config_url = self._html_search_regex(
+            r'data-config-url=(["\'])(?P<url>(?:(?!\1).)+)\1', webpage,
+            'config URL', default=None, group='url')
+        if not config_url:
+            data = self._parse_json(self._search_regex(
+                r'window\s*=\s*_extend\(window,\s*({.+?})\);', webpage, 'data',
+                default=NO_DEFAULT if video_password_verified else '{}'), video_id)
+            config_url = data.get('vimeo_esi', {}).get('config', {}).get('configUrl')
          if config_url is None:
              self._verify_video_password(webpage_url, video_id, webpage)
              config_url = self._get_config_url(
diff --git a/youtube_dl/extractor/vimple.py b/youtube_dl/extractor/vimple.py

index 7fd9b777b4b6bb88cd08e9e625f74f41e8775092..c74b437668c62c7ac38d085433532c994ab8fbb7 100644 (file)
--- a/youtube_dl/extractor/vimple.py
+++ b/youtube_dl/extractor/vimple.py
@@ -37,7 +37,7 @@ class VimpleIE(SprutoBaseIE):
              'ext': 'mp4',
              'title': 'Sunset',
              'duration': 20,
-            'thumbnail': 're:https?://.*?\.jpg',
+            'thumbnail': r're:https?://.*?\.jpg',
          },
      }, {
          'url': 'http://player.vimple.ru/iframe/52e1beec-1314-4a83-aeac-c61562eadbf9',
diff --git a/youtube_dl/extractor/viu.py b/youtube_dl/extractor/viu.py

new file mode 100644 (file)

index 0000000..3fd889c
--- /dev/null
+++ b/youtube_dl/extractor/viu.py
@@ -0,0 +1,249 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..compat import compat_str
+from ..utils import (
+    ExtractorError,
+    int_or_none,
+)
+
+
+class ViuBaseIE(InfoExtractor):
+    def _real_initialize(self):
+        viu_auth_res = self._request_webpage(
+            'https://www.viu.com/api/apps/v2/authenticate', None,
+            'Requesting Viu auth', query={
+                'acct': 'test',
+                'appid': 'viu_desktop',
+                'fmt': 'json',
+                'iid': 'guest',
+                'languageid': 'default',
+                'platform': 'desktop',
+                'userid': 'guest',
+                'useridtype': 'guest',
+                'ver': '1.0'
+            }, headers=self.geo_verification_headers())
+        self._auth_token = viu_auth_res.info()['X-VIU-AUTH']
+
+    def _call_api(self, path, *args, **kwargs):
+        headers = self.geo_verification_headers()
+        headers.update({
+            'X-VIU-AUTH': self._auth_token
+        })
+        headers.update(kwargs.get('headers', {}))
+        kwargs['headers'] = headers
+        response = self._download_json(
+            'https://www.viu.com/api/' + path, *args, **kwargs)['response']
+        if response.get('status') != 'success':
+            raise ExtractorError('%s said: %s' % (
+                self.IE_NAME, response['message']), expected=True)
+        return response
+
+
+class ViuIE(ViuBaseIE):
+    _VALID_URL = r'(?:viu:|https?://www\.viu\.com/[a-z]{2}/media/)(?P<id>\d+)'
+    _TESTS = [{
+        'url': 'https://www.viu.com/en/media/1116705532?containerId=playlist-22168059',
+        'info_dict': {
+            'id': '1116705532',
+            'ext': 'mp4',
+            'title': 'Citizen Khan - Ep 1',
+            'description': 'md5:d7ea1604f49e5ba79c212c551ce2110e',
+        },
+        'params': {
+            'skip_download': 'm3u8 download',
+        },
+        'skip': 'Geo-restricted to India',
+    }, {
+        'url': 'https://www.viu.com/en/media/1130599965',
+        'info_dict': {
+            'id': '1130599965',
+            'ext': 'mp4',
+            'title': 'Jealousy Incarnate - Episode 1',
+            'description': 'md5:d3d82375cab969415d2720b6894361e9',
+        },
+        'params': {
+            'skip_download': 'm3u8 download',
+        },
+        'skip': 'Geo-restricted to Indonesia',
+    }]
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        video_data = self._call_api(
+            'clip/load', video_id, 'Downloading video data', query={
+                'appid': 'viu_desktop',
+                'fmt': 'json',
+                'id': video_id
+            })['item'][0]
+
+        title = video_data['title']
+
+        m3u8_url = None
+        url_path = video_data.get('urlpathd') or video_data.get('urlpath')
+        tdirforwhole = video_data.get('tdirforwhole')
+        # #EXT-X-BYTERANGE is not supported by native hls downloader
+        # and ffmpeg (#10955)
+        # hls_file = video_data.get('hlsfile')
+        hls_file = video_data.get('jwhlsfile')
+        if url_path and tdirforwhole and hls_file:
+            m3u8_url = '%s/%s/%s' % (url_path, tdirforwhole, hls_file)
+        else:
+            # m3u8_url = re.sub(
+            #     r'(/hlsc_)[a-z]+(\d+\.m3u8)',
+            #     r'\1whe\2', video_data['href'])
+            m3u8_url = video_data['href']
+        formats = self._extract_m3u8_formats(m3u8_url, video_id, 'mp4')
+        self._sort_formats(formats)
+
+        subtitles = {}
+        for key, value in video_data.items():
+            mobj = re.match(r'^subtitle_(?P<lang>[^_]+)_(?P<ext>(vtt|srt))', key)
+            if not mobj:
+                continue
+            subtitles.setdefault(mobj.group('lang'), []).append({
+                'url': value,
+                'ext': mobj.group('ext')
+            })
+
+        return {
+            'id': video_id,
+            'title': title,
+            'description': video_data.get('description'),
+            'series': video_data.get('moviealbumshowname'),
+            'episode': title,
+            'episode_number': int_or_none(video_data.get('episodeno')),
+            'duration': int_or_none(video_data.get('duration')),
+            'formats': formats,
+            'subtitles': subtitles,
+        }
+
+
+class ViuPlaylistIE(ViuBaseIE):
+    IE_NAME = 'viu:playlist'
+    _VALID_URL = r'https?://www\.viu\.com/[^/]+/listing/playlist-(?P<id>\d+)'
+    _TEST = {
+        'url': 'https://www.viu.com/en/listing/playlist-22461380',
+        'info_dict': {
+            'id': '22461380',
+            'title': 'The Good Wife',
+        },
+        'playlist_count': 16,
+        'skip': 'Geo-restricted to Indonesia',
+    }
+
+    def _real_extract(self, url):
+        playlist_id = self._match_id(url)
+        playlist_data = self._call_api(
+            'container/load', playlist_id,
+            'Downloading playlist info', query={
+                'appid': 'viu_desktop',
+                'fmt': 'json',
+                'id': 'playlist-' + playlist_id
+            })['container']
+
+        entries = []
+        for item in playlist_data.get('item', []):
+            item_id = item.get('id')
+            if not item_id:
+                continue
+            item_id = compat_str(item_id)
+            entries.append(self.url_result(
+                'viu:' + item_id, 'Viu', item_id))
+
+        return self.playlist_result(
+            entries, playlist_id, playlist_data.get('title'))
+
+
+class ViuOTTIE(InfoExtractor):
+    IE_NAME = 'viu:ott'
+    _VALID_URL = r'https?://(?:www\.)?viu\.com/ott/(?P<country_code>[a-z]{2})/[a-z]{2}-[a-z]{2}/vod/(?P<id>\d+)'
+    _TESTS = [{
+        'url': 'http://www.viu.com/ott/sg/en-us/vod/3421/The%20Prime%20Minister%20and%20I',
+        'info_dict': {
+            'id': '3421',
+            'ext': 'mp4',
+            'title': 'A New Beginning',
+            'description': 'md5:1e7486a619b6399b25ba6a41c0fe5b2c',
+        },
+        'params': {
+            'skip_download': 'm3u8 download',
+        },
+        'skip': 'Geo-restricted to Singapore',
+    }, {
+        'url': 'http://www.viu.com/ott/hk/zh-hk/vod/7123/%E5%A4%A7%E4%BA%BA%E5%A5%B3%E5%AD%90',
+        'info_dict': {
+            'id': '7123',
+            'ext': 'mp4',
+            'title': '這就是我的生活之道',
+            'description': 'md5:4eb0d8b08cf04fcdc6bbbeb16043434f',
+        },
+        'params': {
+            'skip_download': 'm3u8 download',
+        },
+        'skip': 'Geo-restricted to Hong Kong',
+    }]
+
+    def _real_extract(self, url):
+        country_code, video_id = re.match(self._VALID_URL, url).groups()
+
+        product_data = self._download_json(
+            'http://www.viu.com/ott/%s/index.php' % country_code, video_id,
+            'Downloading video info', query={
+                'r': 'vod/ajax-detail',
+                'platform_flag_label': 'web',
+                'product_id': video_id,
+            })['data']
+
+        video_data = product_data.get('current_product')
+        if not video_data:
+            raise ExtractorError('This video is not available in your region.', expected=True)
+
+        stream_data = self._download_json(
+            'https://d1k2us671qcoau.cloudfront.net/distribute_web_%s.php' % country_code,
+            video_id, 'Downloading stream info', query={
+                'ccs_product_id': video_data['ccs_product_id'],
+            })['data']['stream']
+
+        stream_sizes = stream_data.get('size', {})
+        formats = []
+        for vid_format, stream_url in stream_data.get('url', {}).items():
+            height = int_or_none(self._search_regex(
+                r's(\d+)p', vid_format, 'height', default=None))
+            formats.append({
+                'format_id': vid_format,
+                'url': stream_url,
+                'height': height,
+                'ext': 'mp4',
+                'filesize': int_or_none(stream_sizes.get(vid_format))
+            })
+        self._sort_formats(formats)
+
+        subtitles = {}
+        for sub in video_data.get('subtitle', []):
+            sub_url = sub.get('url')
+            if not sub_url:
+                continue
+            subtitles.setdefault(sub.get('name'), []).append({
+                'url': sub_url,
+                'ext': 'srt',
+            })
+
+        title = video_data['synopsis'].strip()
+
+        return {
+            'id': video_id,
+            'title': title,
+            'description': video_data.get('description'),
+            'series': product_data.get('series', {}).get('name'),
+            'episode': title,
+            'episode_number': int_or_none(video_data.get('number')),
+            'duration': int_or_none(stream_data.get('duration')),
+            'thumbnail': video_data.get('cover_image_url'),
+            'formats': formats,
+            'subtitles': subtitles,
+        }
diff --git a/youtube_dl/extractor/vk.py b/youtube_dl/extractor/vk.py

index 1990e7093acabb2dce11faebfddd220e8d88392b..6e6c3a0e16361e3ce3120e3200efbcd231c27b86 100644 (file)
--- a/youtube_dl/extractor/vk.py
+++ b/youtube_dl/extractor/vk.py
@@ -245,7 +245,7 @@ class VKIE(VKBaseIE):
              },
          },
          {
-            # finished live stream, live_mp4
+            # finished live stream, postlive_mp4
              'url': 'https://vk.com/videos-387766?z=video-387766_456242764%2Fpl_-387766_-2',
              'md5': '90d22d051fccbbe9becfccc615be6791',
              'info_dict': {
@@ -258,7 +258,7 @@ class VKIE(VKBaseIE):
              },
          },
          {
-            # live stream, hls and rtmp links,most likely already finished live
+            # live stream, hls and rtmp links, most likely already finished live
              # stream by the time you are reading this comment
              'url': 'https://vk.com/video-140332_456239111',
              'only_matching': True,
@@ -378,12 +378,24 @@ def _real_extract(self, url):
          if not data:
              data = self._parse_json(
                  self._search_regex(
-                    r'<!json>\s*({.+?})\s*<!>', info_page, 'json'),
-                video_id)['player']['params'][0]
+                    r'<!json>\s*({.+?})\s*<!>', info_page, 'json', default='{}'),
+                video_id)
+            if data:
+                data = data['player']['params'][0]
+
+        if not data:
+            data = self._parse_json(
+                self._search_regex(
+                    r'var\s+playerParams\s*=\s*({.+?})\s*;\s*\n', info_page,
+                    'player params'),
+                video_id)['params'][0]
  
          title = unescapeHTML(data['md_title'])
  
-        if data.get('live') == 2:
+        # 2 = live
+        # 3 = post live (finished live)
+        is_live = data.get('live') == 2
+        if is_live:
              title = self._live_title(title)
  
          timestamp = unified_timestamp(self._html_search_regex(
@@ -398,7 +410,8 @@ def _real_extract(self, url):
          for format_id, format_url in data.items():
              if not isinstance(format_url, compat_str) or not format_url.startswith(('http', '//', 'rtmp')):
                  continue
-            if format_id.startswith(('url', 'cache')) or format_id in ('extra_data', 'live_mp4'):
+            if (format_id.startswith(('url', 'cache')) or
+                    format_id in ('extra_data', 'live_mp4', 'postlive_mp4')):
                  height = int_or_none(self._search_regex(
                      r'^(?:url|cache)(\d+)', format_id, 'height', default=None))
                  formats.append({
@@ -408,8 +421,9 @@ def _real_extract(self, url):
                  })
              elif format_id == 'hls':
                  formats.extend(self._extract_m3u8_formats(
-                    format_url, video_id, 'mp4', m3u8_id=format_id,
-                    fatal=False, live=True))
+                    format_url, video_id, 'mp4',
+                    entry_protocol='m3u8' if is_live else 'm3u8_native',
+                    m3u8_id=format_id, fatal=False, live=is_live))
              elif format_id == 'rtmp':
                  formats.append({
                      'format_id': format_id,
@@ -427,6 +441,7 @@ def _real_extract(self, url):
              'duration': data.get('duration'),
              'timestamp': timestamp,
              'view_count': view_count,
+            'is_live': is_live,
          }
  
  
diff --git a/youtube_dl/extractor/vlive.py b/youtube_dl/extractor/vlive.py

index 8d671cca767d4592a5428f7d3ad855e952df5353..b9718901b8339e3d5fee85a15d20514b038b66aa 100644 (file)
--- a/youtube_dl/extractor/vlive.py
+++ b/youtube_dl/extractor/vlive.py
@@ -2,22 +2,29 @@
  from __future__ import unicode_literals
  
  import re
+import time
+import itertools
  
  from .common import InfoExtractor
+from ..compat import (
+    compat_urllib_parse_urlencode,
+    compat_str,
+)
  from ..utils import (
      dict_get,
      ExtractorError,
      float_or_none,
      int_or_none,
      remove_start,
+    try_get,
+    urlencode_postdata,
  )
-from ..compat import compat_urllib_parse_urlencode
  
  
  class VLiveIE(InfoExtractor):
      IE_NAME = 'vlive'
      _VALID_URL = r'https?://(?:(?:www|m)\.)?vlive\.tv/video/(?P<id>[0-9]+)'
-    _TEST = {
+    _TESTS = [{
          'url': 'http://www.vlive.tv/video/1326',
          'md5': 'cc7314812855ce56de70a06a27314983',
          'info_dict': {
@@ -27,7 +34,20 @@ class VLiveIE(InfoExtractor):
              'creator': "Girl's Day",
              'view_count': int,
          },
-    }
+    }, {
+        'url': 'http://www.vlive.tv/video/16937',
+        'info_dict': {
+            'id': '16937',
+            'ext': 'mp4',
+            'title': '[V LIVE] 첸백시 걍방',
+            'creator': 'EXO',
+            'view_count': int,
+            'subtitles': 'mincount:12',
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
@@ -35,17 +55,23 @@ def _real_extract(self, url):
          webpage = self._download_webpage(
              'http://www.vlive.tv/video/%s' % video_id, video_id)
  
-        video_params = self._search_regex(
-            r'\bvlive\.video\.init\(([^)]+)\)',
-            webpage, 'video params')
-        status, _, _, live_params, long_video_id, key = re.split(
-            r'"\s*,\s*"', video_params)[2:8]
+        VIDEO_PARAMS_RE = r'\bvlive\.video\.init\(([^)]+)'
+        VIDEO_PARAMS_FIELD = 'video params'
+
+        params = self._parse_json(self._search_regex(
+            VIDEO_PARAMS_RE, webpage, VIDEO_PARAMS_FIELD, default=''), video_id,
+            transform_source=lambda s: '[' + s + ']', fatal=False)
+
+        if not params or len(params) < 7:
+            params = self._search_regex(
+                VIDEO_PARAMS_RE, webpage, VIDEO_PARAMS_FIELD)
+            params = [p.strip(r'"') for p in re.split(r'\s*,\s*', params)]
+
+        status, long_video_id, key = params[2], params[5], params[6]
          status = remove_start(status, 'PRODUCT_')
  
          if status == 'LIVE_ON_AIR' or status == 'BIG_EVENT_ON_AIR':
-            live_params = self._parse_json('"%s"' % live_params, video_id)
-            live_params = self._parse_json(live_params, video_id)
-            return self._live(video_id, webpage, live_params)
+            return self._live(video_id, webpage)
          elif status == 'VOD_ON_AIR' or status == 'BIG_EVENT_INTRO':
              if long_video_id and key:
                  return self._replay(video_id, webpage, long_video_id, key)
@@ -76,7 +102,22 @@ def _get_common_fields(self, webpage):
              'thumbnail': thumbnail,
          }
  
-    def _live(self, video_id, webpage, live_params):
+    def _live(self, video_id, webpage):
+        init_page = self._download_webpage(
+            'http://www.vlive.tv/video/init/view',
+            video_id, note='Downloading live webpage',
+            data=urlencode_postdata({'videoSeq': video_id}),
+            headers={
+                'Referer': 'http://www.vlive.tv/video/%s' % video_id,
+                'Content-Type': 'application/x-www-form-urlencoded'
+            })
+
+        live_params = self._search_regex(
+            r'"liveStreamInfo"\s*:\s*(".*"),',
+            init_page, 'live stream info')
+        live_params = self._parse_json(live_params, video_id)
+        live_params = self._parse_json(live_params, video_id)
+
          formats = []
          for vid in live_params.get('resolutions', []):
              formats.extend(self._extract_m3u8_formats(
@@ -85,10 +126,14 @@ def _live(self, video_id, webpage, live_params):
                  fatal=False, live=True))
          self._sort_formats(formats)
  
-        return dict(self._get_common_fields(webpage),
-                    id=video_id,
-                    formats=formats,
-                    is_live=True)
+        info = self._get_common_fields(webpage)
+        info.update({
+            'title': self._live_title(info['title']),
+            'id': video_id,
+            'formats': formats,
+            'is_live': True,
+        })
+        return info
  
      def _replay(self, video_id, webpage, long_video_id, key):
          playinfo = self._download_json(
@@ -116,14 +161,103 @@ def _replay(self, video_id, webpage, long_video_id, key):
  
          subtitles = {}
          for caption in playinfo.get('captions', {}).get('list', []):
-            lang = dict_get(caption, ('language', 'locale', 'country', 'label'))
+            lang = dict_get(caption, ('locale', 'language', 'country', 'label'))
              if lang and caption.get('source'):
                  subtitles[lang] = [{
                      'ext': 'vtt',
                      'url': caption['source']}]
  
-        return dict(self._get_common_fields(webpage),
-                    id=video_id,
-                    formats=formats,
-                    view_count=view_count,
-                    subtitles=subtitles)
+        info = self._get_common_fields(webpage)
+        info.update({
+            'id': video_id,
+            'formats': formats,
+            'view_count': view_count,
+            'subtitles': subtitles,
+        })
+        return info
+
+
+class VLiveChannelIE(InfoExtractor):
+    IE_NAME = 'vlive:channel'
+    _VALID_URL = r'https?://channels\.vlive\.tv/(?P<id>[0-9A-Z]+)'
+    _TEST = {
+        'url': 'http://channels.vlive.tv/FCD4B',
+        'info_dict': {
+            'id': 'FCD4B',
+            'title': 'MAMAMOO',
+        },
+        'playlist_mincount': 110
+    }
+    _APP_ID = '8c6cc7b45d2568fb668be6e05b6e5a3b'
+
+    def _real_extract(self, url):
+        channel_code = self._match_id(url)
+
+        webpage = self._download_webpage(
+            'http://channels.vlive.tv/%s/video' % channel_code, channel_code)
+
+        app_id = None
+
+        app_js_url = self._search_regex(
+            r'<script[^>]+src=(["\'])(?P<url>http.+?/app\.js.*?)\1',
+            webpage, 'app js', default=None, group='url')
+
+        if app_js_url:
+            app_js = self._download_webpage(
+                app_js_url, channel_code, 'Downloading app JS', fatal=False)
+            if app_js:
+                app_id = self._search_regex(
+                    r'Global\.VFAN_APP_ID\s*=\s*[\'"]([^\'"]+)[\'"]',
+                    app_js, 'app id', default=None)
+
+        app_id = app_id or self._APP_ID
+
+        channel_info = self._download_json(
+            'http://api.vfan.vlive.tv/vproxy/channelplus/decodeChannelCode',
+            channel_code, note='Downloading decode channel code',
+            query={
+                'app_id': app_id,
+                'channelCode': channel_code,
+                '_': int(time.time())
+            })
+
+        channel_seq = channel_info['result']['channelSeq']
+        channel_name = None
+        entries = []
+
+        for page_num in itertools.count(1):
+            video_list = self._download_json(
+                'http://api.vfan.vlive.tv/vproxy/channelplus/getChannelVideoList',
+                channel_code, note='Downloading channel list page #%d' % page_num,
+                query={
+                    'app_id': app_id,
+                    'channelSeq': channel_seq,
+                    'maxNumOfRows': 1000,
+                    '_': int(time.time()),
+                    'pageNo': page_num
+                }
+            )
+
+            if not channel_name:
+                channel_name = try_get(
+                    video_list,
+                    lambda x: x['result']['channelInfo']['channelName'],
+                    compat_str)
+
+            videos = try_get(
+                video_list, lambda x: x['result']['videoList'], list)
+            if not videos:
+                break
+
+            for video in videos:
+                video_id = video.get('videoSeq')
+                if not video_id:
+                    continue
+                video_id = compat_str(video_id)
+                entries.append(
+                    self.url_result(
+                        'http://www.vlive.tv/video/%s' % video_id,
+                        ie=VLiveIE.ie_key(), video_id=video_id))
+
+        return self.playlist_result(
+            entries, channel_code, channel_name)
diff --git a/youtube_dl/extractor/vodlocker.py b/youtube_dl/extractor/vodlocker.py

index bbfa6e5f26f6043af52ae168b6e2cebb7463edfc..02c9617d297926b75b62cfdd5a4b33f085a9741a 100644 (file)
--- a/youtube_dl/extractor/vodlocker.py
+++ b/youtube_dl/extractor/vodlocker.py
@@ -20,7 +20,7 @@ class VodlockerIE(InfoExtractor):
              'id': 'e8wvyzz4sl42',
              'ext': 'mp4',
              'title': 'Germany vs Brazil',
-            'thumbnail': 're:http://.*\.jpg',
+            'thumbnail': r're:http://.*\.jpg',
          },
      }]
  
diff --git a/youtube_dl/extractor/voicerepublic.py b/youtube_dl/extractor/voicerepublic.py

index 4f1a99a8989d736c1de572e6372b022544102f87..59e1359c48628af9b4c53bedc337fa6b9b3d1396 100644 (file)
--- a/youtube_dl/extractor/voicerepublic.py
+++ b/youtube_dl/extractor/voicerepublic.py
@@ -26,7 +26,7 @@ class VoiceRepublicIE(InfoExtractor):
              'ext': 'm4a',
              'title': 'Watching the Watchers: Building a Sousveillance State',
              'description': 'Secret surveillance programs have metadata too. The people and companies that operate secret surveillance programs can be surveilled.',
-            'thumbnail': 're:^https?://.*\.(?:png|jpg)$',
+            'thumbnail': r're:^https?://.*\.(?:png|jpg)$',
              'duration': 1800,
              'view_count': int,
          }
diff --git a/youtube_dl/extractor/vporn.py b/youtube_dl/extractor/vporn.py

index 1557a0e0406ebfb75c2b5b4583c74f05c5dd2cc7..858ac9e71422548f688600450e3f7cc0a630c500 100644 (file)
--- a/youtube_dl/extractor/vporn.py
+++ b/youtube_dl/extractor/vporn.py
@@ -7,6 +7,7 @@
      ExtractorError,
      parse_duration,
      str_to_int,
+    urljoin,
  )
  
  
@@ -22,7 +23,7 @@ class VpornIE(InfoExtractor):
                  'ext': 'mp4',
                  'title': 'Violet on her 19th birthday',
                  'description': 'Violet dances in front of the camera which is sure to get you horny.',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
                  'uploader': 'kileyGrope',
                  'categories': ['Masturbation', 'Teen'],
                  'duration': 393,
@@ -40,7 +41,7 @@ class VpornIE(InfoExtractor):
                  'ext': 'mp4',
                  'title': 'Hana Shower',
                  'description': 'Hana showers at the bathroom.',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
                  'uploader': 'Hmmmmm',
                  'categories': ['Big Boobs', 'Erotic', 'Teen', 'Female', '720p'],
                  'duration': 588,
@@ -66,10 +67,9 @@ def _real_extract(self, url):
          description = self._html_search_regex(
              r'class="(?:descr|description_txt)">(.*?)</div>',
              webpage, 'description', fatal=False)
-        thumbnail = self._html_search_regex(
-            r'flashvars\.imageUrl\s*=\s*"([^"]+)"', webpage, 'description', fatal=False, default=None)
-        if thumbnail:
-            thumbnail = 'http://www.vporn.com' + thumbnail
+        thumbnail = urljoin('http://www.vporn.com', self._html_search_regex(
+            r'flashvars\.imageUrl\s*=\s*"([^"]+)"', webpage, 'description',
+            default=None))
  
          uploader = self._html_search_regex(
              r'(?s)Uploaded by:.*?<a href="/user/[^"]+"[^>]*>(.+?)</a>',
diff --git a/youtube_dl/extractor/vube.py b/youtube_dl/extractor/vube.py

index 10ca6acb12469f85267405f9431b9508c0537e57..8ce3a6b81b6ed69e5fd4960ab4721ecd45bf750e 100644 (file)
--- a/youtube_dl/extractor/vube.py
+++ b/youtube_dl/extractor/vube.py
@@ -26,7 +26,7 @@ class VubeIE(InfoExtractor):
                  'ext': 'mp4',
                  'title': 'Best Drummer Ever [HD]',
                  'description': 'md5:2d63c4b277b85c2277761c2cf7337d71',
-                'thumbnail': 're:^https?://.*\.jpg',
+                'thumbnail': r're:^https?://.*\.jpg',
                  'uploader': 'William',
                  'timestamp': 1406876915,
                  'upload_date': '20140801',
@@ -45,7 +45,7 @@ class VubeIE(InfoExtractor):
                  'ext': 'mp4',
                  'title': 'Chiara Grispo - Price Tag by Jessie J',
                  'description': 'md5:8ea652a1f36818352428cb5134933313',
-                'thumbnail': 're:^http://frame\.thestaticvube\.com/snap/[0-9x]+/102e7e63057-5ebc-4f5c-4065-6ce4ebde131f\.jpg$',
+                'thumbnail': r're:^http://frame\.thestaticvube\.com/snap/[0-9x]+/102e7e63057-5ebc-4f5c-4065-6ce4ebde131f\.jpg$',
                  'uploader': 'Chiara.Grispo',
                  'timestamp': 1388743358,
                  'upload_date': '20140103',
@@ -65,7 +65,7 @@ class VubeIE(InfoExtractor):
                  'ext': 'mp4',
                  'title': 'My 7 year old Sister and I singing "Alive" by Krewella',
                  'description': 'md5:40bcacb97796339f1690642c21d56f4a',
-                'thumbnail': 're:^http://frame\.thestaticvube\.com/snap/[0-9x]+/102265d5a9f-0f17-4f6b-5753-adf08484ee1e\.jpg$',
+                'thumbnail': r're:^http://frame\.thestaticvube\.com/snap/[0-9x]+/102265d5a9f-0f17-4f6b-5753-adf08484ee1e\.jpg$',
                  'uploader': 'Seraina',
                  'timestamp': 1396492438,
                  'upload_date': '20140403',
@@ -84,7 +84,7 @@ class VubeIE(InfoExtractor):
                  'ext': 'mp4',
                  'title': 'Frozen - Let It Go Cover by Siren Gene',
                  'description': 'My rendition of "Let It Go" originally sung by Idina Menzel.',
-                'thumbnail': 're:^http://frame\.thestaticvube\.com/snap/[0-9x]+/10283ab622a-86c9-4681-51f2-30d1f65774af\.jpg$',
+                'thumbnail': r're:^http://frame\.thestaticvube\.com/snap/[0-9x]+/10283ab622a-86c9-4681-51f2-30d1f65774af\.jpg$',
                  'uploader': 'Siren',
                  'timestamp': 1395448018,
                  'upload_date': '20140322',
diff --git a/youtube_dl/extractor/vvvvid.py b/youtube_dl/extractor/vvvvid.py

new file mode 100644 (file)

index 0000000..d44ec85
--- /dev/null
+++ b/youtube_dl/extractor/vvvvid.py
@@ -0,0 +1,140 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..utils import (
+    ExtractorError,
+    int_or_none,
+    str_or_none,
+)
+
+
+class VVVVIDIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?vvvvid\.it/#!(?:show|anime|film|series)/(?P<show_id>\d+)/[^/]+/(?P<season_id>\d+)/(?P<id>[0-9]+)'
+    _TESTS = [{
+        # video_type == 'video/vvvvid'
+        'url': 'https://www.vvvvid.it/#!show/434/perche-dovrei-guardarlo-di-dario-moccia/437/489048/ping-pong',
+        'md5': 'b8d3cecc2e981adc3835adf07f6df91b',
+        'info_dict': {
+            'id': '489048',
+            'ext': 'mp4',
+            'title': 'Ping Pong',
+        },
+    }, {
+        # video_type == 'video/rcs'
+        'url': 'https://www.vvvvid.it/#!show/376/death-note-live-action/377/482493/episodio-01',
+        'md5': '33e0edfba720ad73a8782157fdebc648',
+        'info_dict': {
+            'id': '482493',
+            'ext': 'mp4',
+            'title': 'Episodio 01',
+        },
+    }]
+    _conn_id = None
+
+    def _real_initialize(self):
+        self._conn_id = self._download_json(
+            'https://www.vvvvid.it/user/login',
+            None, headers=self.geo_verification_headers())['data']['conn_id']
+
+    def _real_extract(self, url):
+        show_id, season_id, video_id = re.match(self._VALID_URL, url).groups()
+        response = self._download_json(
+            'https://www.vvvvid.it/vvvvid/ondemand/%s/season/%s' % (show_id, season_id),
+            video_id, headers=self.geo_verification_headers(), query={
+                'conn_id': self._conn_id,
+            })
+        if response['result'] == 'error':
+            raise ExtractorError('%s said: %s' % (
+                self.IE_NAME, response['message']), expected=True)
+
+        vid = int(video_id)
+        video_data = list(filter(
+            lambda episode: episode.get('video_id') == vid, response['data']))[0]
+        formats = []
+
+        # vvvvid embed_info decryption algorithm is reverse engineered from function $ds(h) at vvvvid.js
+        def ds(h):
+            g = "MNOPIJKL89+/4567UVWXQRSTEFGHABCDcdefYZabstuvopqr0123wxyzklmnghij"
+
+            def f(m):
+                l = []
+                o = 0
+                b = False
+                m_len = len(m)
+                while ((not b) and o < m_len):
+                    n = m[o] << 2
+                    o += 1
+                    k = -1
+                    j = -1
+                    if o < m_len:
+                        n += m[o] >> 4
+                        o += 1
+                        if o < m_len:
+                            k = (m[o - 1] << 4) & 255
+                            k += m[o] >> 2
+                            o += 1
+                            if o < m_len:
+                                j = (m[o - 1] << 6) & 255
+                                j += m[o]
+                                o += 1
+                            else:
+                                b = True
+                        else:
+                            b = True
+                    else:
+                        b = True
+                    l.append(n)
+                    if k != -1:
+                        l.append(k)
+                    if j != -1:
+                        l.append(j)
+                return l
+
+            c = []
+            for e in h:
+                c.append(g.index(e))
+
+            c_len = len(c)
+            for e in range(c_len * 2 - 1, -1, -1):
+                a = c[e % c_len] ^ c[(e + 1) % c_len]
+                c[e % c_len] = a
+
+            c = f(c)
+            d = ''
+            for e in c:
+                d += chr(e)
+
+            return d
+
+        for quality in ('_sd', ''):
+            embed_code = video_data.get('embed_info' + quality)
+            if not embed_code:
+                continue
+            embed_code = ds(embed_code)
+            video_type = video_data.get('video_type')
+            if video_type in ('video/rcs', 'video/kenc'):
+                formats.extend(self._extract_akamai_formats(
+                    embed_code, video_id))
+            else:
+                formats.extend(self._extract_wowza_formats(
+                    'http://sb.top-ix.org/videomg/_definst_/mp4:%s/playlist.m3u8' % embed_code, video_id))
+        self._sort_formats(formats)
+
+        return {
+            'id': video_id,
+            'title': video_data['title'],
+            'formats': formats,
+            'thumbnail': video_data.get('thumbnail'),
+            'duration': int_or_none(video_data.get('length')),
+            'series': video_data.get('show_title'),
+            'season_id': season_id,
+            'season_number': video_data.get('season_number'),
+            'episode_id': str_or_none(video_data.get('id')),
+            'epidode_number': int_or_none(video_data.get('number')),
+            'episode_title': video_data['title'],
+            'view_count': int_or_none(video_data.get('views')),
+            'like_count': int_or_none(video_data.get('video_likes')),
+        }
diff --git a/youtube_dl/extractor/walla.py b/youtube_dl/extractor/walla.py

index 8b9488340368ea0292fa2614336778099c9eb11e..cbb54867244839e0447324f5a57e07cef2f6c646 100644 (file)
--- a/youtube_dl/extractor/walla.py
+++ b/youtube_dl/extractor/walla.py
@@ -20,7 +20,7 @@ class WallaIE(InfoExtractor):
              'ext': 'flv',
              'title': 'וואן דיירקשן: ההיסטריה',
              'description': 'md5:de9e2512a92442574cdb0913c49bc4d8',
-            'thumbnail': 're:^https?://.*\.jpg',
+            'thumbnail': r're:^https?://.*\.jpg',
              'duration': 3600,
          },
          'params': {
diff --git a/youtube_dl/extractor/watchindianporn.py b/youtube_dl/extractor/watchindianporn.py

index 5d3b5bdb4cb904acabea0864dd76be8a0cc62c30..ed099beea632b7eed16e93a7b386b6aec3089778 100644 (file)
--- a/youtube_dl/extractor/watchindianporn.py
+++ b/youtube_dl/extractor/watchindianporn.py
@@ -22,7 +22,7 @@ class WatchIndianPornIE(InfoExtractor):
              'display_id': 'hot-milf-from-kerala-shows-off-her-gorgeous-large-breasts-on-camera',
              'ext': 'mp4',
              'title': 'Hot milf from kerala shows off her gorgeous large breasts on camera',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'uploader': 'LoveJay',
              'upload_date': '20160428',
              'duration': 226,
diff --git a/youtube_dl/extractor/webcaster.py b/youtube_dl/extractor/webcaster.py

new file mode 100644 (file)

index 0000000..e4b65f5
--- /dev/null
+++ b/youtube_dl/extractor/webcaster.py
@@ -0,0 +1,102 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+import re
+
+from .common import InfoExtractor
+from ..utils import (
+    determine_ext,
+    xpath_text,
+)
+
+
+class WebcasterIE(InfoExtractor):
+    _VALID_URL = r'https?://bl\.webcaster\.pro/(?:quote|media)/start/free_(?P<id>[^/]+)'
+    _TESTS = [{
+        # http://video.khl.ru/quotes/393859
+        'url': 'http://bl.webcaster.pro/quote/start/free_c8cefd240aa593681c8d068cff59f407_hd/q393859/eb173f99dd5f558674dae55f4ba6806d/1480289104?sr%3D105%26fa%3D1%26type_id%3D18',
+        'md5': '0c162f67443f30916ff1c89425dcd4cd',
+        'info_dict': {
+            'id': 'c8cefd240aa593681c8d068cff59f407_hd',
+            'ext': 'mp4',
+            'title': 'Сибирь - Нефтехимик. Лучшие моменты первого периода',
+            'thumbnail': r're:^https?://.*\.jpg$',
+        },
+    }, {
+        'url': 'http://bl.webcaster.pro/media/start/free_6246c7a4453ac4c42b4398f840d13100_hd/2_2991109016/e8d0d82587ef435480118f9f9c41db41/4635726126',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        video = self._download_xml(url, video_id)
+
+        title = xpath_text(video, './/event_name', 'event name', fatal=True)
+
+        def make_id(parts, separator):
+            return separator.join(filter(None, parts))
+
+        formats = []
+        for format_id in (None, 'noise'):
+            track_tag = make_id(('track', format_id), '_')
+            for track in video.findall('.//iphone/%s' % track_tag):
+                track_url = track.text
+                if not track_url:
+                    continue
+                if determine_ext(track_url) == 'm3u8':
+                    m3u8_formats = self._extract_m3u8_formats(
+                        track_url, video_id, 'mp4',
+                        entry_protocol='m3u8_native',
+                        m3u8_id=make_id(('hls', format_id), '-'), fatal=False)
+                    for f in m3u8_formats:
+                        f.update({
+                            'source_preference': 0 if format_id == 'noise' else 1,
+                            'format_note': track.get('title'),
+                        })
+                    formats.extend(m3u8_formats)
+        self._sort_formats(formats)
+
+        thumbnail = xpath_text(video, './/image', 'thumbnail')
+
+        return {
+            'id': video_id,
+            'title': title,
+            'thumbnail': thumbnail,
+            'formats': formats,
+        }
+
+
+class WebcasterFeedIE(InfoExtractor):
+    _VALID_URL = r'https?://bl\.webcaster\.pro/feed/start/free_(?P<id>[^/]+)'
+    _TEST = {
+        'url': 'http://bl.webcaster.pro/feed/start/free_c8cefd240aa593681c8d068cff59f407_hd/q393859/eb173f99dd5f558674dae55f4ba6806d/1480289104',
+        'only_matching': True,
+    }
+
+    @staticmethod
+    def _extract_url(ie, webpage):
+        mobj = re.search(
+            r'<(?:object|a[^>]+class=["\']webcaster-player["\'])[^>]+data(?:-config)?=(["\']).*?config=(?P<url>https?://bl\.webcaster\.pro/feed/start/free_.*?)(?:[?&]|\1)',
+            webpage)
+        if mobj:
+            return mobj.group('url')
+        for secure in (True, False):
+            video_url = ie._og_search_video_url(
+                webpage, secure=secure, default=None)
+            if video_url:
+                mobj = re.search(
+                    r'config=(?P<url>https?://bl\.webcaster\.pro/feed/start/free_[^?&=]+)',
+                    video_url)
+                if mobj:
+                    return mobj.group('url')
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        feed = self._download_xml(url, video_id)
+
+        video_url = xpath_text(
+            feed, ('video_hd', 'video'), 'video url', fatal=True)
+
+        return self.url_result(video_url, WebcasterIE.ie_key())
diff --git a/youtube_dl/extractor/webofstories.py b/youtube_dl/extractor/webofstories.py

index 7aea47ed52f7f64032034ab43d51dbe524bff2b3..1eb1f67024acfca7948d7450f2bca849069190cc 100644 (file)
--- a/youtube_dl/extractor/webofstories.py
+++ b/youtube_dl/extractor/webofstories.py
@@ -19,7 +19,7 @@ class WebOfStoriesIE(InfoExtractor):
              'id': '4536',
              'ext': 'mp4',
              'title': 'The temperature of the sun',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'description': 'Hans Bethe talks about calculating the temperature of the sun',
              'duration': 238,
          }
@@ -30,7 +30,7 @@ class WebOfStoriesIE(InfoExtractor):
              'id': '55908',
              'ext': 'mp4',
              'title': 'The story of Gemmata obscuriglobus',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'description': 'Planctomycete talks about The story of Gemmata obscuriglobus',
              'duration': 169,
          },
@@ -42,7 +42,7 @@ class WebOfStoriesIE(InfoExtractor):
              'id': '54215',
              'ext': 'mp4',
              'title': '"A Leg to Stand On"',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'description': 'Oliver Sacks talks about the death and resurrection of a limb',
              'duration': 97,
          },
@@ -134,7 +134,7 @@ def _real_extract(self, url):
  
          entries = [
              self.url_result('http://www.webofstories.com/play/%s' % video_number, 'WebOfStories')
-            for video_number in set(re.findall('href="/playAll/%s\?sId=(\d+)"' % playlist_id, webpage))
+            for video_number in set(re.findall(r'href="/playAll/%s\?sId=(\d+)"' % playlist_id, webpage))
          ]
  
          title = self._search_regex(
diff --git a/youtube_dl/extractor/weiqitv.py b/youtube_dl/extractor/weiqitv.py

index 8e09156c26c58b4cc184dbe97e679ee9b8dfa47f..7e0befd3922b15194fbcf5e142419c1c62ae46c3 100644 (file)
--- a/youtube_dl/extractor/weiqitv.py
+++ b/youtube_dl/extractor/weiqitv.py
@@ -37,11 +37,11 @@ def _real_extract(self, url):
          page = self._download_webpage(url, media_id)
  
          info_json_str = self._search_regex(
-            'var\s+video\s*=\s*(.+});', page, 'info json str')
+            r'var\s+video\s*=\s*(.+});', page, 'info json str')
          info_json = self._parse_json(info_json_str, media_id)
  
          letvcloud_url = self._search_regex(
-            'var\s+letvurl\s*=\s*"([^"]+)', page, 'letvcloud url')
+            r'var\s+letvurl\s*=\s*"([^"]+)', page, 'letvcloud url')
  
          return {
              '_type': 'url_transparent',
diff --git a/youtube_dl/extractor/xbef.py b/youtube_dl/extractor/xbef.py

index e4a2baad22534d772a90b8ec5832c11833f10281..4c41e98b27ff1ede17b9d0d803f18f5de3ebafe3 100644 (file)
--- a/youtube_dl/extractor/xbef.py
+++ b/youtube_dl/extractor/xbef.py
@@ -14,7 +14,7 @@ class XBefIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'md5:7358a9faef8b7b57acda7c04816f170e',
              'age_limit': 18,
-            'thumbnail': 're:^http://.*\.jpg',
+            'thumbnail': r're:^http://.*\.jpg',
          }
      }
  
diff --git a/youtube_dl/extractor/xfileshare.py b/youtube_dl/extractor/xfileshare.py

index de344bad25309c03b1d7378ceb6b3968c2d4c47a..e616adce3ab3333291a316d19c224c846006feea 100644 (file)
--- a/youtube_dl/extractor/xfileshare.py
+++ b/youtube_dl/extractor/xfileshare.py
@@ -44,7 +44,7 @@ class XFileShareIE(InfoExtractor):
              'id': '06y9juieqpmi',
              'ext': 'mp4',
              'title': 'Rebecca Black My Moment Official Music Video Reaction-6GK87Rc8bzQ',
-            'thumbnail': 're:http://.*\.jpg',
+            'thumbnail': r're:http://.*\.jpg',
          },
      }, {
          'url': 'http://gorillavid.in/embed-z08zf8le23c6-960x480.html',
@@ -56,7 +56,7 @@ class XFileShareIE(InfoExtractor):
              'id': '3rso4kdn6f9m',
              'ext': 'mp4',
              'title': 'Micro Pig piglets ready on 16th July 2009-bG0PdrCdxUc',
-            'thumbnail': 're:http://.*\.jpg',
+            'thumbnail': r're:http://.*\.jpg',
          }
      }, {
          'url': 'http://movpod.in/0wguyyxi1yca',
@@ -67,7 +67,7 @@ class XFileShareIE(InfoExtractor):
              'id': '3ivfabn7573c',
              'ext': 'mp4',
              'title': 'youtube-dl test video \'äBaW_jenozKc.mp4.mp4',
-            'thumbnail': 're:http://.*\.jpg',
+            'thumbnail': r're:http://.*\.jpg',
          },
          'skip': 'Video removed',
      }, {
diff --git a/youtube_dl/extractor/xhamster.py b/youtube_dl/extractor/xhamster.py

index bd8e1af2e0f6c25fc44aea36c23b813b092b4438..36a8c98407bcaf8466660279f2e444df48083a3a 100644 (file)
--- a/youtube_dl/extractor/xhamster.py
+++ b/youtube_dl/extractor/xhamster.py
@@ -5,8 +5,8 @@
  from .common import InfoExtractor
  from ..utils import (
      dict_get,
-    float_or_none,
      int_or_none,
+    parse_duration,
      unified_strdate,
  )
  
@@ -22,7 +22,7 @@ class XHamsterIE(InfoExtractor):
              'title': 'FemaleAgent Shy beauty takes the bait',
              'upload_date': '20121014',
              'uploader': 'Ruseful2011',
-            'duration': 893.52,
+            'duration': 893,
              'age_limit': 18,
          },
      }, {
@@ -33,7 +33,7 @@ class XHamsterIE(InfoExtractor):
              'title': 'Britney Spears  Sexy Booty',
              'upload_date': '20130914',
              'uploader': 'jojo747400',
-            'duration': 200.48,
+            'duration': 200,
              'age_limit': 18,
          },
          'params': {
@@ -48,7 +48,7 @@ class XHamsterIE(InfoExtractor):
              'title': '....',
              'upload_date': '20160208',
              'uploader': 'parejafree',
-            'duration': 72.0,
+            'duration': 72,
              'age_limit': 18,
          },
          'params': {
@@ -101,9 +101,9 @@ def is_hd(webpage):
               r'''<video[^>]+poster=(?P<q>["'])(?P<thumbnail>.+?)(?P=q)[^>]*>'''],
              webpage, 'thumbnail', fatal=False, group='thumbnail')
  
-        duration = float_or_none(self._search_regex(
-            r'(["\'])duration\1\s*:\s*(["\'])(?P<duration>.+?)\2',
-            webpage, 'duration', fatal=False, group='duration'))
+        duration = parse_duration(self._search_regex(
+            r'Runtime:\s*</span>\s*([\d:]+)', webpage,
+            'duration', fatal=False))
  
          view_count = int_or_none(self._search_regex(
              r'content=["\']User(?:View|Play)s:(\d+)',
diff --git a/youtube_dl/extractor/xiami.py b/youtube_dl/extractor/xiami.py

index 86abef25704bbc9e8a1494ecc2146b5c5bfabe32..d017e03de2092c8726bdad7e86b364b57e44e136 100644 (file)
--- a/youtube_dl/extractor/xiami.py
+++ b/youtube_dl/extractor/xiami.py
@@ -16,7 +16,9 @@ def _download_webpage(self, *args, **kwargs):
          return webpage
  
      def _extract_track(self, track, track_id=None):
-        title = track['title']
+        track_name = track.get('songName') or track.get('name') or track['subName']
+        artist = track.get('artist') or track.get('artist_name') or track.get('singers')
+        title = '%s - %s' % (artist, track_name) if artist else track_name
          track_url = self._decrypt(track['location'])
  
          subtitles = {}
@@ -31,9 +33,10 @@ def _extract_track(self, track, track_id=None):
              'thumbnail': track.get('pic') or track.get('album_pic'),
              'duration': int_or_none(track.get('length')),
              'creator': track.get('artist', '').split(';')[0],
-            'track': title,
-            'album': track.get('album_name'),
-            'artist': track.get('artist'),
+            'track': track_name,
+            'track_number': int_or_none(track.get('track')),
+            'album': track.get('album_name') or track.get('title'),
+            'artist': artist,
              'subtitles': subtitles,
          }
  
@@ -68,14 +71,14 @@ def _decrypt(origin):
  class XiamiSongIE(XiamiBaseIE):
      IE_NAME = 'xiami:song'
      IE_DESC = '虾米音乐'
-    _VALID_URL = r'https?://(?:www\.)?xiami\.com/song/(?P<id>[0-9]+)'
+    _VALID_URL = r'https?://(?:www\.)?xiami\.com/song/(?P<id>[^/?#&]+)'
      _TESTS = [{
          'url': 'http://www.xiami.com/song/1775610518',
          'md5': '521dd6bea40fd5c9c69f913c232cb57e',
          'info_dict': {
              'id': '1775610518',
              'ext': 'mp3',
-            'title': 'Woman',
+            'title': 'HONNE - Woman',
              'thumbnail': r're:http://img\.xiami\.net/images/album/.*\.jpg',
              'duration': 265,
              'creator': 'HONNE',
@@ -95,7 +98,7 @@ class XiamiSongIE(XiamiBaseIE):
          'info_dict': {
              'id': '1775256504',
              'ext': 'mp3',
-            'title': '悟空',
+            'title': 'æ\88´è\8d\83 - æ\82\9fç©º',
              'thumbnail': r're:http://img\.xiami\.net/images/album/.*\.jpg',
              'duration': 200,
              'creator': '戴荃',
@@ -109,6 +112,26 @@ class XiamiSongIE(XiamiBaseIE):
              },
          },
          'skip': 'Georestricted',
+    }, {
+        'url': 'http://www.xiami.com/song/1775953850',
+        'info_dict': {
+            'id': '1775953850',
+            'ext': 'mp3',
+            'title': 'До Скону - Чума Пожирает Землю',
+            'thumbnail': r're:http://img\.xiami\.net/images/album/.*\.jpg',
+            'duration': 683,
+            'creator': 'До Скону',
+            'track': 'Чума Пожирает Землю',
+            'track_number': 7,
+            'album': 'Ад',
+            'artist': 'До Скону',
+        },
+        'params': {
+            'skip_download': True,
+        },
+    }, {
+        'url': 'http://www.xiami.com/song/xLHGwgd07a1',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
@@ -124,7 +147,7 @@ def _real_extract(self, url):
  class XiamiAlbumIE(XiamiPlaylistBaseIE):
      IE_NAME = 'xiami:album'
      IE_DESC = '虾米音乐 - 专辑'
-    _VALID_URL = r'https?://(?:www\.)?xiami\.com/album/(?P<id>[0-9]+)'
+    _VALID_URL = r'https?://(?:www\.)?xiami\.com/album/(?P<id>[^/?#&]+)'
      _TYPE = '1'
      _TESTS = [{
          'url': 'http://www.xiami.com/album/2100300444',
@@ -136,28 +159,34 @@ class XiamiAlbumIE(XiamiPlaylistBaseIE):
      }, {
          'url': 'http://www.xiami.com/album/512288?spm=a1z1s.6843761.1110925389.6.hhE9p9',
          'only_matching': True,
+    }, {
+        'url': 'http://www.xiami.com/album/URVDji2a506',
+        'only_matching': True,
      }]
  
  
  class XiamiArtistIE(XiamiPlaylistBaseIE):
      IE_NAME = 'xiami:artist'
      IE_DESC = '虾米音乐 - 歌手'
-    _VALID_URL = r'https?://(?:www\.)?xiami\.com/artist/(?P<id>[0-9]+)'
+    _VALID_URL = r'https?://(?:www\.)?xiami\.com/artist/(?P<id>[^/?#&]+)'
      _TYPE = '2'
-    _TEST = {
+    _TESTS = [{
          'url': 'http://www.xiami.com/artist/2132?spm=0.0.0.0.dKaScp',
          'info_dict': {
              'id': '2132',
          },
          'playlist_count': 20,
          'skip': 'Georestricted',
-    }
+    }, {
+        'url': 'http://www.xiami.com/artist/bC5Tk2K6eb99',
+        'only_matching': True,
+    }]
  
  
  class XiamiCollectionIE(XiamiPlaylistBaseIE):
      IE_NAME = 'xiami:collection'
      IE_DESC = '虾米音乐 - 精选集'
-    _VALID_URL = r'https?://(?:www\.)?xiami\.com/collect/(?P<id>[0-9]+)'
+    _VALID_URL = r'https?://(?:www\.)?xiami\.com/collect/(?P<id>[^/?#&]+)'
      _TYPE = '3'
      _TEST = {
          'url': 'http://www.xiami.com/collect/156527391?spm=a1z1s.2943601.6856193.12.4jpBnr',
diff --git a/youtube_dl/extractor/xuite.py b/youtube_dl/extractor/xuite.py

index 4b9c1ee9c5222f48c5634184f703baa062cf3ae9..e0818201a2b9ff122904fcbbf7a775a770c5dc5c 100644 (file)
--- a/youtube_dl/extractor/xuite.py
+++ b/youtube_dl/extractor/xuite.py
@@ -24,7 +24,7 @@ class XuiteIE(InfoExtractor):
              'id': '3860914',
              'ext': 'mp3',
              'title': '孤單南半球-歐德陽',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 247.246,
              'timestamp': 1314932940,
              'upload_date': '20110902',
@@ -40,7 +40,7 @@ class XuiteIE(InfoExtractor):
              'id': '25925099',
              'ext': 'mp4',
              'title': 'BigBuckBunny_320x180',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 596.458,
              'timestamp': 1454242500,
              'upload_date': '20160131',
@@ -58,7 +58,7 @@ class XuiteIE(InfoExtractor):
              'ext': 'mp4',
              'title': '暗殺教室 02',
              'description': '字幕:【極影字幕社】',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'duration': 1384.907,
              'timestamp': 1421481240,
              'upload_date': '20150117',
diff --git a/youtube_dl/extractor/yesjapan.py b/youtube_dl/extractor/yesjapan.py

index 112a6c030138e6c7d0e58619d40f3af012e13362..681338c96a2c743362773e6ea036a5dc64824326 100644 (file)
--- a/youtube_dl/extractor/yesjapan.py
+++ b/youtube_dl/extractor/yesjapan.py
@@ -21,7 +21,7 @@ class YesJapanIE(InfoExtractor):
              'ext': 'mp4',
              'timestamp': 1416391590,
              'upload_date': '20141119',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          }
      }
  
diff --git a/youtube_dl/extractor/yinyuetai.py b/youtube_dl/extractor/yinyuetai.py

index 834d860af32871678f8fef5afd345353273ebc19..1fd8d35c637224a8609b23c87db3206a24987a63 100644 (file)
--- a/youtube_dl/extractor/yinyuetai.py
+++ b/youtube_dl/extractor/yinyuetai.py
@@ -18,7 +18,7 @@ class YinYueTaiIE(InfoExtractor):
              'title': '少女时代_PARTY_Music Video Teaser',
              'creator': '少女时代',
              'duration': 25,
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          },
      }, {
          'url': 'http://v.yinyuetai.com/video/h5/2322376',
diff --git a/youtube_dl/extractor/ynet.py b/youtube_dl/extractor/ynet.py

index 0d943c3432a57570afc229ab4efce66b2118f763..c4ae4d88eb0f64dd24d63c7e882ab49552a962c2 100644 (file)
--- a/youtube_dl/extractor/ynet.py
+++ b/youtube_dl/extractor/ynet.py
@@ -17,7 +17,7 @@ class YnetIE(InfoExtractor):
                  'id': 'L-11659-99244',
                  'ext': 'flv',
                  'title': 'איש לא יודע מאיפה באנו',
-                'thumbnail': 're:^https?://.*\.jpg',
+                'thumbnail': r're:^https?://.*\.jpg',
              }
          }, {
              'url': 'http://hot.ynet.co.il/home/0,7340,L-8859-84418,00.html',
@@ -25,7 +25,7 @@ class YnetIE(InfoExtractor):
                  'id': 'L-8859-84418',
                  'ext': 'flv',
                  'title': "צפו: הנשיקה הלוהטת של תורגי' ויוליה פלוטקין",
-                'thumbnail': 're:^https?://.*\.jpg',
+                'thumbnail': r're:^https?://.*\.jpg',
              }
          }
      ]
diff --git a/youtube_dl/extractor/youporn.py b/youtube_dl/extractor/youporn.py

index 0265a64a7d3c014001b2d0e81789f0e904b32d62..34ab878a4167580b50ef5b3d5bee94e58b02fe41 100644 (file)
--- a/youtube_dl/extractor/youporn.py
+++ b/youtube_dl/extractor/youporn.py
@@ -24,7 +24,7 @@ class YouPornIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Sex Ed: Is It Safe To Masturbate Daily?',
              'description': 'Love & Sex Answers: http://bit.ly/DanAndJenn -- Is It Unhealthy To Masturbate Daily?',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'uploader': 'Ask Dan And Jennifer',
              'upload_date': '20101221',
              'average_rating': int,
@@ -43,7 +43,7 @@ class YouPornIE(InfoExtractor):
              'ext': 'mp4',
              'title': 'Big Tits Awesome Brunette On amazing webcam show',
              'description': 'http://sweetlivegirls.com Big Tits Awesome Brunette On amazing webcam show.mp4',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
              'uploader': 'Unknown',
              'upload_date': '20111125',
              'average_rating': int,
diff --git a/youtube_dl/extractor/yourupload.py b/youtube_dl/extractor/yourupload.py

index 4e25d6f22312a0dca9f1997baa3bacd1c3fd263d..9fa77283899ceb005b6fed8f4ebcb626d38241ce 100644 (file)
--- a/youtube_dl/extractor/yourupload.py
+++ b/youtube_dl/extractor/yourupload.py
@@ -2,44 +2,37 @@
  from __future__ import unicode_literals
  
  from .common import InfoExtractor
+from ..utils import urljoin
  
  
  class YourUploadIE(InfoExtractor):
-    _VALID_URL = r'''(?x)https?://(?:www\.)?
-        (?:yourupload\.com/watch|
-           embed\.yourupload\.com|
-           embed\.yucache\.net
-        )/(?P<id>[A-Za-z0-9]+)
-        '''
-    _TESTS = [
-        {
-            'url': 'http://yourupload.com/watch/14i14h',
-            'md5': '5e2c63385454c557f97c4c4131a393cd',
-            'info_dict': {
-                'id': '14i14h',
-                'ext': 'mp4',
-                'title': 'BigBuckBunny_320x180.mp4',
-                'thumbnail': 're:^https?://.*\.jpe?g',
-            }
-        },
-        {
-            'url': 'http://embed.yourupload.com/14i14h',
-            'only_matching': True,
-        },
-        {
-            'url': 'http://embed.yucache.net/14i14h?client_file_id=803349',
-            'only_matching': True,
-        },
-    ]
+    _VALID_URL = r'https?://(?:www\.)?(?:yourupload\.com/(?:watch|embed)|embed\.yourupload\.com)/(?P<id>[A-Za-z0-9]+)'
+    _TESTS = [{
+        'url': 'http://yourupload.com/watch/14i14h',
+        'md5': '5e2c63385454c557f97c4c4131a393cd',
+        'info_dict': {
+            'id': '14i14h',
+            'ext': 'mp4',
+            'title': 'BigBuckBunny_320x180.mp4',
+            'thumbnail': r're:^https?://.*\.jpe?g',
+        }
+    }, {
+        'url': 'http://www.yourupload.com/embed/14i14h',
+        'only_matching': True,
+    }, {
+        'url': 'http://embed.yourupload.com/14i14h',
+        'only_matching': True,
+    }]
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
  
-        embed_url = 'http://embed.yucache.net/{0:}'.format(video_id)
+        embed_url = 'http://www.yourupload.com/embed/%s' % video_id
+
          webpage = self._download_webpage(embed_url, video_id)
  
          title = self._og_search_title(webpage)
-        video_url = self._og_search_video_url(webpage)
+        video_url = urljoin(embed_url, self._og_search_video_url(webpage))
          thumbnail = self._og_search_thumbnail(webpage, default=None)
  
          return {
diff --git a/youtube_dl/extractor/youtube.py b/youtube_dl/extractor/youtube.py

index 545246bcd74a8b94878d2b631d23029539e0438b..f2f75110445a6bad461fcd92d4ae3d26dd456d69 100644 (file)
--- a/youtube_dl/extractor/youtube.py
+++ b/youtube_dl/extractor/youtube.py
@@ -40,6 +40,7 @@
      sanitized_Request,
      smuggle_url,
      str_to_int,
+    try_get,
      unescapeHTML,
      unified_strdate,
      unsmuggle_url,
@@ -316,6 +317,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
          '137': {'ext': 'mp4', 'height': 1080, 'format_note': 'DASH video', 'vcodec': 'h264', 'preference': -40},
          '138': {'ext': 'mp4', 'format_note': 'DASH video', 'vcodec': 'h264', 'preference': -40},  # Height can vary (https://github.com/rg3/youtube-dl/issues/4559)
          '160': {'ext': 'mp4', 'height': 144, 'format_note': 'DASH video', 'vcodec': 'h264', 'preference': -40},
+        '212': {'ext': 'mp4', 'height': 480, 'format_note': 'DASH video', 'vcodec': 'h264', 'preference': -40},
          '264': {'ext': 'mp4', 'height': 1440, 'format_note': 'DASH video', 'vcodec': 'h264', 'preference': -40},
          '298': {'ext': 'mp4', 'height': 720, 'format_note': 'DASH video', 'vcodec': 'h264', 'fps': 60, 'preference': -40},
          '299': {'ext': 'mp4', 'height': 1080, 'format_note': 'DASH video', 'vcodec': 'h264', 'fps': 60, 'preference': -40},
@@ -376,12 +378,13 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                  'title': 'youtube-dl test video "\'/\\ä↭𝕐',
                  'uploader': 'Philipp Hagemeister',
                  'uploader_id': 'phihag',
-                'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/phihag',
+                'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/phihag',
                  'upload_date': '20121002',
                  'license': 'Standard YouTube License',
                  'description': 'test chars:  "\'/\\ä↭𝕐\ntest URL: https://github.com/rg3/youtube-dl/issues/1892\n\nThis is a test video for youtube-dl.\n\nFor more information, contact phihag@phihag.de .',
                  'categories': ['Science & Technology'],
                  'tags': ['youtube-dl'],
+                'duration': 10,
                  'like_count': int,
                  'dislike_count': int,
                  'start_time': 1,
@@ -401,9 +404,10 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                  'tags': ['Icona Pop i love it', 'sweden', 'pop music', 'big beat records', 'big beat', 'charli',
                           'xcx', 'charli xcx', 'girls', 'hbo', 'i love it', "i don't care", 'icona', 'pop',
                           'iconic ep', 'iconic', 'love', 'it'],
+                'duration': 180,
                  'uploader': 'Icona Pop',
                  'uploader_id': 'IconaPop',
-                'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/IconaPop',
+                'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/IconaPop',
                  'license': 'Standard YouTube License',
                  'creator': 'Icona Pop',
              }
@@ -418,9 +422,10 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                  'title': 'Justin Timberlake - Tunnel Vision (Explicit)',
                  'alt_title': 'Tunnel Vision',
                  'description': 'md5:64249768eec3bc4276236606ea996373',
+                'duration': 419,
                  'uploader': 'justintimberlakeVEVO',
                  'uploader_id': 'justintimberlakeVEVO',
-                'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/justintimberlakeVEVO',
+                'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/justintimberlakeVEVO',
                  'license': 'Standard YouTube License',
                  'creator': 'Justin Timberlake',
                  'age_limit': 18,
@@ -437,7 +442,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                  'description': 'md5:09b78bd971f1e3e289601dfba15ca4f7',
                  'uploader': 'SET India',
                  'uploader_id': 'setindia',
-                'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/setindia',
+                'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/setindia',
                  'license': 'Standard YouTube License',
                  'age_limit': 18,
              }
@@ -451,12 +456,13 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                  'title': 'youtube-dl test video "\'/\\ä↭𝕐',
                  'uploader': 'Philipp Hagemeister',
                  'uploader_id': 'phihag',
-                'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/phihag',
+                'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/phihag',
                  'upload_date': '20121002',
                  'license': 'Standard YouTube License',
                  'description': 'test chars:  "\'/\\ä↭𝕐\ntest URL: https://github.com/rg3/youtube-dl/issues/1892\n\nThis is a test video for youtube-dl.\n\nFor more information, contact phihag@phihag.de .',
                  'categories': ['Science & Technology'],
                  'tags': ['youtube-dl'],
+                'duration': 10,
                  'like_count': int,
                  'dislike_count': int,
              },
@@ -472,7 +478,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                  'ext': 'm4a',
                  'upload_date': '20121002',
                  'uploader_id': '8KVIDEO',
-                'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/8KVIDEO',
+                'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/8KVIDEO',
                  'description': '',
                  'uploader': '8KVIDEO',
                  'license': 'Standard YouTube License',
@@ -492,6 +498,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                  'ext': 'm4a',
                  'title': 'Afrojack, Spree Wilson - The Spark ft. Spree Wilson',
                  'description': 'md5:12e7067fa6735a77bdcbb58cb1187d2d',
+                'duration': 244,
                  'uploader': 'AfrojackVEVO',
                  'uploader_id': 'AfrojackVEVO',
                  'upload_date': '20131011',
@@ -511,6 +518,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                  'title': 'Taylor Swift - Shake It Off',
                  'alt_title': 'Shake It Off',
                  'description': 'md5:95f66187cd7c8b2c13eb78e1223b63c3',
+                'duration': 242,
                  'uploader': 'TaylorSwiftVEVO',
                  'uploader_id': 'TaylorSwiftVEVO',
                  'upload_date': '20140818',
@@ -528,10 +536,11 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
              'info_dict': {
                  'id': 'T4XJQO3qol8',
                  'ext': 'mp4',
+                'duration': 219,
                  'upload_date': '20100909',
                  'uploader': 'The Amazing Atheist',
                  'uploader_id': 'TheAmazingAtheist',
-                'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/TheAmazingAtheist',
+                'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/TheAmazingAtheist',
                  'license': 'Standard YouTube License',
                  'title': 'Burning Everyone\'s Koran',
                  'description': 'SUBSCRIBE: http://www.youtube.com/saturninefilms\n\nEven Obama has taken a stand against freedom on this issue: http://www.huffingtonpost.com/2010/09/09/obama-gma-interview-quran_n_710282.html',
@@ -544,10 +553,11 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                  'id': 'HtVdAasjOgU',
                  'ext': 'mp4',
                  'title': 'The Witcher 3: Wild Hunt - The Sword Of Destiny Trailer',
-                'description': 're:(?s).{100,}About the Game\n.*?The Witcher 3: Wild Hunt.{100,}',
+                'description': r're:(?s).{100,}About the Game\n.*?The Witcher 3: Wild Hunt.{100,}',
+                'duration': 142,
                  'uploader': 'The Witcher',
                  'uploader_id': 'WitcherGame',
-                'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/WitcherGame',
+                'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/WitcherGame',
                  'upload_date': '20140605',
                  'license': 'Standard YouTube License',
                  'age_limit': 18,
@@ -561,9 +571,10 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                  'ext': 'mp4',
                  'title': 'Dedication To My Ex (Miss That) (Lyric Video)',
                  'description': 'md5:33765bb339e1b47e7e72b5490139bb41',
+                'duration': 247,
                  'uploader': 'LloydVEVO',
                  'uploader_id': 'LloydVEVO',
-                'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/LloydVEVO',
+                'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/LloydVEVO',
                  'upload_date': '20110629',
                  'license': 'Standard YouTube License',
                  'age_limit': 18,
@@ -575,9 +586,10 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
              'info_dict': {
                  'id': '__2ABJjxzNo',
                  'ext': 'mp4',
+                'duration': 266,
                  'upload_date': '20100430',
                  'uploader_id': 'deadmau5',
-                'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/deadmau5',
+                'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/deadmau5',
                  'creator': 'deadmau5',
                  'description': 'md5:12c56784b8032162bb936a5f76d55360',
                  'uploader': 'deadmau5',
@@ -595,9 +607,10 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
              'info_dict': {
                  'id': 'lqQg6PlCWgI',
                  'ext': 'mp4',
+                'duration': 6085,
                  'upload_date': '20150827',
                  'uploader_id': 'olympic',
-                'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/olympic',
+                'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/olympic',
                  'license': 'Standard YouTube License',
                  'description': 'HO09  - Women -  GER-AUS - Hockey - 31 July 2012 - London 2012 Olympic Games',
                  'uploader': 'Olympic',
@@ -614,9 +627,10 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                  'id': '_b-2C3KPAM0',
                  'ext': 'mp4',
                  'stretched_ratio': 16 / 9.,
+                'duration': 85,
                  'upload_date': '20110310',
                  'uploader_id': 'AllenMeow',
-                'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/AllenMeow',
+                'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/AllenMeow',
                  'description': 'made by Wacom from Korea | 字幕&加油添醋 by TY\'s Allen | 感謝heylisa00cavey1001同學熱情提供梗及翻譯',
                  'uploader': '孫艾倫',
                  'license': 'Standard YouTube License',
@@ -648,9 +662,10 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                  'ext': 'mp4',
                  'title': 'md5:7b81415841e02ecd4313668cde88737a',
                  'description': 'md5:116377fd2963b81ec4ce64b542173306',
+                'duration': 220,
                  'upload_date': '20150625',
                  'uploader_id': 'dorappi2000',
-                'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/dorappi2000',
+                'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/dorappi2000',
                  'uploader': 'dorappi2000',
                  'license': 'Standard YouTube License',
                  'formats': 'mincount:32',
@@ -690,10 +705,11 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                      'ext': 'mp4',
                      'title': 'teamPGP: Rocket League Noob Stream (Main Camera)',
                      'description': 'md5:dc7872fb300e143831327f1bae3af010',
+                    'duration': 7335,
                      'upload_date': '20150721',
                      'uploader': 'Beer Games Beer',
                      'uploader_id': 'beergamesbeer',
-                    'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/beergamesbeer',
+                    'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/beergamesbeer',
                      'license': 'Standard YouTube License',
                  },
              }, {
@@ -702,10 +718,11 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                      'ext': 'mp4',
                      'title': 'teamPGP: Rocket League Noob Stream (kreestuh)',
                      'description': 'md5:dc7872fb300e143831327f1bae3af010',
+                    'duration': 7337,
                      'upload_date': '20150721',
                      'uploader': 'Beer Games Beer',
                      'uploader_id': 'beergamesbeer',
-                    'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/beergamesbeer',
+                    'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/beergamesbeer',
                      'license': 'Standard YouTube License',
                  },
              }, {
@@ -714,10 +731,11 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                      'ext': 'mp4',
                      'title': 'teamPGP: Rocket League Noob Stream (grizzle)',
                      'description': 'md5:dc7872fb300e143831327f1bae3af010',
+                    'duration': 7337,
                      'upload_date': '20150721',
                      'uploader': 'Beer Games Beer',
                      'uploader_id': 'beergamesbeer',
-                    'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/beergamesbeer',
+                    'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/beergamesbeer',
                      'license': 'Standard YouTube License',
                  },
              }, {
@@ -726,10 +744,11 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                      'ext': 'mp4',
                      'title': 'teamPGP: Rocket League Noob Stream (zim)',
                      'description': 'md5:dc7872fb300e143831327f1bae3af010',
+                    'duration': 7334,
                      'upload_date': '20150721',
                      'uploader': 'Beer Games Beer',
                      'uploader_id': 'beergamesbeer',
-                    'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/beergamesbeer',
+                    'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/beergamesbeer',
                      'license': 'Standard YouTube License',
                  },
              }],
@@ -767,9 +786,10 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                  'title': '{dark walk}; Loki/AC/Dishonored; collab w/Elflover21',
                  'alt_title': 'Dark Walk',
                  'description': 'md5:8085699c11dc3f597ce0410b0dcbb34a',
+                'duration': 133,
                  'upload_date': '20151119',
                  'uploader_id': 'IronSoulElf',
-                'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/IronSoulElf',
+                'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/IronSoulElf',
                  'uploader': 'IronSoulElf',
                  'license': 'Standard YouTube License',
                  'creator': 'Todd Haberman, Daniel Law Heath & Aaron Kaplan',
@@ -808,10 +828,11 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                  'ext': 'mp4',
                  'title': 'md5:e41008789470fc2533a3252216f1c1d1',
                  'description': 'md5:a677553cf0840649b731a3024aeff4cc',
+                'duration': 721,
                  'upload_date': '20150127',
                  'uploader_id': 'BerkmanCenter',
-                'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/BerkmanCenter',
-                'uploader': 'BerkmanCenter',
+                'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/BerkmanCenter',
+                'uploader': 'The Berkman Klein Center for Internet & Society',
                  'license': 'Creative Commons Attribution license (reuse allowed)',
              },
              'params': {
@@ -826,10 +847,11 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                  'ext': 'mp4',
                  'title': 'Democratic Socialism and Foreign Policy | Bernie Sanders',
                  'description': 'md5:dda0d780d5a6e120758d1711d062a867',
+                'duration': 4060,
                  'upload_date': '20151119',
                  'uploader': 'Bernie 2016',
                  'uploader_id': 'UCH1dpzjCEiGAt8CXkryhkZg',
-                'uploader_url': 're:https?://(?:www\.)?youtube\.com/channel/UCH1dpzjCEiGAt8CXkryhkZg',
+                'uploader_url': r're:https?://(?:www\.)?youtube\.com/channel/UCH1dpzjCEiGAt8CXkryhkZg',
                  'license': 'Creative Commons Attribution license (reuse allowed)',
              },
              'params': {
@@ -856,12 +878,42 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
                  'upload_date': '20150811',
                  'uploader': 'FlixMatrix',
                  'uploader_id': 'FlixMatrixKaravan',
-                'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/FlixMatrixKaravan',
+                'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/FlixMatrixKaravan',
                  'license': 'Standard YouTube License',
              },
              'params': {
                  'skip_download': True,
              },
+        },
+        {
+            # YouTube Red video with episode data
+            'url': 'https://www.youtube.com/watch?v=iqKdEhx-dD4',
+            'info_dict': {
+                'id': 'iqKdEhx-dD4',
+                'ext': 'mp4',
+                'title': 'Isolation - Mind Field (Ep 1)',
+                'description': 'md5:8013b7ddea787342608f63a13ddc9492',
+                'duration': 2085,
+                'upload_date': '20170118',
+                'uploader': 'Vsauce',
+                'uploader_id': 'Vsauce',
+                'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/Vsauce',
+                'license': 'Standard YouTube License',
+                'series': 'Mind Field',
+                'season_number': 1,
+                'episode_number': 1,
+            },
+            'params': {
+                'skip_download': True,
+            },
+            'expected_warnings': [
+                'Skipping DASH manifest',
+            ],
+        },
+        {
+            # itag 212
+            'url': '1t24XAntNCY',
+            'only_matching': True,
          }
      ]
  
@@ -976,8 +1028,9 @@ def _genslice(start, end, step):
  
      def _parse_sig_js(self, jscode):
          funcname = self._search_regex(
-            r'\.sig\|\|([a-zA-Z0-9$]+)\(', jscode,
-            'Initial JS player signature function name')
+            (r'(["\'])signature\1\s*,\s*(?P<sig>[a-zA-Z0-9$]+)\(',
+             r'\.sig\|\|(?P<sig>[a-zA-Z0-9$]+)\('),
+            jscode, 'Initial JS player signature function name', group='sig')
  
          jsi = JSInterpreter(jscode)
          initial_function = jsi.extract_function(funcname)
@@ -998,6 +1051,9 @@ def _decrypt_signature(self, s, video_id, player_url, age_gate=False):
  
          if player_url.startswith('//'):
              player_url = 'https:' + player_url
+        elif not re.match(r'https?://', player_url):
+            player_url = compat_urlparse.urljoin(
+                'https://www.youtube.com', player_url)
          try:
              player_id = (player_url, self._signature_cache_id(s))
              if player_id not in self._player_cache:
@@ -1448,6 +1504,16 @@ def add_dash_mpd(video_info):
          else:
              video_alt_title = video_creator = None
  
+        m_episode = re.search(
+            r'<div[^>]+id="watch7-headline"[^>]*>\s*<span[^>]*>.*?>(?P<series>[^<]+)</a></b>\s*S(?P<season>\d+)\s*•\s*E(?P<episode>\d+)</span>',
+            video_webpage)
+        if m_episode:
+            series = m_episode.group('series')
+            season_number = int(m_episode.group('season'))
+            episode_number = int(m_episode.group('episode'))
+        else:
+            series = season_number = episode_number = None
+
          m_cat_container = self._search_regex(
              r'(?s)<h4[^>]*>\s*Category\s*</h4>\s*<ul[^>]*>(.*?)</ul>',
              video_webpage, 'categories', default=None)
@@ -1476,11 +1542,11 @@ def _extract_count(count_name):
          video_subtitles = self.extract_subtitles(video_id, video_webpage)
          automatic_captions = self.extract_automatic_captions(video_id, video_webpage)
  
-        if 'length_seconds' not in video_info:
-            self._downloader.report_warning('unable to extract video duration')
-            video_duration = None
-        else:
-            video_duration = int(compat_urllib_parse_unquote_plus(video_info['length_seconds'][0]))
+        video_duration = try_get(
+            video_info, lambda x: int_or_none(x['length_seconds'][0]))
+        if not video_duration:
+            video_duration = parse_duration(self._html_search_meta(
+                'duration', video_webpage, 'video duration'))
  
          # annotations
          video_annotations = None
@@ -1737,6 +1803,9 @@ def decrypt_sig(mobj):
              'is_live': is_live,
              'start_time': start_time,
              'end_time': end_time,
+            'series': series,
+            'season_number': season_number,
+            'episode_number': episode_number,
          }
  
  
@@ -1788,15 +1857,15 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
                              youtu\.be/[0-9A-Za-z_-]{11}\?.*?\blist=
                          )
                          (
-                            (?:PL|LL|EC|UU|FL|RD|UL)?[0-9A-Za-z-_]{10,}
+                            (?:PL|LL|EC|UU|FL|RD|UL|TL)?[0-9A-Za-z-_]{10,}
                              # Top tracks, they can also include dots
                              |(?:MC)[\w\.]*
                          )
                          .*
                       |
-                        ((?:PL|LL|EC|UU|FL|RD|UL)[0-9A-Za-z-_]{10,})
+                        ((?:PL|LL|EC|UU|FL|RD|UL|TL)[0-9A-Za-z-_]{10,})
                       )"""
-    _TEMPLATE_URL = 'https://www.youtube.com/playlist?list=%s'
+    _TEMPLATE_URL = 'https://www.youtube.com/playlist?list=%s&disable_polymer=true'
      _VIDEO_RE = r'href="\s*/watch\?v=(?P<id>[0-9A-Za-z_-]{11})&amp;[^"]*?index=(?P<index>\d+)(?:[^>]+>(?P<title>[^<]+))?'
      IE_NAME = 'youtube:playlist'
      _TESTS = [{
@@ -1813,6 +1882,7 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
              'title': 'YDL_Empty_List',
          },
          'playlist_count': 0,
+        'skip': 'This playlist is private',
      }, {
          'note': 'Playlist with deleted videos (#651). As a bonus, the video #51 is also twice in this list.',
          'url': 'https://www.youtube.com/playlist?list=PLwP_SiAcdui0KVebT0mU9Apz359a4ubsC',
@@ -1844,6 +1914,7 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
              'id': 'PLtPgu7CB4gbY9oDN3drwC3cMbJggS7dKl',
          },
          'playlist_count': 2,
+        'skip': 'This playlist is private',
      }, {
          'note': 'embedded',
          'url': 'https://www.youtube.com/embed/videoseries?list=PL6IaIsEjSbf96XFRuNccS_RuEXwNdsoEu',
@@ -1877,7 +1948,7 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
              'title': "Smiley's People 01 detective, Adventure Series, Action",
              'uploader': 'STREEM',
              'uploader_id': 'UCyPhqAZgwYWZfxElWVbVJng',
-            'uploader_url': 're:https?://(?:www\.)?youtube\.com/channel/UCyPhqAZgwYWZfxElWVbVJng',
+            'uploader_url': r're:https?://(?:www\.)?youtube\.com/channel/UCyPhqAZgwYWZfxElWVbVJng',
              'upload_date': '20150526',
              'license': 'Standard YouTube License',
              'description': 'md5:507cdcb5a49ac0da37a920ece610be80',
@@ -1898,7 +1969,7 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
              'title': 'Small Scale Baler and Braiding Rugs',
              'uploader': 'Backus-Page House Museum',
              'uploader_id': 'backuspagemuseum',
-            'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/backuspagemuseum',
+            'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/backuspagemuseum',
              'upload_date': '20161008',
              'license': 'Standard YouTube License',
              'description': 'md5:800c0c78d5eb128500bffd4f0b4f2e8a',
@@ -1914,6 +1985,9 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
      }, {
          'url': 'https://youtu.be/uWyaPkt-VOI?list=PL9D9FC436B881BA21',
          'only_matching': True,
+    }, {
+        'url': 'TLGGrESM50VT6acwMjAyMjAxNw',
+        'only_matching': True,
      }]
  
      def _real_initialize(self):
@@ -1955,14 +2029,18 @@ def _extract_playlist(self, playlist_id):
          url = self._TEMPLATE_URL % playlist_id
          page = self._download_webpage(url, playlist_id)
  
-        for match in re.findall(r'<div class="yt-alert-message">([^<]+)</div>', page):
+        # the yt-alert-message now has tabindex attribute (see https://github.com/rg3/youtube-dl/issues/11604)
+        for match in re.findall(r'<div class="yt-alert-message"[^>]*>([^<]+)</div>', page):
              match = match.strip()
              # Check if the playlist exists or is private
-            if re.match(r'[^<]*(The|This) playlist (does not exist|is private)[^<]*', match):
-                raise ExtractorError(
-                    'The playlist doesn\'t exist or is private, use --username or '
-                    '--netrc to access it.',
-                    expected=True)
+            mobj = re.match(r'[^<]*(?:The|This) playlist (?P<reason>does not exist|is private)[^<]*', match)
+            if mobj:
+                reason = mobj.group('reason')
+                message = 'This playlist %s' % reason
+                if 'private' in reason:
+                    message += ', use --username or --netrc to access it'
+                message += '.'
+                raise ExtractorError(message, expected=True)
              elif re.match(r'[^<]*Invalid parameters[^<]*', match):
                  raise ExtractorError(
                      'Invalid parameters. Maybe URL is incorrect.',
@@ -2175,7 +2253,7 @@ def _build_template_url(self, url, channel_id):
  
  class YoutubeLiveIE(YoutubeBaseInfoExtractor):
      IE_DESC = 'YouTube.com live streams'
-    _VALID_URL = r'(?P<base_url>https?://(?:\w+\.)?youtube\.com/(?:user|channel|c)/(?P<id>[^/]+))/live'
+    _VALID_URL = r'(?P<base_url>https?://(?:\w+\.)?youtube\.com/(?:(?:user|channel|c)/)?(?P<id>[^/]+))/live'
      IE_NAME = 'youtube:live'
  
      _TESTS = [{
@@ -2186,7 +2264,7 @@ class YoutubeLiveIE(YoutubeBaseInfoExtractor):
              'title': 'The Young Turks - Live Main Show',
              'uploader': 'The Young Turks',
              'uploader_id': 'TheYoungTurks',
-            'uploader_url': 're:https?://(?:www\.)?youtube\.com/user/TheYoungTurks',
+            'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/TheYoungTurks',
              'upload_date': '20150715',
              'license': 'Standard YouTube License',
              'description': 'md5:438179573adcdff3c97ebb1ee632b891',
@@ -2204,6 +2282,9 @@ class YoutubeLiveIE(YoutubeBaseInfoExtractor):
      }, {
          'url': 'https://www.youtube.com/c/CommanderVideoHq/live',
          'only_matching': True,
+    }, {
+        'url': 'https://www.youtube.com/TheYoungTurks/live',
+        'only_matching': True,
      }]
  
      def _real_extract(self, url):
@@ -2267,18 +2348,18 @@ def _get_n_results(self, query, n):
          videos = []
          limit = n
  
+        url_query = {
+            'search_query': query.encode('utf-8'),
+        }
+        url_query.update(self._EXTRA_QUERY_ARGS)
+        result_url = 'https://www.youtube.com/results?' + compat_urllib_parse_urlencode(url_query)
+
          for pagenum in itertools.count(1):
-            url_query = {
-                'search_query': query.encode('utf-8'),
-                'page': pagenum,
-                'spf': 'navigate',
-            }
-            url_query.update(self._EXTRA_QUERY_ARGS)
-            result_url = 'https://www.youtube.com/results?' + compat_urllib_parse_urlencode(url_query)
              data = self._download_json(
                  result_url, video_id='query "%s"' % query,
                  note='Downloading page %s' % pagenum,
-                errnote='Unable to download API page')
+                errnote='Unable to download API page',
+                query={'spf': 'navigate'})
              html_content = data[1]['body']['content']
  
              if 'class="search-message' in html_content:
@@ -2290,6 +2371,12 @@ def _get_n_results(self, query, n):
              videos += new_videos
              if not new_videos or len(videos) > limit:
                  break
+            next_link = self._html_search_regex(
+                r'href="(/results\?[^"]*\bsp=[^"]+)"[^>]*>\s*<span[^>]+class="[^"]*\byt-uix-button-content\b[^"]*"[^>]*>Next',
+                html_content, 'next link', default=None)
+            if next_link is None:
+                break
+            result_url = compat_urlparse.urljoin('https://www.youtube.com/', next_link)
  
          if len(videos) > n:
              videos = videos[:n]
diff --git a/youtube_dl/extractor/zapiks.py b/youtube_dl/extractor/zapiks.py

index 22a9a57e882be49109c00036fa3559410b4e334f..bacb82eeeb2a549edbb0cbf6d0a67e07f28b595b 100644 (file)
--- a/youtube_dl/extractor/zapiks.py
+++ b/youtube_dl/extractor/zapiks.py
@@ -24,7 +24,7 @@ class ZapiksIE(InfoExtractor):
                  'ext': 'mp4',
                  'title': 'EP2S3 - Bon Appétit - Eh bé viva les pyrénées con!',
                  'description': 'md5:7054d6f6f620c6519be1fe710d4da847',
-                'thumbnail': 're:^https?://.*\.jpg$',
+                'thumbnail': r're:^https?://.*\.jpg$',
                  'duration': 528,
                  'timestamp': 1359044972,
                  'upload_date': '20130124',
diff --git a/youtube_dl/extractor/zdf.py b/youtube_dl/extractor/zdf.py

index 2ef17727592405b7bb20b378403d82470b52ce2f..a365923fbbadc093a484a972c85cf6070f1d2765 100644 (file)
--- a/youtube_dl/extractor/zdf.py
+++ b/youtube_dl/extractor/zdf.py
@@ -1,262 +1,312 @@
  # coding: utf-8
  from __future__ import unicode_literals
  
-import functools
  import re
  
  from .common import InfoExtractor
+from ..compat import compat_str
  from ..utils import (
-    int_or_none,
-    unified_strdate,
-    OnDemandPagedList,
-    xpath_text,
      determine_ext,
+    int_or_none,
+    NO_DEFAULT,
+    orderedSet,
+    parse_codecs,
      qualities,
-    float_or_none,
-    ExtractorError,
+    try_get,
+    unified_timestamp,
+    update_url_query,
+    urljoin,
  )
  
  
-class ZDFIE(InfoExtractor):
-    _VALID_URL = r'(?:zdf:|zdf:video:|https?://www\.zdf\.de/ZDFmediathek(?:#)?/(.*beitrag/(?:video/)?))(?P<id>[0-9]+)(?:/[^/?]+)?(?:\?.*)?'
+class ZDFBaseIE(InfoExtractor):
+    def _call_api(self, url, player, referrer, video_id):
+        return self._download_json(
+            url, video_id, 'Downloading JSON content',
+            headers={
+                'Referer': referrer,
+                'Api-Auth': 'Bearer %s' % player['apiToken'],
+            })
+
+    def _extract_player(self, webpage, video_id, fatal=True):
+        return self._parse_json(
+            self._search_regex(
+                r'(?s)data-zdfplayer-jsb=(["\'])(?P<json>{.+?})\1', webpage,
+                'player JSON', default='{}' if not fatal else NO_DEFAULT,
+                group='json'),
+            video_id)
+
+
+class ZDFIE(ZDFBaseIE):
+    _VALID_URL = r'https?://www\.zdf\.de/(?:[^/]+/)*(?P<id>[^/?]+)\.html'
+    _QUALITIES = ('auto', 'low', 'med', 'high', 'veryhigh')
  
      _TESTS = [{
-        'url': 'http://www.zdf.de/ZDFmediathek/beitrag/video/2037704/ZDFspezial---Ende-des-Machtpokers--?bc=sts;stt',
+        'url': 'https://www.zdf.de/service-und-hilfe/die-neue-zdf-mediathek/zdfmediathek-trailer-100.html',
          'info_dict': {
-            'id': '2037704',
-            'ext': 'webm',
-            'title': 'ZDFspezial - Ende des Machtpokers',
-            'description': 'Union und SPD haben sich auf einen Koalitionsvertrag geeinigt. Aber was bedeutet das für die Bürger? Sehen Sie hierzu das ZDFspezial "Ende des Machtpokers - Große Koalition für Deutschland".',
-            'duration': 1022,
-            'uploader': 'spezial',
-            'uploader_id': '225948',
-            'upload_date': '20131127',
-        },
-        'skip': 'Videos on ZDF.de are depublicised in short order',
+            'id': 'zdfmediathek-trailer-100',
+            'ext': 'mp4',
+            'title': 'Die neue ZDFmediathek',
+            'description': 'md5:3003d36487fb9a5ea2d1ff60beb55e8d',
+            'duration': 30,
+            'timestamp': 1477627200,
+            'upload_date': '20161028',
+        }
+    }, {
+        'url': 'https://www.zdf.de/filme/taunuskrimi/die-lebenden-und-die-toten-1---ein-taunuskrimi-100.html',
+        'only_matching': True,
+    }, {
+        'url': 'https://www.zdf.de/dokumentation/planet-e/planet-e-uebersichtsseite-weitere-dokumentationen-von-planet-e-100.html',
+        'only_matching': True,
      }]
  
-    def _parse_smil_formats(self, smil, smil_url, video_id, namespace=None, f4m_params=None, transform_rtmp_url=None):
-        param_groups = {}
-        for param_group in smil.findall(self._xpath_ns('./head/paramGroup', namespace)):
-            group_id = param_group.attrib.get(self._xpath_ns('id', 'http://www.w3.org/XML/1998/namespace'))
-            params = {}
-            for param in param_group:
-                params[param.get('name')] = param.get('value')
-            param_groups[group_id] = params
+    @staticmethod
+    def _extract_subtitles(src):
+        subtitles = {}
+        for caption in try_get(src, lambda x: x['captions'], list) or []:
+            subtitle_url = caption.get('uri')
+            if subtitle_url and isinstance(subtitle_url, compat_str):
+                lang = caption.get('language', 'deu')
+                subtitles.setdefault(lang, []).append({
+                    'url': subtitle_url,
+                })
+        return subtitles
+
+    def _extract_format(self, video_id, formats, format_urls, meta):
+        format_url = meta.get('url')
+        if not format_url or not isinstance(format_url, compat_str):
+            return
+        if format_url in format_urls:
+            return
+        format_urls.add(format_url)
+        mime_type = meta.get('mimeType')
+        ext = determine_ext(format_url)
+        if mime_type == 'application/x-mpegURL' or ext == 'm3u8':
+            formats.extend(self._extract_m3u8_formats(
+                format_url, video_id, 'mp4', m3u8_id='hls',
+                entry_protocol='m3u8_native', fatal=False))
+        elif mime_type == 'application/f4m+xml' or ext == 'f4m':
+            formats.extend(self._extract_f4m_formats(
+                update_url_query(format_url, {'hdcore': '3.7.0'}), video_id, f4m_id='hds', fatal=False))
+        else:
+            f = parse_codecs(meta.get('mimeCodec'))
+            format_id = ['http']
+            for p in (meta.get('type'), meta.get('quality')):
+                if p and isinstance(p, compat_str):
+                    format_id.append(p)
+            f.update({
+                'url': format_url,
+                'format_id': '-'.join(format_id),
+                'format_note': meta.get('quality'),
+                'language': meta.get('language'),
+                'quality': qualities(self._QUALITIES)(meta.get('quality')),
+                'preference': -10,
+            })
+            formats.append(f)
+
+    def _extract_entry(self, url, content, video_id):
+        title = content.get('title') or content['teaserHeadline']
+
+        t = content['mainVideoContent']['http://zdf.de/rels/target']
+
+        ptmd_path = t.get('http://zdf.de/rels/streams/ptmd')
+
+        if not ptmd_path:
+            ptmd_path = t[
+                'http://zdf.de/rels/streams/ptmd-template'].replace(
+                '{playerId}', 'portal')
+
+        ptmd = self._download_json(urljoin(url, ptmd_path), video_id)
  
          formats = []
-        for video in smil.findall(self._xpath_ns('.//video', namespace)):
-            src = video.get('src')
-            if not src:
+        track_uris = set()
+        for p in ptmd['priorityList']:
+            formitaeten = p.get('formitaeten')
+            if not isinstance(formitaeten, list):
                  continue
-            bitrate = float_or_none(video.get('system-bitrate') or video.get('systemBitrate'), 1000)
-            group_id = video.get('paramGroup')
-            param_group = param_groups[group_id]
-            for proto in param_group['protocols'].split(','):
-                formats.append({
-                    'url': '%s://%s' % (proto, param_group['host']),
-                    'app': param_group['app'],
-                    'play_path': src,
-                    'ext': 'flv',
-                    'format_id': '%s-%d' % (proto, bitrate),
-                    'tbr': bitrate,
-                })
+            for f in formitaeten:
+                f_qualities = f.get('qualities')
+                if not isinstance(f_qualities, list):
+                    continue
+                for quality in f_qualities:
+                    tracks = try_get(quality, lambda x: x['audio']['tracks'], list)
+                    if not tracks:
+                        continue
+                    for track in tracks:
+                        self._extract_format(
+                            video_id, formats, track_uris, {
+                                'url': track.get('uri'),
+                                'type': f.get('type'),
+                                'mimeType': f.get('mimeType'),
+                                'quality': quality.get('quality'),
+                                'language': track.get('language'),
+                            })
          self._sort_formats(formats)
-        return formats
-
-    def extract_from_xml_url(self, video_id, xml_url):
-        doc = self._download_xml(
-            xml_url, video_id,
-            note='Downloading video info',
-            errnote='Failed to download video info')
-
-        status_code = doc.find('./status/statuscode')
-        if status_code is not None and status_code.text != 'ok':
-            code = status_code.text
-            if code == 'notVisibleAnymore':
-                message = 'Video %s is not available' % video_id
-            else:
-                message = '%s returned error: %s' % (self.IE_NAME, code)
-            raise ExtractorError(message, expected=True)
-
-        title = doc.find('.//information/title').text
-        description = xpath_text(doc, './/information/detail', 'description')
-        duration = int_or_none(xpath_text(doc, './/details/lengthSec', 'duration'))
-        uploader = xpath_text(doc, './/details/originChannelTitle', 'uploader')
-        uploader_id = xpath_text(doc, './/details/originChannelId', 'uploader id')
-        upload_date = unified_strdate(xpath_text(doc, './/details/airtime', 'upload date'))
-        subtitles = {}
-        captions_url = doc.find('.//caption/url')
-        if captions_url is not None:
-            subtitles['de'] = [{
-                'url': captions_url.text,
-                'ext': 'ttml',
-            }]
-
-        def xml_to_thumbnails(fnode):
-            thumbnails = []
-            for node in fnode:
-                thumbnail_url = node.text
-                if not thumbnail_url:
+
+        thumbnails = []
+        layouts = try_get(
+            content, lambda x: x['teaserImageRef']['layouts'], dict)
+        if layouts:
+            for layout_key, layout_url in layouts.items():
+                if not isinstance(layout_url, compat_str):
                      continue
                  thumbnail = {
-                    'url': thumbnail_url,
+                    'url': layout_url,
+                    'format_id': layout_key,
                  }
-                if 'key' in node.attrib:
-                    m = re.match('^([0-9]+)x([0-9]+)$', node.attrib['key'])
-                    if m:
-                        thumbnail['width'] = int(m.group(1))
-                        thumbnail['height'] = int(m.group(2))
+                mobj = re.search(r'(?P<width>\d+)x(?P<height>\d+)', layout_key)
+                if mobj:
+                    thumbnail.update({
+                        'width': int(mobj.group('width')),
+                        'height': int(mobj.group('height')),
+                    })
                  thumbnails.append(thumbnail)
-            return thumbnails
  
-        thumbnails = xml_to_thumbnails(doc.findall('.//teaserimages/teaserimage'))
+        return {
+            'id': video_id,
+            'title': title,
+            'description': content.get('leadParagraph') or content.get('teasertext'),
+            'duration': int_or_none(t.get('duration')),
+            'timestamp': unified_timestamp(content.get('editorialDate')),
+            'thumbnails': thumbnails,
+            'subtitles': self._extract_subtitles(ptmd),
+            'formats': formats,
+        }
  
-        format_nodes = doc.findall('.//formitaeten/formitaet')
-        quality = qualities(['veryhigh', 'high', 'med', 'low'])
+    def _extract_regular(self, url, player, video_id):
+        content = self._call_api(player['content'], player, url, video_id)
+        return self._extract_entry(player['content'], content, video_id)
  
-        def get_quality(elem):
-            return quality(xpath_text(elem, 'quality'))
-        format_nodes.sort(key=get_quality)
-        format_ids = []
-        formats = []
-        for fnode in format_nodes:
-            video_url = fnode.find('url').text
-            is_available = 'http://www.metafilegenerator' not in video_url
-            if not is_available:
-                continue
-            format_id = fnode.attrib['basetype']
-            quality = xpath_text(fnode, './quality', 'quality')
-            format_m = re.match(r'''(?x)
-                (?P<vcodec>[^_]+)_(?P<acodec>[^_]+)_(?P<container>[^_]+)_
-                (?P<proto>[^_]+)_(?P<index>[^_]+)_(?P<indexproto>[^_]+)
-            ''', format_id)
-
-            ext = determine_ext(video_url, None) or format_m.group('container')
-            if ext not in ('smil', 'f4m', 'm3u8'):
-                format_id = format_id + '-' + quality
-            if format_id in format_ids:
-                continue
+    def _extract_mobile(self, video_id):
+        document = self._download_json(
+            'https://zdf-cdn.live.cellular.de/mediathekV2/document/%s' % video_id,
+            video_id)['document']
  
-            if ext == 'meta':
-                continue
-            elif ext == 'smil':
-                formats.extend(self._extract_smil_formats(
-                    video_url, video_id, fatal=False))
-            elif ext == 'm3u8':
-                # the certificates are misconfigured (see
-                # https://github.com/rg3/youtube-dl/issues/8665)
-                if video_url.startswith('https://'):
-                    continue
-                formats.extend(self._extract_m3u8_formats(
-                    video_url, video_id, 'mp4', m3u8_id=format_id, fatal=False))
-            elif ext == 'f4m':
-                formats.extend(self._extract_f4m_formats(
-                    video_url, video_id, f4m_id=format_id, fatal=False))
-            else:
-                proto = format_m.group('proto').lower()
-
-                abr = int_or_none(xpath_text(fnode, './audioBitrate', 'abr'), 1000)
-                vbr = int_or_none(xpath_text(fnode, './videoBitrate', 'vbr'), 1000)
-
-                width = int_or_none(xpath_text(fnode, './width', 'width'))
-                height = int_or_none(xpath_text(fnode, './height', 'height'))
-
-                filesize = int_or_none(xpath_text(fnode, './filesize', 'filesize'))
-
-                format_note = ''
-                if not format_note:
-                    format_note = None
-
-                formats.append({
-                    'format_id': format_id,
-                    'url': video_url,
-                    'ext': ext,
-                    'acodec': format_m.group('acodec'),
-                    'vcodec': format_m.group('vcodec'),
-                    'abr': abr,
-                    'vbr': vbr,
-                    'width': width,
-                    'height': height,
-                    'filesize': filesize,
-                    'format_note': format_note,
-                    'protocol': proto,
-                    '_available': is_available,
-                })
-            format_ids.append(format_id)
+        title = document['titel']
  
+        formats = []
+        format_urls = set()
+        for f in document['formitaeten']:
+            self._extract_format(video_id, formats, format_urls, f)
          self._sort_formats(formats)
  
+        thumbnails = []
+        teaser_bild = document.get('teaserBild')
+        if isinstance(teaser_bild, dict):
+            for thumbnail_key, thumbnail in teaser_bild.items():
+                thumbnail_url = try_get(
+                    thumbnail, lambda x: x['url'], compat_str)
+                if thumbnail_url:
+                    thumbnails.append({
+                        'url': thumbnail_url,
+                        'id': thumbnail_key,
+                        'width': int_or_none(thumbnail.get('width')),
+                        'height': int_or_none(thumbnail.get('height')),
+                    })
+
          return {
              'id': video_id,
              'title': title,
-            'description': description,
-            'duration': duration,
+            'description': document.get('beschreibung'),
+            'duration': int_or_none(document.get('length')),
+            'timestamp': unified_timestamp(try_get(
+                document, lambda x: x['meta']['editorialDate'], compat_str)),
              'thumbnails': thumbnails,
-            'uploader': uploader,
-            'uploader_id': uploader_id,
-            'upload_date': upload_date,
+            'subtitles': self._extract_subtitles(document),
              'formats': formats,
-            'subtitles': subtitles,
          }
  
      def _real_extract(self, url):
          video_id = self._match_id(url)
-        xml_url = 'http://www.zdf.de/ZDFmediathek/xmlservice/web/beitragsDetails?ak=web&id=%s' % video_id
-        return self.extract_from_xml_url(video_id, xml_url)
  
+        webpage = self._download_webpage(url, video_id, fatal=False)
+        if webpage:
+            player = self._extract_player(webpage, url, fatal=False)
+            if player:
+                return self._extract_regular(url, player, video_id)
+
+        return self._extract_mobile(video_id)
  
-class ZDFChannelIE(InfoExtractor):
-    _VALID_URL = r'(?:zdf:topic:|https?://www\.zdf\.de/ZDFmediathek(?:#)?/.*kanaluebersicht/(?:[^/]+/)?)(?P<id>[0-9]+)'
+
+class ZDFChannelIE(ZDFBaseIE):
+    _VALID_URL = r'https?://www\.zdf\.de/(?:[^/]+/)*(?P<id>[^/?#&]+)'
      _TESTS = [{
-        'url': 'http://www.zdf.de/ZDFmediathek#/kanaluebersicht/1586442/sendung/Titanic',
+        'url': 'https://www.zdf.de/sport/das-aktuelle-sportstudio',
          'info_dict': {
-            'id': '1586442',
+            'id': 'das-aktuelle-sportstudio',
+            'title': 'das aktuelle sportstudio | ZDF',
          },
-        'playlist_count': 3,
-    }, {
-        'url': 'http://www.zdf.de/ZDFmediathek/kanaluebersicht/aktuellste/332',
-        'only_matching': True,
+        'playlist_count': 21,
      }, {
-        'url': 'http://www.zdf.de/ZDFmediathek/kanaluebersicht/meist-gesehen/332',
-        'only_matching': True,
+        'url': 'https://www.zdf.de/dokumentation/planet-e',
+        'info_dict': {
+            'id': 'planet-e',
+            'title': 'planet e.',
+        },
+        'playlist_count': 4,
      }, {
-        'url': 'http://www.zdf.de/ZDFmediathek/kanaluebersicht/_/1798716?bc=nrt;nrm?flash=off',
+        'url': 'https://www.zdf.de/filme/taunuskrimi/',
          'only_matching': True,
      }]
-    _PAGE_SIZE = 50
-
-    def _fetch_page(self, channel_id, page):
-        offset = page * self._PAGE_SIZE
-        xml_url = (
-            'http://www.zdf.de/ZDFmediathek/xmlservice/web/aktuellste?ak=web&offset=%d&maxLength=%d&id=%s'
-            % (offset, self._PAGE_SIZE, channel_id))
-        doc = self._download_xml(
-            xml_url, channel_id,
-            note='Downloading channel info',
-            errnote='Failed to download channel info')
-
-        title = doc.find('.//information/title').text
-        description = doc.find('.//information/detail').text
-        for asset in doc.findall('.//teasers/teaser'):
-            a_type = asset.find('./type').text
-            a_id = asset.find('./details/assetId').text
-            if a_type not in ('video', 'topic'):
-                continue
-            yield {
-                '_type': 'url',
-                'playlist_title': title,
-                'playlist_description': description,
-                'url': 'zdf:%s:%s' % (a_type, a_id),
-            }
+
+    @classmethod
+    def suitable(cls, url):
+        return False if ZDFIE.suitable(url) else super(ZDFChannelIE, cls).suitable(url)
  
      def _real_extract(self, url):
          channel_id = self._match_id(url)
-        entries = OnDemandPagedList(
-            functools.partial(self._fetch_page, channel_id), self._PAGE_SIZE)
  
-        return {
-            '_type': 'playlist',
-            'id': channel_id,
-            'entries': entries,
-        }
+        webpage = self._download_webpage(url, channel_id)
+
+        entries = [
+            self.url_result(item_url, ie=ZDFIE.ie_key())
+            for item_url in orderedSet(re.findall(
+                r'data-plusbar-url=["\'](http.+?\.html)', webpage))]
+
+        return self.playlist_result(
+            entries, channel_id, self._og_search_title(webpage, fatal=False))
+
+        r"""
+        player = self._extract_player(webpage, channel_id)
+
+        channel_id = self._search_regex(
+            r'docId\s*:\s*(["\'])(?P<id>(?!\1).+?)\1', webpage,
+            'channel id', group='id')
+
+        channel = self._call_api(
+            'https://api.zdf.de/content/documents/%s.json' % channel_id,
+            player, url, channel_id)
+
+        items = []
+        for module in channel['module']:
+            for teaser in try_get(module, lambda x: x['teaser'], list) or []:
+                t = try_get(
+                    teaser, lambda x: x['http://zdf.de/rels/target'], dict)
+                if not t:
+                    continue
+                items.extend(try_get(
+                    t,
+                    lambda x: x['resultsWithVideo']['http://zdf.de/rels/search/results'],
+                    list) or [])
+            items.extend(try_get(
+                module,
+                lambda x: x['filterRef']['resultsWithVideo']['http://zdf.de/rels/search/results'],
+                list) or [])
+
+        entries = []
+        entry_urls = set()
+        for item in items:
+            t = try_get(item, lambda x: x['http://zdf.de/rels/target'], dict)
+            if not t:
+                continue
+            sharing_url = t.get('http://zdf.de/rels/sharing-url')
+            if not sharing_url or not isinstance(sharing_url, compat_str):
+                continue
+            if sharing_url in entry_urls:
+                continue
+            entry_urls.add(sharing_url)
+            entries.append(self.url_result(
+                sharing_url, ie=ZDFIE.ie_key(), video_id=t.get('id')))
+
+        return self.playlist_result(entries, channel_id, channel.get('title'))
+        """
diff --git a/youtube_dl/extractor/zingmp3.py b/youtube_dl/extractor/zingmp3.py

index 0f0e9d0eb9b1ac945934b11a134d143d82b19fb0..adfdcaabf6cb32ba9671db628007eeecbeb31b49 100644 (file)
--- a/youtube_dl/extractor/zingmp3.py
+++ b/youtube_dl/extractor/zingmp3.py
@@ -95,7 +95,7 @@ class ZingMp3IE(ZingMp3BaseInfoExtractor):
              'id': 'ZWZB9WAB',
              'title': 'Xa Mãi Xa',
              'ext': 'mp3',
-            'thumbnail': 're:^https?://.*\.jpg$',
+            'thumbnail': r're:^https?://.*\.jpg$',
          },
      }, {
          'url': 'http://mp3.zing.vn/video-clip/Let-It-Go-Frozen-OST-Sungha-Jung/ZW6BAEA0.html',
diff --git a/youtube_dl/jsinterp.py b/youtube_dl/jsinterp.py

index 9737f70021d3285a4e8df616467b764de1a91fa2..24cdec28c6cb2332232212d6bcf39d03edc27c7a 100644 (file)
--- a/youtube_dl/jsinterp.py
+++ b/youtube_dl/jsinterp.py
@@ -198,12 +198,12 @@ def interpret_expression(self, expr, local_vars, allow_recursion):
              return opfunc(x, y)
  
          m = re.match(
-            r'^(?P<func>%s)\((?P<args>[a-zA-Z0-9_$,]+)\)$' % _NAME_RE, expr)
+            r'^(?P<func>%s)\((?P<args>[a-zA-Z0-9_$,]*)\)$' % _NAME_RE, expr)
          if m:
              fname = m.group('func')
              argvals = tuple([
                  int(v) if v.isdigit() else local_vars[v]
-                for v in m.group('args').split(',')])
+                for v in m.group('args').split(',')]) if len(m.group('args')) > 0 else tuple()
              if fname not in self._functions:
                  self._functions[fname] = self.extract_function(fname)
              return self._functions[fname](argvals)
@@ -213,7 +213,7 @@ def interpret_expression(self, expr, local_vars, allow_recursion):
      def extract_object(self, objname):
          obj = {}
          obj_m = re.search(
-            (r'(?:var\s+)?%s\s*=\s*\{' % re.escape(objname)) +
+            (r'(?<!this\.)%s\s*=\s*\{' % re.escape(objname)) +
              r'\s*(?P<fields>([a-zA-Z$0-9]+\s*:\s*function\(.*?\)\s*\{.*?\}(?:,\s*)?)*)' +
              r'\}\s*;',
              self.code)
diff --git a/youtube_dl/options.py b/youtube_dl/options.py

index 53497fbc6f60a945b6350ce36e352a8eb6ef1f2c..3abf621c090484139d393516bd2554b6651989fe 100644 (file)
--- a/youtube_dl/options.py
+++ b/youtube_dl/options.py
@@ -178,6 +178,10 @@ def _scrub_eq(o):
          'When given in the global configuration file /etc/youtube-dl.conf: '
          'Do not read the user configuration in ~/.config/youtube-dl/config '
          '(%APPDATA%/youtube-dl/config.txt on Windows)')
+    general.add_option(
+        '--config-location',
+        dest='config_location', metavar='PATH',
+        help='Location of the configuration file; either the path to the config or its containing directory.')
      general.add_option(
          '--flat-playlist',
          action='store_const', dest='extract_flat', const='in_playlist',
@@ -212,23 +216,23 @@ def _scrub_eq(o):
      network.add_option(
          '--source-address',
          metavar='IP', dest='source_address', default=None,
-        help='Client-side IP address to bind to (experimental)',
+        help='Client-side IP address to bind to',
      )
      network.add_option(
          '-4', '--force-ipv4',
          action='store_const', const='0.0.0.0', dest='source_address',
-        help='Make all connections via IPv4 (experimental)',
+        help='Make all connections via IPv4',
      )
      network.add_option(
          '-6', '--force-ipv6',
          action='store_const', const='::', dest='source_address',
-        help='Make all connections via IPv6 (experimental)',
+        help='Make all connections via IPv6',
      )
      network.add_option(
          '--geo-verification-proxy',
          dest='geo_verification_proxy', default=None, metavar='URL',
          help='Use this proxy to verify the IP address for some geo-restricted sites. '
-        'The default proxy specified by --proxy (or none, if the options is not present) is used for the actual downloading. (experimental)'
+        'The default proxy specified by --proxy (or none, if the options is not present) is used for the actual downloading.'
      )
      network.add_option(
          '--cn-verification-proxy',
@@ -293,7 +297,7 @@ def _scrub_eq(o):
          '--match-filter',
          metavar='FILTER', dest='match_filter', default=None,
          help=(
-            'Generic video filter (experimental). '
+            'Generic video filter. '
              'Specify any key (see help for -o for a list of available keys) to'
              ' match if the key is present, '
              '!key to check if the key is not present,'
@@ -341,7 +345,7 @@ def _scrub_eq(o):
      authentication.add_option(
          '-2', '--twofactor',
          dest='twofactor', metavar='TWOFACTOR',
-        help='Two-factor auth code')
+        help='Two-factor authentication code')
      authentication.add_option(
          '-n', '--netrc',
          action='store_true', dest='usenetrc', default=False,
@@ -446,7 +450,7 @@ def _scrub_eq(o):
          '--skip-unavailable-fragments',
          action='store_true', dest='skip_unavailable_fragments', default=True,
          help='Skip unavailable fragments (DASH and hlsnative only)')
-    general.add_option(
+    downloader.add_option(
          '--abort-on-unavailable-fragment',
          action='store_false', dest='skip_unavailable_fragments',
          help='Abort downloading when some fragment is not available')
@@ -469,7 +473,7 @@ def _scrub_eq(o):
      downloader.add_option(
          '--xattr-set-filesize',
          dest='xattr_set_filesize', action='store_true',
-        help='Set file xattribute ytdl.filesize with expected filesize (experimental)')
+        help='Set file xattribute ytdl.filesize with expected file size (experimental)')
      downloader.add_option(
          '--hls-prefer-native',
          dest='hls_prefer_native', action='store_true', default=None,
@@ -657,8 +661,12 @@ def _scrub_eq(o):
          help=('Output filename template, see the "OUTPUT TEMPLATE" for all the info'))
      filesystem.add_option(
          '--autonumber-size',
-        dest='autonumber_size', metavar='NUMBER',
-        help='Specify the number of digits in %(autonumber)s when it is present in output filename template or --auto-number option is given')
+        dest='autonumber_size', metavar='NUMBER', default=5, type=int,
+        help='Specify the number of digits in %(autonumber)s when it is present in output filename template or --auto-number option is given (default is %default)')
+    filesystem.add_option(
+        '--autonumber-start',
+        dest='autonumber_start', metavar='NUMBER', default=1, type=int,
+        help='Specify the start value for %(autonumber)s (default is %default)')
      filesystem.add_option(
          '--restrict-filenames',
          action='store_true', dest='restrictfilenames', default=False,
@@ -747,7 +755,7 @@ def _scrub_eq(o):
          help='Convert video files to audio-only files (requires ffmpeg or avconv and ffprobe or avprobe)')
      postproc.add_option(
          '--audio-format', metavar='FORMAT', dest='audioformat', default='best',
-        help='Specify audio format: "best", "aac", "vorbis", "mp3", "m4a", "opus", or "wav"; "%default" by default')
+        help='Specify audio format: "best", "aac", "vorbis", "mp3", "m4a", "opus", or "wav"; "%default" by default; No effect without -x')
      postproc.add_option(
          '--audio-quality', metavar='QUALITY',
          dest='audioquality', default='5',
@@ -845,22 +853,32 @@ def compat_conf(conf):
              return conf
  
          command_line_conf = compat_conf(sys.argv[1:])
-
-        if '--ignore-config' in command_line_conf:
-            system_conf = []
-            user_conf = []
+        opts, args = parser.parse_args(command_line_conf)
+
+        system_conf = user_conf = custom_conf = []
+
+        if '--config-location' in command_line_conf:
+            location = compat_expanduser(opts.config_location)
+            if os.path.isdir(location):
+                location = os.path.join(location, 'youtube-dl.conf')
+            if not os.path.exists(location):
+                parser.error('config-location %s does not exist.' % location)
+            custom_conf = _readOptions(location)
+        elif '--ignore-config' in command_line_conf:
+            pass
          else:
              system_conf = _readOptions('/etc/youtube-dl.conf')
-            if '--ignore-config' in system_conf:
-                user_conf = []
-            else:
+            if '--ignore-config' not in system_conf:
                  user_conf = _readUserConf()
-        argv = system_conf + user_conf + command_line_conf
  
+        argv = system_conf + user_conf + custom_conf + command_line_conf
          opts, args = parser.parse_args(argv)
          if opts.verbose:
-            write_string('[debug] System config: ' + repr(_hide_login_info(system_conf)) + '\n')
-            write_string('[debug] User config: ' + repr(_hide_login_info(user_conf)) + '\n')
-            write_string('[debug] Command-line args: ' + repr(_hide_login_info(command_line_conf)) + '\n')
+            for conf_label, conf in (
+                    ('System config', system_conf),
+                    ('User config', user_conf),
+                    ('Custom config', custom_conf),
+                    ('Command-line args', command_line_conf)):
+                write_string('[debug] %s: %s\n' % (conf_label, repr(_hide_login_info(conf))))
  
      return parser, opts, args
diff --git a/youtube_dl/postprocessor/metadatafromtitle.py b/youtube_dl/postprocessor/metadatafromtitle.py

index 920573da9d8f472b8fdd8681cab0be1c6331afb7..164edd3a820af4d0c3d1af48b9cf81a6b5460e9b 100644 (file)
--- a/youtube_dl/postprocessor/metadatafromtitle.py
+++ b/youtube_dl/postprocessor/metadatafromtitle.py
@@ -12,7 +12,7 @@ def __init__(self, downloader, titleformat):
          self._titleregex = self.format_to_regex(titleformat)
  
      def format_to_regex(self, fmt):
-        """
+        r"""
          Converts a string like
             '%(title)s - %(artist)s'
          to a regex like
diff --git a/youtube_dl/socks.py b/youtube_dl/socks.py

index 104807242bd3b0f35e0423faa096540be29d0a45..0f5d7bdb2128b17c2e1dba3144ff01d9b3d2f06a 100644 (file)
--- a/youtube_dl/socks.py
+++ b/youtube_dl/socks.py
@@ -55,12 +55,12 @@ class Socks5AddressType(object):
      ATYP_IPV6 = 0x04
  
  
-class ProxyError(IOError):
+class ProxyError(socket.error):
      ERR_SUCCESS = 0x00
  
      def __init__(self, code=None, msg=None):
          if code is not None and msg is None:
-            msg = self.CODES.get(code) and 'unknown error'
+            msg = self.CODES.get(code) or 'unknown error'
          super(ProxyError, self).__init__(code, msg)
  
  
@@ -103,6 +103,7 @@ class ProxyType(object):
      SOCKS4A = 1
      SOCKS5 = 2
  
+
  Proxy = collections.namedtuple('Proxy', (
      'type', 'host', 'port', 'username', 'password', 'remote_dns'))
  
@@ -122,7 +123,7 @@ def recvall(self, cnt):
          while len(data) < cnt:
              cur = self.recv(cnt - len(data))
              if not cur:
-                raise IOError('{0} bytes missing'.format(cnt - len(data)))
+                raise EOFError('{0} bytes missing'.format(cnt - len(data)))
              data += cur
          return data
  
diff --git a/youtube_dl/swfinterp.py b/youtube_dl/swfinterp.py

index 7cf490aa43a878b3c377bea0b173c7a2b170c2c7..0c71585753134e93fba8d8de5cee003d31f050c9 100644 (file)
--- a/youtube_dl/swfinterp.py
+++ b/youtube_dl/swfinterp.py
@@ -115,6 +115,8 @@ def _u30(reader):
      res = _read_int(reader)
      assert res & 0xf0000000 == 0
      return res
+
+
  _u32 = _read_int
  
  
@@ -176,6 +178,7 @@ def __str__(self):
          return 'undefined'
      __repr__ = __str__
  
+
  undefined = _Undefined()
  
  
diff --git a/youtube_dl/utils.py b/youtube_dl/utils.py

index 9595bcf9f120ea4d24133e3f7399e637d14ac035..67a847ebad8238fc4f368f46b336b80e6caa3673 100644 (file)
--- a/youtube_dl/utils.py
+++ b/youtube_dl/utils.py
@@ -86,6 +86,11 @@ def register_socks_protocols():
  }
  
  
+USER_AGENTS = {
+    'Safari': 'Mozilla/5.0 (X11; Linux x86_64; rv:10.0) AppleWebKit/533.20.25 (KHTML, like Gecko) Version/5.0.4 Safari/533.20.27',
+}
+
+
  NO_DEFAULT = object()
  
  ENGLISH_MONTH_NAMES = [
@@ -123,7 +128,13 @@ def register_socks_protocols():
      '%d %B %Y',
      '%d %b %Y',
      '%B %d %Y',
+    '%B %dst %Y',
+    '%B %dnd %Y',
+    '%B %dth %Y',
      '%b %d %Y',
+    '%b %dst %Y',
+    '%b %dnd %Y',
+    '%b %dth %Y',
      '%b %dst %Y %I:%M',
      '%b %dnd %Y %I:%M',
      '%b %dth %Y %I:%M',
@@ -132,6 +143,7 @@ def register_socks_protocols():
      '%Y/%m/%d',
      '%Y/%m/%d %H:%M',
      '%Y/%m/%d %H:%M:%S',
+    '%Y-%m-%d %H:%M',
      '%Y-%m-%d %H:%M:%S',
      '%Y-%m-%d %H:%M:%S.%f',
      '%d.%m.%Y %H:%M',
@@ -496,7 +508,7 @@ def sanitize_path(s):
      if drive_or_unc:
          norm_path.pop(0)
      sanitized_path = [
-        path_part if path_part in ['.', '..'] else re.sub('(?:[/<>:"\\|\\\\?\\*]|[\s.]$)', '#', path_part)
+        path_part if path_part in ['.', '..'] else re.sub(r'(?:[/<>:"\|\\?\*]|[\s.]$)', '#', path_part)
          for path_part in norm_path]
      if drive_or_unc:
          sanitized_path.insert(0, drive_or_unc + os.path.sep)
@@ -1178,7 +1190,7 @@ def date_from_str(date_str):
          return today
      if date_str == 'yesterday':
          return today - datetime.timedelta(days=1)
-    match = re.match('(now|today)(?P<sign>[+-])(?P<time>\d+)(?P<unit>day|week|month|year)(s)?', date_str)
+    match = re.match(r'(now|today)(?P<sign>[+-])(?P<time>\d+)(?P<unit>day|week|month|year)(s)?', date_str)
      if match is not None:
          sign = match.group('sign')
          time = int(match.group('time'))
@@ -1695,6 +1707,16 @@ def base_url(url):
      return re.match(r'https?://[^?#&]+/', url).group()
  
  
+def urljoin(base, path):
+    if not isinstance(path, compat_str) or not path:
+        return None
+    if re.match(r'^(?:https?:)?//', path):
+        return path
+    if not isinstance(base, compat_str) or not re.match(r'^(?:https?:)?//', base):
+        return None
+    return compat_urlparse.urljoin(base, path)
+
+
  class HEADRequest(compat_urllib_request.Request):
      def get_method(self):
          return 'HEAD'
@@ -1751,7 +1773,7 @@ def parse_duration(s):
      s = s.strip()
  
      days, hours, mins, secs, ms = [None] * 5
-    m = re.match(r'(?:(?:(?:(?P<days>[0-9]+):)?(?P<hours>[0-9]+):)?(?P<mins>[0-9]+):)?(?P<secs>[0-9]+)(?P<ms>\.[0-9]+)?$', s)
+    m = re.match(r'(?:(?:(?:(?P<days>[0-9]+):)?(?P<hours>[0-9]+):)?(?P<mins>[0-9]+):)?(?P<secs>[0-9]+)(?P<ms>\.[0-9]+)?Z?$', s)
      if m:
          days, hours, mins, secs, ms = m.groups()
      else:
@@ -1768,11 +1790,11 @@ def parse_duration(s):
                  )?
                  (?:
                      (?P<secs>[0-9]+)(?P<ms>\.[0-9]+)?\s*s(?:ec(?:ond)?s?)?\s*
-                )?$''', s)
+                )?Z?$''', s)
          if m:
              days, hours, mins, secs, ms = m.groups()
          else:
-            m = re.match(r'(?i)(?:(?P<hours>[0-9.]+)\s*(?:hours?)|(?P<mins>[0-9.]+)\s*(?:mins?\.?|minutes?)\s*)$', s)
+            m = re.match(r'(?i)(?:(?P<hours>[0-9.]+)\s*(?:hours?)|(?P<mins>[0-9.]+)\s*(?:mins?\.?|minutes?)\s*)Z?$', s)
              if m:
                  hours, mins = m.groups()
              else:
@@ -2081,11 +2103,18 @@ def strip_jsonp(code):
  
  
  def js_to_json(code):
+    COMMENT_RE = r'/\*(?:(?!\*/).)*?\*/|//[^\n]*'
+    SKIP_RE = r'\s*(?:{comment})?\s*'.format(comment=COMMENT_RE)
+    INTEGER_TABLE = (
+        (r'(?s)^(0[xX][0-9a-fA-F]+){skip}:?$'.format(skip=SKIP_RE), 16),
+        (r'(?s)^(0+[0-7]+){skip}:?$'.format(skip=SKIP_RE), 8),
+    )
+
      def fix_kv(m):
          v = m.group(0)
          if v in ('true', 'false', 'null'):
              return v
-        elif v.startswith('/*') or v == ',':
+        elif v.startswith('/*') or v.startswith('//') or v == ',':
              return ""
  
          if v[0] in ("'", '"'):
@@ -2096,11 +2125,6 @@ def fix_kv(m):
                  '\\x': '\\u00',
              }.get(m.group(0), m.group(0)), v[1:-1])
  
-        INTEGER_TABLE = (
-            (r'^(0[xX][0-9a-fA-F]+)\s*:?$', 16),
-            (r'^(0+[0-7]+)\s*:?$', 8),
-        )
-
          for regex, base in INTEGER_TABLE:
              im = re.match(regex, v)
              if im:
@@ -2112,11 +2136,11 @@ def fix_kv(m):
      return re.sub(r'''(?sx)
          "(?:[^"\\]*(?:\\\\|\\['"nurtbfx/\n]))*[^"\\]*"|
          '(?:[^'\\]*(?:\\\\|\\['"nurtbfx/\n]))*[^'\\]*'|
-        /\*.*?\*/|,(?=\s*[\]}])|
+        {comment}|,(?={skip}[\]}}])|
          [a-zA-Z_][.a-zA-Z_0-9]*|
-        \b(?:0[xX][0-9a-fA-F]+|0+[0-7]+)(?:\s*:)?|
-        [0-9]+(?=\s*:)
-        ''', fix_kv, code)
+        \b(?:0[xX][0-9a-fA-F]+|0+[0-7]+)(?:{skip}:)?|
+        [0-9]+(?={skip}:)
+        '''.format(comment=COMMENT_RE, skip=SKIP_RE), fix_kv, code)
  
  
  def qualities(quality_ids):
diff --git a/youtube_dl/version.py b/youtube_dl/version.py

index 69df88c6e83690467baccefa5f32126929b7eb1d..0f9b6b703c752085861776cb9801594d1efad94e 100644 (file)
--- a/youtube_dl/version.py
+++ b/youtube_dl/version.py
@@ -1,3 +1,3 @@
  from __future__ import unicode_literals
  
-__version__ = '2016.11.08.1'
+__version__ = '2017.02.01'
author	Remita Amine <redacted>
	Fri, 3 Feb 2017 09:15:52 +0000 (10:15 +0100)
committer	Remita Amine <redacted>
	Fri, 3 Feb 2017 09:15:52 +0000 (10:15 +0100)