[DailyWire] Add extractors (#4084)

[yt-dlp.git] / CONTRIBUTING.md
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md

index ed4bf69d927f21ec143fe92bcadfd318157087c2..e48d2ebd0c7b1e71b43d7eeefe21874631858854 100644 (file)
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -11,6 +11,7 @@ # CONTRIBUTING TO YT-DLP
      - [Is anyone going to need the feature?](#is-anyone-going-to-need-the-feature)
      - [Is your question about yt-dlp?](#is-your-question-about-yt-dlp)
      - [Are you willing to share account details if needed?](#are-you-willing-to-share-account-details-if-needed)
+    - [Is the website primarily used for piracy](#is-the-website-primarily-used-for-piracy)
  - [DEVELOPER INSTRUCTIONS](#developer-instructions)
      - [Adding new feature or making overarching changes](#adding-new-feature-or-making-overarching-changes)
      - [Adding support for a new site](#adding-support-for-a-new-site)
@@ -24,6 +25,7 @@ # CONTRIBUTING TO YT-DLP
          - [Collapse fallbacks](#collapse-fallbacks)
          - [Trailing parentheses](#trailing-parentheses)
          - [Use convenience conversion and parsing functions](#use-convenience-conversion-and-parsing-functions)
+    - [My pull request is labeled pending-fixes](#my-pull-request-is-labeled-pending-fixes)
  - [EMBEDDING YT-DLP](README.md#embedding-yt-dlp)
  
  
@@ -113,7 +115,7 @@ ###  Is your question about yt-dlp?
  
  ### Are you willing to share account details if needed?
  
-The maintainers and potential contributors of the project often do not have an account for the website you are asking support for. So any developer interested in solving your issue may ask you for account details. It is your personal discression whether you are willing to share the account in order for the developer to try and solve your issue. However, if you are unwilling or unable to provide details, they obviously cannot work on the issue and it cannot be solved unless some developer who both has an account and is willing/able to contribute decides to solve it.
+The maintainers and potential contributors of the project often do not have an account for the website you are asking support for. So any developer interested in solving your issue may ask you for account details. It is your personal discretion whether you are willing to share the account in order for the developer to try and solve your issue. However, if you are unwilling or unable to provide details, they obviously cannot work on the issue and it cannot be solved unless some developer who both has an account and is willing/able to contribute decides to solve it.
  
  By sharing an account with anyone, you agree to bear all risks associated with it. The maintainers and yt-dlp can't be held responsible for any misuse of the credentials.
  
@@ -123,6 +125,10 @@ ### Are you willing to share account details if needed?
  - Change the password before sharing the account to something random (use [this](https://passwordsgenerator.net/) if you don't have a random password generator).
  - Change the password after receiving the account back.
  
+### Is the website primarily used for piracy?
+
+We follow [youtube-dl's policy](https://github.com/ytdl-org/youtube-dl#can-you-add-support-for-this-anime-video-site-or-site-which-shows-current-movies-for-free) to not support services that is primarily used for infringing copyright. Additionally, it has been decided to not to support porn sites that specialize in deep fake. We also cannot support any service that serves only [DRM protected content](https://en.wikipedia.org/wiki/Digital_rights_management). 
+
  
  
  
@@ -172,7 +178,6 @@ ## Adding support for a new site
  1. Start with this simple template and save it to `yt_dlp/extractor/yourextractor.py`:
  
      ```python
-    # coding: utf-8
      from .common import InfoExtractor
      
      
@@ -210,7 +215,7 @@ ## Adding support for a new site
              }
      ```
  1. Add an import in [`yt_dlp/extractor/extractors.py`](yt_dlp/extractor/extractors.py).
-1. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, the tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc. Note that tests with `only_matching` key in test's dict are not counted in. You can also run all the tests in one go with `TestDownload.test_YourExtractor_all`
+1. Run `python test/test_download.py TestDownload.test_YourExtractor` (note that `YourExtractor` doesn't end with `IE`). This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, the tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc. Note that tests with `only_matching` key in test's dict are not counted in. You can also run all the tests in one go with `TestDownload.test_YourExtractor_all`
  1. Make sure you have atleast one test for your extractor. Even if all videos covered by the extractor are expected to be inaccessible for automated testing, tests should still be added with a `skip` parameter indicating why the particular test is disabled from running.
  1. Have a look at [`yt_dlp/extractor/common.py`](yt_dlp/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](yt_dlp/extractor/common.py#L91-L426). Add tests and code for as many as you want.
  1. Make sure your code follows [yt-dlp coding conventions](#yt-dlp-coding-conventions) and check the code with [flake8](https://flake8.pycqa.org/en/latest/index.html#quickstart):
@@ -252,7 +257,11 @@ ### Mandatory and optional metafields
   - `title` (media title)
   - `url` (media download URL) or `formats`
  
-The aforementioned metafields are the critical data that the extraction does not make any sense without and if any of them fail to be extracted then the extractor is considered completely broken. While, in fact, only `id` is technically mandatory, due to compatibility reasons, yt-dlp also treats `title` as mandatory. The extractor is allowed to return the info dict without url or formats in some special cases if it allows the user to extract usefull information with `--ignore-no-formats-error` - Eg: when the video is a live stream that has not started yet.
+The aforementioned metafields are the critical data that the extraction does not make any sense without and if any of them fail to be extracted then the extractor is considered completely broken. While all extractors must return a `title`, they must also allow it's extraction to be non-fatal.
+
+For pornographic sites, appropriate `age_limit` must also be returned.
+
+The extractor is allowed to return the info dict without url or formats in some special cases if it allows the user to extract usefull information with `--ignore-no-formats-error` - Eg: when the video is a live stream that has not started yet.
  
  [Any field](yt_dlp/extractor/common.py#219-L426) apart from the aforementioned ones are considered **optional**. That means that extraction should be **tolerant** to situations when sources for these fields can potentially be unavailable (even if they are always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields.
  
@@ -365,21 +374,21 @@ ### Provide fallbacks
  
  #### Example
  
-Say `meta` from the previous example has a `title` and you are about to extract it. Since `title` is a mandatory meta field you should end up with something like:
+Say `meta` from the previous example has a `title` and you are about to extract it like:
  
  ```python
-title = meta['title']
+title = meta.get('title')
  ```
  
-If `title` disappears from `meta` in future due to some changes on the hoster's side the extraction would fail since `title` is mandatory. That's expected.
+If `title` disappears from `meta` in future due to some changes on the hoster's side the title extraction would fail.
  
-Assume that you have some another source you can extract `title` from, for example `og:title` HTML meta of a `webpage`. In this case you can provide a fallback scenario:
+Assume that you have some another source you can extract `title` from, for example `og:title` HTML meta of a `webpage`. In this case you can provide a fallback like:
  
  ```python
  title = meta.get('title') or self._og_search_title(webpage)
  ```
  
-This code will try to extract from `meta` first and if it fails it will try extracting `og:title` from a `webpage`.
+This code will try to extract from `meta` first and if it fails it will try extracting `og:title` from a `webpage`, making the extractor more robust.
  
  
  ### Regular expressions
@@ -422,7 +431,7 @@ ##### Example
      r'<span[^>]+class="title"[^>]*>([^<]+)', webpage, 'title')
  ```
  
-Or even better:
+which tolerates potential changes in the `style` attribute's value. Or even better:
  
  ```python
  title = self._search_regex(  # correct
@@ -430,7 +439,7 @@ ##### Example
      webpage, 'title', group='title')
  ```
  
-Note how you tolerate potential changes in the `style` attribute's value or switch from using double quotes to single for `class` attribute: 
+which also handles both single quotes in addition to double quotes.
  
  The code definitely should not look like:
  
@@ -524,13 +533,13 @@ #### Example
  Correct:
  
  ```python
-title = self._html_search_regex(r'<title>([^<]+)</title>', webpage, 'title')
+title = self._html_search_regex(r'<h1>([^<]+)</h1>', webpage, 'title')
  ```
  
  Incorrect:
  
  ```python
-TITLE_RE = r'<title>([^<]+)</title>'
+TITLE_RE = r'<h1>([^<]+)</h1>'
  # ...some lines of code...
  title = self._html_search_regex(TITLE_RE, webpage, 'title')
  ```
@@ -633,7 +642,7 @@ ### Use convenience conversion and parsing functions
  
  Use `url_or_none` for safe URL processing.
  
-Use `try_get`, `dict_get` and `traverse_obj` for safe metadata extraction from parsed JSON.
+Use `traverse_obj` and `try_call` (superseeds `dict_get` and `try_get`) for safe metadata extraction from parsed JSON.
  
  Use `unified_strdate` for uniform `upload_date` or any `YYYYMMDD` meta field extraction, `unified_timestamp` for uniform `timestamp` extraction, `parse_filesize` for `filesize` extraction, `parse_count` for count meta fields extraction, `parse_resolution`, `parse_duration` for `duration` extraction, `parse_age_limit` for `age_limit` extraction. 
  
@@ -654,6 +663,10 @@ ##### Safely extract more optional metadata
  view_count = int_or_none(video.get('views'))
  ```
  
+# My pull request is labeled pending-fixes
+
+The `pending-fixes` label is added when there are changes requested to a PR. When the necessary changes are made, the label should be removed. However, despite our best efforts, it may sometimes happen that the maintainer did not see the changes or forgot to remove the label. If your PR is still marked as `pending-fixes` a few days after all requested changes have been made, feel free to ping the maintainer who labeled your issue and ask them to re-review and remove the label.
+