[DailyWire] Add extractors (#4084)

[yt-dlp.git] / CONTRIBUTING.md
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md

index dbd6a84b258063cf82f83d67e826c2941159e60e..e48d2ebd0c7b1e71b43d7eeefe21874631858854 100644 (file)
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -178,7 +178,6 @@ ## Adding support for a new site
  1. Start with this simple template and save it to `yt_dlp/extractor/yourextractor.py`:
  
      ```python
-    # coding: utf-8
      from .common import InfoExtractor
      
      
@@ -375,21 +374,21 @@ ### Provide fallbacks
  
  #### Example
  
-Say `meta` from the previous example has a `title` and you are about to extract it. Since `title` is a mandatory meta field you should end up with something like:
+Say `meta` from the previous example has a `title` and you are about to extract it like:
  
  ```python
-title = meta['title']
+title = meta.get('title')
  ```
  
-If `title` disappears from `meta` in future due to some changes on the hoster's side the extraction would fail since `title` is mandatory. That's expected.
+If `title` disappears from `meta` in future due to some changes on the hoster's side the title extraction would fail.
  
-Assume that you have some another source you can extract `title` from, for example `og:title` HTML meta of a `webpage`. In this case you can provide a fallback scenario:
+Assume that you have some another source you can extract `title` from, for example `og:title` HTML meta of a `webpage`. In this case you can provide a fallback like:
  
  ```python
  title = meta.get('title') or self._og_search_title(webpage)
  ```
  
-This code will try to extract from `meta` first and if it fails it will try extracting `og:title` from a `webpage`.
+This code will try to extract from `meta` first and if it fails it will try extracting `og:title` from a `webpage`, making the extractor more robust.
  
  
  ### Regular expressions
@@ -432,7 +431,7 @@ ##### Example
      r'<span[^>]+class="title"[^>]*>([^<]+)', webpage, 'title')
  ```
  
-Or even better:
+which tolerates potential changes in the `style` attribute's value. Or even better:
  
  ```python
  title = self._search_regex(  # correct
@@ -440,7 +439,7 @@ ##### Example
      webpage, 'title', group='title')
  ```
  
-Note how you tolerate potential changes in the `style` attribute's value or switch from using double quotes to single for `class` attribute: 
+which also handles both single quotes in addition to double quotes.
  
  The code definitely should not look like:
  
@@ -534,13 +533,13 @@ #### Example
  Correct:
  
  ```python
-title = self._html_search_regex(r'<title>([^<]+)</title>', webpage, 'title')
+title = self._html_search_regex(r'<h1>([^<]+)</h1>', webpage, 'title')
  ```
  
  Incorrect:
  
  ```python
-TITLE_RE = r'<title>([^<]+)</title>'
+TITLE_RE = r'<h1>([^<]+)</h1>'
  # ...some lines of code...
  title = self._html_search_regex(TITLE_RE, webpage, 'title')
  ```
@@ -643,7 +642,7 @@ ### Use convenience conversion and parsing functions
  
  Use `url_or_none` for safe URL processing.
  
-Use `try_get`, `dict_get` and `traverse_obj` for safe metadata extraction from parsed JSON.
+Use `traverse_obj` and `try_call` (superseeds `dict_get` and `try_get`) for safe metadata extraction from parsed JSON.
  
  Use `unified_strdate` for uniform `upload_date` or any `YYYYMMDD` meta field extraction, `unified_timestamp` for uniform `timestamp` extraction, `parse_filesize` for `filesize` extraction, `parse_count` for count meta fields extraction, `parse_resolution`, `parse_duration` for `duration` extraction, `parse_age_limit` for `age_limit` extraction.