1. Start with this simple template and save it to `yt_dlp/extractor/yourextractor.py`:
```python
- # coding: utf-8
from .common import InfoExtractor
#### Example
-Say `meta` from the previous example has a `title` and you are about to extract it. Since `title` is a mandatory meta field you should end up with something like:
+Say `meta` from the previous example has a `title` and you are about to extract it like:
```python
-title = meta['title']
+title = meta.get('title')
```
-If `title` disappears from `meta` in future due to some changes on the hoster's side the extraction would fail since `title` is mandatory. That's expected.
+If `title` disappears from `meta` in future due to some changes on the hoster's side the title extraction would fail.
-Assume that you have some another source you can extract `title` from, for example `og:title` HTML meta of a `webpage`. In this case you can provide a fallback scenario:
+Assume that you have some another source you can extract `title` from, for example `og:title` HTML meta of a `webpage`. In this case you can provide a fallback like:
```python
title = meta.get('title') or self._og_search_title(webpage)
```
-This code will try to extract from `meta` first and if it fails it will try extracting `og:title` from a `webpage`.
+This code will try to extract from `meta` first and if it fails it will try extracting `og:title` from a `webpage`, making the extractor more robust.
### Regular expressions
r'<span[^>]+class="title"[^>]*>([^<]+)', webpage, 'title')
```
-Or even better:
+which tolerates potential changes in the `style` attribute's value. Or even better:
```python
title = self._search_regex( # correct
webpage, 'title', group='title')
```
-Note how you tolerate potential changes in the `style` attribute's value or switch from using double quotes to single for `class` attribute:
+which also handles both single quotes in addition to double quotes.
The code definitely should not look like:
Correct:
```python
-title = self._html_search_regex(r'<title>([^<]+)</title>', webpage, 'title')
+title = self._html_search_regex(r'<h1>([^<]+)</h1>', webpage, 'title')
```
Incorrect:
```python
-TITLE_RE = r'<title>([^<]+)</title>'
+TITLE_RE = r'<h1>([^<]+)</h1>'
# ...some lines of code...
title = self._html_search_regex(TITLE_RE, webpage, 'title')
```
Use `url_or_none` for safe URL processing.
-Use `try_get`, `dict_get` and `traverse_obj` for safe metadata extraction from parsed JSON.
+Use `traverse_obj` and `try_call` (superseeds `dict_get` and `try_get`) for safe metadata extraction from parsed JSON.
Use `unified_strdate` for uniform `upload_date` or any `YYYYMMDD` meta field extraction, `unified_timestamp` for uniform `timestamp` extraction, `parse_filesize` for `filesize` extraction, `parse_count` for count meta fields extraction, `parse_resolution`, `parse_duration` for `duration` extraction, `parse_age_limit` for `age_limit` extraction.