Skip to content

Commit

Permalink
feat: Add auto-discovery from popular services (#225)
Browse files Browse the repository at this point in the history
  • Loading branch information
janw authored Jan 3, 2025
1 parent 7fe8f14 commit c0abf10
Show file tree
Hide file tree
Showing 42 changed files with 141,479 additions and 92 deletions.
64 changes: 32 additions & 32 deletions .assets/podcast-archiver-dry-run.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
14 changes: 7 additions & 7 deletions .assets/podcast-archiver-help.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -62,5 +62,6 @@ repos:
language: system
require_serial: true
pass_filenames: false
always_run: true
files: ^podcast_archiver/config\.py$
types: [python]
29 changes: 29 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,12 +62,41 @@ podcast-archiver --dir ~/Podcasts --feed https://feeds.feedburner.com/TheAnthrop

Podcast Archiver expects values to its `--feed/-f` argument to be URLs pointing to an [RSS feed of a podcast](https://archive.is/jYk3E).

If you are not certain if the link you have for a show that you like, you can try and pass it to Podcast Archiver directly. The archiver supports a variety of links from popular podcast players and platforms, including [Apple Podcasts](https://podcasts.apple.com/us/browse), [Overcast.fm](https://overcast.fm/), [Castro](https://castro.fm/), and [Pocket Casts](https://pocketcasts.com/):

```sh
# Archive from Apple Podcasts URL
podcast-archiver -f https://podcasts.apple.com/us/podcast/black-girl-gone-a-true-crime-podcast/id1556267741
# ... or just the ID
podcast-archiver -f 1556267741

# From Overcast podcast URL
podcast-archiver -f https://overcast.fm/itunes394775318/99-invisible
# ... or episode sharing links (will resolve to all episodes)
podcast-archiver -f https://overcast.fm/+AAyIOzrEy1g
```

#### Supported services

The table below lists most of the supported services and URLs. If you think that some service you like is missing here, [please let me know](/~https://github.com/janw/podcast-archiver/issues/new)!

| Service | Example URL |
| ------------------------------------- | -------------------------------------------------------------------------------------- |
| Apple Podcasts | <https://podcasts.apple.com/us/podcast/the-anthropocene-reviewed/id1342003491> |
| [Overcast](https://overcast.fm/) | <https://overcast.fm/itunes394775318/99-invisible>, <https://overcast.fm/+AAyIOzrEy1g> |
| [Castro](https://castro.fm/) | <https://castro.fm/podcast/f996ae94-70a2-4d9c-afbc-c70b5bacd120> |
| [SoundCloud](https://soundcloud.com/) | <https://soundcloud.com/chapo-trap-house> |

#### Local files

Feeds can also be "fetched" from a local file:

```bash
podcast-archiver -f file:/Users/janw/downloaded_feed.xml
```

#### Testing without downloading

To find out if you have to the right feed, you may want to use the `--dry-run` option to output the discovered feed information and found episodes. It will prevent all downloads.

```sh
Expand Down
2 changes: 1 addition & 1 deletion config.yaml.example
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
## Podcast-Archiver configuration
## Generated using podcast-archiver v2.0.2
## Generated using podcast-archiver v2.1.0

# Field 'feeds': Feed URLs to archive.
#
Expand Down
5 changes: 5 additions & 0 deletions cspell.config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -53,3 +53,8 @@ words:
- venv
- virtualenv
- willhaus
- rsps
- soundcloud
- logbuch
- netzpolitik
- wochendaemmerung
3 changes: 2 additions & 1 deletion hack/rich-codex.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
TMPDIR=$(mktemp -d 2>/dev/null || mktemp -d -t 'tmpdir')

export FORCE_COLOR="1"
export TERM="xterm-256color"
export COLUMNS="120"
export CREATED_FILES="created.txt"
export DELETED_FILES="deleted.txt"
Expand All @@ -17,4 +18,4 @@ export PODCAST_ARCHIVER_IGNORE_DATABASE=true
# shellcheck disable=SC2064
trap "rm -rf '$TMPDIR'" EXIT

exec poetry run rich-codex --terminal-width $COLUMNS --notrim
poetry run rich-codex --terminal-width $COLUMNS --notrim --terminal-theme DIMMED_MONOKAI
3 changes: 2 additions & 1 deletion podcast_archiver/console.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,8 @@

_theme = Theme(
{
"error": "bold dark_red",
"error": "dark_red bold",
"errorhint": "dark_red dim",
"warning": "orange1 bold",
"warning_hint": "orange1 dim",
"completed": "dark_cyan bold",
Expand Down
2 changes: 1 addition & 1 deletion podcast_archiver/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
USER_AGENT = f"{PROG_NAME}/{__version__} (/~https://github.com/janw/podcast-archiver)"
ENVVAR_PREFIX = "PODCAST_ARCHIVER"

REQUESTS_TIMEOUT = 30
REQUESTS_TIMEOUT = (5, 30)

SUPPORTED_LINK_TYPES_RE = re.compile(r"^(audio|video)/")
DOWNLOAD_CHUNK_SIZE = 256 * 1024
Expand Down
4 changes: 4 additions & 0 deletions podcast_archiver/exceptions.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,3 +45,7 @@ class NotModified(PodcastArchiverException):
def __init__(self, info: FeedInfo, *args: object) -> None:
super().__init__(*args)
self.info = info


class NotSupported(PodcastArchiverException):
pass
37 changes: 26 additions & 11 deletions podcast_archiver/models/feed.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
from pydantic import AliasChoices, BaseModel, ConfigDict, Field, field_validator

from podcast_archiver.constants import MAX_TITLE_LENGTH
from podcast_archiver.exceptions import NotModified
from podcast_archiver.exceptions import NotModified, NotSupported
from podcast_archiver.logging import logger, rprint
from podcast_archiver.models.episode import EpisodeOrFallback
from podcast_archiver.models.field_types import LenientDatetime
Expand Down Expand Up @@ -90,6 +90,13 @@ def truncate_title(cls, value: str) -> str:
def field_titles(cls) -> list[str]:
return [field.title for field in cls.model_fields.values() if field.title]

@property
def alternate_rss(self) -> str | None:
for link in self.links:
if link.rel == "alternate" and link.link_type == "application/rss+xml":
return link.href
return None


class FeedPage(BaseModel):
model_config = ConfigDict(arbitrary_types_allowed=True)
Expand All @@ -103,38 +110,46 @@ class FeedPage(BaseModel):
episodes: list[EpisodeOrFallback] = Field(default_factory=list, validation_alias=AliasChoices("entries", "items"))

@classmethod
def parse_feed(cls, source: str | bytes, alt_url: str | None) -> FeedPage:
def parse_feed(cls, source: str | bytes, alt_url: str | None, retry: bool = False) -> FeedPage:
feedobj = feedparser.parse(source)
obj = cls.model_validate(feedobj)
if obj.bozo and (exc := obj.bozo_exception) and isinstance(exc, SAXParseException):
url = source if isinstance(source, str) and not alt_url else alt_url
if not obj.bozo:
return obj

if (fallback_url := obj.feed.alternate_rss) and not retry:
logger.info("Attempting to fetch alternate feed at %s", fallback_url)
return cls.from_url(fallback_url, retry=True)

url = source if isinstance(source, str) and not alt_url else alt_url
if (exc := obj.bozo_exception) and isinstance(exc, SAXParseException):
rprint(f"Feed content is not well-formed for {url}", style="warning")
rprint(f"Continuing processing but here be dragons ({exc.getMessage()})", style="warning_hint")
return obj
rprint(f"Attemping processing but here be dragons ({exc.getMessage()})", style="warninghint")

raise NotSupported(f"Content at {url} is not supported")

@classmethod
def from_url(cls, url: str, *, known_info: FeedInfo | None = None) -> FeedPage:
def from_url(cls, url: str, *, known_info: FeedInfo | None = None, retry: bool = False) -> FeedPage:
parsed = urlparse(url)
if parsed.scheme == "file":
return cls.parse_feed(parsed.path, None)

if not known_info:
return cls.from_response(session.get_and_raise(url), alt_url=url)
return cls.from_response(session.get_and_raise(url), alt_url=url, retry=retry)

response = session.get_and_raise(url, last_modified=known_info.last_modified)
if response.status_code == HTTPStatus.NOT_MODIFIED:
logger.debug("Server reported 'not modified' from %s, skipping fetch.", known_info.last_modified)
raise NotModified(known_info)

instance = cls.from_response(response, alt_url=url)
instance = cls.from_response(response, alt_url=url, retry=retry)
if instance.feed.updated_time == known_info.updated_time:
logger.debug("Feed's updated time %s did not change, skipping fetch.", known_info.updated_time)
raise NotModified(known_info)

return instance

@classmethod
def from_response(cls, response: Response, alt_url: str | None) -> FeedPage:
instance = cls.parse_feed(response.content, alt_url=alt_url)
def from_response(cls, response: Response, alt_url: str | None, retry: bool) -> FeedPage:
instance = cls.parse_feed(response.content, alt_url=alt_url, retry=retry)
instance.feed.last_modified = response.headers.get("Last-Modified")
return instance
Loading

0 comments on commit c0abf10

Please sign in to comment.