Apple is quietly tagging podcast episodes by topic

Earlier this year, I noticed something unexpected on Apple Podcasts web pages: a list of episode topics.

As far as I can tell, these topics aren’t displayed on-screen. Rather, they appear in the page source, and seem to suggest Apple is tagging individual podcast episodes by topic.

For example, here’s an episode of Today, Explained:

On-screen, Apple displays some basic information about this episode: publish date, episode notes, length, etc. Most of these details come directly from the Today, Explained RSS feed.

In addition to all those publicly-visible details, nestled in the web page source, one can find a list of topics: Ukraine, U.S. Politics, World politics, Council of Europe, United States government, Cultural anthropology, Russia, United States, Language geography, North America, U.S. House of Representatives, U.S. Democratic Party, War in Donbass, U.S. Congress, 2014 Russian military intervention in Ukraine, Joe Biden, Vladimir Putin, Elon Musk, and Nikita Khrushchev.

Impressively, many of these topics (e.g. Musk, Putin, Khrushchev) do not appear within episode metadata in the Today, Explained RSS feed. They’re not included in the episode title, description, or anywhere else.

So where are they coming from? Seemingly, the audio itself. This is easy enough to to confirm: listen to the episode, or check the transcript.

It’s not just Today, Explained, either. Apple has assigned topics to many podcast episodes. When I checked in early November 2022, the top 250 shows in the Apple Top Shows list (US) contained a total of 70,094 episodes. Of those, 44,516 episodes (approximately 63.5%) were tagged with topics.

What’s going on here?

Here’s my best guess: Apple is using machine-generated transcriptions, then applying natural language processing techniques like topic modeling to generate lists of relevant topics on an episode-by-episode basis.

Similar approaches have been used by organizations like The New York Times to improve their recommendation engines. Applying these techniques to spoken-word podcast audio makes sense, given Apple’s capabilities in what they call Natural Language Processing and Speech Technologies.

Apple’s episode topic tagging seems similar to Musixmatch Podcasts, an AI-powered podcast transcription service. As Ivan Mehta explains in TechCrunch:

Musixmatch’s podcast platform automatically generates transcription every day for some of the top podcast episodes across different topics and charts. It’s using its NLP base model architecture, Umberto, to tag keywords such as places, people and topics with Wikipedia IDs — alphanumeric IDs that are linked to topics on Wikipedia.

Speaking of Wikipedia…

Unstructured audio, meet structured data

Most (but not all) of Apple’s podcast episode topics map directly to Wikidata identifiers. This is fascinating.

To me, this seems like an attempt to take the messy, chaotic, and unruly world of spoken word audio, and bind it to the highly-structured world of modeled data from projects like Wikipedia.

Not only could this lead to new and better podcast discovery features within Apple Podcasts, greater use of structured data also represents a potential win for podcast search and audio SEO.

We can see evidence Apple is using these topics within its existing search functions today. For example, when I searched the Apple Podcasts app for “War in Donbass,” the top recommended episode was from The Inquiry:

The phrase “War in Donbass” does not appear in this episode’s title or description, yet this episode ranks highly for that search phrase. How? Topics are a likely part of the answer.

Indeed, when I peeked under the hood, I saw that War in Donbass (Wikidata identifier Q16335075) is listed as a highly relevant topic for this episode of The Inquiry.

Relevance, ranked

Every topic I found in Apple’s system has a per-episode relevance score. For example, here’s the list of topics for our Today, Explained episode, ranked by relevance:

It’s not difficult to see how Apple could use relevence-weighted topic data to significantly improve podcast episode discovery, especially when coupled with other podcast listening behaviour data that Apple tracks.

For example, imagine being able to explore episodes on a particular topic, published in the last week/month/year, ranked by a combination of topic relevance and total listening time among all Apple Podcasts users. Or imagine diving into the back catalog of a long-running show through the lens of topics you care most about.

What do Apple’s topics mean for podcasters?

It seems Apple is extracting topics from automated transcripts, and using those topics to help power its podcast search features. If true, I see several important implications for podcasters.

  • Podcast SEO goes well beyond the textual elements in you show’s RSS feed
  • For some podcasts, it’s possible to see what Apple thinks your show is about
  • Like Apple’s transcript search, there’s no way for creators to directly control what Apple’s topic analysis thinks an episode is about. Whereas podcasters can directly manipulate metadata like title and description text through their RSS feeds, there doesn’t seem to be any way to edit, revise, or correct the topics Apple assigns to their episodes.
  • Combined with Apple’s podcast category data, episode topics could be used to identify editorial trends over time. Think Google Trends, but for podcasts.
  • Apple’s podcast topic identification could enable new and exciting discovery features, and help listeners find relevant episodes in an increasingly crowded landscape

I’m fascinated by Apple’s efforts to associate spoken-word audio with structured topic data, and I look forward to exploring this data and seeing what listener-facing features Apple will continue to build on top of it.

What does Apple Podcasts think your show is about?

Thanks to John Spurlock for reviewing an early draft of this post.


Get data-informed insights into podcast strategy, audience growth, and the audio industry...