Podcasts provide highly diverse content to a
massive listener base through a unique on-
demand modality. However, limited data has
prevented large-scale computational analysis
of the podcast ecosystem. To fill this gap, we
introduce a massive dataset of over 1.1M pod-
cast transcripts that is largely comprehensive of
all English language podcasts available through
public RSS feeds from May and June of 2020.
This data is not limited to text, but includes
metadata, inferred speaker roles, and audio fea-
tures and speaker turns for a subset of 370K
episodes. Using this data, we conduct a founda-
tional investigation into the content, structure,
and responsiveness of this ecosystem. Together,
our data and analyses open the door to contin-
ued computational research of this popular and
impactful medium.