# yt-cast: Generating podcasts from YouTube URLs

Start with a config.json like this:

{
"brandon-sanderson": [
]
}


Tested URLs include:

• Playlist URLs (like the above)
• Single video URLs
• Channel URLs

Most youtube URLs should work though.

How? With these parts:

• The update-thread to download all the playlists and metadata
• The download-thread to download the actual episodes and convert them to mp3s
• Flask routes to generate the RSS feed and return the mp3s
• The main function

# A thread to download information on the requested URLs periodically
# Update once a minute, but caches will probably mostly be used
while True:
with open(CONFIG_FILE, 'r') as fin:

for key in config:
for url in config[key]:
path = path_for(url)

if not os.path.exists(path) or os.path.getmtime(path) + CACHE_TTL < time.time():
logging.info(f'Fetching {url}')
try:

with open(path, 'w') as fout:
json.dump(info, fout)

if 'entries' in info:
for entry in info['entries']:
else:

except Exception as ex:
logging.error(f'Failed to fetch {url}: {ex}')

time.sleep(60)


with youtube_dl.YoutubeDL(YDL_OPTS) as ydl:


If there are entries, we have more than one video. Include them all. If not, it’s a single video. I did add a cutoff function to make sure I didn’t download every video on long feeds:

# When we should return history back to
def cutoff():
return (datetime.date.today() - datetime.timedelta(**CUTOFF)).strftime('%Y%m%d')


The next thing that we will do is generate a hashed filename for the URL (mostly to make sure unusual filenames etc aren’t a problem):

# Get the cache file for a url
def path_for(url):
hash = hashlib.md5(url.encode()).hexdigest()
return f'data/{hash}.json'


This also uses the file mtime to make sure that we only download files every so often, even if this update script runs once per minute.

# Download a single youtube video
# Prepopulate with any missing videos
for key in config:
for url in config[key]:
path = path_for(url)
if not os.path.exists(path):
continue

with open(path, 'r') as fin:
if 'entries' in info:
for entry in info['entries']:
else:

while True:

path = path_for(url)
if os.path.exists(path):
continue

time.sleep(60)


It starts with a single download ahead of time because the mp3 server doesn’t like not having files. If files are already updated, it should go very quickly and then go into the main update loop which goes through newly queued videos and downloads them. It’s actually the same functionality as getting the info, just for single videos and with download=True. Onwards!

First, let’s generate the XML files:

@app.route('/<key>.xml')
def podcast(key):
entries = []

for url in config[key]:
path = path_for(url)
if not os.path.exists(path):
continue

with open(path, 'r') as fin:
if 'entries' in info:
for entry in info['entries']:
entries.append(entry)
else:
entries.append(info)

entries = list(reversed(sorted(entries, key=lambda entry: entry['upload_date'])))

# Generate the XML
flask.render_template('podcast.xml', key = key, entries = entries, format_date = format_date),
mimetype='application/atom+xml'
)


This will filter through videos based on the cutoff (above). Most of the work is done in the Jinja templates.

<?xml version="1.0" encoding="UTF-8"?>
<channel>
<title>{{ key }}</title>
<language>en-us</language>
<itunes:subtitle>Generated by yt-cast</itunes:subtitle>
<itunes:summary>{{ key }}</itunes:summary>
<description>{{ key }}</description>
<itunes:owner>
<itunes:email>me@example.com</itunes:email>
</itunes:owner>
<itunes:explicit>no</itunes:explicit>
<itunes:category>Comedy</itunes:category>
{% for entry in entries %}
<item>
<title>{{ entry['title'] }}</title>
<itunes:summary></itunes:summary>
<description>{{ entry['description'] }}</description>
<enclosure url="{{ request.host_url }}{{ entry['id'] }}.mp3" type="audio/mpeg" length="1024"></enclosure>
<itunes:author></itunes:author>
<itunes:duration>00:00:01</itunes:duration>
<itunes:explicit>no</itunes:explicit>
<guid>{{ path }}</guid>
</item>
{% endfor %}
</channel>


It’s half incomplete, but it’s at least functional. One interesting thing I did learn was that the itunes:duration and enclosure@length don’t actually have to be realistic values, but for many programs they do have to be set. Legacy!

This one is really quick. It does require that the id file actually look like an ID (primarily to prevent a directory traversal attack, although Flask should do that). Then just send_file it back. This could be much more efficient by using a reverse proxy (nginx etc) in front of Flask to actually serve the static files, but in practice it seems to be working well enough.

@app.route('/<id>.mp3')
def episode(id):
if not re.match(r'^[a-zA-Z0-9_-]+\$', id):
raise Exception('Close but no cigar')



# Main

Start it all up and we’re good to go:

if __name__ == '__main__':

app.run(host = '0.0.0.0')


Setting this as daemon threads means that when the server is shut down, the threads will go with it.

# TODOs

• Automatically remove files that have passed the cutoff date

# Source

Full source: https://github.com/jpverkamp/yt-cast

If you have any ideas, send in a pull request or shoot me an email.