Some technical thoughts on podcasting 08 July 2017

I've been podcasting for over four years now, I'm now getting game recordings together for the Whartson Hall Æthernauts too, and I hope some of this advice may be useful to people who are getting started.

Biggest single piece of advice, because even now I still meet podcasts that don't do it: add ID3 tags for title, speakers, etc., to the MP3 file. Not only is this kinder to the end user, it helps with iTunes should you push it there. I use id3v2, a command-line program for Linux; other packages have their own methods.

Some sort of music or "sting" can be helpful; I usually use about fifteen seconds, for introduction and lead-out and to make a gap between segments. (This particularly helps if you take a break while recording or move segments out of recording order, because your voice will sound different when you come back.) The canonical source for royalty-free music is (you'll see credits for "music by Kevin McLeod" on a lot of no-budget films too); there's also the Audio Library youtube channel, should you be able to be a STREAMING PIRATE and download the audio from there with your EVIL HACKING TOOLS.

For game recordings I use a longer intro, and usually mix the music with bits of dialogue from the first few sessions to give listeners some idea of what they'll be in for.

It's easier to chop things out than to add things. With multiple people, if you end up talking over each other, everyone should stop, wait for a bit, and start again. Same if you cough. Just deal with it, skip back in what you're saying, and have another try, rather than trying to salvage the phrase you've started.

I'm still editing in Audacity, though I hear good things about Ardour and keep meaning to give it a try some time. My workflow for editing a segment (either 10-20 minutes of chat for a podcast, or one whole game session) is:

  • run it through noise reduction, because I've recorded five seconds of ambient room noise before we started talking
  • edit for content, removing fluffs, clicks, etc.; this is the time-consuming bit, and for Whartson Hall I usually don't do this at all (see the name) unless someone has said something they'd rather not have in the recording.
  • run the whole thing through a compressor; I use the default settings on Compress Dynamics, available from the Audacity Podcast since the author died a while ago. This helps get the levels more consistent across time (i.e. not getting gradually louder or softer).
  • run the whole thing through Normalise (left/right channels stay linked).

For game sessions, I have one stereo track pair for me, and one for everyone else (playing remotely), so usually I chop off any chat from the start of the session first (keeping the tracks synchronised), then run through the noise reduction, compression, normalisation chain separately for each pair.

In Audacity, note that ctrl-click on Play will play a few seconds before and a few seconds after the selected region, i.e. what it'll sound like if you cut that bit out. Which is quicker than cut-play-undo.

Also note that Edit → Find Zero Crossings will do what it says, which helps reduce clicking when you're removing a chunk.

I particularly aim to remove clicks (i.e. very loud transients), which will confuse the compression and normalisation stages. Constant background noise will be taken out by noise reduction, but transients are a Pain. Hand claps, coughs, chair creaks (we actually use plastic garden chairs on the podcast to minimise this), ice clinks, a glass being put down: all these things are surprisingly noisy.

When I assemble the segments with music and any effects (each distinct thing gets its own track, much like layers in image editing), I also tweak the overall track level faders for a bit more consistency. With the setup I use, I generally drop the music about 5-10dB relative to the speech. Some podcasters like to have REALLY LOUD music between segments, in the style of TV advertisements to grab the listener's attention if it's wandered, but I don't favour this.

Usually I use the start of the music track and fade out over 5-10 seconds when the speech starts. Sometimes I fade the music in over the last few seconds of the previous segment to avoid any gaps. This is most easily done with the Envelope tool.

Overall editing takes me very roughly 1-1.5× the run-time of the show. I've got faster at it over time.

Hardware is the least important thing, so I'm mentioning it last. I'm still using the Tascam DR-40 recorder I picked up late in 2015, but I now combine it with an anti-shock mount (which sits between recorder and tripod and cuts down on noise from table-knocking), and a "dead kitten" windgag when I'm recording outside (which I tend to do in the summer). That's really it. Everything else is free software and knowing how to use it.

  1. Posted by Owen Smith at 10:43pm on 08 July 2017

    I recommend MP3Tag for tagging MP3 and FLAC files on Windows. It may do other filetypes, those are the only ones I've wanted to tag. It's free, there's an option to donate and when I did so (because I'd had so much use out of it) the author emailed me and seemed quite surprised I'd donated.

    I use Audacity for audio editing on Windows on the odd occasion I do any. Usually that's male voice choirs or brass band live recordings that some other idiot made that I'm trying to knock into shape.

