Extracting Audio from Multimedia (Video) Files

Objective

From a multimedia (video) file, extract the audio into a separate audio file for further processing or playback using an audio player.

Assumptions, Constraints and Considerations

  1. Free/open source software.
  2. Command line tools for maximum flexibility.
  3. Portable (supported by multiple platforms).

A "Swiss Army knife" of multimedia files can be found in the ffmpeg tool, which has ports for Linux, *BSD, MacOS and Microsoft Windows.

Examining Containers

Multimedia files are stored in a "container" format that contains video and audio streams. Container formats include: mp4, mkv and webm. The ffprobe program (part of the ffmpeg suite) can be used to examine a container file, giving details of each type of media stream it contains; e.g.

$ ffprobe <filename>.mkv
Input #0, matroska,webm from '<filename>.mkv':
  Metadata:
    COMPATIBLE_BRANDS: iso6avc1mp41
    MAJOR_BRAND     : dash
    MINOR_VERSION   : 0
    ENCODER         : Lavf57.56.101
  Duration: 00:05:18.00, start: -0.004000, bitrate: 245 kb/s
    Stream #0:0: Video: h264 (Main), yuv420p(progressive), ...
    Metadata:
    ...
    Stream #0:1(eng): Audio: opus, 48000 Hz, stereo, fltp (default)
    Metadata:
      DURATION      : 00:05:18.000000

In the example above, the audio is in stream #0:1 and is in the OPUS bitstream coding. (Typical audio codings include MP3, AAC, Opus and AIFF.)

Extracting the Audio

To ignore the video stream, use the -vn option. ffmpeg tries to reduce the burden on the user by making "intelligent" guesses (e.g. automatic selection of streams) where possible, so it is not usually required to specify the audio stream explicitly where there is only a single stream.

Passthrough

To simply copy the audio stream from the input file to an output file as-is, specify the "copy" parameter to the codec option.

$ ffmpeg -i <infile> -vn -c:a copy <outfile>.<extension>

where

-i <infile>
specifies the multimedia (container) input file as <infile>
-vn
tells ffmpeg to ignore the video stream
-c:a copy
is the audio codec option that tells ffmpeg to simply pass a copy of the audio stream to the output file without decoding
<outfile>.<extension>
is the output file name and extension. Note that the extension must agree with the audio encoding format or ffmpeg will complain.

Transcoding

ffpeg can decode the input stream, then re-encode with another CODEC or parameters. For example, to encode as AAC:

$ ffmpeg -i <infile> -vn -c:a aac <file>.aac

where

-i <infile>
specifies the multimedia (container) input file as <infile>
-vn
tells ffmpeg to ignore the video stream
-c:a aac
is the audio codec option that specifies the AAC encoder. (Other parameters such as bitrate may also be specified.)
<outfile>.<extension>
is the output file name and extension.

Fragment Extraction

ffmpeg can extract a portion of a stream using the -ss to specify the start time of the portion (or -sseof to specify the start time relative to the end of the file) and -to to specify the end time or -t to specify the duration. These options applied to the input stream should come before the -i <infile> option, for example:

$ ffmpeg -ss 04:21 -to 05:45 -i <infile> -vn -c:a aac <file>.aac

Time is specified either in the format

[-][<HH>:]<MM>:<SS>[.<m>...]

for hours, minutes and seconds or

[-]<S>+[.<m>...][<unit>]

for just seconds, where <unit> is ms for milliseconds, us for microseconds, and omitted for seconds.

Note that the results may not be that accurate, particularly when not transcoding, since many streams do not support accurate "seek" (but ffmpeg creates an output stream that includes the selected time; i.e. it will be too long rather than too short). A tool like audacity can be used to "trim" the excess afterwards.

ID3 Tags

ID3 is a set of de-facto standards for metadata that can be added to certain types of audio file. Originally intended for MP3, ID3 tags can also be added to AIFF files. The tool id3ed can be used as a command line tool to print and edit ID3 tags. Metadata include title, artist, album, year, comment and track number. Genre is encoded as a single byte indexing a list of genres.

social