Live audio streaming over the Internet for reception by members of the general public, with optional authentication to restrict access only to an intended audience.
Assumptions, Constraints and Considerations
- Listeners are members of the general public and not "tech savvy". The solution must work with common audio players (e.g. on mobile devices) and media players embedded in web pages.
- Low costs of implementation and running.
- Must use free software components to the extent possible (full source code available, ability to inspect and modify).
- Operating the service (start, stop, control access) must require a minimum of specialist knowledge and training.
- Audio source uses a domestic ISP (Internet Service Provider) connection, so behind a NAT (Network Address Translation) device. (This means that the source does not have a public IP address.)
We consider three components:
- Audio source: A source of audio signals. Encodes the input audio and transmits to the server. Possibly behind a NAT. The physical device is assumed to be a small computer (laptop, single-board computer etc.).
- Server: Takes the audio stream from the source and makes it available to clients. For low-cost, a Virtual Private Server (VPS) is assumed, with low computing performance and memory requirements.
- Audio clients: Software that runs on listeners' devices (smartphone, computer etc.) that plays the audio stream from the server.
Audio signals are typically compressed for transmission over a network, and decompressed for processing by the receiver. Encoding/decoding is performed by a CODEC (COder-DECoder), and there is a plethora of methods (algorithms) for audio encoding/decoding. The method, its quality of its implementation and its parameters affect audio quality, and there is typically a tradeoff between quality and required bandwidth (degree of compression). There are a variety of audio coding methods with different characteristics and varying degrees of support by software and audio player devices. From my requirement for wide support, I chose the venerable MP3 coding, since it is practically ubiquitous and is now patent-free so it can be implemented as free software.
The choice of server is partly determined by the choice of streaming method and protocol.
Audio signals have to be transmitted using a communication protocol. There are various protocols for the streaming of encoded audio data over the Internet in real-time or near real-time. Because of my first constraint above, I wanted a streaming format with widespread support by audio player software and devices. I therefore selected an HTTP-based protocol for server-to-client transmission; specifically, HLS (HTTP Live Streaming), developed by Apple Inc. in and standardised as a Request For Comments in RFC8216 . HLS is widely supported and being HTTP-based, can traverse firewalls and proxy servers, aiding deployment, though that consideration is hardly relevant to general public users.
The way HLS works is a bit of a kludge: it presents a continuous stream as an unlimited series of small HTTP-based file downloads, each containing a short portion of the audio stream, that can then be transmitted over a transport service. A list of available streams is sent to the client as an extended M3U playlist format file. This necessarily requires buffering to break up the input audio stream into chunks that are then transmitted, so latency is necessarily high, but low latency is not typically a requirement for streaming audio.
As the server, I made a preliminary selection of Icecast .
Configuration of the Icecast server for test use is straightforward, if a little hampered by lack of detailed explanation as to what the server actually does. (It simply states that it is a "streaming media server".)
The server takes inputs from audio sources and makes them available as an HLS stream at a certain port. The Icecast server is capable of handling multiple such streams (that is, input source and corresponding output stream) simultaneously, so each stream is identified by a name, which is reflected in the URLs of the audio source input and HLS output interfaces, and is referred to as a mountpoint in that context. Stream information (name, genre, description) is specified by metadata provided with the input stream, but can also be specified in the server configuration file. There is no "mixing" function (so multiple input streams cannot be merged into a single output stream) and there is no transcoding function; the output stream characteristics are the same as those of the corresponding input stream.
The output interface has an HTTP URL (Uniform Resource Locator) that can either be used as a link in a web page (e.g. to an audio client) or given directly to an audio client. If the link is specified with an M3U file extension it can be consumed directly by media players. If the stream name (mountpoint) is <stream> and the server's hostname is <hostname>, then the output stream has a basic URL:
The input stream is sent from audio source clients to the server using the HTTP-based Icecast protocol . The mountpoint is specified as part of the URL and supplementary mountpoint information is specified using non-standard HTTP headers. It is important that the source set the Content-Type header. Authentication of the source is done by HTTP Basic Auth, requiring a username and password.
With some audio source software (e.g. ffmpeg), the input stream interface is identified by a basic URL:
The username is typically 'source' and the password is set in the icecast configuration file. The default port is 8000, and this can again be modified in the configuration file.
The icecast server can also relay (forward) streams, but I did not explore this function.
Icecast supports listing of broadcast streams in a "Yellow Pages" (YP) directory hosted by XIPH (the organisation behind Icecast), allowing discovery by a wider audience. Since my purpose was for broadcasting to a small private audience, I did not use this feature. The identification of streams as "public" seems to be related to this feature.
Configuration and Administration
Configuration of Icecast is by editing an XML-format file icecast.xml.
The administrative user interface is a web page served from a built-in web server. There are also pages that show statistics. Functions are provided by HTTP requests that return statistics etc.
The main parameters that require configuration are:
- The hostname used for stream directory listings.
- The password used for authentication of input source clients.
- The password used to authenticate the administration user for the administrative web page.
- Specifies the IP address and port that the server will listen on for input streams, audio clients and also for web interface. Multiple IP addresses/ports may be specified.
Listener authentication can be provided using simple htpasswd password file lookup (storing passwords for individual user names) or roll-your-own method using web requests and a client program that examines headers in the client connection request and responds as to whether the client is allowed to connected or not (called URL method in the documentation). With the URL method, the requests/responses are over plain HTTP with no provision for TLS, so appropriate measures need to be taken to ensure the confidentiality and integrity of the communication, such as running the the authentication program on the same machine as Icecast and communicating via a local-only IP connection (localhost), using an IPsec VPN if the program is on another machine (recommended to protect authentication information in case a security vulnerability in Icecast leads to a server compromise) etc.
The processing of the audio source can be represented as the following pipeline.
Sample (digitise) Encode Stream analogue audio -> (CODEC) -> to server
Audio acquisition hardware (hardware that takes analogue signals from microphones etc. and converts them to digital signals that software can process) is not standardised, so use by software requires device drivers and a layer of software that abstracts the underlying hardware (sound card) and provides a common interface to allow software to access audio devices. On Linux computers, this role is fulfilled by the audio hardware is the Advanced Linux Sound Architecture (ALSA). To discover audio devices, the following command
$ arecord --list-devices
will list audio input devices (sound cards) and sub-devices (there may be several audio capture devices implemented on a single sound card). Audio input devices are referred to in ASLA command lines arguments as
where <a> is the sound card number and <b> is the sub-device number (starting at 0). Audio inputs have a number of channels: 1 for mono, 2 for stereo, referred to by the -c argument.
To test, the following command records input audio from the mono input device hw:0:2 and saves it in the file test.wav. It also displays a text 'vu meter'.
$ arecord -f cd \ --device hw:2,0 -c 1 \ --vumeter=stereo \ test.wav
- -f cd
- specifies the sampling format (cd is a shorthand for 16-bit little-endian, 44,100Hz sampling rate, stereo)
- --device hw:2,0 -c 1
- specifies to use sub-device 0 of sound input device 2 and one channel
- displays a text VU meter on the console to give visible feedback of the input audio signal level
Encoding and Streaming to Server
Input signals from the audio source must be processed and encoded for transmission to the server, in a format that the server can handle. From the discussion so far, we chose the MP3 coding and the selection of the Icecast server means the output needs to be a source stream in the icecast protocol format.
A command line program that can serve this purpose is ffmpeg. (Another potentially is gstreamer, but I have not tried this yet.) ffmpeg is a sort of "Swiss army knife" kind of multitool for audio and video manipulation and is extremely flexible, but a downside is the complexity of the command line.
On a Linux system, a suitable basic command might be as follows:
$ ffmpeg -loglevel debug \ -ac 1 \ -f alsa -i hw:2,0 \ -acodec libmp3lame -b:a 32k -ac 1 \ -f mp3 \ -content_type audio/mpeg \ icecast://source:<password>@<hostname>:<port>/<stream>
- -ac 1
- specifies a mono source (one channel)
- -f alsa -i hw:0,2'
- specifies to use the ALSA as the input source, audio input device 0, subdevice 2
- -acodec libmp3lame -b:a 32k -ac 1
- specifies to use the LAME (Lame Ain't an MP3 Encoder) MP3 encoder wrapper as the CODEC and sets the codec audio bitrate to 32k (default is 128k), and again specifies a single input channel
- -f mp3
- specifies the output format as MP3
- -content_type audio/mpeg
- sets the Content-Type header in the HTTP/Icecast output to indicate the content type of the stream.
- specifies the URL of the Icecast server input, with authentication information and mountpoint.
Other options can be used to specify stream information (are documented in the ffmpeg-protocols man page ) including ice_genre (set stream genre), ice_name (set stream name), ice_description, ice_url, and ice_public. The documentation incorrectly states that content_type must be set if is different from audio_mpeg (implying audio/mpeg is the default), but with the version I used it must always be set.
For test purposes (though not production use), the vlc media player program can be used to stream an input file to an Icecast server.
Icecast provides a web interface for users to be able to browse and access streams, but this requires training for novice users as well as exposing the administrative interface to the public internet.
Alternatively, the link to the audio stream can be embedded in a web page. With HTML5, the following snippet in the BODY section of the HTML document will embed a player:
<AUDIO CONTROLS> <SOURCE src="http://<hostname>:<port>/<stream>" /> Sorry, audio playback not is supported by this browser. <AUDIO>
Building services such as this involves "gluing together" several different components. The difficulties involve understanding each component and getting it to work in isolation, then bringing the components together to work as a single system. Further work is required to make a system for "production" use secure, reliable and easy to use.
I used a few tutorials I found on the web. While these are useful to "get something going", they gloss over a lot of details that need to be researched.
Problems I found included:
- misleading ffmpeg documentation that implied that the -content-type option was not required for audio MP3 streams (it is)
- Icecast documentation not adequately explaining the architecture of the program.
- Own firewall blocking HTTP over port 8000 (d'oh)
Apart from that it was remarkably straightforward, and being able to test with vlc as an input client using a dummy audio source made things even simpler. For a production site, securing the Icecast server, making it easy for an administrative user with minimal training to set up and monitor for any problems without requiring expert intervention, and interfacing the audio stream source to the local sound system will be the biggest tasks. user interface.
Musings on Security
It is configure the Icecast server to listen on several ports with multiple <listen-socket> sections in the configuration file. Unfortunately these seems to be no way of segregating the administration web server interface from other services (audio sources, audio clients); i.e. having it listen on a separate port.
A separate interface (<listen-socket>) should be configured for the administrative web interface, and it should be configured to use Transport Layer Security (TLS). (This is done by setting <ssl>1<ssl> under the appropriate <listen_port> section and setting <ssl-certificate> in the <paths> section.) Unfortunately, there seems to be no way of configuring TLS client certificates for authentication. The URL for the administrative interface has a subdirectory 'admin', so it may be possible to configure the administrative interface to bind to the localhost IP address (127.0.0.1) and use a reverse proxy to pass requests to that subdirectory on the appropriate port and block it for other ports.
Audio source authentication tokens are transmitted in the clear over HTTP connections. Although encryption of the transmitted input stream is meaningless for a publically-broadcast output stream, it ought to be possible to use TLS for authentication of the client (and the server) without confidentiality (encryption) of the transmitted data using a NULL ciphersuite. However, there appears to be no way to specify the NULL ciphersuite.
Unfortunately, I can think of no easy way to block attempts to connect to source over the audio client interface, except maybe via a more sophisticated reverse proxy that blocks HTTPS PUT and the non-standard STREAM method used by earlier versions of the Icecast protocol.
There should be rate limiting on connections for production use, to prevent attempts to brute-force guess authentication credentials.
|||RFC 8216, HTTP Live Streaming, August 2017: https://tools.ietf.org/html/rfc8216|
|||Icecast protocol specification: https://gist.github.com/ePirat/adc3b8ba00d85b7e3870|
|||Linux Audio and Streaming: https://www.srevilak.net/wiki/Linux_Audio_and_Streaming|
|||How to set up audio streaming (internet radio) in Linux: https://blog.michael.franzl.name/2013/11/25/audio-streaming|
|||Icecast Server/known https restrictions: https://wiki.xiph.org/Icecast_Server/known_https_restrictions|
|||FFmpeg Protocols Documentation - Icecast: https://ffmpeg.org/ffmpeg-protocols.html#icecast|