gstreamer

Aioli Audiostreamer: Moving the Sound

AI generated picture of an amplifier with raspberries

People need projects to consume their free time. I’ve lately felt that I want to actually try finishing a project (instead of just starting them), and that the project should be somehow related to audio, and it would be nice if it would have a real-world use, and it would use the old Raspberry Pis that have lying around. Plenty of requirements then. I think this is still better a better-formulated train wreck than an average customer project.

After considering few different options, I ended up attempting to create a multi-speaker streaming system named Aioli (so yeah, I started another project). This text is closer to a devlog than tutorial, but there will be open source code repositories in case you want to see how it’s done. Enough with the blabber, let’s move on.

Overview Of The Project

Basically, in this project I want to have one audio source, and the audio from that single source would get wirelessly transmitted to multiple speakers. To be more specific, in my case there’s one Raspberry Pi 4 connected to an external audio source, and then there would be other Raspberry Pi 2’s connected to the speakers. The Raspberry Pis in this scenario would handle at least streaming, networking, receiving the audio, and playing the audio. This graph attempts to explain the situation:

To start out the project I decided to focus on the streaming between Raspberry Pis because I didn’t feel masochistic enough to start working with Bluetooth yet. Everything is all fun and games until Bluetooth is added into the mix, and I want to have a bit of fun and games.

Today’s focus is this part to be exact

Obviously, the first thing to do is to create a custom Yocto distro, because every self-respecting hobby project needs its own Linux distribution. Perhaps further down the line this distro can contain some useful configs and other things that actually justify its existence, but for now, it’s just a renamed example Poky distro.

Creating The Network Of Raspberries

To get the Raspberry Pis talking to each other the first step is getting the devices connected to the same LAN. I wanted to use WLAN to not have cables around the house. Using ethernet cables would defeat the whole point of the system anyway as I could just use audio cables then. I considered also an ad hoc network but decided to use WLAN to keep things familiar for now. The Raspberry Pi 4 I own does have an internal Wifi chip, so that was easy to sort out, but the two Raspberry Pi 2’s did not. I had one Wifi dongle that worked out of the box, but another dongle required some extra work. You can read about it from my previous blog post if you’re interested.

After getting the hardware sorted out it was time to get the devices actually connected to WLAN. For that purpose, I added wpa_supplicant to the distro. wpa_supplicant is a program that in layman’s terms “connects the device to wifi” (or so I’ve understood as a layman). A properly configured supplicant that launches during boot should in theory automatically connect the device to WLAN. Surprisingly enough, it usually does. Following simple configuration in /etc/wpa_supplicant.conf added during a build to a Raspberry Pi does the trick:

network={
    ssid="WLANname"
    psk="SecretPassword"
}

This of course means that you have a statically defined network you want to connect to, and the password is stored in plaintext in the device. Both are bad things for different reasons, but they’ll do for now because it’s the simplest solution. This simplicity will be fixed later on in another text. If you have a WLAN network without a password or want to use a calculated key instead of a plaintext password, you can read more about wpa_supplicant from Arch wiki. It’s a good read. Pay attention to the quotation marks in psk-variable, they caused a lot of headache to me.

With quotation marks the value is a plaintext password, without them it’s a calculated key value. Makes “sense”.

After the devices wirelessly connected to the router, I gave them static IP address leases to make the development somewhat easier. I also ran a quick ping test to check that the Raspberry Pis can reach Google and each other before proceeding.

Moving The Audio Bits

Making the audio streaming work was actually fairly simple because there already is an open-source solution, as there usually is. GStreamer is an “open source multimedia framework”, which can mean many things. This is quite fitting because GStreamer does many things, and with the help of its plugins, it can do pretty much anything you can dream of. Assuming your dreams revolve around handling and processing multimedia.

My dream was to find a way to stream audio over IP network. And dreams, they sometimes do come true. Actually, a bit too much so, it was slightly difficult to find the best options for streaming the audio with all encoding options, protocols, and what-not. I’m still not sure I picked the right things. To keep the prototyping fast I worked with command line tools provided by GStreamer (as opposed to using the API, which may be worth looking into in the future).

GStreamer works with pipelines. Pipelines have sources where the media originates from, sinks where the media ends up, and parsers, encoders, and other types of things in between that manipulate input and pass it forward. For example, here’s a simple pipeline that reads an audio file, and then parses, converts, resamples, and outputs it to the appropriate default sink:

gst-launch-1.0 filesrc location=/opt/sample-files/sample1.wav ! wavparse ! audioconvert ! audioresample ! autoaudiosink

This command may result in a sound being output from your speakers. Quite often not. Depends on what your default ALSA output device is, if you’re using PulseAudio, and if it’s the third Tuesday of the month. In the case of Raspberry Pi, the default output device is the HDMI audio, and I’m not using PulseAudio, and it’s not the third Tuesday today, meaning that I actually got some sound out from a television connected to the HDMI port. If you want to get the audio output from the Raspberry Pi’s headphone jack, you can be a bit more specific about the sink:

gst-launch-1.0 filesrc location=/opt/sample-files/sample1.wav ! wavparse ! audioconvert ! audioresample ! alsasink device=hw:1
# use "aplay -l" command to list the available ALSA devices

To get the audio sent over the network we can use the RTP protocol that’s meant for delivering audio and video. Basic GStreamer functionality can be easily extended with plugins, and as it turns out, there exists a plugin for RTP. It’s weird how these things work out nicely. Almost like someone has had the same ideas before me. Now we can package the audio to 16-bit RTP payloads, and instead of using an alsasink we can use a udpsink (from another plugin) to output the stream to a target in a network instead of an audio device.

gst-launch-1.0 filesrc location=/opt/sample-files/sample1.wav ! wavparse ! audioconvert ! audioresample ! rtpL16pay ! udpsink host=192.168.1.182 port=5001

Then, the intended receiver of the stream can use udpsrc instead of filersrc to read the stream, decode, and deliver the contents to its own audio sink. Simple as.

gst-launch-1.0 udpsrc port=5001 ! 'application/x-rtp,media=audio,payload=96,clock-rate=44100,encoding-name=L16,channels=2' ! rtpL16depay ! audioconvert ! autoaudiosink

To get the audio sent to multiple devices, a multiudpsink can be used on the sending side. The receiving end still uses the same command:

gst-launch-1.0 filesrc location=/opt/sample-files/sample1.wav ! wavparse ! audioconvert ! audioresample ! rtpL16pay ! multiudpsink clients=192.168.1.182:5001,192.168.1.183:5001

In theory, we could use multicast streaming instead of multiple streams but for some reason I couldn’t get it working. Most likely it had something to with the third Tuesday of the month. I couldn’t even complete a simple multicast test on my network of Raspberry Pis, so I guess something is wrong with my setup. For the sake of completeness, AFAIK these commands (should (in theory)) work, but don’t. I’ll look into this later on because multicasting seems like a more sensible approach to this problem:

# Controller command
gst-launch-1.0 filesrc location=/opt/sample-files/sample1.wav ! wavparse ! audioconvert ! audioresample ! rtpL16pay ! udpsink host=224.1.1.1 auto-multicast=true port=3000

# Speaker command
gst-launch-1.0 udpsrc multicast-group=224.1.1.1 auto-multicast=true port=3000 ! 'application/x-rtp,media=audio,payload=96,clock-rate=44100,encoding-name=L16,channels=2' ! rtpL16depay ! audioconvert ! autoaudiosink
Considering the amount of multicast memes floating around the internet, I’m not the only one having issues with it.

By using these commands we can send the audio over network from the controller device to the speaker devices. However, this is still a bit cumbersome, because we need to manually run the gst-launch-1.0 commands, figure out the intended receivers & their IP addresses, and so on. Later on I plan to introduce a manager process that’s dynamically able to find the clients in LAN and control the streaming, but that’s a topic for another text.

There’s a recipe for GStreamer and its plugins in Yocto, so to get these things installed into the new custom distro is just a matter of adding a few packages. It’s almost simpler than using a package manager. At least if you’ve spent the last five years learning the ins and outs of Yocto, and don’t need to install them during runtime. Something like this should do the trick:

IMAGE_INSTALL:append = " \
    gstreamer1.0 \
    gstreamer1.0-meta-base \
    gstreamer1.0-meta-audio \
    gstreamer1.0-plugins-good-udp \
    gstreamer1.0-plugins-good-rtp \
"

Plugins are sorted to good, bad, and ugly (I guess it’s no big surprise that bluez plugin is “bad”). To figure out which group plugin belongs to you can check the documentation. The documentation is quite good by the way, I recommend reading it. For example, udp plugin page contains information about the pipeline elements it provides, and also mention which group it belongs to.

That mostly covers all for this text. We’re now able to send sound over the network from one device to another. Next time we’ll stop this goofing off, and get painfully serious by adding Bluetooth to the system, and instead of using sample audio files we’ll actually stream something from a phone.

You can find the top-level repo-tool manifest repository from here. Please note that the progress of the project is a bit further than what’s presented in this blog text, and the progress is also “a bit” all over the place, so the manifest repository and the subrepositories contain plenty of spoilers and confusion.

One more question remains: why the name Aioli? Well, it kinda sounds like audio combined with I/O, and I like garlic flavoured condiments. That’s as good reason as any.