aioli

Aioli Audiostreamer: Music To The People

Check out the previous part of the Aioli Audiostreamer saga here. In case you don’t want to check it out, here’s a quick recap: I started a new project in which the goal is to stream audio from one Raspberry Pi to another over an IP network. The last time we got the streaming to work in theory, but the practical part of it was (and is still) missing. This time we’re not going to to address that.

Instead of focusing on getting the streaming working robustly and automatically, I chose to add a Bluetooth connection between the Raspberry Pi controller device and an external audio source. This way the system can stream something else than just the audio files present in the controller device. Here’s the graph with a chunky red line showing what’s the focus for today:

Diagram showing the Aioli Audiostreamer system overview

Unfortunately, this means confronting my old nemesis: BlueZ stack. Or Bluetooth in general. Something about it rubs me the wrong way. I’m not sure if it’s actually that bad. However, every time my headphones fail to connect to my phone I curse the whole protocol to the ninth circle of hell. Which happens every single morning. And it’s still the best choice for this kind of project. But yeah, plenty of that coming up.

Picture of a robot pounding a "no fun allowed" sign to the ground

Bluetooth Connectivity

The first step is making our Raspberry Pi audio server advertise itself as a Bluetooth device wanting to receive audio: headphones, speakers, or anything along those lines. In theory, this sounds like a lot of work, but once again, the open-source community comes to the rescue. This bt-speaker project makes a Raspberry Pi act as a Bluetooth speaker, which is exactly what we want. The phone (or some other audio source) can connect to the bt-speaker daemon running on Raspberry Pi and stream audio to it. bt-speaker then outputs the received audio to the desired audio device.

The program required some tweaking for cross-compilation, and some things weren’t quite as generic as they could have been, causing some QA errors in Yocto. However, it mostly worked quite nicely out of the box. I guess because the bulk of the program is written in Python there are not that many compilation issues to wrestle with. There was also one codec that needed to be compiled, and then there was the issue of figuring out the correct dependencies, but all in all fairly simple stuff. Yocto recipe can be found here.

The Actual Troubles

What actually took a long time was getting the start-up script working. I’m still sticking to SysVinit for simplicity, which means that I ended up using start-stop-daemon to launch the program. However, it turned out Busybox’s implementation of start-stop-daemon was missing the -d/--chdir option. bt-speaker loads a codec from a relative path, meaning that the program fails at start-up because it’s launched from the root directory. Because I’m not much of a Python programmer, I chose to patch the feature into Busybox instead of doing the sensible thing and installing the codec into the correct location and fixing bt-speaker. An open-source contribution to Busybox coming soon I hope.

Well, after that came the second problem: the Bluetooth chip in Raspberry Pi wasn’t stable during the startup. The script starting BlueZ worked well, and the bt-speaker launched successfully as well. However, after a few seconds, the Bluetooth device became undiscoverable. I tried to check all the Bluetooth-related changes that happened during boot in the system: changes in the Bluetooth device information and BlueZ status, reading syslog & dmesg, but no. The Bluetooth chip just reset a few seconds after the BlueZ launched. So I did what any sane person would do: power cycled BT chip as a part of the start-up, added a “reasonable amount” of sleep, and moved on with my life.

The Less Actual Troubles

After getting the thing starting automatically during the boot there’s still a small problem. The problems never end with Bluetooth, don’t they? Well, even better, there are two problems. First, for some reason, my phone says that it has trouble connecting to this Frankensteinian BlueZ device. Streaming music from Spotify works nicely though, and even the volume control behaves as expected. So all in all, this sounds like a very typical Bluetooth device already: nothing works, except that it works, except when it doesn’t. I’m not yet sure if this is actually a problem or not to anyone else except my phone.

Screenshot from a mobile phone showing connection issue with BlueZ 5.66 device
I just wanted to flex my phone and watch with this screenshot. The ironic thing here is that if I “turn device off & back on” as suggested, it’ll be forever unable to connect again unless the BlueZ cache is cleared on Raspberry Pi. That may be the fault of dodgy Bluetooth code and not the protocol itself though.

The bigger issue is that we don’t want the audio to be output from the Raspberry Pi we’re connected to. Instead, we want to stream the audio to the other Raspberry Pis in LAN and have those output the audio. GStreamer has an alsasrc source that can take input from an ALSA device and work with that. However, we have a bit of a mismatch here: GStreamer wants an input device to receive the audio from (e.g. microphone), but bt-speaker generates audio that goes to an output device (e.g. speaker).

Loop Devices to the Rescue!

Loop device is a virtual audio device that redirects audio from a virtual output device to a corresponding virtual input device (or vice versa). This means that we can have a virtual “microphone” that outputs the audio that bt-speaker has been received through Bluetooth. Maybe my explanation just made it worse, the idea is quite simple. This blog post explains the functionality quite well.

To explain more: probing snd-aloop module creates two loopback sound cards, both for input and output (four cards in total). The virtual cards have two devices, and each device has eight subdevices. This results in a lot of devices being created. These devices are special because the output of an output device gets directed to the input of a corresponding input device. For example, if I play music with aplay to card 1, device 0, subdevice 0, the same music can be captured from the input card 1, device 1, subdevice 0. Notice how the device number is flipped. Output to input, and vice versa, as I’ve been repeating myself for two chapters now.

Meme saying "snd-aloop transforms input to an out and vice versa" in French
Google Translate don’t fail me now.

With this method, when bt-speaker uses aplay to output the audio it receives, we can define the output device to be a loopback device. Then, GStreamer can use the corresponding loopback input device to receive the Bluetooth audio and pass it to the LAN. A bit of patching to the bt-speaker, and something like this seems to do the trick:

# Play command that bt-speaker uses when it receives audio

aplay -D hw:2,0,1 -f cd -

# Streaming command to send audio to 192.168.1.182

gst-launch-1.0 alsasrc device=hw:2,1,1 ! audioconvert ! audioresample ! rtpL24pay ! udpsink host=192.168.1.182 port=5001

Is This a Good Idea?

I’m not sure. I think it would be possible for the bt-speaker daemon to launch the GStreamer directly once the Bluetooth connection has been initialized. This approach would skip aplay altogether and wouldn’t require the loop devices. However, keeping the Bluetooth and networking separate should keep the system simpler. Both processes do their own thing without knowledge of each other. bt-speaker can output audio without caring if anyone listens to it. On the other end, GStreamer can stream whatever it happens to receive through the loopback device.

This also allows kicking the bt-speaker and GStreamer individually when they eventually and inevitably start misbehaving. The drawback is that GStreamer streams silence if nothing is received from Bluetooth, but I think that’s an acceptable weakness for now. After all, I’ve paid for a WLAN router to route some bits, so I’m going to route them bits, even if they’re all zeros.

I noticed that there actually is a BlueZ plugin for GStreamer. However, it’s labelled as one of the “bad” plugins, and dabbling with such dark magic sounds like a bad idea. If something is labelled “bad” even in the official documentation it’s usually better to avoid it. It doesn’t necessarily mean that the quality itself is bad as the label may also mean a lack of testing or maintenance, but still.

Movie poster of Bad Boys
TBH I don’t exactly know what WASAPI is, but after a quick Google search, I’m not sure I even want to know.

Closing Words

This text was a bit shorter than anticipated, but some things in life are unexpectedly short. Next time we’ll get rid of the static WLAN configuration. Instead, we’ll create a mechanism for passing the SSID and password during run-time. We’ll be doing that mostly because I already started working on that feature. Now I have DHCP servers running amok in my home LAN causing trouble and slowing down development.

One question still remains: does this system contain any code that I have written? Not really at the moment. But in my experience, that’s the story for most of the embedded Linux projects: find half a dozen somewhat working pieces of software and glue them together with some scripts. Until next time!


This blog text is a part of my Movember 2023 series. If you found this text useful, I’d ask you to do “something good”. That doesn’t necessarily mean shoving your money to charities or volunteering weekends away (although those are good ideas), it can be something as simple as asking a family member or colleague how they’re doing. Or it can mean selling your earthly possessions away and becoming a monk. It’s really up to you.

Aioli Audiostreamer: Moving the Sound

AI generated picture of an amplifier with raspberries

People need projects to consume their free time. I’ve lately felt that I want to actually try finishing a project (instead of just starting them), and that the project should be somehow related to audio, and it would be nice if it would have a real-world use, and it would use the old Raspberry Pis that have lying around. Plenty of requirements then. I think this is still better a better-formulated train wreck than an average customer project.

After considering few different options, I ended up attempting to create a multi-speaker streaming system named Aioli (so yeah, I started another project). This text is closer to a devlog than tutorial, but there will be open source code repositories in case you want to see how it’s done. Enough with the blabber, let’s move on.

Overview Of The Project

Basically, in this project I want to have one audio source, and the audio from that single source would get wirelessly transmitted to multiple speakers. To be more specific, in my case there’s one Raspberry Pi 4 connected to an external audio source, and then there would be other Raspberry Pi 2’s connected to the speakers. The Raspberry Pis in this scenario would handle at least streaming, networking, receiving the audio, and playing the audio. This graph attempts to explain the situation:

To start out the project I decided to focus on the streaming between Raspberry Pis because I didn’t feel masochistic enough to start working with Bluetooth yet. Everything is all fun and games until Bluetooth is added into the mix, and I want to have a bit of fun and games.

Today’s focus is this part to be exact

Obviously, the first thing to do is to create a custom Yocto distro, because every self-respecting hobby project needs its own Linux distribution. Perhaps further down the line this distro can contain some useful configs and other things that actually justify its existence, but for now, it’s just a renamed example Poky distro.

Creating The Network Of Raspberries

To get the Raspberry Pis talking to each other the first step is getting the devices connected to the same LAN. I wanted to use WLAN to not have cables around the house. Using ethernet cables would defeat the whole point of the system anyway as I could just use audio cables then. I considered also an ad hoc network but decided to use WLAN to keep things familiar for now. The Raspberry Pi 4 I own does have an internal Wifi chip, so that was easy to sort out, but the two Raspberry Pi 2’s did not. I had one Wifi dongle that worked out of the box, but another dongle required some extra work. You can read about it from my previous blog post if you’re interested.

After getting the hardware sorted out it was time to get the devices actually connected to WLAN. For that purpose, I added wpa_supplicant to the distro. wpa_supplicant is a program that in layman’s terms “connects the device to wifi” (or so I’ve understood as a layman). A properly configured supplicant that launches during boot should in theory automatically connect the device to WLAN. Surprisingly enough, it usually does. Following simple configuration in /etc/wpa_supplicant.conf added during a build to a Raspberry Pi does the trick:

network={
    ssid="WLANname"
    psk="SecretPassword"
}

This of course means that you have a statically defined network you want to connect to, and the password is stored in plaintext in the device. Both are bad things for different reasons, but they’ll do for now because it’s the simplest solution. This simplicity will be fixed later on in another text. If you have a WLAN network without a password or want to use a calculated key instead of a plaintext password, you can read more about wpa_supplicant from Arch wiki. It’s a good read. Pay attention to the quotation marks in psk-variable, they caused a lot of headache to me.

With quotation marks the value is a plaintext password, without them it’s a calculated key value. Makes “sense”.

After the devices wirelessly connected to the router, I gave them static IP address leases to make the development somewhat easier. I also ran a quick ping test to check that the Raspberry Pis can reach Google and each other before proceeding.

Moving The Audio Bits

Making the audio streaming work was actually fairly simple because there already is an open-source solution, as there usually is. GStreamer is an “open source multimedia framework”, which can mean many things. This is quite fitting because GStreamer does many things, and with the help of its plugins, it can do pretty much anything you can dream of. Assuming your dreams revolve around handling and processing multimedia.

My dream was to find a way to stream audio over IP network. And dreams, they sometimes do come true. Actually, a bit too much so, it was slightly difficult to find the best options for streaming the audio with all encoding options, protocols, and what-not. I’m still not sure I picked the right things. To keep the prototyping fast I worked with command line tools provided by GStreamer (as opposed to using the API, which may be worth looking into in the future).

GStreamer works with pipelines. Pipelines have sources where the media originates from, sinks where the media ends up, and parsers, encoders, and other types of things in between that manipulate input and pass it forward. For example, here’s a simple pipeline that reads an audio file, and then parses, converts, resamples, and outputs it to the appropriate default sink:

gst-launch-1.0 filesrc location=/opt/sample-files/sample1.wav ! wavparse ! audioconvert ! audioresample ! autoaudiosink

This command may result in a sound being output from your speakers. Quite often not. Depends on what your default ALSA output device is, if you’re using PulseAudio, and if it’s the third Tuesday of the month. In the case of Raspberry Pi, the default output device is the HDMI audio, and I’m not using PulseAudio, and it’s not the third Tuesday today, meaning that I actually got some sound out from a television connected to the HDMI port. If you want to get the audio output from the Raspberry Pi’s headphone jack, you can be a bit more specific about the sink:

gst-launch-1.0 filesrc location=/opt/sample-files/sample1.wav ! wavparse ! audioconvert ! audioresample ! alsasink device=hw:1
# use "aplay -l" command to list the available ALSA devices

To get the audio sent over the network we can use the RTP protocol that’s meant for delivering audio and video. Basic GStreamer functionality can be easily extended with plugins, and as it turns out, there exists a plugin for RTP. It’s weird how these things work out nicely. Almost like someone has had the same ideas before me. Now we can package the audio to 16-bit RTP payloads, and instead of using an alsasink we can use a udpsink (from another plugin) to output the stream to a target in a network instead of an audio device.

gst-launch-1.0 filesrc location=/opt/sample-files/sample1.wav ! wavparse ! audioconvert ! audioresample ! rtpL16pay ! udpsink host=192.168.1.182 port=5001

Then, the intended receiver of the stream can use udpsrc instead of filersrc to read the stream, decode, and deliver the contents to its own audio sink. Simple as.

gst-launch-1.0 udpsrc port=5001 ! 'application/x-rtp,media=audio,payload=96,clock-rate=44100,encoding-name=L16,channels=2' ! rtpL16depay ! audioconvert ! autoaudiosink

To get the audio sent to multiple devices, a multiudpsink can be used on the sending side. The receiving end still uses the same command:

gst-launch-1.0 filesrc location=/opt/sample-files/sample1.wav ! wavparse ! audioconvert ! audioresample ! rtpL16pay ! multiudpsink clients=192.168.1.182:5001,192.168.1.183:5001

In theory, we could use multicast streaming instead of multiple streams but for some reason I couldn’t get it working. Most likely it had something to with the third Tuesday of the month. I couldn’t even complete a simple multicast test on my network of Raspberry Pis, so I guess something is wrong with my setup. For the sake of completeness, AFAIK these commands (should (in theory)) work, but don’t. I’ll look into this later on because multicasting seems like a more sensible approach to this problem:

# Controller command
gst-launch-1.0 filesrc location=/opt/sample-files/sample1.wav ! wavparse ! audioconvert ! audioresample ! rtpL16pay ! udpsink host=224.1.1.1 auto-multicast=true port=3000

# Speaker command
gst-launch-1.0 udpsrc multicast-group=224.1.1.1 auto-multicast=true port=3000 ! 'application/x-rtp,media=audio,payload=96,clock-rate=44100,encoding-name=L16,channels=2' ! rtpL16depay ! audioconvert ! autoaudiosink
Considering the amount of multicast memes floating around the internet, I’m not the only one having issues with it.

By using these commands we can send the audio over network from the controller device to the speaker devices. However, this is still a bit cumbersome, because we need to manually run the gst-launch-1.0 commands, figure out the intended receivers & their IP addresses, and so on. Later on I plan to introduce a manager process that’s dynamically able to find the clients in LAN and control the streaming, but that’s a topic for another text.

There’s a recipe for GStreamer and its plugins in Yocto, so to get these things installed into the new custom distro is just a matter of adding a few packages. It’s almost simpler than using a package manager. At least if you’ve spent the last five years learning the ins and outs of Yocto, and don’t need to install them during runtime. Something like this should do the trick:

IMAGE_INSTALL:append = " \
    gstreamer1.0 \
    gstreamer1.0-meta-base \
    gstreamer1.0-meta-audio \
    gstreamer1.0-plugins-good-udp \
    gstreamer1.0-plugins-good-rtp \
"

Plugins are sorted to good, bad, and ugly (I guess it’s no big surprise that bluez plugin is “bad”). To figure out which group plugin belongs to you can check the documentation. The documentation is quite good by the way, I recommend reading it. For example, udp plugin page contains information about the pipeline elements it provides, and also mention which group it belongs to.

That mostly covers all for this text. We’re now able to send sound over the network from one device to another. Next time we’ll stop this goofing off, and get painfully serious by adding Bluetooth to the system, and instead of using sample audio files we’ll actually stream something from a phone.

You can find the top-level repo-tool manifest repository from here. Please note that the progress of the project is a bit further than what’s presented in this blog text, and the progress is also “a bit” all over the place, so the manifest repository and the subrepositories contain plenty of spoilers and confusion.

One more question remains: why the name Aioli? Well, it kinda sounds like audio combined with I/O, and I like garlic flavoured condiments. That’s as good reason as any.