“Fun” with SELinux

A few weeks ago, I had some (mis)adventures with SELinux, and after spending almost a whole week debugging weird issues, I felt like I needed to vent a bit. Fortunately, I have this blog that I can use to scream into the void of the internet. Perhaps this can be even considered informative, and some lessons can be learnt.

What Is SELinux

A few words about the star of the show: SELinux. It is a Linux security module that can be used to implement mandatory access controls to a system. Traditionally, Linux systems have discretionary access controls, where the owner of the file sets up the read, write, and execute permissions for the file. However, the root user can override these permissions. In the mandatory access control policy, a centralised policy is set into the kernel by an administrator, and this policy applies to everyone and everything in the system.

In short, SELinux has subjects (users, processes), verbs (e.g. read, write, map) and objects (files, devices, ports). These subjects and objects are then labelled with security contexts, which include user, role, and type. SELinux then checks the centralised policy to see if requested actions are allowed or not. If access is not allowed, it is denied by default. For example, a process with httpd_t type can be allowed to access resources only with httpd_sys_content_t type. So, if the process goes rogue, it cannot access the whole system. Sounds simple in theory, but it is difficult to get right in practice when things like “domain transition” come into play.

I’ll admit, I don’t know that much about SELinux, but I know the two most important things: it is complicated, and it makes life complicated. Definitely more secure, yes, of course, but that security comes with a price of a harrowing headache. With that in mind, let me share how I spent a week of my life debugging two SELinux-related issues.

Problem 1: Login Password Change

The first problem made the regular use of the Linux system quite difficult. I built a Yocto image where the user was forced to change the password during the first login. However, if SELinux was enabled, the password change failed with mystical “Authentication token manipulation error”:

Fail
qemux86-64 login: serviceuser
Password:
You are required to change your password immediately (password expired).
Changing password for serviceuser.
Current password:

You can now choose the new password or passphrase.

A valid password should be a mix of upper and lower case letters, digits, and
other characters.  You can use a password containing at least 14 characters
from at least 3 of these 4 classes.
An upper case letter that begins the password and a digit that ends it do not
count towards the number of character classes used.

A passphrase should be of at least 3 words, 14 to 40 characters long, and
contain enough different characters.

Alternatively, if no one else can see your terminal now, you can pick this as
your password: "Plum3Grave6czech".

Enter new password:
Re-type new password:

Authentication token manipulation error

Sulka 0.2.0 qemux86-64 /dev/ttyS0

WARNING: This is a restricted system. Unauthorized access is strictly prohibited. All activities are monitored and recorded.
qemux86-64 login:

This created an interesting chicken-egg problem, as it was impossible to log in to the system to debug the issue, but without debugging it was not possible to log in to the system. So, the first step of solving the problem was redirecting all the logs to /dev/console, so that I could see what was going wrong. Fortunately I was able to build and flash the firmware image as many times as required. After redirecting the audit logs to the console, a fairly typical access vector cache denial popped up when attempting to change the password:

More fail
Enter new password:
Re-type new password:
[  116.469892] audit: type=1400 audit(1764963364.887:111): avc:  denied  { write } for  pid=446 comm="login" name="etc" dev="vda" ino=87 scontext=system_u:system_r:local_login_t:s0 tcontext=system_u:object_r:etc_t:s0 tclass=dir permissive=0


Authentication token manipulation error

Sulka 0.2.0 qemux86-64 /dev/ttyS0

WARNING: This is a restricted system. Unauthorized access is strictly prohibited. All activities are monitored and recorded.
qemux86-64 login:

Access vector cache is the cache for the SELinux decisions, and the failures are logged to the audit logs so that they can be noticed and fixed if needed. With the help of audit2allow, I modified the SELinux policy to allow this access. I rebuild the firmware image with the fixed policy, then another AVC failure popped up. Then another. Then another.

After fixing like a half a dozen of these one by one, the things unfortunately took a turn for the worse. I was still getting the “Authentication token manipulation error”, but the AVC denials didn’t pop up in the log anymore, so there was still some pesky issue hiding somewhere. Quick grep through the build area revealed that the error originated from PAM, after which I started furiously fprintf‘ing everything from the login process to stderr as it seemed like the most reliable way of delivering information to the console.

Grepping PAM also showed that the “Authentication token manipulation error” was related to pretty much any SELinux-related error. Grepping PAM for SELinux revealed some quite promising spots where the behaviour of the login and password handling changed if SELinux was enabled. Especially the pam_unix module started to look suspicious.

The trail went a bit cold when I figured out that setfscreatecon_raw call in PAM unix passverify failed, but I could not find a definition for that function. Grepping the build area for setfscreatecon yielded no results. Some extensive guesswork later, the failing function turned out to be in SELinux (who would have guessed). There’s some magic going on with how the function gets declared, and in all honesty, I’m not 100% sure how it works, but adding the setfscreate capability to the local login process fixed the issue.

After this, the password change started working. Hooray. You can find a patch that does the required changes to the SELinux refpolicy from my Sulka meta-layer here. Note that this applies only for local logins, for remote logins you’ll of course need to have a different patch with some additional holes in the policy.

Problem 2: Shell Profile Not Getting Read

This was more of a Yocto oddity than a silent permission failure. In short, the problem was that when logging into the shell, the shell profile files were not getting read. Commonly, a login shell is launched with a hyphen-prefixed name, like -sh. In Busybox, the ash shell detects this by checking the zeroth character of the zeroth argument, and if it’s a hyphen, the shell is treated as a login shell, and the profile gets read.

This kind of trick is possible due to the way standard C library execve works. If you call commands from the shell, the zeroth argument is the name of the command you used. However, when using execve, you can execute the file with a different argument as the zeroth argument. So, you cannot call -sh directly from the shell to open a new login shell, but shadow is able to do that during log-in (shadow uses execle which allows defining an environment, but under the hood execle calls execve) (see also how the hyphen prefix is added a few lines earlier if you’re interested).

All good so far. The problem comes from the fact that the Busybox binary, which provides the shell, is installed only once, and all the programs are actually symlinks to that single binary. Busybox then checks the arguments it was called with, and calls the correct applet. This normally works well, but this kind of linking breaks the SELinux labelling. SELinux uses the label of the target of the link for the policy checking, which makes sense. However, the target of the links is the single busybox binary that can have only one label.

To circumvent this, the meta-selinux layer in Yocto creates simple wrapper scripts with correct labelling that the links point to. These minimal scripts just invoke Busybox using a shebang, and all is good. Right? Well, unfortunately, using this wrapper loses the special zeroth argument from shadow execve, meaning that the shell doesn’t get properly detected as a login shell, and the profile doesn’t get read. The wrapper works as it should; the shell gets executed, it’s just not considered a login shell.

To figure this out, I littered shadow, glibc and kernel with debug tracing without any meaningful results. Eventually, I got the great idea of checking where the /bin/sh link actually points to, and that’s when I realised the special trick that the meta-selinux does. Too bad it took two days to figure this out.

For this issue, I don’t really have a working solution yet. One solution could be using a shell that doesn’t require a wrapper to have a correct SELinux label, i.e. any of your regular single-binary shells like bash, dash, zsh, etc. Perhaps ash provided by Busybox could be built as a single binary as well. I think another option could be writing a simple wrapper binary that just performs execve on the Busybox binary path with the given arguments, but I have not verified this. Also I’m not 100% certain if something like this can be secure.

End

I hope you learnt something useful from this light rant. If nothing else, then hopefully at least the fact that SELinux tends to make life complicated is clear now. Thanks for reading. Feel free to ask questions; the explanations here omitted quite a lot of details. However, I think I’ll forget everything I’ve learnt in a week, so ask quickly.

Recommended Reading

Share