Prototyping an NFS connection to LDAP using SSSD

Blaine Gardner
Rook Blog
Published in
7 min readAug 2, 2022

--

Intro

In the Rook project, we’ve been making it a priority to add more support for Ceph’s NFS capabilities. At this stage, we have been following the 80/20-rule and looking for the feature additions that provide the most benefit to users.

User ID management and user access control are some of the most fundamental parts of any real-world NFS installation. ID management lays a foundation for access control, so it’s reasonable to start there. One of the most common ID management services plugged into NFS servers is LDAP (Lightweight Directory Access Protocol), so we set that as a target.

We created a prototype design for connecting our NFS-Ganesha server to an LDAP environment that provides user ID information.

This is our first time writing a deeply technical article about a prototype feature for Rook. In many cases, we would create a design document to be reviewed, but in this case we weren’t even sure whether the idea was feasible. We wanted to prototype first. The exercise was quite a bit of a challenge, and we think there is a lot of benefit to sharing our experience, especially as it relates to the System Security Services Daemon (SSSD). Even though we went about this exercise in the Kubernetes context, the context and details should apply to any containerized environment.

Let’s dig in

Finding inspiration

Another project that uses NFS-Ganesha is Gluster, so we started by looking at Gluster setups as a reference point. Ganesha uses the System Security Services Daemon to connect to an LDAP service. SSSD can also connect to Active Directory and FreeIPA for ID management as well, and it can additionally use Kerberos for authentication. SSSD provides a lot of flexibility, and it could be a great option for providing a lot of benefit to Rook’s Ceph NFS offering all at once.

Investigating process flows

We looked at a sample NFS-Ganesha setup with Gluster to understand the process interactions that happen when the server needs to get ID information about connected users.

  1. NFS-Ganesha uses Linux NFS ID Mapper functions to request user information
  2. NFS ID Mapper uses nfsidmapd to begin the lookup
  3. To pass the query onto SSSD, nfsidmapd uses Linux’s Name Switch Service (NSS)
  4. Finally, SSSD queries the LDAP database for the user info

We’ll refer back to these steps in later parts of this article.

Making a general plan

Before SSSD queries the LDAP service (4), the entire process flow uses either kernel calls (1–2) or sockets (3) for communication. At any point in steps 1–3, we could try to mount the right host files into our containers to use what is running on the host system. However, when running in containers (especially in a Kubernetes environment) we can’t rely on the components we need being available or configured.

To be as flexible as possible, we decided our solution should be self-contained.

Refining the plan

Steps 1–2 rely on kernel libraries, which we have available in the Ceph container already. For step 3, passing on the query to SSSD, the NFS server Pod needs access to a running SSS Daemon instance. Because the query is passed via socket, the option of using HTTP calls to a centralized daemon is a no-go. SSSD has to run alongside our NFS-Ganesha server.

In Rook, our NFS-Ganesha pod already uses the sidecar pattern to run DBus (required by Ganesha) alongside the Ganesha server daemon with a shared socket mount for communication. The relevant diagram of our current Kubernetes Pod looks like this:

Current Ceph NFS pod architecture as of Rook v1.9

The requirements behind this design are outside of the scope of this article, but for those not as familiar with Kubernetes, we’ll talk about the methods in brief.

  • On the right side: Rook mounts the keyring and Ganesha config file that it created for this NFS-Ganesha server into Pod volumes
  • In the init container: a minimal Ceph config file is put into a volume
  • Arrows between the Containers and Volume mounts show which volumes are mounted into which containers and also give an idea of whether the directories are used for read or read-write communication
  • In the primary Ganesha daemon container: Ganesha is configured by files found in volumes for the Ceph config, Ceph keyring, and Ganesha config
  • The sidecar DBus application: sets up a socket in a volume which is also shared with the primary Ganesha daemon container

Prototyping

In practice, Rook’s current NFS server pod design has worked well, and we used similar methods for prototyping SSSD. After setting up a simple LDAP server for testing, we started working.

The first challenge was that SSSD wasn’t available in the Ceph container image. We also couldn’t find any recently-maintained SSSD images on Docker Hub or Quay. For the prototype, we created our own simple SSSD image to use, creating yet another unmaintained SSSD image in the ecosystem.

The critical element for getting our prototype working was determining the right sockets to mount between the NFS-Ganesha container and the SSSD sidecar to allow inter-communication. The information wasn’t easy to come by, but an SSSD developer helped us nail down /var/lib/sss/pipes/ as the directory where the sockets live. Many thanks to Alexey Tikhonov for supporting us.

A curious detail we found was that although SSSD uses DBus, SSSD’s NSS module was crashing when we allowed it access to the Pod’s shared DBus socket. Alexey clarified for us that SSSD only uses DBus for internal communications unless the DBus module is needed. Since nfsidmap only uses NSS for the lookup, we removed the shared DBus socket, and everything started working smoothly.

The penultimate hurdle we had to cross was yet again that SSSD wasn’t available in the Ceph container. Critically, the NFS-Ganesha container needs to have an SSSD client. For the prototype, it was simple enough to manually install the SSSD client into the container. Quick and dirty? Yes, but effective enough for proving out the design.

yum install -y sssd-client

The final challenge was resolving ID lookups that were taking way too long. Testing ID lookups from the SSSD container took well under a second, but lookups from the Ganesha container took many seconds. In the end, configuring NSS (via nsswitch.conf) to only search SSSD for IDs and to ignore local files for lookup resolved this.

With all that out of the way, the diagram for the final working prototype looks like this:

Ceph NFS pod architecture highlighting prototype components
  • sssd.conf ConfigMap: LDAP servers can be configured in myriad ways, and the SSSD configuration has to be tailored to the LDAP environment, so we planned our prototype to allow this configuration to be flexibly provided by users
  • nsswitch.conf ConfigMap: the configuration here eliminates the extra latency for ID lookups
  • The init container: this copies the initial content of the SSSD socket directory into the volume used for sharing sockets so SSSD doesn’t start up with expected items missing from the directory
  • Again arrows between the Containers and Volume mounts show which volumes are mounted into which containers and also give an idea of whether the directories are used for read or read-write communication

The whole Pod looks like this:

Full Ceph NFS pod design including current and prototype components

It’s a lot, but you’ve seen all of this before.

Analyzing what we made

How does what we made stack up? What is still left dangling?

For a final design, the NSS config file likely doesn’t need the flexibility of a ConfigMap. At least with our current plans, files located in the Ganesha daemon pod shouldn’t have user information. This file can be generated by an init container easily.

The problem of what container image to use is a big one. The Ceph container needs to at least have the SSSD-Client added (~500kB). We also need to have an image with full SSSD requirements installed. We could either build an SSSD stand-alone image (annoying) or install the SSSD requirements into the Ceph container. Our napkin calculations suggest the SSSD requirements add 90–150 MB onto a base image, and the Ceph container is already 1.3 GIGAbytes.

We aren’t sure what SSSD’s resource usage can be expected to be. It’s also not clear from the documentation to us how to control caching by size. For a simple prototype, it is minimal, but for an LDAP environment with a very large number of LDAP entries or many client connections, SSSD may especially consume a large amount of system RAM.

The biggest thing to chew on when deciding whether to use this prototype’s design for our eventual implementation in Rook. Linux’s NFS ID mapper already has configuration options for connecting directly to LDAP servers, and NFS-Ganesha has support for Kerberos natively. We must establish a clear enough benefit to using SSSD in order to justify the extra resource usage.

Conclusion

Obviously, this exercise is only just a prototype. We have proven that it’s possible to use SSSD in a sidecar to connect NFS-Ganesha to an ID management service, but we still have to decide whether the prototype design is the right one. There are still several unknowns and snags, and there may be other designs we want to prototype to make good comparisons between other options.

For now, we hope this insight into some of the ongoing work in Rook was valuable. As always, we welcome contributions and feedback from the community. You can find documentation on our webpage, check out the code on GitHub, or engage in discussion on Slack.

And again, many thanks to Alexey who helped us prove out the possibility of this design.

Appendix

This prototype was done with Rook v1.9.7 and Ceph v17.2.1.

If you want to try out the prototype for yourself or see the prototype details more closely, you can check out the files used in the gists below.

--

--