-
Notifications
You must be signed in to change notification settings - Fork 0
Home
The GPFS, VFS, XFS, and LUSTRE FSALs all use POSIX file systems that are mounted on the NFS Ganesha host. When processing exports for these FSALs, NFS Ganesha will read the mount table using the getmntent system call to discover file systems and then prepare them for NFS export.
NFS file systems all must be identified over the protocol by an fsid attribute for each object. For NFSv3 this is a single 64 bit number, for NFSv4 this is two 64 bit numbers. NFS Ganesha also inserts an fsid into the file handles so it can distinguish multiple file systems within a single export. NFS Ganesha has several ways to derive an fsid, and these ways may not work in all environments. Ultimately, the fsid for each file system that is to be exported must be unique. Also, NFS Ganesha must be able to discover all the file systems of interest.
The first hurdle is fitting what might be a 128 bit value (or 2 64 bit values) into the 64 bit NFSv3 fsid. The NFSv4 fsid is split into a major and minor part. NFS Ganesha provides various ways to indicate how an fsid is fit into major and minor. To fit into the single 64 bit NFSv3 fsid, these major and minor values are folded by taking major and xoring it with minor rotated by 32 bits. This has the effect that if major and minor are actually only 32 bits each of not losing anything (since the 32 bit minor will be rotated into the high 32 bits).
If the NFS Ganesha host has blkid available, NFS Ganesha will prefer to use that to get the 128 bit UUID for a file system. By default this can then be split into two 64 bit values (major and minor) for NFSv4 (and then folded as above for NFSv3).
By default, if blkid is not available or fails, NFS Ganesha derives fsid from the f_fsid field returned by the statfs system call. By convention, f_fsid is an array of 2 integers and is divided by NFS Ganesha into major and minor.
The final method is only available by configuration. If the NFS_CORE_PARAM fsid_device is set to true (it defaults to false), blkid and statfs are not used, instead, NFS Ganesha will use st_dev from a stat system call and divide it into major and minor. This option may be required in some circumstances in order to get all the file systems to register properly.
NFSv3 has a 64 byte handle limit, and the 16 bytes of a 128 bit fsid can strain that for some local filesystems. Because of this, FSAL_VFS has an export option to change the format of the fsid. These options are set in the EXPORT { FSAL { fsid_type } } config option within a VFS export. They can be used for XFS and LUSTRE exports also. In most cases, changing the fsid type also changes the fsid attribute.
- None - no fsid is encoded in the handle, the export will only be capable of exporting a single file system, the fsid reported as an attribute is unchanged.
- One64 - the fsid is reduced to a single 64 bit number using the folding process described above as major with minor being 0 (thus further folding will have no effect).
- Major64 - the fsid attribute is maintained as 64 bit major and minor, however, only major is included in the handle and also the internal indexing of the filesystem. If multiple file systems have the same major, there will be problems.
- Two64 - the default 64 bit major and minor.
- Device - Use st_dev instead of the derived fsid. This allows use of st_dev on an export by export basis, however, it does not overcome the issue where file systems are ignored because f_fsid is identical.
- Two32 - reduce the fsid to 32 bit major and minor. Each half of the 64 bit major is xored to make a 32 bit major, and the same for minor. This basically offers a different folding algorithm which conceivable would work better with some forms of 128 bit fsids.
There is an EXPORT config option, FileSystem_ID. This really should not be used, all it does it designate an fsid to be used with the attributes of all objects in the export. It will be folded to fit into NFSv3. Because it applies to the entire export, it prevents exporting multiple file systems since there will likely be issues with collision of inode numbers on the client.
NFS Ganesha as mentioned uses getmntent system calls to discover file systems. It will skip over a variety of file systems not expected to be exported (nfs, autofs, sysfs, proc, devtmpfs, securityfs, cgroup, selinuxfs, debugfs, hugetlbfs, mqueue, pstore, devpts, configfs, binfmt_misc, rpc_pipefs, vboxsf). It will also skip file systems that seem to be the same (sharing the same st_dev, sharing the same f_fsid, or sharing the same uuid).
NFS Ganesha supports many to many relationships between exports and file systems. It also supports situations where an export exports one file system that has another file system mounted below. It supports situations where an export contains another export within it's tree. There are some limits on all of this.
- A file system may only be exported by one FSAL, though there may be many exports by that FSAL of a single file system
- GPFS exports do not currently support crossing over into an export from a different FSAL (this is unlikely to be an issue in any real setup)
- A single client is not allowed to acquire byte range locks on a single file via more than one export (this can occur if two separate directories are exported separately, and a file has hard links in both directory trees).
With NFS Ganesha V4.0+ changing the exports such that a file system or portion of a file system is now exported by a different export is supported. Clients that have handles to files with the old export will see stale file handle errors, however, a new directory traversal with the new structure will observe the newly visible files.
NFS Ganesha detects file system boundaries during NFS operations when st_dev changes on a lookup from parent to child.
NFS Ganesha V4.0+ has support for BTRFS Subvolumes. On discovering a BTRFS file system, NFS Ganesha will enumerate the subvolumes and treat each subvolume as a separate filesystem. This may not be ideal for setups with a large number of subvolumes since the client will create a mount for each subvolume however it does prevent the client from tripping over duplicate inode numbers and allows FSAL_VFS to cross into the subvolume since each subvolume has a unique device ID (which FSAL_VFS uses to detect file system boundaries).