-
Notifications
You must be signed in to change notification settings - Fork 49
Ganesha and NFSv4.0 4.1
|
- all required operations
- state-bearing operations (but they may not be correct)
- in particular, sequence id management Philippe and IBM seem to be fixing this
- rpcsec_gss security -- supported, IBM has been testing this
- unicode (some prescriptions in NFSv4 now deprecated for 4.1 and in 3530bis)
The clientid is formed by hashing the client.id counter to 3530's statement that the server "must take care to ensure that these values are extremely unlikely to ever be regenerated." Collisions are detected by comparing the recorded and generated clientids rather than the provided and recorded client.id arguments, making spurious errors and state disposals likely.
Mutating clientid records rather than generating new uncomfirmed records on SETCLIENTID and replacing old, confirmed records on SETCLIENTID_CONFIRM removes much of the robustness in the mechanism and violates the implementation guidelines in the spec.
State is not actually released when required.
Callbacks are optional in NFSv4.0. A client and server must support callbacks and establish and maintain a callback path if clients are to be allowed to use specific optional protocol features, in particular, delegations.
In NFSv4.0, information is provided with the SETCLIENTID/SETCLIENTID_CONFIRM operations, which effect a callback registration (address, program, port, callback_ident, and GSS-API security flavor). The server initiates callback connections using the supplied information (including GSS-API secure connection establishment). An infrastructure must exist to track and test client-provided callback information, identify path-down condition, and alert clients of path-down. By definition, an NFSv4.0 callback transport is independent of the transport(s) the client is using to send NFSv4.0 protocol requests.
- Ganesha has placeholder support for NFSv4.0 callbacks. The server does not acknowledge callback registration, and does not establish or test callback connections of any security flavor.
- Ganesha has partial but throw-away support for client callback operations and compounds, further elaboration of these interfaces is required for NFSv4.0 and NFSv4.1 callback support.
- Ganesha has no asynchronous client/lease/path garbage collection mechanism, an efficient mechanism of this type is needed for callback path-down checks (and probably also for other async state checks).
- NFSv4.1 open and lock state exist, but may not be correct for all cases.
- session, clientid and sequence management exist but not complete in all respects, especially clientid and sequence rules
- BIND_CONN_TO_SESSION (for callbacks [below], but also reconnection, as used by the Windows client)
- rpcsec_gss security -- supported
- but not SSV (there may be no working implementation?)
It is our understanding that Jim Wahlig of IBM intends to implement this functionality.
- Ganesha currently falls short of spec in a number of ways
- RECLAIM_COMPLETE implementation is a no-op, and is not optional.
- There is no implementation of a reclaim grace period for NFSv4 within the current Ganesha code (there is for NLM), which is required.
- The reclaim-type version of OPEN and LOCK are not implemented, which is required.
- Minimal Compliance
- Implement the reclaim grace period.
- Recognize when the grace period needs to begin; recognize when in grace period and when grace period has expired
- At restart
- When a filesystem moves to a new server
- Recognize when the grace period needs to begin; recognize when in grace period and when grace period has expired
- Implement the following "reclaim-type" operations:
- OPEN with claim = CLAIM_PREVIOUS (see 9.11)
- LOCK with reclaim = true (see 9.11)
- Implement RECLAIM_COMPLETE operation (section 18.51. Operation 58: RECLAIM_COMPLETE - Indicates Reclaims Finished)
- When within the grace period
- Process reclaims according to section 8.4.2. Server Failure and Recovery and section 11.7.7 Lock State and File System Transitions
- Record which clients have called RECLAIM_COMPLETE
- Clients that have not yet called RECLAIM_COMPLETE can only call reclaim-type operations and RECLAIM_COMPLETE; other operations result in NFS4ERR_GRACE
- Clients that have called RECLAIM_COMPLETE will receive NFS4ERR_COMPLETE_ALREADY if they call it again subsequently
- Note: from section 8.4.2.1. State Reclaim, "For a server to provide simple, valid handling during the grace period, the easiest method is to simply reject all non-reclaim locking requests and READ and WRITE operations by returning the NFS4ERR_GRACE error."
- Implement the reclaim grace period.
- Optional Compliance
- Implement the following "reclaim-type" operation:
- WANT_DELEGATION with wda_claim = CLAIM_PREVIOUS (see 10.2.1)
- Serialize client lock information to persistant store, which will allow the server to:
- Truncate the grace period when no further reclaims can be made
- Allow some operations to take place during the grace period if there are no possible impending reclaims that will prohibit such operations
- Implement the following "reclaim-type" operation:
- The server should maintain a lease period for each client, during which the client's locks remain valid.
- The lease exists to handle the case of a client that has locks failing, therefore the lease can apply to all sessions of the server on the client.
- Each time the client submits a SEQUENCE operation on the sever, the lease is automatically renewed.
- The client, if it needs to renew the lease, can submit an empty SEQUENCE operation strictly for lease renewal purposes.
- The server can renew the lease upon receipt of the SEQUENCE operation as long as it guarantees the lock does not expire during the operations.
- The server updates the lease upon completion of the SEQUENCE operation to at least the sum of the current time and the lease period.
- The server can release locks once the lease has expired, thereby allowing other clients to claim locks that would otherwise be conflicting.
- The server could use a timer to expire the lease or alternatively simple wait until a conflicting lock request is made.
Where NFSv4.0 serialized on open and lock owners, NFSv4.1 serializes on the stateid. Two open requests for the the same or different files may be fired off in parallel over multiple connections. The same is true of lock owners -- multiple locking calls can be in-flight at the same time for the same lock owner. Currently Ganesha does not support this well since it still serializes everything through the open or lock owners. Individual states are neither locked with a mutex nor reference counted. While this imposes unnecessary and incorrect serialization on opens and locks, it is an especially bad problem for layouts, which are intended to have a stream of LAYOUTGET and LAYOUTRETURN requests on the forechannel and LAYOUTRECALL requests ont he backchannel. To support all three, we should have a mechanism that serializes NFSv4.1 stateids that does not apply to NFSv4.0.
There is currently no check for wraparound in stateid.seqid. The check in 12.5.5.2.1.4 should be implemented at least for LAYOUT states.
- NFSv41 uses a finite-sized reply cache for requests to the server.
- This is implemented as a slot table, where each slot has a unique identifier (1..N), and each slot holds a sequence_id and the cached result of a request.
- Each SEQUENCE operation designates a slot in the table along with a sequence id. The client tries to use the lowest available slot in order to minimize resource requirements on the server for the slot table.
- Using the slot number and sequence id, the server can tell if this is a new request, a resubmission of the previous, already handled, request, or a variety of error conditions and respond appropriately.
- In the case of a resubmission of the previous request, the cached response can be replayed back to the client.
- The sequence id is 32 bits and can roll over.
- The client can, for a given SEQUENCE operation ask that the server not cache the results. The server may still cache the results or elect to adhere to the request.
- Using the slot number and sequence id, the server can tell if this is a new request, a resubmission of the previous, already handled, request, or a variety of error conditions and respond appropriately.
NFSv41 allows for adaptive adjustments to the slot table to optimize resource allocation for it. It does not appear that such adjustments are currently implemented.
Adding support for a persistent reply cache that could be enabled for servers with SSD would improve robustness.
Not all functions update or respect the current stateid. The saved stateid is never updated or restored.
OPEN_DOWNGRADE updates the stateid.seqid but is otherwise a no-op.
Currently Ganesha does nothing, simply returning NFS4_OK whatever the client passes in.
This function returns NFS4_OK on every stateid.
NFSv4.1 requires that the verifier4 supplied with the EXCLUSIVE4_1 open flag be committed to stable storage. It recommends a dedicated location (an extended attribute, for example), but failing that recommends repurposing recommended attributes such as the access and modification time. Currently, Ganesha stores this verifier in memory, making it unable to provide exclusivity guarantee across server reboots.
The RPCSEC_GSS security flavor MUST be implemented (2.2.11). Ganesha has GSS support but integration is not complete.
- RPCSEC_GSS must support Kerberos V (2.2.1.1.1.2, complete)
- RPCSEC_GSS support is required for secure NFSv4.1 backchannel if clients request RPCSEC_GSS (2.10.8.2)
Is in RFC5661 defined as an optional (2.10.8.3) mechanism for strong session protection (clientid, lock and open state) protection.
Current status of Ganesha:
- CREATE_SESSION
- BACKCHANNEL_CTL -- is mandatory for fully compliant GSS callback security
- the Linux client currently has no code to call the operation, but Linux Documentation/filesystems/nfs/nfs41-server.txt calls the op mandatory to implement
- SSV
- Ganesha lacks SSV functionality in
- EXCHANGE_ID
- BIND_CONN_TO_SESSION (which is unimplemented [Linux]
- SECINFO
- SECINFO_NO_NAME (which is unimplemented)
- SET_SSV (present as a but in Ganesha but unimplemented)
- BACKCHANNEL_CTL (which is unimplemented)
- Ganesha lacks SSV functionality in
The operation can return AUTH_NONE and AUTH_UNIX flavors, but not RPCSEC_GSS. Actually enforcing RPCSEC_GSS security on specific objects or altogether is not mandatory, but may be insufficient for the intended use.
The operation is not implemented in Ganesha, file missing. The operation is mandatory (18.45)
The Ganesha server has basic support for session operations as used by the Linux kernel NFSv4.1 client.
Ganesha next currently does not correctly accept a sessionid agreed on in EXCHANGE_ID. Linux Box Ganesha has a fix for this, pushed as part of the pNFS patch.
When the client invokes CREATE_SESSION it passes in attributes to make various requests upon the server:
- General attributes:
- Whether to persist the session reply cache for EOS operations.
- This attribute is currently ignored and assumed to be false.
- Whether to use the existing connection for the back channel as well as the fore channel.
- This attribute is currently ignored and assumed to be true. That is the connection is always used for both the fore and back channels.
- Whether to upgrade the connection to a RDMA (remote direct memory access) connection if it's not already
- This attribute is currently ignored and assumed to be false.
- Whether to persist the session reply cache for EOS operations.
- Attributes for the fore channel and for the back channel
- These attributes are currently accepted without analysis.
- The spec allows for the server to make appropriate adjustments on some of these attributes, which it currently does not do.
Is mandatory to implement, and also used by the Windows (but not the Linux) client. Not implemented in Ganesha, even as a stub. Linux Box has a prototype implementation (see backchannel).
Ganesha's clientid is just a hash of the co_owner string, which violates the prescription that two incarnations of the same client should not have the same clientid (see Grace Period, Recovery, and Reclaim Rules). Collisions are also detected by comparing the clientid produced by this hash, creating the opportunity for spurious collisions.
Currently Ganesha supports no state protection and nfs4_op_exchange_id is hard-coded to return SP4_NONE whatever the client requests. RFC5661 implies that a client may specificy either of SP4_MACH_CRED and SP4_SSV at its option.
DESTROY_CLIENTID is currently unimplemented, returning NFS4ERR_OP_ILLEGAL. It is required functionality.
NFSv4.1 supports both session trunking and clientid trunking, support for both types is mandatory (2.5.10).
- Ganesha appears to support clientid trunking
- clientid trunking is enabled using the CREATE_SESSION operation multiple times with a shared clientid, which logically is supported
- Ganesha does not support session trunking
- session trunking is enabled using the BIND_CONN_TO_SESSION operation, which is unimplemented (a prototype implementation of BIND_CONN_TO_SESSION has been produced by Linux Box, but it is incomplete)
The callback mechanism in NFSv4.1 is better integrated than in NFSv4.0, but remains optional. In NFSv4.1, the backchannel connections are initiated at the client rather than the server (for NAT traversal), using the new CREATE_SESSION and BIND_CONN_TO_SESSION operations. Although RFC 5661 provides for flexible "fore" and "back" channel management within sessions, the support is not well elaborated in the specification, and the Linux client and current Windows client support only a single, shared (bi-directional) channel configuration per-session.
- Ganesha has no explicit backchannel support. The server unconditionally accepts whatever backchannel configuration a client requests in CREATE_SESSION, but does not make use of the backchannel thereafter.
- Underlying RPC library changes are needed to support bi-directional operation
- Linux Box is working on bi-directional support for TI-RPC (a dedicated back-channel "switching" mechanism was previously implemented)
- Ganesha has partial but throw-away support for client callback operations and compounds, further elaboration of these interfaces is required for NFSv4.1 callback support. Linuxbox plans to implement LAYOUTRECALL at least.
- Ganesha has no async client/lease/path garbage collection mechanism, an efficient mechanism of this type is needed for callback path-down checks (and probably also for other async state checks)
- new GSS-API (and possibly SSV) channel identity work will be needed to support callback security, will be completed by linux Box
A generic pNFS implementation and FSAL-based pNFS implementation has been submitted for inclusion in Ganesha.
- NFSv4.1 optionally allows information about client sessions and their associated state to be saved to persistant store, such that if the server restarts, the state can be recovered and operations can be resumed in minimal time.
- The following must minimally be stored if this is implemented (2.10.6.5):
- Session id
- Reply cache slot table
"Servers MAY support or not support retention on any file object type" (5.13).
- There is some evidence that IBM may eventually support RDMA efforts
- Dirent cache and readdir result caching based on AVL trees implemented (finished, merged to next)
- Stability and replacement issues
- Will entail work by several parties
- Design in progress, includes upcall/events layer
- Initial implementation from IBM
- General reorganizing cleanups (in progress)
- support only TI-RPC implementation (2x remove duplicated code)
- remove requirements for Ganesha/TI-RPC layering violations
- direct access to transport array (or even that it is such)
- transport copying with changed parameters
- support plug-out request activation
- support plug-out Duplicate Request Cache
- support plug-out allocator indirection (finished)
- support plug-out log channel (finished)
- Changes required to interoperate with Linux and Windows client backchannel (in progress)
- Channel multiplexing currently done using Unix select
- EPOLL support to generic TI-RPC, this is planned to be merged with Ganesha in tandem with bi-directional changes
A Linux zero-copy I/O strategy will be necessary to achieve i/o performance competitive with kernel mode implementations.
- To support POSIX-like FSALs only, a sendfile-based mechanism may be sufficient
- To support user-client-based (e.g., Ceph) FSALs, as well as exported kernel file systems, a model based on tee() and splice() would be required
- Further unification of open, lock and pnfs state possible
- Efficiency of new state representation should be evaluated and measured (many lists)