Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CRITEO] Add a property to force canonicalization of hostname with WebHdfsFileSystem #75

Open
wants to merge 2 commits into
base: branch-3.3.0
Choose a base branch
from

Commits on Sep 11, 2024

  1. [CRITEO] Add a property to force canonicalization of hostname with We…

    …bHdfsFileSystem
    
    WebHdfsFileSystem does not enforce SPNEGO when using connectionFactory because the jdk automatically performs SPNEGO when a response is received with 401 + header 'WWW-Authenticate: Negotiate'.
    
    This part actually works fine, WebHdfsFileSystem gets a delegation token with SPNEGO and continues with this token.
    
    However, if we expect hostname canonicalization, the jdk has some restrictions and forces the canonical hostname to be a longer format of the hostname, otherwise it is ignored. This behavior can be found in class sun.security.krb5.PrincipalName, in the constructor:
    
                        // RFC4120 does not recommend canonicalizing a hostname.
                        // However, for compatibility reason, we will try
                        // canonicalize it and see if the output looks better.
    
                        String canonicalized = (InetAddress.getByName(hostName)).
                                getCanonicalHostName();
    
                        // Looks if canonicalized is a longer format of hostName,
                        // we accept cases like
                        //     bunny -> bunny.rabbit.hole
                        if (canonicalized.toLowerCase(Locale.ENGLISH).startsWith(
                                    hostName.toLowerCase(Locale.ENGLISH)+".")) {
                            hostName = canonicalized;
                        }
    
    This means that when reaching namenodes via consul for instance (ex. hadoop-hdfs-namenode-active-root.query.consul.preprod.crto.in) the canonicalization is purely ignored by the jdk because the canonicalized hostname is something like {something}.{dc}.hpc.criteo.(pre)prod
    
    This commit allows the possibility to canonicalize namenode addresses in WebHdfsFileSystem to overcome this issue. This behavior is activated by the property `dfs.webhdfs.host.canonicalize.enabled` (default: false)
    w.montaz committed Sep 11, 2024
    Configuration menu
    Copy the full SHA
    a1817a9 View commit details
    Browse the repository at this point in the history
  2. Change constant names

    w.montaz committed Sep 11, 2024
    Configuration menu
    Copy the full SHA
    5cbf219 View commit details
    Browse the repository at this point in the history