Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CRITEO] Add a property to force canonicalization of hostname with WebHdfsFileSystem #75

Open
wants to merge 2 commits into
base: branch-3.3.0
Choose a base branch
from

Conversation

Willymontaz
Copy link

@Willymontaz Willymontaz commented Sep 11, 2024

WebHdfsFileSystem does not enforce SPNEGO when using connectionFactory because the jdk automatically performs SPNEGO when a response is received with 401 + header 'WWW-Authenticate: Negotiate'.

This part actually works fine, WebHdfsFileSystem gets a delegation token with SPNEGO and continues with this token.

However, if we expect hostname canonicalization, the jdk has some restrictions and forces the canonical hostname to be a longer format of the hostname, otherwise it is ignored. This behavior can be found in class sun.security.krb5.PrincipalName, in the constructor:

                    // RFC4120 does not recommend canonicalizing a hostname.
                    // However, for compatibility reason, we will try
                    // canonicalize it and see if the output looks better.

                    String canonicalized = (InetAddress.getByName(hostName)).
                            getCanonicalHostName();

                    // Looks if canonicalized is a longer format of hostName,
                    // we accept cases like
                    //     bunny -> bunny.rabbit.hole
                    if (canonicalized.toLowerCase(Locale.ENGLISH).startsWith(
                                hostName.toLowerCase(Locale.ENGLISH)+".")) {
                        hostName = canonicalized;
                    }

This means that when reaching namenodes via consul for instance (ex. hadoop-hdfs-namenode-active-root.query.consul.preprod.crto.in) the canonicalization is purely ignored by the jdk because the canonicalized hostname is something like {something}.{dc}.hpc.criteo.(pre)prod

This commit allows the possibility to canonicalize namenode addresses in WebHdfsFileSystem to overcome this issue. This behavior is activated by the property dfs.webhdfs.host.canonicalize.enabled (default: false)

w.montaz added 2 commits September 11, 2024 14:44
…bHdfsFileSystem

WebHdfsFileSystem does not enforce SPNEGO when using connectionFactory because the jdk automatically performs SPNEGO when a response is received with 401 + header 'WWW-Authenticate: Negotiate'.

This part actually works fine, WebHdfsFileSystem gets a delegation token with SPNEGO and continues with this token.

However, if we expect hostname canonicalization, the jdk has some restrictions and forces the canonical hostname to be a longer format of the hostname, otherwise it is ignored. This behavior can be found in class sun.security.krb5.PrincipalName, in the constructor:

                    // RFC4120 does not recommend canonicalizing a hostname.
                    // However, for compatibility reason, we will try
                    // canonicalize it and see if the output looks better.

                    String canonicalized = (InetAddress.getByName(hostName)).
                            getCanonicalHostName();

                    // Looks if canonicalized is a longer format of hostName,
                    // we accept cases like
                    //     bunny -> bunny.rabbit.hole
                    if (canonicalized.toLowerCase(Locale.ENGLISH).startsWith(
                                hostName.toLowerCase(Locale.ENGLISH)+".")) {
                        hostName = canonicalized;
                    }

This means that when reaching namenodes via consul for instance (ex. hadoop-hdfs-namenode-active-root.query.consul.preprod.crto.in) the canonicalization is purely ignored by the jdk because the canonicalized hostname is something like {something}.{dc}.hpc.criteo.(pre)prod

This commit allows the possibility to canonicalize namenode addresses in WebHdfsFileSystem to overcome this issue. This behavior is activated by the property `dfs.webhdfs.host.canonicalize.enabled` (default: false)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants