Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade hadoop version and set java/hadoop home in docker image #10446

Closed
wants to merge 1 commit into from

Conversation

JkSelf
Copy link
Collaborator

@JkSelf JkSelf commented Jul 11, 2024

  • Support jvm version libhdfs in velox #9835 requires Java Home to dynamic load the jni APIs.
  • Additionally, the libhdfs3 library provides a hdfsGetLastError API for tracking the error stack. However, the current java based Hadoop version 2.10 does not include this API. Hadoop 3.3 offers hdfsGetLastExceptionRootCause and hdfsGetLastExceptionStackTrace API that allow for tracing exceptions. Therefore, we need to upgrade our Hadoop version to 3.3.
  • Setting the Hadoop CLASSPATH is to ensure that the JVM can find the Hadoop class libraries when dynamically loading the libhdfs.so.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 11, 2024
Copy link

netlify bot commented Jul 11, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit d0bfce5
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/66d7e71cb6ac500008ef7320

@JkSelf
Copy link
Collaborator Author

JkSelf commented Jul 11, 2024

@mbasmanova @majetideepak @assignUser Can you help to review this PR firstly? Thanks.

@JkSelf JkSelf force-pushed the hdfs-ci branch 2 times, most recently from c94b0ac to 7dcf00a Compare August 1, 2024 03:36
@JkSelf
Copy link
Collaborator Author

JkSelf commented Aug 1, 2024

@rui-mo Can you help to review this PR? Thanks.

scripts/adapters.dockerfile Outdated Show resolved Hide resolved
@JkSelf JkSelf changed the title Upgrade hadoop version and install JDK in docker image Upgrade hadoop version and set java/hadoop home in docker image Aug 4, 2024
@JkSelf
Copy link
Collaborator Author

JkSelf commented Aug 4, 2024

@mbasmanova @majetideepak @assignUser @rui-mo The CI is passed. Can you help to review? Thanks.

Copy link
Collaborator

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Looks good to me.

scripts/adapters.dockerfile Outdated Show resolved Hide resolved
@JkSelf
Copy link
Collaborator Author

JkSelf commented Aug 7, 2024

@mbasmanova @majetideepak Do you have any further comment? Thanks.

@majetideepak
Copy link
Collaborator

@JkSelf For CLASSPATH, can't we use a wildcard path instead of listing everything?
ENV CLASSPATH=$LIBS_PATH"/*"

@majetideepak
Copy link
Collaborator

Do the jars that have 2.10.1 eg. hadoop-common-2.10.1.jar correspond to the older Hadoop version? Do we need them?

@JkSelf
Copy link
Collaborator Author

JkSelf commented Aug 8, 2024

@JkSelf For CLASSPATH, can't we use a wildcard path instead of listing everything? ENV CLASSPATH=$LIBS_PATH"/*"

@majetideepak Thanks for your review. I try set the CLASSPATH use a wildcard as following: /usr/local/hadoop/etc/hadoop:/usr/local/hadoop/share/hadoop/common/:/usr/local/hadoop/share/hadoop/common/lib/*.jar:/usr/local/hadoop/share/hadoop/hdfs/:/usr/local/hadoop/share/hadoop/hdfs/lib/*.jar:/usr/local/hadoop/share/hadoop/yarn/:/usr/local/hadoop/share/hadoop/yarn/lib/*.jar:/usr/local/hadoop/share/hadoop/mapreduce/:/usr/local/hadoop/share/hadoop/mapreduce/lib/*.jar:/usr/local/hadoop/contrib/capacity-scheduler/*.jar. The test still failed by not founding necessary jars.

@JkSelf
Copy link
Collaborator Author

JkSelf commented Aug 8, 2024

Do the jars that have 2.10.1 eg. hadoop-common-2.10.1.jar correspond to the older Hadoop version? Do we need them?

@majetideepak We don't need the jars related hadoop version 2.10 after upgrading hadoop version to 3.3.

@majetideepak
Copy link
Collaborator

We don't need the jars related hadoop version 2.10 after upgrading hadoop version to 3.3.

@JkSelf Why are some of the 2.10 jars listed in the class path in the adapters dockerfile?

@majetideepak
Copy link
Collaborator

majetideepak commented Aug 8, 2024

@JkSelf I see some tips here https://stackoverflow.com/questions/1237093/how-to-use-a-wildcard-in-the-classpath-to-add-multiple-jars

The common mistake is put "foo/*.jar", that will not work. Will only work with "foo/*"

Can you try by replacing *.jar with *?

@JkSelf JkSelf force-pushed the hdfs-ci branch 2 times, most recently from 6b2422f to 2d33969 Compare August 12, 2024 02:20
@JkSelf
Copy link
Collaborator Author

JkSelf commented Aug 12, 2024

We don't need the jars related hadoop version 2.10 after upgrading hadoop version to 3.3.

@JkSelf Why are some of the 2.10 jars listed in the class path in the adapters dockerfile?

@majetideepak Good catch. I have updated to 3.3 related jars.

@JkSelf
Copy link
Collaborator Author

JkSelf commented Aug 12, 2024

@JkSelf I see some tips here https://stackoverflow.com/questions/1237093/how-to-use-a-wildcard-in-the-classpath-to-add-multiple-jars

The common mistake is put "foo/*.jar", that will not work. Will only work with "foo/*"

Can you try by replacing *.jar with *?

@majetideepak I tried in my local env. And it can not work with "*".

@majetideepak
Copy link
Collaborator

@JkSelf Are you able to print or see the expanded class path when you run the application? That should give you a hint.

@JkSelf
Copy link
Collaborator Author

JkSelf commented Aug 13, 2024

@majetideepak It requires the HADOOP_HOME/share/hadoop/common and HADOOP_HOME/share/hadoop/hdfs directory. I updated the class path. Please help to review again. Thanks.

@JkSelf
Copy link
Collaborator Author

JkSelf commented Aug 14, 2024

@majetideepak @assignUser @rui-mo
When executing velox_hdfs_file_test within a Docker environment using Hadoop 3.3, I encounter the following exception, whereas Hadoop 2.1 operates without issue.


[root@2ad7826d7f46 velox]# ./_build/debug/velox/connectors/hive/storage_adapters/hdfs/tests/velox_hdfs_file_test
Running main() from /home/velox/_build/debug/_deps/gtest-src/googletest/src/gtest_main.cc
[==========] Running 22 tests from 3 test suites.
[----------] Global test environment set-up.
[----------] 20 tests from HdfsFileSystemTest
WARNING: HADOOP_PREFIX has been replaced by HADOOP_HOME. Using value of HADOOP_PREFIX.
java.lang.NoClassDefFoundError: junit/framework/TestCase
        at java.lang.ClassLoader.defineClass1(Native Method)
        at java.lang.ClassLoader.defineClass(ClassLoader.java:756)
        at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
        at java.net.URLClassLoader.defineClass(URLClassLoader.java:473)
        at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
        at org.apache.hadoop.test.MapredTestDriver.<init>(MapredTestDriver.java:109)
        at org.apache.hadoop.test.MapredTestDriver.<init>(MapredTestDriver.java:61)
        at org.apache.hadoop.test.MapredTestDriver.main(MapredTestDriver.java:147)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
Caused by: java.lang.ClassNotFoundException: junit.framework.TestCase
        at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
        ... 21 more
Unknown program 'minicluster' chosen.
Valid program names are:
  DFSCIOTest: Distributed i/o benchmark of libhdfs.
  MRReliabilityTest: A program that tests the reliability of the MR framework by injecting faults/failures
  TestDFSIO: Distributed i/o benchmark.
  fail: a job that always fails
  gsleep: A sleep job whose mappers create 1MB buffer for every record.
  loadgen: Generic map/reduce load generator
  mapredtest: A map/reduce test check.
  mrbench: A map/reduce benchmark that can create many small jobs
  nnbench: A benchmark that stresses the namenode w/ MR.
  nnbenchWithoutMR: A benchmark that stresses the namenode w/o MR.
  sleep: A job that sleeps at each map and reduce task.
  testbigmapoutput: A map/reduce program that works on a very big non-splittable file and does identity map/reduce
  testfilesystem: A test for FileSystem read/write.
  testmapredsort: A map/reduce program that validates the map-reduce framework's sort.
  testsequencefile: A test for flat files of binary key value pairs.
  testsequencefileinputformat: A test for sequence file input format.
  testtextinputformat: A test for text input format.
  threadedmapbench: A map/reduce benchmark that compares the performance of maps with multiple spills over maps with 1 spill
  timelineperformance: A job that launches mappers to test timeline service performance.
WARNING: Logging before InitGoogleLogging() is written to STDERR

Do you have any input?

@JkSelf
Copy link
Collaborator Author

JkSelf commented Aug 15, 2024

[root@2ad7826d7f46 velox]# ./_build/debug/velox/connectors/hive/storage_adapters/hdfs/tests/velox_hdfs_file_test
Running main() from /home/velox/_build/debug/_deps/gtest-src/googletest/src/gtest_main.cc
[==========] Running 22 tests from 3 test suites.
[----------] Global test environment set-up.
[----------] 20 tests from HdfsFileSystemTest
WARNING: HADOOP_PREFIX has been replaced by HADOOP_HOME. Using value of HADOOP_PREFIX.
java.lang.NoClassDefFoundError: junit/framework/TestCase
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:756)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:473)
at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
at org.apache.hadoop.test.MapredTestDriver.(MapredTestDriver.java:109)
at org.apache.hadoop.test.MapredTestDriver.(MapredTestDriver.java:61)
at org.apache.hadoop.test.MapredTestDriver.main(MapredTestDriver.java:147)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
Caused by: java.lang.ClassNotFoundException: junit.framework.TestCase
at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 21 more
Unknown program 'minicluster' chosen.
Valid program names are:
DFSCIOTest: Distributed i/o benchmark of libhdfs.
MRReliabilityTest: A program that tests the reliability of the MR framework by injecting faults/failures
TestDFSIO: Distributed i/o benchmark.
fail: a job that always fails
gsleep: A sleep job whose mappers create 1MB buffer for every record.
loadgen: Generic map/reduce load generator
mapredtest: A map/reduce test check.
mrbench: A map/reduce benchmark that can create many small jobs
nnbench: A benchmark that stresses the namenode w/ MR.
nnbenchWithoutMR: A benchmark that stresses the namenode w/o MR.
sleep: A job that sleeps at each map and reduce task.
testbigmapoutput: A map/reduce program that works on a very big non-splittable file and does identity map/reduce
testfilesystem: A test for FileSystem read/write.
testmapredsort: A map/reduce program that validates the map-reduce framework's sort.
testsequencefile: A test for flat files of binary key value pairs.
testsequencefileinputformat: A test for sequence file input format.
testtextinputformat: A test for text input format.
threadedmapbench: A map/reduce benchmark that compares the performance of maps with multiple spills over maps with 1 spill
timelineperformance: A job that launches mappers to test timeline service performance.
WARNING: Logging before InitGoogleLogging() is written to STDERR

@assignUser @majetideepak @rui-mo

The issue stems from the removal of junit-4.11.jar in Spark 3.3. To resolve this, we need to add it to the /usr/local/hadoop/share/hadoop/common/lib/ directory. While running the velox_hdfs_file_test in the adapter Docker environment, I encountered compatibility issues between OpenJDK 22 and Hadoop 3.3, resulting in a hdfs connection failed exception. To address this, we need to switch the Java version to JDK 1.8.

@assignUser
Copy link
Collaborator

To address this, we need to switch the Java version to JDK 1.8.

3.3 was released in 2020, is there really a need to use 1.8 instead of a more recent LTS?

@xiaoxmeng xiaoxmeng requested a review from kgpai August 27, 2024 04:16
@xiaoxmeng xiaoxmeng added the ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall label Aug 27, 2024
@JkSelf
Copy link
Collaborator Author

JkSelf commented Aug 28, 2024

To address this, we need to switch the Java version to JDK 1.8.

3.3 was released in 2020, is there really a need to use 1.8 instead of a more recent LTS?

@assignUser It seems the required java version is 1.8 here .

@wForget
Copy link

wForget commented Sep 2, 2024

3.3 was released in 2020, is there really a need to use 1.8 instead of a more recent LTS?

Hadoop seems to have never marked higher version support.

Ref:
https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+Java+Versions
https://issues.apache.org/jira/browse/HADOOP-16795
https://issues.apache.org/jira/browse/HADOOP-17177

@assignUser
Copy link
Collaborator

Hadoop seems to have never marked higher version support.

Wow, ok. I guess we don't have a choice ...

@JkSelf
Copy link
Collaborator Author

JkSelf commented Sep 3, 2024

@kagamiori Can you help to look at this PR? Thanks.

PATH=/usr/local/hadoop/bin:${PATH} \
JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk \
PATH=/usr/lib/jvm/java-1.8.0-openjdk/bin:${PATH} \
CLASSPATH=/usr/local/hadoop/etc/hadoop:/usr/local/hadoop/share/hadoop/common/lib/httpclient-4.5.6.jar:/usr/local/hadoop/share/hadoop/common/lib/kerb-simplekdc-1.0.1.jar:/usr/local/hadoop/share/hadoop/common/lib/zookeeper-jute-3.5.6.jar:/usr/local/hadoop/share/hadoop/common/lib/j2objc-annotations-1.1.jar:/usr/local/hadoop/share/hadoop/common/lib/curator-recipes-4.2.0.jar:/usr/local/hadoop/share/hadoop/common/lib/paranamer-2.3.jar:/usr/local/hadoop/share/hadoop/common/lib/jetty-security-9.4.20.v20190813.jar:/usr/local/hadoop/share/hadoop/common/lib/jackson-mapper-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/common/lib/jackson-core-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/common/lib/checker-qual-2.5.2.jar:/usr/local/hadoop/share/hadoop/common/lib/gson-2.2.4.jar:/usr/local/hadoop/share/hadoop/common/lib/jettison-1.1.jar:/usr/local/hadoop/share/hadoop/common/lib/jsr311-api-1.1.1.jar:/usr/local/hadoop/share/hadoop/common/lib/kerb-core-1.0.1.jar:/usr/local/hadoop/share/hadoop/common/lib/jersey-server-1.19.jar:/usr/local/hadoop/share/hadoop/common/lib/kerb-client-1.0.1.jar:/usr/local/hadoop/share/hadoop/common/lib/metrics-core-3.2.4.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-cli-1.2.jar:/usr/local/hadoop/share/hadoop/common/lib/kerby-asn1-1.0.1.jar:/usr/local/hadoop/share/hadoop/common/lib/audience-annotations-0.5.0.jar:/usr/local/hadoop/share/hadoop/common/lib/kerby-config-1.0.1.jar:/usr/local/hadoop/share/hadoop/common/lib/kerb-identity-1.0.1.jar:/usr/local/hadoop/share/hadoop/common/lib/json-smart-2.3.jar:/usr/local/hadoop/share/hadoop/common/lib/hadoop-auth-3.3.0.jar:/usr/local/hadoop/share/hadoop/common/lib/jackson-databind-2.10.3.jar:/usr/local/hadoop/share/hadoop/common/lib/guava-27.0-jre.jar:/usr/local/hadoop/share/hadoop/common/lib/woodstox-core-5.0.3.jar:/usr/local/hadoop/share/hadoop/common/lib/jersey-servlet-1.19.jar:/usr/local/hadoop/share/hadoop/common/lib/stax2-api-3.1.4.jar:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar:/usr/local/hadoop/share/hadoop/common/lib/zookeeper-3.5.6.jar:/usr/local/hadoop/share/hadoop/common/lib/jetty-webapp-9.4.20.v20190813.jar:/usr/local/hadoop/share/hadoop/common/lib/avro-1.7.7.jar:/usr/local/hadoop/share/hadoop/common/lib/hadoop-annotations-3.3.0.jar:/usr/local/hadoop/share/hadoop/common/lib/jackson-jaxrs-1.9.13.jar:/usr/local/hadoop/share/hadoop/common/lib/accessors-smart-1.2.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-codec-1.11.jar:/usr/local/hadoop/share/hadoop/common/lib/kerb-server-1.0.1.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-compress-1.19.jar:/usr/local/hadoop/share/hadoop/common/lib/dnsjava-2.1.7.jar:/usr/local/hadoop/share/hadoop/common/lib/nimbus-jose-jwt-7.9.jar:/usr/local/hadoop/share/hadoop/common/lib/jsp-api-2.1.jar:/usr/local/hadoop/share/hadoop/common/lib/jackson-core-2.10.3.jar:/usr/local/hadoop/share/hadoop/common/lib/kerb-util-1.0.1.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-daemon-1.0.13.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-lang3-3.7.jar:/usr/local/hadoop/share/hadoop/common/lib/jersey-json-1.19.jar:/usr/local/hadoop/share/hadoop/common/lib/jaxb-impl-2.2.3-1.jar:/usr/local/hadoop/share/hadoop/common/lib/curator-framework-4.2.0.jar:/usr/local/hadoop/share/hadoop/common/lib/token-provider-1.0.1.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-collections-3.2.2.jar:/usr/local/hadoop/share/hadoop/common/lib/kerb-common-1.0.1.jar:/usr/local/hadoop/share/hadoop/common/lib/curator-client-4.2.0.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-configuration2-2.1.1.jar:/usr/local/hadoop/share/hadoop/common/lib/netty-3.10.6.Final.jar:/usr/local/hadoop/share/hadoop/common/lib/animal-sniffer-annotations-1.17.jar:/usr/local/hadoop/share/hadoop/common/lib/jul-to-slf4j-1.7.25.jar:/usr/local/hadoop/share/hadoop/common/lib/hadoop-shaded-protobuf_3_7-1.0.0.jar:/usr/local/hadoop/share/hadoop/common/lib/log4j-1.2.17.jar:/usr/local/hadoop/share/hadoop/common/lib/jcip-annotations-1.0-1.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-beanutils-1.9.4.jar:/usr/local/hadoop/share/hadoop/common/lib/jetty-util-9.4.20.v20190813.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-logging-1.1.3.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-io-2.5.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-net-3.6.jar:/usr/local/hadoop/share/hadoop/common/lib/kerby-util-1.0.1.jar:/usr/local/hadoop/share/hadoop/common/lib/kerby-xdr-1.0.1.jar:/usr/local/hadoop/share/hadoop/common/lib/jackson-annotations-2.10.3.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-math3-3.1.1.jar:/usr/local/hadoop/share/hadoop/common/lib/jaxb-api-2.2.11.jar:/usr/local/hadoop/share/hadoop/common/lib/failureaccess-1.0.jar:/usr/local/hadoop/share/hadoop/common/lib/protobuf-java-2.5.0.jar:/usr/local/hadoop/share/hadoop/common/lib/javax.activation-api-1.2.0.jar:/usr/local/hadoop/share/hadoop/common/lib/snappy-java-1.0.5.jar:/usr/local/hadoop/share/hadoop/common/lib/commons-text-1.4.jar:/usr/local/hadoop/share/hadoop/common/lib/jersey-core-1.19.jar:/usr/local/hadoop/share/hadoop/common/lib/jsch-0.1.55.jar:/usr/local/hadoop/share/hadoop/common/lib/listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar:/usr/local/hadoop/share/hadoop/common/lib/httpcore-4.4.10.jar:/usr/local/hadoop/share/hadoop/common/lib/javax.servlet-api-3.1.0.jar:/usr/local/hadoop/share/hadoop/common/lib/slf4j-api-1.7.25.jar:/usr/local/hadoop/share/hadoop/common/lib/kerb-admin-1.0.1.jar:/usr/local/hadoop/share/hadoop/common/lib/jetty-servlet-9.4.20.v20190813.jar:/usr/local/hadoop/share/hadoop/common/lib/jetty-io-9.4.20.v20190813.jar:/usr/local/hadoop/share/hadoop/common/lib/jackson-xc-1.9.13.jar:/usr/local/hadoop/share/hadoop/common/lib/re2j-1.1.jar:/usr/local/hadoop/share/hadoop/common/lib/kerb-crypto-1.0.1.jar:/usr/local/hadoop/share/hadoop/common/lib/jetty-xml-9.4.20.v20190813.jar:/usr/local/hadoop/share/hadoop/common/lib/jetty-http-9.4.20.v20190813.jar:/usr/local/hadoop/share/hadoop/common/lib/jetty-server-9.4.20.v20190813.jar:/usr/local/hadoop/share/hadoop/common/lib/kerby-pkix-1.0.1.jar:/usr/local/hadoop/share/hadoop/common/lib/jsr305-3.0.2.jar:/usr/local/hadoop/share/hadoop/common/lib/asm-5.0.4.jar:/usr/local/hadoop/share/hadoop/common/lib/htrace-core4-4.1.0-incubating.jar:/usr/local/hadoop/share/hadoop/common/hadoop-common-3.3.0.jar:/usr/local/hadoop/share/hadoop/common/hadoop-registry-3.3.0.jar:/usr/local/hadoop/share/hadoop/common/hadoop-kms-3.3.0.jar:/usr/local/hadoop/share/hadoop/common/hadoop-nfs-3.3.0.jar:/usr/local/hadoop/share/hadoop/common/hadoop-common-3.3.0-tests.jar:/usr/local/hadoop/share/hadoop/hdfs:/usr/local/hadoop/share/hadoop/hdfs/lib/httpclient-4.5.6.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/kerb-simplekdc-1.0.1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/zookeeper-jute-3.5.6.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/j2objc-annotations-1.1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/curator-recipes-4.2.0.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/paranamer-2.3.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jetty-security-9.4.20.v20190813.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jackson-mapper-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jackson-core-asl-1.9.13.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/checker-qual-2.5.2.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/gson-2.2.4.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jettison-1.1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jsr311-api-1.1.1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/kerb-core-1.0.1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jersey-server-1.19.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/kerb-client-1.0.1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-cli-1.2.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/kerby-asn1-1.0.1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/netty-all-4.1.50.Final.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/audience-annotations-0.5.0.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/kerby-config-1.0.1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/kerb-identity-1.0.1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/json-smart-2.3.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/hadoop-auth-3.3.0.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jackson-databind-2.10.3.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/guava-27.0-jre.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/woodstox-core-5.0.3.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jersey-servlet-1.19.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/stax2-api-3.1.4.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/zookeeper-3.5.6.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jetty-webapp-9.4.20.v20190813.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/avro-1.7.7.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/leveldbjni-all-1.8.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/hadoop-annotations-3.3.0.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jackson-jaxrs-1.9.13.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/accessors-smart-1.2.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-codec-1.11.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/kerb-server-1.0.1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-compress-1.19.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/okhttp-2.7.5.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/dnsjava-2.1.7.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/nimbus-jose-jwt-7.9.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jackson-core-2.10.3.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/kerb-util-1.0.1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-daemon-1.0.13.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-lang3-3.7.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jersey-json-1.19.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jaxb-impl-2.2.3-1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/curator-framework-4.2.0.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/json-simple-1.1.1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/token-provider-1.0.1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-collections-3.2.2.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/kerb-common-1.0.1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/curator-client-4.2.0.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-configuration2-2.1.1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/netty-3.10.6.Final.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/animal-sniffer-annotations-1.17.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/hadoop-shaded-protobuf_3_7-1.0.0.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/log4j-1.2.17.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jcip-annotations-1.0-1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-beanutils-1.9.4.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jetty-util-9.4.20.v20190813.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-logging-1.1.3.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-io-2.5.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-net-3.6.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/kerby-util-1.0.1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/kerby-xdr-1.0.1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jackson-annotations-2.10.3.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-math3-3.1.1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jaxb-api-2.2.11.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/failureaccess-1.0.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/protobuf-java-2.5.0.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/javax.activation-api-1.2.0.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/snappy-java-1.0.5.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/commons-text-1.4.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jersey-core-1.19.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jsch-0.1.55.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/httpcore-4.4.10.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/javax.servlet-api-3.1.0.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/kerb-admin-1.0.1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jetty-servlet-9.4.20.v20190813.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jetty-io-9.4.20.v20190813.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jackson-xc-1.9.13.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/re2j-1.1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/kerb-crypto-1.0.1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jetty-xml-9.4.20.v20190813.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jetty-http-9.4.20.v20190813.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jetty-server-9.4.20.v20190813.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/kerby-pkix-1.0.1.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jsr305-3.0.2.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/asm-5.0.4.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/htrace-core4-4.1.0-incubating.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/jetty-util-ajax-9.4.20.v20190813.jar:/usr/local/hadoop/share/hadoop/hdfs/lib/okio-1.6.0.jar:/usr/local/hadoop/share/hadoop/hdfs/hadoop-hdfs-httpfs-3.3.0.jar:/usr/local/hadoop/share/hadoop/hdfs/hadoop-hdfs-client-3.3.0.jar:/usr/local/hadoop/share/hadoop/hdfs/hadoop-hdfs-rbf-3.3.0.jar:/usr/local/hadoop/share/hadoop/hdfs/hadoop-hdfs-rbf-3.3.0-tests.jar:/usr/local/hadoop/share/hadoop/hdfs/hadoop-hdfs-native-client-3.3.0-tests.jar:/usr/local/hadoop/share/hadoop/hdfs/hadoop-hdfs-nfs-3.3.0.jar:/usr/local/hadoop/share/hadoop/hdfs/hadoop-hdfs-client-3.3.0-tests.jar:/usr/local/hadoop/share/hadoop/hdfs/hadoop-hdfs-3.3.0.jar:/usr/local/hadoop/share/hadoop/hdfs/hadoop-hdfs-3.3.0-tests.jar:/usr/local/hadoop/share/hadoop/hdfs/hadoop-hdfs-native-client-3.3.0.jar:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc: @majetideepak

Is this common to hardcode the classpath this way ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kgpai We try to use a wildcard path instead of listing all jars. But it doesn't work. #10446 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you just write a script , that gets jars from /usr/local/hadoop/share/... and appends it to CLASSPATH rather than manually hardcoding it this way ?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to use CLASSPATH=$CLASSPATH:`hadoop classpath` ?

Copy link
Collaborator Author

@JkSelf JkSelf Sep 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you just write a script , that gets jars from /usr/local/hadoop/share/... and appends it to CLASSPATH rather than manually hardcoding it this way ?

@kgpai Yes. Add setup-classpath.sh to set the classpath. Can you help to review again? Thanks.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to use CLASSPATH=$CLASSPATH:hadoop classpath ?

Not in ENV as it will not substitute.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to use CLASSPATH=$CLASSPATH:`hadoop classpath`?

@wForget In a Dockerfile, the ENV instruction indeed cannot directly use the value of an environment variable that is set by a script during the build process. The environment variables set by the ENV instruction are determined at the time of building the image, not at runtime.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wForget In a Dockerfile, the ENV instruction indeed cannot directly use the value of an environment variable that is set by a script during the build process. The environment variables set by the ENV instruction are determined at the time of building the image, not at runtime.

Thank you for your explanation, I got it.

@JkSelf
Copy link
Collaborator Author

JkSelf commented Sep 4, 2024

@assignUser Can you help to review again? Thanks.

@JkSelf JkSelf force-pushed the hdfs-ci branch 2 times, most recently from e5a5418 to d0bfce5 Compare September 4, 2024 04:50
@JkSelf
Copy link
Collaborator Author

JkSelf commented Sep 5, 2024

@kgpai @majetideepak @assignUser Can you help to merge if you have no further comment? Thanks

@facebook-github-bot
Copy link
Contributor

@xiaoxmeng has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@xiaoxmeng merged this pull request in b640e69.

Copy link

Conbench analyzed the 1 benchmark run on commit b640e69c.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

facebook-github-bot pushed a commit that referenced this pull request Sep 10, 2024
Summary:
PR #10446 upgraded the Hadoop version from 2.10.1 to 3.3.0, but the Hadoop seach path was not modified, which caused the `velox_hdfs_file_test` CI of the later submitted PR to turn red.
e.g. https://github.com/facebookincubator/velox/actions/runs/10752319167/job/29820507912?pr=10946
https://github.com/facebookincubator/velox/actions/runs/10735750279/job/29773707267?pr=10939
```
331/354 Test #338: velox_hdfs_file_test ...................................................***Exception: SegFault  0.20 sec
WARNING: Logging before InitGoogleLogging() is written to STDERR
I20240907 16:17:40.657701 65194 RegisterAbfsFileSystem.cpp:41] Register ABFS
Running main() from /__w/velox/velox/_build/release/_deps/gtest-src/googletest/src/gtest_main.cc
[==========] Running 22 tests from 3 test suites.
[----------] Global test environment set-up.
[----------] 20 tests from HdfsFileSystemTest
E20240907 16:17:40.6[685](https://github.com/facebookincubator/velox/actions/runs/10752319167/job/29820507912?pr=10946#step:11:686)77 65194 Exceptions.h:67] Line: /__w/velox/velox/velox/connectors/hive/storage_adapters/hdfs/tests/HdfsMiniCluster.cpp:71, Function:HdfsMiniCluster, Expression:  Failed to find minicluster executable hadoop', Source: RUNTIME, ErrorCode: INVALID_STATE
unknown file: Failure
C++ exception with description "Exception: VeloxRuntimeError
Error Source: RUNTIME
Error Code: INVALID_STATE
Reason: Failed to find minicluster executable hadoop'
Retriable: False
Function: HdfsMiniCluster
File: /__w/velox/velox/velox/connectors/hive/storage_adapters/hdfs/tests/HdfsMiniCluster.cpp
Line: 71
Stack trace:
# 0  _ZN8facebook5velox7process10StackTraceC1Ei
# 1  _ZN8facebook5velox14VeloxExceptionC1EPKcmS3_St17basic_string_viewIcSt11char_traitsIcEES7_S7_S7_bNS1_4TypeES7_
# 2  _ZN8facebook5velox6detail14veloxCheckFailINS0_17VeloxRuntimeErrorERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEvRKNS1_18VeloxCheckFailArgsET0_
# 3  _ZN8facebook5velox11filesystems4test15HdfsMiniClusterC1Ev
# 4  _ZN18HdfsFileSystemTest14SetUpTestSuiteEv
# 5  _ZN7testing8internal35HandleExceptionsInMethodIfSupportedINS_9TestSuiteEvEET0_PT_MS4_FS3_vEPKc
# 6  _ZN7testing9TestSuite3RunEv
# 7  _ZN7testing8internal12UnitTestImpl11RunAllTestsEv
# 8  _ZN7testing8UnitTest3RunEv
# 9  main
# 10 __libc_start_call_main
# 11 __libc_start_main
# 12 _start
" thrown in SetUpTestSuite().
```

CC: majetideepak assignUser kgpai

Pull Request resolved: #10947

Reviewed By: pedroerp

Differential Revision: D62381299

Pulled By: mbasmanova

fbshipit-source-id: 8876cd79fdc9a7e756c02f506a79384d8b07de48
facebook-github-bot pushed a commit that referenced this pull request Sep 30, 2024
Summary:
A new script to setup the classpath was added by PR [10446](#10446) that is called incorrectly on startup of the container.
The name needs to be setup-classpath.sh and not set_classpath.sh.

Error returned on startup
[root@czentgr-foobar velox]# docker run -it 64a9eed9f771 bash Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg. --: line 1: /set_classpath.sh: No such file or directory

Pull Request resolved: #11132

Reviewed By: pedroerp

Differential Revision: D63648777

Pulled By: kagamiori

fbshipit-source-id: 6ce0a4a32f22e5729e7a98feac5a45629c83b4b8
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged ready-to-merge PR that have been reviewed and are ready for merging. PRs with this tag notify the Velox Meta oncall
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants