Replies: 12 comments 17 replies
-
cc @chenxu14 |
Beta Was this translation helpful? Give feedback.
-
There are some Critical bugs with libhdfs that have not been resolved yet |
Beta Was this translation helpful? Give feedback.
-
@JkSelf Could you provide some detailed information about this case? |
Beta Was this translation helpful? Give feedback.
-
While, I have the three option, we should add Kerberos authentication and viewfs support in libhdfs3 and still get accessing HDFS using native code instead of JVM to get better performance. |
Beta Was this translation helpful? Give feedback.
-
Posting the relevant issue on libhdfs3 here: this . TLDR, it was mentioned that the Kerberos does not work for application <-> Hadoop KMS communication. |
Beta Was this translation helpful? Give feedback.
-
Is it possible to use libhdfs3 maintained by ClickHouse? It seems to be still maintained, and we use this internally. |
Beta Was this translation helpful? Give feedback.
-
@JkSelf can you have a list of the features missing in libhdfs3? And if they are added to clickhouse/libhdfs3? If they are, looks clickhouse/libhdfs3 is a better choice. |
Beta Was this translation helpful? Give feedback.
-
@zhanglistar @wypb In gluten we did tried to use CK/libhdfs3 before, we also added the delegation token support(oap-project/libhdfs3@9f234ed) The viewfs support is also a gap based on what I learned thanks |
Beta Was this translation helpful? Give feedback.
-
@majetideepak @assignUser @zhanglistar @FelixYBW We conducted performance tests on the Q6 query using a 2TB TPC-H dataset, comparing the results with HDFS Short Circuit enabled and disabled. According to the data we collected, with HDFS Short Circuit enabled, libhdfs3 performs approximately 1.08 x faster than libhdfs. However, with HDFS Short Circuit disabled, the performance of the two is quite similar. Note: The observed 8% performance degradation occurs under extreme conditions. In real-world production environments, remote HDFS is commonly used, so the performance is same with HDFS Short Circuit disabled, which do not impact the overall performance. We conducted performance tests on 103 queries of the 2 TB TPC-DS in the same environment and found that the performance when using libhdfs and libhdfs3 is comparable, with no performance degradation. Below are the machine models used for the tests:
|
Beta Was this translation helpful? Give feedback.
-
Based on the experimental results and the vote, let's remove |
Beta Was this translation helpful? Give feedback.
-
Sorry to see this too late. |
Beta Was this translation helpful? Give feedback.
-
Currently, Velox utilizes the c++ version libhdfs3 to support the HDFS system. However, when a customer's Hadoop environment has Kerberos authentication and viewfs support enabled, Velox encounters errors with HDFS connection failures. Therefore, we plan to follow Arrow's approach and dynamically load the Hadoop and JVM libraries set up in the system during the runtime of Velox to invoke the JVM version implementation of HDFS. This would allow us to leverage more features from the Hadoop community. Given this requirement, whether use the JVM version of libhdfs to completely replace the C++ version of libhdfs3?
21 votes ·
Beta Was this translation helpful? Give feedback.
All reactions