Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] [Source HdfsFile] Read file failed #8786

Open
2 of 3 tasks
Meepoljdx opened this issue Feb 21, 2025 · 0 comments
Open
2 of 3 tasks

[Bug] [Source HdfsFile] Read file failed #8786

Meepoljdx opened this issue Feb 21, 2025 · 0 comments
Labels

Comments

@Meepoljdx
Copy link

Search before asking

  • I had searched in the issues and found no similar issues.

What happened

Use HdfsFile Source connector read hdfs data, some file can not read, file type is text.I can use command cat to read these files which can not be read by seatunnel

SeaTunnel Version

2.3.9\2.3.8

SeaTunnel Config

env {
  parallelism = 20
  job.retry.times = 3
  job.retry.interval.seconds = 10
  job.name = "hdfs file"
  job.mode = "BATCH"
}

source {
  HdfsFile {
  hdfs_site_path = "/opt/apache-seatunnel-2.3.9/hdfs-conf/hdfs-site-old.xml"
  path = "/xxxx/p_date=2025-01-24/p_hour=0/4924080071101949"
  file_format_type = "text"
  fs.defaultFS = "hdfs://nameservice"
  remote_user= "mr"
  krb5_path = "/etc/krb5.conf"
  kerberos_principal = "mr/[email protected]"
  kerberos_keytab_path = "/opt/apache-seatunnel-2.3.9/config/mr.keytab"
  }
  # 如果您想获取有关如何配置 seatunnel 和查看源插件完整列表的更多信息,
  # 请访问 https://seatunnel.apache.org/docs/connector-v2/source
}

transform {
  # 如果您想获取有关如何配置 seatunnel 和查看转换插件完整列表的更多信息,
    # 请访问 https://seatunnel.apache.org/docs/category/transform-v2
}

sink {
    HdfsFile {
      fs.defaultFS = "hdfs://nameservice"
      path = "/xxxxxx/p_date=2025-01-24/p_hour=0/"
      tmp_path = "/tmp/seatunnel"
      remote_user= "mr"
      hdfs_site_path = "/opt/apache-seatunnel-2.3.9/hdfs-conf/hdfs-site-new.xml"
      file_format_type = "text"

    }
  # 如果您想获取有关如何配置 seatunnel 和查看接收器插件完整列表的更多信息,
  # 请访问 https://seatunnel.apache.org/docs/connector-v2/sink
}

Running Command

./bin/seatunnel.sh --config ./config/hdfs-to-hdfs.config -m local

Error Exception

Exception in thread "main" org.apache.seatunnel.core.starter.exception.CommandExecuteException: SeaTunnel job executed failed
        at org.apache.seatunnel.core.starter.seatunnel.command.ClientExecuteCommand.execute(ClientExecuteCommand.java:228)
        at org.apache.seatunnel.core.starter.SeaTunnel.run(SeaTunnel.java:40)
        at org.apache.seatunnel.core.starter.seatunnel.SeaTunnelClient.main(SeaTunnelClient.java:34)
Caused by: org.apache.seatunnel.engine.common.exception.SeaTunnelEngineException: org.apache.seatunnel.common.exception.SeaTunnelRuntimeException: ErrorCode:[COMMON-01], ErrorDescription:[SeaTunnel read file 'hdfs://nameservice/xxx/p_date=2025-01-24/p_hour=0/4924080071101949' failed.]
        at org.apache.seatunnel.common.exception.CommonError.fileOperationFailed(CommonError.java:68)
        at org.apache.seatunnel.connectors.seatunnel.file.source.BaseFileSourceReader.pollNext(BaseFileSourceReader.java:65)
        at org.apache.seatunnel.engine.server.task.flow.SourceFlowLifeCycle.collect(SourceFlowLifeCycle.java:159)
        at org.apache.seatunnel.engine.server.task.SourceSeaTunnelTask.collect(SourceSeaTunnelTask.java:127)
        at org.apache.seatunnel.engine.server.task.SeaTunnelTask.stateProcess(SeaTunnelTask.java:169)
        at org.apache.seatunnel.engine.server.task.SourceSeaTunnelTask.call(SourceSeaTunnelTask.java:132)
        at org.apache.seatunnel.engine.server.TaskExecutionService$BlockingWorker.run(TaskExecutionService.java:694)
        at org.apache.seatunnel.engine.server.TaskExecutionService$NamedTaskWrapper.run(TaskExecutionService.java:1019)
        at org.apache.seatunnel.api.tracing.MDCRunnable.run(MDCRunnable.java:43)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException

        at org.apache.seatunnel.core.starter.seatunnel.command.ClientExecuteCommand.execute(ClientExecuteCommand.java:220)
        ... 2 more
2025-02-21 14:59:03,104 INFO  [s.c.s.s.c.ClientExecuteCommand] [SeaTunnel-CompletableFuture-Thread-6] - run shutdown hook because get close signal

Zeta or Flink or Spark Version

No response

Java or Scala Version

No response

Screenshots

Image

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@Meepoljdx Meepoljdx added the bug label Feb 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant