Add ability to write multiple topics into single output path #91

OneCricketeer · 2018-12-03T07:24:08Z

From StackOverflow

Naturally, one might try to use RegexRouter to send multiple topics to a single directory. Say, data coming from JDBC Source connector

    "topics": "SQLSERVER-TEST-TABLE_TEST",

    "transforms":"dropPrefix",      
    "transforms.dropPrefix.type":"org.apache.kafka.connect.transforms.RegexRouter",  
    "transforms.dropPrefix.regex":"SQLSERVER-TEST-(.*)",  
    "transforms.dropPrefix.replacement":"$1"

But this will throw a NPE

Caused by: java.lang.NullPointerException
    at io.confluent.connect.s3.S3SinkTask.put(S3SinkTask.java:188)
    at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:564)
    ... 10 more

And if debugging (the S3 Connector, specifically), we see that the data that's needed to generate the top level folder is available, but the storage writer cannot access it from the map.

There is a HashMap with the original topic name (SQLSERVER_TEST_TABLE_TEST-0), and the transform has already been applied (TABLE-TEST-0), so if we lookup the "new" topicname, it cannot find the S3 writer for the TopicPartition.

I think adding a separate config in the storage-common module for performing the logic of the RegexRouter outside of the SMT pipeline will help solve this problem, and can be patched into the Hadoop, S3, and other storage connectors

The text was updated successfully, but these errors were encountered:

dongxiaohe · 2018-12-06T03:15:55Z

Will do, I run those whole thing by using docker directly and EKS (kubernetes). Now I need to compile the project and set up the local environment, it may take some time 😄

OneCricketeer mentioned this issue Dec 6, 2018

Getting NullPointerException at io.confluent.connect.s3.S3SinkTask.put(S3SinkTask.java:188) confluentinc/kafka-connect-storage-cloud#221

Open

OneCricketeer mentioned this issue Jan 23, 2019

NullPointerException when using RegexRouter confluentinc/kafka-connect-hdfs#236

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ability to write multiple topics into single output path #91

Add ability to write multiple topics into single output path #91

OneCricketeer commented Dec 3, 2018 •

edited

Loading

dongxiaohe commented Dec 6, 2018

Add ability to write multiple topics into single output path #91

Add ability to write multiple topics into single output path #91

Comments

OneCricketeer commented Dec 3, 2018 • edited Loading

dongxiaohe commented Dec 6, 2018

OneCricketeer commented Dec 3, 2018 •

edited

Loading