Redpanda connect batching confusion #23927

pmak-852 · 2024-10-28T16:00:09Z

pmak-852
Oct 28, 2024

i have a simple pipeline that pull messages from kafka in batch and persist to s3. The input batch size is 50 messages while the output is 20. i am expecting the output batch size is 20, which is not the case, it says 50. When i remove the input batching policy, the output batch size is 20 as expected.

hope i understood the concept of batching in the context of redpanda connect correctly.

say the input has 100 records, redpanda cuts them into 2 parts evenly in this case.
each batch undergoes the processor after which we still have 2 batches with 50 messages in each of them
when a batch of 50 messages arrives the output, it will be further cut into smaller batches with 20 message
the processors in the batching is handling 20 messages each time, so the batch_size() should return 20 instead of 50

Many thanks!

my connect.yaml as follow

input:
  kafka:
    addresses: [ "redpanda-0:9092" ]
    topics: [ "example" ]
    consumer_group: "test4"
    checkpoint_limit: 1024
    auto_replay_nacks: true
    batching:
      count: 50
      period: "5s"



pipeline:
  processors:
    - mapping: |
        root.data = this
        root.meta = metadata()



output:
  label: "unittest"
  aws_s3:
    bucket: "raw-zone-general"
    path: ${!meta("kafka_topic")}-${!meta("kafka_partition")}-${!meta("kafka_offset").from(-1)}-${!timestamp_unix_nano()}.zip
    endpoint: "http://minio:9000"
    region: "local"
    force_path_style_urls: true
    batching:
      count: 10
      period: "10s"
      processors:
        - log:
            level: INFO
            fields_mapping: |
              root.kafka_topic = meta("kafka_offset").from(-1)
              root.sz = batch_size()
        - archive:
            format: tar
            path: ${!meta("kafka_partition")}-${!meta("kafka_offset")}

    credentials:
      id: "testing_account"
      secret: "testing_pwd"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Redpanda connect batching confusion #23927

{{title}}

Replies: 0 comments

Select a reply

Redpanda connect batching confusion #23927

pmak-852 Oct 28, 2024

Replies: 0 comments

pmak-852
Oct 28, 2024