You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I wrote a python producer app that was consistently getting Local: Queue full errors (BufferError) and found this code sample from Magnus (author of librdkafka) helpful:
The while loop just means that normally a record will be produced and then there's a break to move out of the while loop and back into the for loop, but in case we hit BufferError, we need to retry the produce before we move back to the for loop.
I would also suggest including some performance tuning knobs in the example.
I have a code snippet that might help. My app reads from a CSV file to produce to two different topics. Here is a relevant snippet:
# configure schema registry client and avro serializerschema_registry_client=SchemaRegistryClient(SCHEMA_REGISTRY_CONFIG)
my_avro_serializer=AvroSerializer(
schema_registry_client=schema_registry_client,
schema_str=MY_AVRO_SCHEMA
)
producer_config=KAFKA_CONFIG.copy()
producer_config['value.serializer'] =my_avro_serializer# Configure other producer properties to tune performance. # See https://github.com/edenhill/librdkafka/blob/master/CONFIGURATION.mdproducer_config['queue.buffering.max.messages'] =1000000producer_config['linger.ms'] =500producer_config['batch.size'] =2000000producer_config['compression.type'] ='snappy'# Create producer and produce dataproducer=SerializingProducer(producer_config)
records_produced=0withopen('./my_data.csv') asdata:
reader=csv.DictReader(data,fieldnames=MY_FIELDNAMES)
forrowinreader:
# preprocess csv row to use float values instead of scientific notationrow_with_float_values= {field:float(value) forfield, valueinrow.items()}
whileTrue:
try:
producer.produce(
topic=MY_TOPIC,
value=row_with_float_values,
on_delivery=log_kafka_errors)
producer.poll(0)
records_produced+=1breakexceptBufferErroraserr:
print(err, file=sys.stderr)
print(f"produced {records_produced} records so far", file=sys.stdout)
producer.flush()
print(f"produced a totoal of {records_produced} records", file=sys.stdout)
producer.flush()
There is some discussion about the extent to which calling poll(0) after each produce affects performance. Magnus says it's negligible but here's someone who doesn't agree.
It made sense to me to do producer.flush() when we hit a BufferError to clear everything out, but Magnus' example does poll(1) to wait for just 1 second to register callbacks for sent messages. I found that to cause me to hit the BufferError more often. To be honest, I don't really have a good sense of how poll works or what I should do with it from reading the docs, so maybe some more comments around how to use poll() and flush() would be good.
I wrote a python producer app that was consistently getting
Local: Queue full
errors (BufferError
) and found this code sample from Magnus (author of librdkafka) helpful:The
while
loop just means that normally a record will be produced and then there's abreak
to move out of the while loop and back into the for loop, but in case we hitBufferError
, we need to retry the produce before we move back to the for loop.I would also suggest including some performance tuning knobs in the example.
I have a code snippet that might help. My app reads from a CSV file to produce to two different topics. Here is a relevant snippet:
There is some discussion about the extent to which calling
poll(0)
after each produce affects performance. Magnus says it's negligible but here's someone who doesn't agree.It made sense to me to do
producer.flush()
when we hit aBufferError
to clear everything out, but Magnus' example doespoll(1)
to wait for just 1 second to register callbacks for sent messages. I found that to cause me to hit the BufferError more often. To be honest, I don't really have a good sense of howpoll
works or what I should do with it from reading the docs, so maybe some more comments around how to usepoll()
andflush()
would be good.For reference, my full working example is here:
The text was updated successfully, but these errors were encountered: