Welder 2022

Trying out different streaming jobs to read avro from kafka and output to targets like :

Append only: hive/parquet
full sync: Iceberg/parquet.
full sync: Delta
full sync: hudi

Goals with this is to learn howto read from kafka with multiple partitions/offsets and spread out the work to multiple workers. The name Welder is the opposite from the Shredder , since the welder it makes the data "whole again" (Hopefully)

Screenshots

An producer called the Shredder is started , reading fixed column sized datafiles (30 columns in this example , 2 gig per file).

Time spend in total : 2.471591492s parsing 148804290 lines from 2620609413 bytes
Troughput bytes/s total : 1011.17MB /s
Troughput lines/s total : 57.42M Lines/s
Troughput lines/s toAvro: 4.27M Lines/s
Time spent toReadChunks : 0.7911964744166666 s
Time spent toAvro : 33.271536043333334 s
Time spent toKafka : 19.226326717666666 s
Time spent DoneKafka : 8.036e-06 s

Name		Name	Last commit message	Last commit date
Latest commit History 352 Commits
.mvn		.mvn
infra		infra
screenshots		screenshots
spark232job-mod		spark232job-mod
spark301job-mod		spark301job-mod
spark311job-mod		spark311job-mod
spark320job-mod		spark320job-mod
spark332Icebergjob-mod		spark332Icebergjob-mod
.gitignore		.gitignore
Jenkinsfile		Jenkinsfile
LICENSE		LICENSE
README.md		README.md
settings.xml		settings.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Welder 2022

Screenshots

About

Releases

Packages

Languages

License

Ignalina/welder

Folders and files

Latest commit

History

Repository files navigation

Welder 2022

Screenshots

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages