Kiba Pro provides vendor-supported ETL extensions for Kiba. Your subscription funds the Open-Source development, thanks for considering it!
Learn more on the Kiba website.
Documentation is available on the Wiki.
- Relax dependency constraint to allow work with Kiba v4
- New:
SQLBulkLookup
transform allows to efficiently lookup values in SQL tables. This is particularly useful in datawarehouse scenarios (to replace unique business keys by surrogate keys), or when writing migrations of SQL databases. Instead of looking-up each row individually, it avoids a "N+1" like effect, by working on large batches of rows. - New:
ParallelTransform
provides an easy way to process a group of ETL rows at the same time using a pool of threads. It can be used to accelerate ETL transforms doing IO operations such as HTTP queries, by going multithreaded. - New:
FileLock
adds an easy way to avoid overlapping runs in ETL Jobs using a local file lock.
- Compatibility with Kiba v3
- BREAKING CHANGE: deprecate non-live Sequel connection passing (#79). Do not use
database: "connection_string"
, instead pass yourSequel
connection directly. This moves the connection management out of the destination, which is a better pattern & provides better (block-based) resources closing. - Official MySQL support:
- While the compatibility was already here, it is now tested for in our QA testing suite.
- MySQL 5.5-8.0 is supported & tested
- MariaDB should be supported (although not tested against in the QA testing suite)
- Amazon Aurora MySQL is also supposed to work (although not tested)
Kiba::Pro::Sources::SQL
supports for non-streaming + streaming useKiba::Pro::Destinations::SQLBulkInsert
supports:- Bulk insert
- Bulk insert with ignore
- Bulk upsert (including with dynamically computed columns) via
ON DUPLICATE KEY UPDATE
- Note that the
Kiba::Pro::Destinations::SQLUpsert
(row-by-row) is not MySQL compatible at the moment
SQL
source improvements:-
Deprecate use_cursor in favor of block query construct. The source could previously be configured with:
source Kiba::Pro::Sources::SQL, query: "SELECT * FROM items", use_cursor: true
The
use_cursor
keyword is now deprecated. You can use the more powerful block query construct:source Kiba::Pro::Sources::SQL, query: -> (db) { db["SELECT * FROM items"].use_cursor },
-
Avoid bogus nested SQL calls when configuring the query via block/proc. A call with:
source Kiba::Pro::Sources::SQL, query: -> (db) { db["SELECT * FROM items"] },
would have previously generated a
SELECT * FROM (SELECT * FROM "items")
. This is now fixed. -
Add specs around streaming support (for both MySQL and Postgres).
For Postgres, streaming was recommended by the author of Sequel over
use_cursor: true
(but do compare on your actual cases!). To enable streaming for Postgres:- Add
sequel_pg
to yourGemfile
- Enable the extension in your
db
instance & add.stream
to your dataset e.g.:
Sequel.connect(ENV.fetch('DATABASE_URL')) do |db| db.extension(:pg_streaming) Kiba.run(Kiba.parse do source Kiba::Pro::Sources::SQL, db: db, query: -> (db) { db[:items].stream } # SNIP end)
For MySQL, just add
.stream
to your dataset like above (no extension required). - Add
-
- Improvement:
SQLBulkInsert
now supports PostgresINSERT ON CONFLICT
for batch operations (bulk upsert, conditional upserts, ignore if exist etc) via newdataset
keyword. See documentation.
NOTE: documentation & requirements/compatibility are available on the wiki.
- New:
SQLUpsert
destination allowing row-by-row "insert or update". - New:
SQL
source allowing efficient streaming of large volumes of SQL rows while controlling memory consumption. - Improvement:
SQLBulkInsert
can now be used from a Sidekiq job.
- Multiple improvements to
SQLBulkInsert
:- New flexible
row_pre_processor
option which allows to either remove a row conditionally (useful to conditionally target a given destination amongst many) or to replace it by N dynamically computed target rows. - New callbacks:
after_initialize
&before_flush
(useful to enforce dependent destinations flush & ensure required foreign keys constraints are respected). logger
support.- Bugfix: make sure to
disconnect
onclose
. - Extra safety checks on row keys.
- New flexible
- Initial release of the
SQLBulkInsert
destination (providing fast SQL INSERT).