bulk update of exchanges is slow with peewee #46

aleksandra-kim · 2016-09-07T22:22:18Z

Original report by Tomas Navarrete Gutierrez (Bitbucket: tomas_navarrete, ).

If I want to update all the amounts of the inputs of an activity (with different values), I would have to iterate over the technosphere of the activity, update the amount field per exchange, and call "save()" on each exchange. With n technosphere exchanges, this requires n transactions, in current implementation.

I would like to group the updates, so that there is only one transaction, and hopefully less I/O to hardrive.

The idea from: http://docs.peewee-orm.com/en/latest/peewee/querying.html#atomic-updates

something in the line of:

"""
exchanges is a dictionary of oldExchange:newExchange
"""
def bulk_update_exchanges(activity, exchanges):
    with db.atomic as txn:
        for old, new in exchanges.items():
            old.update(data = new.data)

aleksandra-kim · 2016-09-08T12:22:13Z

Original comment by Tomas Navarrete Gutierrez (Bitbucket: tomas_navarrete, ).

like the private method _efficient_write_many_data from the sqlite backend
(https://bitbucket.org/cmutel/brightway2-data/src/aa5e4a8377aef097be0e694ead2a149ec04dec84/bw2data/backends/peewee/database.py?at=default&fileviewer=file-view-default#database.py-147)

aleksandra-kim · 2016-09-08T13:35:04Z

Original comment by Tomas Navarrete Gutierrez (Bitbucket: tomas_navarrete, ).

So, I found a quick hack to what I wanted to achieve, but I am not sure this is the right way to go.
Specially, since this is not "generic" at all

from brightway2 import *
from bw2data.backends.peewee import sqlite3_lci_db as db 
import random

# ... project, db, activity finding
act = Database('my_db').get('myActivity')

with db.atomic() as txn:
    for e in act.technosphere():
        v = random.random()
        e_ds = e._document
        e_ds.data.update(amount = v)
        e.save()

Of course the raw import of the sqlite_lci_db can be done in a more elegant way, depending on the type of backend.

My doubt is specially on the need to recover the underlying document (ExchangeDataset) for the object (Exchange).

It seems to work for now, but my request remains. ;)

aleksandra-kim · 2016-09-21T14:32:36Z

Original comment by Chris Mutel (Bitbucket: cmutel, GitHub: cmutel).

Yes, this is a weakness of the current model of abstraction layer cake. Actually, I think your approach is quite reasonable, though you could do something directly with ActivityDataset objects using normal Peewee methods, e.g. what actually happens when you call .get().

The problem with ActivityDataset and ExchangeDataset is that there actually isn't any foreign keys or other automatic relationships between them. So you will have to manage these yourself, and make sure you don't create mismatches between the tables. As you have seen, you can also gain some speed by dropping down to straight SQL from Python, but this only really makes sense in special circumstances.

aleksandra-kim added minor enhancement New feature or request labels Mar 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bulk update of exchanges is slow with peewee #46

bulk update of exchanges is slow with peewee #46

aleksandra-kim commented Sep 7, 2016

aleksandra-kim commented Sep 8, 2016

aleksandra-kim commented Sep 8, 2016

aleksandra-kim commented Sep 21, 2016

bulk update of exchanges is slow with peewee #46

bulk update of exchanges is slow with peewee #46

Comments

aleksandra-kim commented Sep 7, 2016

aleksandra-kim commented Sep 8, 2016

aleksandra-kim commented Sep 8, 2016

aleksandra-kim commented Sep 21, 2016