Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bulk update of exchanges is slow with peewee #46

Open
aleksandra-kim opened this issue Sep 7, 2016 · 3 comments
Open

bulk update of exchanges is slow with peewee #46

aleksandra-kim opened this issue Sep 7, 2016 · 3 comments
Labels
enhancement New feature or request minor

Comments

@aleksandra-kim
Copy link
Member

Original report by Tomas Navarrete Gutierrez (Bitbucket: tomas_navarrete, ).


If I want to update all the amounts of the inputs of an activity (with different values), I would have to iterate over the technosphere of the activity, update the amount field per exchange, and call "save()" on each exchange. With n technosphere exchanges, this requires n transactions, in current implementation.

I would like to group the updates, so that there is only one transaction, and hopefully less I/O to hardrive.

The idea from: http://docs.peewee-orm.com/en/latest/peewee/querying.html#atomic-updates

something in the line of:

"""
exchanges is a dictionary of oldExchange:newExchange
"""
def bulk_update_exchanges(activity, exchanges):
    with db.atomic as txn:
        for old, new in exchanges.items():
            old.update(data = new.data)
@aleksandra-kim
Copy link
Member Author

Original comment by Tomas Navarrete Gutierrez (Bitbucket: tomas_navarrete, ).


like the private method _efficient_write_many_data from the sqlite backend
(https://bitbucket.org/cmutel/brightway2-data/src/aa5e4a8377aef097be0e694ead2a149ec04dec84/bw2data/backends/peewee/database.py?at=default&fileviewer=file-view-default#database.py-147)

@aleksandra-kim
Copy link
Member Author

Original comment by Tomas Navarrete Gutierrez (Bitbucket: tomas_navarrete, ).


So, I found a quick hack to what I wanted to achieve, but I am not sure this is the right way to go.
Specially, since this is not "generic" at all

from brightway2 import *
from bw2data.backends.peewee import sqlite3_lci_db as db 
import random

# ... project, db, activity finding
act = Database('my_db').get('myActivity')

with db.atomic() as txn:
    for e in act.technosphere():
        v = random.random()
        e_ds = e._document
        e_ds.data.update(amount = v)
        e.save()

Of course the raw import of the sqlite_lci_db can be done in a more elegant way, depending on the type of backend.

My doubt is specially on the need to recover the underlying document (ExchangeDataset) for the object (Exchange).

It seems to work for now, but my request remains. ;)

@aleksandra-kim
Copy link
Member Author

Original comment by Chris Mutel (Bitbucket: cmutel, GitHub: cmutel).


Yes, this is a weakness of the current model of abstraction layer cake. Actually, I think your approach is quite reasonable, though you could do something directly with ActivityDataset objects using normal Peewee methods, e.g. what actually happens when you call .get().

The problem with ActivityDataset and ExchangeDataset is that there actually isn't any foreign keys or other automatic relationships between them. So you will have to manage these yourself, and make sure you don't create mismatches between the tables. As you have seen, you can also gain some speed by dropping down to straight SQL from Python, but this only really makes sense in special circumstances.

@aleksandra-kim aleksandra-kim added minor enhancement New feature or request labels Mar 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request minor
Projects
None yet
Development

No branches or pull requests

1 participant