Skip to content
This repository has been archived by the owner on Dec 31, 2024. It is now read-only.

Ease development with Docker #57

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

aaronjwood
Copy link

One command and you're good to go :)

@JPHutchins
Copy link
Owner

@aaronjwood Very exciting! I am testing this out now!

FYI we'll have to maintain the original process guide in the README (move to bottom instead of replacing) since it's informative for how production is running (Linux systemd).

Hopefully I work up the courage to switch production to the docker container.

@aaronjwood
Copy link
Author

aaronjwood commented Jan 22, 2023

Sounds good, I'll adjust the readme when I get some time in a few days.

When I got everything up locally and fixed some crashing around the test data parsing I found that the UI didn't show the test data that was loaded into the DB anywhere, and the UI was stuck on December 1969. Are you aware of this being an existing issue? I'm guessing it's specific to the local dev env since things are working for me on your live deployment with my PGE data but I didn't dig in very much to see exactly why it wasn't working. The test data is from 2019 it seems, but the front end doesn't allow to go anywhere besides 1969.

WORKDIR /frontend
RUN npm ci && npm run build

FROM python:3.8-slim
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JPHutchins what do you think about moving to PyPy for the JIT sweetness?

@JPHutchins
Copy link
Owner

JPHutchins commented Jan 28, 2023

@aaronjwood Unfortunately I am in a "how is this even working" sorta situation with the MQ and Celery tasks on the live server...

The docker container works for me up to the point of queuing the async jobs - LMK if this flow is working for you in the docker container: https://github.com/JPHutchins/open-energy-view#example-account-setup

Here's a description of what is supposed to be happening.

  • User registers for the first time and we'd like to get their historical data. PGE SMD team advised that I request 1 month of data at a time with some delay in between 🙄.
  • This starts a job "fetch historical data" that queues up to ~48 jobs that will make the requests for 1 month of data at a time to the PGE SMD API
    • real task:
      @celery.task(bind=True, name="fetch_task")
      def fetch_task(self, published_period_start, interval_block_url, headers, cert):
      four_weeks = 3600 * 24 * 28
      end = int(time.time())
      published_period_start = int(published_period_start)
      print(published_period_start, interval_block_url, headers, cert)
      while end > published_period_start:
      start = end - four_weeks + 3600
      params = {
      "published-min": start,
      "published-max": end,
      }
      response_text = request_url(
      "GET",
      interval_block_url,
      params=params,
      headers=headers,
      cert=cert,
      format="text",
      )
      save_espi_xml(response_text)
      db_insert_task = insert_espi_xml_into_db.delay(response_text)
      end = start - 3600
      sleep(2)
      retries = 0
      while not db_insert_task.ready():
      if retries > 60:
      print("Insert into DB failed!")
      break
      retries += 1
      sleep(1)
      return "done"
    • mock task (the one that should work in docker container):
      @celery.task(bind=True, name="fake_fetch")
      def fake_fetch(self):
      test_xml = [
      "/home/jp/open-energy-view/test/data/espi/espi_2_years.xml",
      "/home/jp/open-energy-view/test/data/espi/Single Days/2019-10-16.xml",
      "/home/jp/open-energy-view/test/data/espi/Single Days/2019-10-17.xml",
      "/home/jp/open-energy-view/test/data/espi/Single Days/2019-10-18.xml",
      "/home/jp/open-energy-view/test/data/espi/Single Days/2019-10-19.xml",
      "/home/jp/open-energy-view/test/data/espi/Single Days/2019-10-20.xml",
      "/home/jp/open-energy-view/test/data/espi/Single Days/2019-10-21.xml",
      "/home/jp/open-energy-view/test/data/espi/Single Days/2019-10-22.xml",
      "/home/jp/open-energy-view/test/data/espi/Single Days/2019-10-23.xml",
      "/home/jp/open-energy-view/test/data/espi/Single Days/2019-10-24.xml",
      "/home/jp/open-energy-view/test/data/espi/Single Days/2019-10-25.xml",
      "/home/jp/open-energy-view/test/data/espi/Single Days/2019-10-26.xml",
      "/home/jp/open-energy-view/test/data/espi/Single Days/2019-10-27.xml",
      ]
      test_xml.reverse()
      for xml_path in test_xml:
      time.sleep(2.5)
      with open(xml_path) as xml_reader:
      xml = xml_reader.read()
      db_insert_task = insert_espi_xml_into_db.delay(xml)
      retries = 0
      while not db_insert_task.ready():
      if retries > 60:
      break
      retries += 1
      sleep(1)
      return "done"
  • AFAICT the fetch task is running correctly in the highly-threaded celery-io pool.
  • Before the fetch task completes it queues up the insert_espi_xml_into_db task in the single-threaded celery-cpu pool (FIFO to write to the DB):
    @celery.task(bind=True, name="insert_espi_xml_into_db")
    def insert_espi_xml_into_db(self, xml, given_source_id=None, save=False):
    """Parse and insert the XML into the db."""
    print("CALLED")
    if not has_app_context():
    app = create_app(f"open_energy_view.{os.environ.get('FLASK_CONFIG')}")
    app.app_context().push()
    print(has_app_context())
    if save:
    try:
    save_espi_xml(xml)
    except Exception as e:
    print(e)
    save_espi_xml(xml.decode("utf-8"))
    finally:
    pass
    data_update = []
    source_id_memo = {}
    for start, duration, watt_hours, usage_point in parse_espi_data(xml):
    if usage_point not in source_id_memo:
    if given_source_id:
    source_id_memo[usage_point] = [given_source_id]
    else:
    sources = db.session.query(models.Source).filter_by(
    usage_point=usage_point
    )
    if sources.count() == 0:
    print(
    f"could not find usage point {usage_point} in db, probably gas"
    )
    source_id_memo[usage_point] = []
    elif sources.count() > 1:
    print(f"WARNING: {usage_point} is associated with multiple sources")
    source_id_memo[usage_point] = [source.id for source in sources]
    for source_id in source_id_memo[usage_point]:
    data_update.append(
    {
    "start": start,
    "duration": duration,
    "watt_hours": watt_hours,
    "source_id": source_id,
    }
    )
    try:
    db.session.bulk_insert_mappings(models.Espi, data_update)
    db.session.commit()
    except SQLiteException.IntegrityError:
    db.session.rollback()
    sql_statement = """
    INSERT OR REPLACE INTO espi (start, duration, watt_hours, source_id)
    VALUES (:start, :duration, :watt_hours, :source_id)
    """
    db.engine.execute(sql_statement, data_update)
    finally:
    timestamp = int(time.time() * 1000)
    for source_ids in source_id_memo.values():
    for source_id in source_ids:
    source_row = db.session.query(models.Source).filter_by(id=source_id)
    source_row.update({"last_update": timestamp})
    db.session.commit()
  • I'm sure that this is going into the MQ, but what's not happening for me is the celery-cpu queue getting processed. In fact, you can see the "CALLED" print at the top - I think I left that in from when I was trying to get setup on AWS and ran into the same "how was this ever working" situation - anyway, if you see "CALLED" in the stdout that would be a good sign! 😭

As I mentioned, in production these are all running from systemd. I've inspected my config and it does not seem to differ from what you have setup in the docker container.

LMK what you might find when you run that flow.

It's critical for development to be able to mock the PGE request/response in the development environment so that we have an efficient way to test data parsing, fetching etc, thank you for your help!

EDIT: just confirmed that the "fake fetch" is working in production.

  • create a new account with fake email "[email protected]", pw admin
  • select fake utility, name whatever
  • you'll see it load in the first month and add a spinner in the upper right corner. Network will show 202s coming in as it checks on the celery tasks and eventually it will prompt you to reload.
  • this is exactly what should be working in the development environment.

EDIT2: if it's not clear, the architecturally f*(cky thing here is that the insert_to_db task needs the "flask application context" in order to setup the SQL ORM (sql alchemy).

@JPHutchins
Copy link
Owner

JPHutchins commented Jan 28, 2023

JFC there is some embarrassing code in here

finally: 
    pass

@duhruh
Copy link

duhruh commented Oct 29, 2024

Is this still being worked on?

If not i think a good approach would be to at least make docker optional as to not disturb the original flow. So people like me can use the docker container and others can use the app straight up.

for things like

celery = Celery(
    "tasks", backend="rpc://", broker="amqp://jp:admin@localhost:5672/myvhost",
)

we can convert it to

rabbitHost = os.getenv("RABBIT_HOST", default="localhost:5672") 
celery = Celery(
    "tasks", backend="rpc://", broker="amqp://jp:admin@{rabbitHost}/myvhost",
)

same thing with base paths /app vs /whatever

@JPHutchins
Copy link
Owner

I never got docker working 😬. If you're interested in hosting a project like this, I think this repo can serve as a proof of concept, but a new website that implements Green Button Connect My Data for users should start from scratch. Something like celery shouldn't even be needed, architecture can be simplified.

@duhruh
Copy link

duhruh commented Oct 29, 2024

Yea i was mostly thinking about self hosting this if it can provide me good information/dashboards on my pg&e usage. I assumed this was only a self hosted project repo since https://www.openenergyview.com/ seems to be down.

@JPHutchins
Copy link
Owner

Yea i was mostly thinking about self hosting this if it can provide me good information/dashboards on my pg&e usage. I assumed this was only a self hosted project repo since https://www.openenergyview.com/ seems to be down.

This is the repository for openenergyview.com, but I don't have time to maintain it. It uses PGE Share My Data OAuth to get Green Button data from every user that registers (with OEV).

OEV stores the data in a DB to make it easy for users to see their data. The original goal was to pipe that data to Home Assistant users so that they can have energy dashboards for free. Then use the on/off states of smart home devices to determine individual device's power consumption without needing any additional sensors.

Further, utility companies are required to implement Green Button, so this is scalable to the entire US. Auth and APIs would be handled uniquely for each utility company.

Needless to say it fell short.

If you are only trying to get your own data, then the companion SMD repo might work for you. You would still need to run a web server, unless PGE has updated the APIs to avoid that 🤣.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants