Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggregate values #33

Open
wants to merge 13 commits into
base: master
Choose a base branch
from
Open

Conversation

roe-dl
Copy link

@roe-dl roe-dl commented Sep 9, 2022

** 2nd trial after resolving conflicts with other PRs **

Several times users asked for the possibility to output aggregated values such as maximum or minimum values by MQTT. This PR adds this feature by introducing a section [[[calculations]]]. Within this section aggregated values can be defined similar to the aggregation tags for the Cheetah generator.

Example:

[StdRESTful]
...
[[MQTT]]
...
[[[calculations]]]
maxTemp = day.outTemp.max
minTemp = day.outTemp.min
This defines the names maxTemp, which is the daily temperature maximum, and minTemp, which is the minimum. Those values are then included in the MQTT output.

Unit groups are assigned automatically.

There is also a special case: dayRain. As this observation type is originally defined to be send to WU, it is calculated to be inclusive at both ends of the interval (see remark in restx.py). That is not appropriate for the MQTT output and results in wrong values for example within the Belchertown skin. Using dayRain = day.rain.sum dayRain is re-defined and re-calculated to be exclusive at the left end, and that re-calculation is valid for MQTT only. That's why this is the default, if no section [[[calculations]]] is present.

See also poblabs/weewx-belchertown#685

Copy link

@ThomDietrich ThomDietrich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey!
My humble review. Hope my comments make sense.

The addition seems well implemented but I didn't quite understand the use case. Aggregation should (arguably) happen after MQTT. I use Home Assistant for that. That said, I can see users relying more directly on e.g. MQTT driven apps and widgets, for whom this addition is useful.

Does your addition change anything for users without [[[calculations]]] in their config?

bin/user/mqtt.py Outdated
if len(tag)==3:
# example: day.rain.sum
# '$' at the beginning is possible but not necessary
if tag[0][0]=='$': tag[0] = tag[0][1:]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to have a formatter run over your addition. This is just one example

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you address with that comment? I unfortunately do not have a formatter.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your code editor probably has one but you could also e.g. use https://codebeautify.org/python-formatter-beautifier

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only write small extensions and PRs. And I do so using a simple text editor within a terminal session. I am no trained Python programmer. I am mechanical engineer, only. So have mercy.

bin/user/mqtt.py Outdated
@@ -390,7 +410,8 @@ def __init__(self, queue, server_url,
post_interval=None, stale=None,
log_success=True, log_failure=True,
timeout=60, max_tries=3, retry_wait=5,
max_backlog=sys.maxsize):
max_backlog=sys.maxsize,
calculations={'dayRain':'day.rain.sum'}):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure whether the unconditional mentioning of rain is correct here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is. See poblabs/weewx-belchertown#685 for explanation.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused by this. Is this then a bug in the MQTT addon, which needs to be solved at the root rather than considered in your calculations contribution?? @matthewwall your opinion?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so. The general rule of WeeWX is to have an interval, that is open at its left end and closed at its right end. The only exception is dayRain, and it is calculated that way because of the special needs of the the WU upload. Within the WeeWX sources there is a comment on this that is

                # NB: The WU considers the archive with time stamp 00:00
                # (midnight) as (wrongly) belonging to the current day
                # (instead of the previous day). But, it's their site,
                # so we'll do it their way.  That means the SELECT statement
                # is inclusive on both time ends:

For other use cases than the WU upload the daily rain should be calculated in the normal way, that is $day.rain.sum.

The Belchertown skin uses dayRain to display the daily rain, and so it shows a wrong value when it rains just before midnight, if the MQTT extension uses that special calculated dayRain value.

bin/user/mqtt.py Outdated
for agg_obs in self.calculations:
try:
tag = self.calculations[agg_obs].split('.')
if len(tag)==3:
Copy link

@ThomDietrich ThomDietrich Sep 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens otherwise? When can the length of tag be other than 3? I like to log a warning for this kind of thing. Especially in this case, where the error would mainly stem from user config error?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it makes sense to create some error message.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved.

bin/user/mqtt.py Outdated
# register name with unit group if necessary
weewx.units.obs_group_dict.setdefault(agg_obs,__result[2])
except (LookupError,ValueError,TypeError,weewx.UnknownType,weewx.UnknownAggregation,weewx.CannotCalculate) as e:
logerr('%s = %s: error %s' % (obs,tag,e))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a fan of these "fuck it, let's catch all" clauses. Not saying it is wrong in your particular case. Would you be able to separate them out?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me it is no use to send WeeWX to crash if there is some little problem with some little extension.

Copy link

@ThomDietrich ThomDietrich Sep 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly, but not the point.
If you are aware that line 123 might raise a LookupError, handle it there. Rather be specific and handle problems where they arise.

Please don't take this in a demotivating way.

Copy link
Author

@roe-dl roe-dl Sep 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stupid question: So you suggest to repeat all the error handling code (in this case one line, only) for all the places where errors can occur?

In this case I want to make sure, that the code of this function never ever crashes WeeWX. If I do not want to use that "catch all" clause, I would have to surround every single line of code by a try: - except: clause. Don't you think that is a lot of code and difficult to read?

@roe-dl
Copy link
Author

roe-dl commented Sep 9, 2022

See #31 for older comments.

@roe-dl
Copy link
Author

roe-dl commented Sep 9, 2022

@ThomDietrich

Does your addition change anything for users without [[[calculations]]] in their config?

No. Section [[[calculations]]] is not necessary. For people who do not need it, nothing changes.

@roe-dl
Copy link
Author

roe-dl commented Sep 9, 2022

@ThomDietrich

Aggregation should (arguably) happen after MQTT.

There are certainly people who would argue that. If you have some Software at the other end, that can aggregate, then be happy. But that does not apply to all people. Another thing is that MQTT is not reliable in that sense, that you can be sure that the recipient receives all messages sent. If - for example - the aggregation is "sum" then, what do you get? So I guess, aggregation should happen before MQTT to get reliable results.

See also:
#31 (comment)
#31 (comment)

@ThomDietrich
Copy link

Another thing is that MQTT is not reliable in that sense, that you can be sure that the recipient receives all messages sent. If - for example - the aggregation is "sum" then, what do you get? So I guess, aggregation should happen before MQTT to get reliable results.

Maybe I got this wrong. You want rain to be aggregated, which is why there is the published topic dayRain (aside the problem that you state it is buggy)?

@roe-dl
Copy link
Author

roe-dl commented Sep 9, 2022

There are several observation types in WeeWX that contain aggregated values. They are calculated in restx.py That is in the central part of WeeWX. Regarding rain there are hourRain, dayRain and rain24. And they are widely used in skins.

If you want to display live data on the web you use the MQTT extension to transfer the actual readings including those aggregations to the client and process them by Javascript. And I guess, you want to do as least calculations as possible at the client's side.

@matthewwall
Copy link
Owner

i find the use of 'calculations' confusing. there is a 'Calculations' section in the weewx configuration file, and is completely different from what you have implemented for 'calculations'. perhaps your implementation would be more appropriately called 'aggregations'?

my first reaction to this implementation is that it does not belong here. the mqtt extension should simply upload whatever observations are in the loop and/or archive dictionaries. so if someone wants an aggregation, s/he should do that aggregation either somewhere in weewx before mqtt uploading happens, or downstream in an mqtt consumer (i can imagine use cases for both of these).

the reason you see selected aggregation in some uploaders (e.g., some implemented in restx.py) is that the destination requires those aggregations (e.g., a very specific 'dayRain' for WU, or 'dayRain' and 'hourRain' for WOW). mqtt is generic, so there is no specification to meet.

having said that, i'm not sure what an implementation for 'do the aggregation before mqtt uploading' looks like. i would prefer to see at least one or two attempts at that before bogging down the mqtt uploader with additional complexity. note that for some hardware, you do not have to do anything - some of those aggregations already appear in the archive records!

also, you cannot assume that everyone will have 'rain' as an observation. will your implementation try to insert dayRain even if no 'rain' is recorded? what about loop vs archive? remember that no observation is not the same as an observation with value 'NULL', and that is different from an observation with value 0.

i am not a python expert, but from what i have seen, it is poor form to have a catch-all for a bunch of different exception types. better to catch the exceptions where they might occur. that also forces you to test the code and understand exactly what each part of the code might or might not do. this is particularly true for ValueError and TypeError - i can see why you might end up with the other exceptions in a single catch, but having ValueError and TypeError in there indicate that you do not really know how the code might behave. it is laudable to try to not have weewx die, but not by swallowing exceptions.

@roe-dl
Copy link
Author

roe-dl commented Sep 10, 2022

@matthewwall Thank you for commenting on this.

i find the use of 'calculations' confusing.

Ok. Done.

my first reaction to this implementation is that it does not belong here. ...

I understand the point. So what were my thoughts?

  • I see no way to insert some extension between restx.py in the WeeWX kernel and the MQTT extension.
  • And I looked for a more general solution for the Belchertown rain bug than introducing a second daily rain observation type with a different calculation.

the reason you see selected aggregation in some uploaders (e.g., some implemented in restx.py) is that the destination requires those aggregations (e.g., a very specific 'dayRain' for WU, or 'dayRain' and 'hourRain' for WOW). mqtt is generic, so there is no specification to meet.

I agree with you. I see it that way: People can use the MQTT extension to feed whatever application they use. And for me, that requires the possibility to configure the MQTT extension to meet that application's needs.

will your implementation try to insert dayRain even if no 'rain' is recorded?

No.

what about loop vs archive?

The aggregation is only added if a reading of the observation type to be aggregated is present. Otherwise nothing is changed. No None value is emitted.

better to catch the exceptions where they might occur.

I will change that as two people spoke about that

Please note: Calculating an aggregation can call other people's extensions. And I cannot know how they are implemented and whether they are free of bugs or unhandled exceptions. There is no way to find out all the exceptions the call of weewx.xtypes.get_aggregate() can produce. Everyone can extend that function by their code.

@hoetzgit
Copy link

hoetzgit commented Sep 10, 2022

@matthewwall

I am using weewx-mqtt with this PR. What would I have to do without this PR to get the following values as MQTT output in weewx-mqtt Topic?

Example, day totals: rain, sunshineDur (sunshine per interval in seconds), windrun ...

Thanks
Henry

Edit:

"the mqtt extension should simply upload whatever observations are in the loop and/or archive dictionaries."

The extension can still do this, right? [aggregations] is optional

"so if someone wants an aggregation, s/he should do that aggregation either somewhere in weewx before mqtt uploading happens, or downstream in an mqtt consumer"

Why simple when complicated also works? This PR expands the possibilities of the extension and makes it user-friendly. I always thought we Germans are complicated :-)

"the reason you see selected aggregation in some uploaders (e.g., some implemented in restx.py) is that the destination requires those aggregations"

My destination requires this aggregation too.

@bellrichm
Copy link

@roe-dl
Have you given any thoughts to writing a general service to ‘aggregate’ observations and augment either loop packets or archive records?
I am also of the mindset that this doesn’t belong with loop data. My plan is to write a service to augment archive records. My user interface/skin will pull ‘real time’ data from the MQTT loop data and the ‘aggregate’ from MQTT archive data.
I’m just wondering if I am missing something with this approach. Thanks!

@matthewwall
Copy link
Owner

matthewwall commented Sep 10, 2022

another reason for doing aggregation in a separate service is to avoid duplication. for example, if i upload to two different mqtt servers, i should not have to create duplicate aggregation stanzas. similarly, if i want day/week aggregations in both mqtt and influx uploads, you don't want those aggregation queries to happen in each uploader, since that results in double the database queries.

on the other hand, one could argue that aggregation is something that should happen as far down the pipeline as possible (e.g., in the weewx reports).

i don't have an answer yet, but it is healthy to explore the options before committing to an implementation.

@roe-dl
Copy link
Author

roe-dl commented Sep 10, 2022

@bellrichm
Yes, I thought about that

  • A general service cannot change the dayRain value calculated in restx.py and so cannot solve the Belchertown rain bug. And it is also not possible to solve that bug within the Belchertown skin.
  • A general service would influence all the skins and restful services. It cannot be restricted to MQTT output. That can be desired or can be a problem, depending on the user's needs.
  • In general there is no need to have such aggregations for the Cheetah generator. If the aggregation would be calculated within a general service it would influence the Cheetah generator, too. So that wouldn't be the right place to do the aggregation, too.
  • There is no problem with LOOP data, because only those aggregated values are updated that have the "un-aggregated" values present in the data packet, too.

@roe-dl
Copy link
Author

roe-dl commented Sep 10, 2022

@matthewwall

for example, if i upload to two different mqtt servers

I was not aware that this is even possible.

on the other hand, one could argue that aggregation is something that should happen as far down the pipeline as possible (e.g., in the weewx reports).

I would see it that way.

I had some different approaches tested with my installation before sending this PR. I found that this solution was the simplest and most general of them.

@matthewwall
Copy link
Owner

if it turns out that 'aggregations' belongs in the mqtt extension, then that implementation (and the config syntax) probably should be pushed into the weewx core so that it could be use by any other uploader. in that respect, this could be analogous to the 'sensor_map' construct for drivers, or the name/units mappings used by many uploaders with user-configurable mappings. in those instances, we have used a standard syntax for the config definitions, so that eventually we can push the implementation code into the weewx core (if appropriate).

@roe-dl
Copy link
Author

roe-dl commented Sep 10, 2022

I am not sure if Tom would like that idea. But I would agree with you.

@bellrichm
Copy link

@roe-dl Thank you. For my use, I don’t think I have those concerns. But, it something for me to think about.
Thanks again.

@matthewwall
Copy link
Owner

i just realized that 'aggregation' is already used - it indicates whether to send one observation per mqtt message, or send all observations in a single message.

so the keyword 'aggregations' would still work, but it is rather similar, and might cause confusion.

i also just realized that i already built a mechanism in the mqtt uploader (and other uploaders) for adding derivative observations: augment_record. this would be a more appropriate keyword that 'aggregations', since it can contain any derivative value, not just aggregate values. when enabled, it uses the get_record method. this is the appropriate way to augment, and @roe-dl has done exactly that. however, it could be done generically by any restful uploader by defining the list of observations in an augment_record stanza in the configuration file. that way it could be used by any uploader, and could be selectively modified per uploader, with no need to change code. xtypes syntax makes sense here.

my initial take is that it is a design mistake that a base class implementation should contain implementations specific to a derived class, i.e., the definition of dayRain should be in a leaf implementation of get_record, not in the trunk. i will have to take this up with tom, and figuring out a backward-compatible-yet-still-correct implementation could take a bit of work and testing.

@roe-dl
Copy link
Author

roe-dl commented Aug 21, 2023

@matthewwall Is there any news in that case?

May be, WeeWX version 5 is the possibility to introduce a little breaking change as you thought about in your last comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants