Optimize calculate_data and redact_data() #361

Lash-L · 2024-11-07T22:58:41Z

redact data likely needs more work. Gonna let my instance run for a while and then try it. With limited data it does show some pretty large improvements.

There were some merge conflicts for calculate_data and update_advertisement. So if you could just give those a extra lookover that would probably be good!

# Conflicts: # custom_components/bermuda/bermuda_device_scanner.py

agittins

This is great stuff :D

I'm also thinking that we should:

in redaction_list_update():
- store a (errr... static local?) stamp of MONOTONIC_TIME(), and if the last one was more than... maybe several hours? Then re-initialise list so we don't end up iterating over fixes for addresses (and associated matches) that have long since been purged from the system (ie, when a user does two diags but quite some time apart). There's probably a trade-off between cache miss/size cost and regeneration-time.
- (maybe/or) schedule a job in hass to purge the redactions[] after a certain time, to free up the memory I assume they'd take up. Maybe it's not big enough to matter.
- never apply .upper() to address (since we always normalise the address to lowercase) and always lower() any non-known-lowered address when adding/searching keys in redactions.
Then in redact_data() we can remove the case-checking variants, and always look for redactions[thing.lower()] / if thing.lower() in redactions or if find in data.lower().

Also, maybe we can ditch the re.sub altogether and use python's string.replace() since it might be a lot faster and we're not using the power of regex at all, really. Notes should be added to the update_redactions info to explain that the keys are not regex, but lower() forms of literal strings to match.

Also...

This comment is an interesting idea that might apply to our use-case:

If the replacements do not change the string size, you can speed up things a lot by using bytearray / buffers or even mmap and modify the data in-place. (re.sub() and string.replace and endstring += line cause that a lot of memory is copyied around.)

custom_components/bermuda/bermuda_device_scanner.py

agittins · 2024-11-08T14:00:26Z

custom_components/bermuda/bermuda_device_scanner.py

@@ -384,12 +397,16 @@ def calculate_data(self):
                    )
                # Discard the bogus reading by duplicating the last.
                self.hist_distance_by_interval.insert(0, self.hist_distance_by_interval[0])


This has a bug, just reported in #362 - might be worth including in this PR as a quickie? Suggestion is un-tested.

Suggested change

self.hist_distance_by_interval.insert(0, self.hist_distance_by_interval[0])

if len(self.hist_distance_by_interval) == 0:

self.hist_distance_by_interval = [ self.rssi_distance_raw ]

else:

self.hist_distance_by_interval.insert(0, self.hist_distance_by_interval[0])

custom_components/bermuda/bermuda_device_scanner.py

custom_components/bermuda/coordinator.py

Co-authored-by: Ashley Gittins <[email protected]>

Lash-L · 2024-11-11T16:17:29Z

I'm also thinking that we should:

in redaction_list_update():

store a (errr... static local?) stamp of MONOTONIC_TIME(), and if the last one was more than... maybe several hours? Then re-initialise list so we don't end up iterating over fixes for addresses (and associated matches) that have long since been purged from the system (ie, when a user does two diags but quite some time apart). There's probably a trade-off between cache miss/size cost and regeneration-time.

(maybe/or) schedule a job in hass to purge the redactions[] after a certain time, to free up the memory I assume they'd take up. Maybe it's not big enough to matter.

never apply .upper() to address (since we always normalise the address to lowercase) and always lower() any non-known-lowered address when adding/searching keys in redactions.

Then in redact_data() we can remove the case-checking variants, and always look for redactions[thing.lower()] / if thing.lower() in redactions or if find in data.lower().

Also, maybe we can ditch the re.sub altogether and use python's string.replace() since it might be a lot faster and we're not using the power of regex at all, really. Notes should be added to the update_redactions info to explain that the keys are not regex, but lower() forms of literal strings to match.

in redaction_list_update():
- store a (errr... static local?) stamp of MONOTONIC_TIME(), and if the last one was more than... maybe several hours? Then re-initialise list so we don't end up iterating over fixes for addresses (and associated matches) that have long since been purged from the system (ie, when a user does two diags but quite some time apart). There's probably a trade-off between cache miss/size cost and regeneration-time.
- (maybe/or) schedule a job in hass to purge the redactions[] after a certain time, to free up the memory I assume they'd take up. Maybe it's not big enough to matter.
- never apply .upper() to address (since we always normalise the address to lowercase) and always lower() any non-known-lowered address when adding/searching keys in redactions.
Then in redact_data() we can remove the case-checking variants, and always look for redactions[thing.lower()] / if thing.lower() in redactions or if find in data.lower().
ditch the re.sub

Lash-L · 2024-11-11T16:18:39Z

Ready for another lookover @agittins

Redactions are deleted every 8 hours.

agittins · 2024-11-13T15:13:58Z

Just a heads-up that I've had to make some changes to get some bugfixes out, so I've rebased your PR against main, I hope I've done that right :-/ There might be some areas (particularly in coordinator) where I've made changes to main that might need careful merging - I'm not 100% sure, I'm still getting to grips with handling merge conflicts (only been mucking around with various VCSs since the early 2000's, can't go too hard on myself! 😅 )

Re the purging, I think there are job/task types that should cancel automatically when the config entry is blatted - I think the ones you attach via the config entry itself might fit that - but I didn't find any that let you schedule future jobs when I was altering the device registry refresh thingie, so just used a timestamp.

Lash-L · 2024-11-13T17:57:20Z

Your merge looks fine!

Yeah there is probably a better way to handle it, but honestly this seems to be fine imo. I think the tests were just being annoying

…ins#383) * fix: Slowed global diag entity updates fixes [Rate-limit] Global visible device count [and others] agittins#377 - Added _cached_ratelimit to BermudaGlobalEntity and applied to sensors - Added ref_power_changed stamp and tied between ref_power changes and cache invalidation in entities, so calibration that causes increased distances show up immediately. - allowed specifying custom interval to main _cached_rateliimit as well as global.

* fix: Local USB Bluetooth detection - fixes Failed to find Scanner (local USB Bluetooth) agittins#386 - linting * - linting

…ins#398) * Add Dutch translation * Update Greek translation * Update English (base) translation * Update config_flow.py to work better with translations

- fix: IRK and iBeacon redactions (no ":") - Catch raw iBeacon uuid in advert hex strings - Ensure redaction list is refreshed before processing data (first_run on recursive func)

- Remove redundant scanner data - Tidy check for scanner_sends_stamps - Removed doubled-up fields from merge conflict chaos. - Clean last bit of dist_count

agittins · 2024-12-01T18:07:03Z

Whew! Sorry that took a long time for me to move on.

I got a bit wedged with stuff that I wanted finessed and got messed up with trying to merge with the latest changes, so I decided to just push the changes I wanted and get it merged :-)

Hopefully the PR attribution etc all works correctly and I haven't backed out anything in your PR while getting confused resolving merge conflicts. I have made some opinionated choices re using len() and possibly insert, but I think I've kept the core of what you have been working on, and the redaction stuff especially is amazing!

Lash-L added 5 commits November 6, 2024 23:42

some calc data updates

97fae83

Optimize redact_data()

ee32807

Merge branch 'main' into optimizations

6dfc794

# Conflicts: # custom_components/bermuda/bermuda_device_scanner.py

Merge in dev

8eab1f0

Clean up and remove self.hist_dist_count

ef2c9ab

agittins reviewed Nov 8, 2024

View reviewed changes

Lash-L and others added 3 commits November 11, 2024 10:32

Some MR comments

4a0fa09

Apply suggestions from code review

0dd4552

Co-authored-by: Ashley Gittins <[email protected]>

Address more comments

e6b413f

change time to 8 hours

20c0e0f

Lash-L and others added 15 commits November 11, 2024 11:23

Add cancel on shutdown

da6e06e

cancel on unload

d82f246

Add on unload

c29f1fe

Move to init so it only gets run after startup

19b54c9

Move to init so it only gets run after startup

15dc185

Try again

05c8512

add back removed

7625a63

change to async_remove

e70e01d

Try try again

46e5156

Try try again

aa568bb

I think finally fixed

f9340c7

change error

2ddaef0

Change error again

b8acc4a

Change error again

7b746c9

Merge branch 'main' into optimizations

4371094

fix: regression detecting local usb adaptor as scanner

2c2e135

agittins and others added 13 commits December 1, 2024 13:34

fix: daft typo

9157901

fix: Local USB Bluetooth detection (agittins#387) fixes agittins#386

aba9257

* fix: Local USB Bluetooth detection - fixes Failed to find Scanner (local USB Bluetooth) agittins#386 - linting * - linting

Add Dutch translation and reduce hardcoded language in scripts (agitt…

dc6637f

…ins#398) * Add Dutch translation * Update Greek translation * Update English (base) translation * Update config_flow.py to work better with translations

some calc data updates

9ad61b1

Clean up and remove self.hist_dist_count

26296a2

Merge branch 'main' into optimizations

c29b409

feat: Redaction improvements

7f7c652

- fix: IRK and iBeacon redactions (no ":") - Catch raw iBeacon uuid in advert hex strings - Ensure redaction list is refreshed before processing data (first_run on recursive func)

perf: add lru_cache to rssi cals and clean_charbuf

92d4b45

linting

07dd6dc

Log redact timing in dump_devices

f565d20

Cleanups: BermudaDeviceScanner scanner

6b062be

- Remove redundant scanner data - Tidy check for scanner_sends_stamps - Removed doubled-up fields from merge conflict chaos. - Clean last bit of dist_count

linting

f64e492

agittins merged commit f275baf into agittins:main Dec 1, 2024
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize calculate_data and redact_data() #361

Optimize calculate_data and redact_data() #361

Lash-L commented Nov 7, 2024

agittins left a comment

agittins Nov 8, 2024

Lash-L commented Nov 11, 2024

Lash-L commented Nov 11, 2024

agittins commented Nov 13, 2024

Lash-L commented Nov 13, 2024

agittins commented Dec 1, 2024

Optimize calculate_data and redact_data() #361

Optimize calculate_data and redact_data() #361

Conversation

Lash-L commented Nov 7, 2024

agittins left a comment

Choose a reason for hiding this comment

agittins Nov 8, 2024

Choose a reason for hiding this comment

Lash-L commented Nov 11, 2024

Lash-L commented Nov 11, 2024

agittins commented Nov 13, 2024

Lash-L commented Nov 13, 2024

agittins commented Dec 1, 2024