Quickscan #269

jreadey · 2023-10-10T01:48:22Z

This change addresses issue #255 - verbose info not correct.

For the verbose param to return the current info for domain or datasets, you'll need to do a PUT / with rescan and flush params set. See the checkVerbose function in value_test.py to see how this works.

mattjala · 2023-10-10T14:16:50Z

hsds/datanode.py

+            last_action = time.time()
+
+        now = time.time()
+        if (now - last_action) > async_sleep_time:


If last_action is set to the current time before the scanning loop is performed/at the end of each scan loop, won't this always be true?

If there are no root_ids, last_action doesn't get updated, so eventually the if case will get executed.

mattjala · 2023-10-10T14:18:04Z

hsds/datanode.py

+        if (now - last_action) > async_sleep_time:
+            sleep_time = async_sleep_time  # long nap
+        else:
+            sleep_time = short_sleep_time  # shot nap


What's the idea behind varying the time between scans based on how long the scan itself takes? Or is last_action tracking something else?

If there's nothing going on, use a longer sleep and save cpu cycles. It's not how long the scan takes but rather the last time there was a domain to scan.

Here's a quick check using the value_test.py test case.
I modified the testPut1DDataset to do a sequence of flushes and rescans:

# check values we should get from a verbose query import time ts = time.time() for i in range(10): expected = {"num_chunks": 1, "allocated_size": 40} self.checkVerbose(dset_id, headers=headers, expected=expected) now = time.time() elapsed = now - ts print(f"flush {elapsed:.2f}") ts = now

Output is:

$ python value_test.py ValueTest.testPut1DDataset testPut1DDataset /home/test_user1/hsds_test/valuetest/20231010T154859_160988Z flush 2.54 flush 0.12 flush 0.12 flush 0.13 flush 0.13 flush 0.13 flush 0.12 flush 0.12 flush 0.13 flush 0.13

The first round takes a couple of seconds because apparently the bucketScan task is in long sleep mode. After the first flush, it "wakes up" and each update only takes about 0.1s.

mattjala · 2023-10-10T14:32:48Z

admin/config/config.yml

@@ -32,7 +32,7 @@ log_prefix: null # Prefix text to append to log entries
 max_tcp_connections: 100 # max number of inflight tcp connections
 head_sleep_time: 10 # max sleep time between health checks for head node
 node_sleep_time: 10 # max sleep time between health checks for SN/DN nodes
-async_sleep_time: 10 # max sleep time between async task runs


Why the change in automatic scan frequency? I thought the idea was to allow certain requests to force a scan, in which case making automatic scans this frequent would be unnecessary.

The buckeScan task is still running independently, so the sleep is there to not unnecessarily consume cpu.
It might be better to use something like asyncio.Event (see https://docs.python.org/3/library/asyncio-sync.html#asyncio.Event).

I'll leave using signals as a future enhancement since what we have should be good enough for this non-critical path.

mattjala

This enabled H5Dget_storage_size to work from the REST VOL, when it requests a flush and rescan first. I don't fully understand the reasoning behind all the scan changes, but we can iron those out later. Approved.

jreadey added 5 commits October 5, 2023 18:18

fix async errors in getting dset layout

2f33dca

first pass for quickscan

789237e

fix flake8 error

f257fb4

update for synchronous verbose info

b0c02de

Merge branch 'master' into quickscan

5ae437d

jreadey assigned mattjala Oct 10, 2023

mattjala reviewed Oct 10, 2023

View reviewed changes

mattjala approved these changes Oct 10, 2023

View reviewed changes

mattjala mentioned this pull request Oct 10, 2023

Flush domain before requesting allocated bytes HDFGroup/vol-rest#84

Merged

jreadey merged commit 8fb1bb6 into master Oct 10, 2023
10 checks passed

jreadey deleted the quickscan branch April 29, 2024 21:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quickscan #269

Quickscan #269

jreadey commented Oct 10, 2023

mattjala Oct 10, 2023

jreadey Oct 10, 2023

mattjala Oct 10, 2023

jreadey Oct 10, 2023

jreadey Oct 10, 2023

mattjala Oct 10, 2023 •

edited

Loading

jreadey Oct 10, 2023

jreadey Oct 10, 2023

mattjala left a comment

Quickscan #269

Quickscan #269

Conversation

jreadey commented Oct 10, 2023

mattjala Oct 10, 2023

Choose a reason for hiding this comment

jreadey Oct 10, 2023

Choose a reason for hiding this comment

mattjala Oct 10, 2023

Choose a reason for hiding this comment

jreadey Oct 10, 2023

Choose a reason for hiding this comment

jreadey Oct 10, 2023

Choose a reason for hiding this comment

mattjala Oct 10, 2023 • edited Loading

Choose a reason for hiding this comment

jreadey Oct 10, 2023

Choose a reason for hiding this comment

jreadey Oct 10, 2023

Choose a reason for hiding this comment

mattjala left a comment

Choose a reason for hiding this comment

mattjala Oct 10, 2023 •

edited

Loading