Additional asv tests #2185

grusev · 2025-02-18T09:17:00Z

Reference Issues/PRs

More tests migrated to asv with S3 and new framework that allows LMDB, Amazon S3 and others:

batch tests
modification tests - update, append, delete
query benchamarks
finalized test data.

ASV run on S3 without errors here: https://github.com/man-group/ArcticDB/actions/runs/13414624562/job/37472454850

What does this implement or fix?

Change Type (Required)

Patch (Bug fix or non-breaking improvement)
Minor (New feature, but backward compatible)
Major (Breaking changes)
Cherry pick

Any other comments?

Checklist

Checklist for code changes...

Have you updated the relevant docstrings, documentation and copyright notice?
Is this contribution tested against all ArcticDB's features?
Do all exceptions introduced raise appropriate error messages?
Are API changes highlighted in the PR description?
Is the PR labelled as enhancement or bug so it appears in autogenerated release notes?

IvoDD · 2025-02-26T14:29:25Z

python/arcticdb/util/environment_setup.py

@@ -664,6 +665,322 @@ def clear_symbols_cache(self):
            lib._nvs.version_store._clear_symbol_list_keys()


I have encountered a few confusing things not related to the PR itself but the previous PR. They are small but I think would be good to address in this PR while we're still at this? Adding as replies here as I can't add a comment to a non-edited part :/

Here the get_library_names is a bit confusing when using. We have in a few places code like:
self.get_library_names(suffix)[0] which is confusing because what does the [0] stand for?

What do you think about instead passing an argument to the function whether you want a modifiable or a persiatant library? E.g. I think this would be more readable:

self.get_library_name(suffix, lib_type=LibraryType.PERM)

I like the idea! Makes lots of sense and will be clear.

IvoDD · 2025-02-26T14:41:34Z

python/benchmarks/real_finalize_staged_data.py

+
+    def setup(self, cache, num_chunks: int):
+        self.df_cache: CachedDFGenerator = cache["df_cache"]
+        self.set_env = GeneralUseCaseNoSetup.from_storage_info(cache["storage_info"])


Use AWSFinalizeStagedData.SETUP_CLASS instead

we cannot use. Actually we must not use it due to the way asv works. See details why here are the last paragraph: https://github.com/man-group/ArcticDB/wiki/Framework-for-setting-up-environment-needed-for-ASV-tests

IvoDD · 2025-02-26T14:44:10Z

python/benchmarks/real_finalize_staged_data.py

+
+    timeout = 1200
+
+    SETUP_CLASS = (GeneralUseCaseNoSetup(storage=Storage.LMDB, 


class name suggests AWS but we use LMDB?

will change it. I make tests with LMDB as they are faster locally. thus this is clear mistake. Thanks for spotting this

IvoDD · 2025-02-26T15:03:44Z

python/arcticdb/util/utils.py

@@ -553,3 +553,129 @@ def generate_random_dataframe(cls, rows: int, cols: int, indexed: bool = True, s
            gen.add_timestamp_index("index", "s", pd.Timestamp(0))
        return gen.generate_dataframe()

+
+    @classmethod
+    def generate_random_int_dataframe(seclslf, start_name_prefix: str, 


Typo in cls

IvoDD · 2025-02-26T15:12:07Z

python/arcticdb/util/utils.py

+        if round_at is None:
+            data = np.random.uniform(min_value, max_value, size=(num_rows, num_cols)).astype(dtype)
+        else :
+            data = np.round(np.random.uniform(min_value, max_value, 


Instead of writing the np.random.uniform expression twice we can:

data = np.random.unfiorm(something) if round_at is not None: data = np.round(data, round_at)

IvoDD · 2025-02-26T16:01:32Z

python/arcticdb/util/environment_setup.py

+        assert len(sequence_df_list) > 0
+        start = sequence_df_list[0].head(1).index.array[0]
+        last = sequence_df_list[-1].tail(1).index.array[0]
+        return [start, last]


Why not return a tuple?

IvoDD · 2025-02-26T16:20:33Z

python/arcticdb/util/environment_setup.py

+        return next
+
+
+class GeneralSetupOfLibrariesWithSymbols(EnvConfigurationBase):


I personally find the names of all these EnvironmentConfigurations a bit confusing.

All of them start with General which I think just makes the names longer.

Also they live inside environment_setup so when importing environment_setup.General... it's clear that it is for a setup, so maybe let's also drop the setup from the naming.

So what do you think about the following renames:
GeneralSetupLibraryWithSymbols -> SingleLibrary
GeneralSetupSymblsVersionsSnapshots -> LibrariesWithVersionAndSnapshots
GeneralUseCaseNoSetup -> NoSetup
GeneralAppendSetup -> LibraryWithAppendData
GeneralSetupOfLibrariesWithSymbols -> LibrariesPerNumSymbols To indeicate each library is for a specific number of symbols.

I haven't thought too much about the names but it would be good to know e.g. this class will maintain many libraries and different libraries will have different number of symbols.

IvoDD · 2025-02-26T16:22:23Z

python/arcticdb/util/environment_setup.py

+        (list_rows, list_cols) = self._get_symbol_bounds()
+
+        for num_symbols in self._params[self.param_index_num_symbols]:
+            lib = self.get_library(num_symbols)


This will raise if the library does not exist? Should we have a try catch to return False if something in the check raised?
I think that holds for all check_ok steps, so maybe the try catch can live in the setup_environment?

IvoDD · 2025-02-26T16:23:15Z

python/arcticdb/util/environment_setup.py

+        return self
+
+    def check_symbol(self, lib: Library, symbols_number: int):
+        pass


Since we're not checking the actual symbol data, should we remove this function?

IvoDD · 2025-02-26T16:24:44Z

python/arcticdb/util/environment_setup.py

+        start = time.time()
+        for sym_num in range(symbols_number):
+            for row in list_rows:
+                for col in list_cols:


This looks like surprising behavior to me. Won't we generate num_symbols * num_rows * num_cols instead of just num_symbols?

Georgi Rusev added 8 commits February 18, 2025 11:11

initial commit

d4934ff

fix omission

71c4d93

lmdb test added and also some more logging

0ea52b6

new theory

f8c643c

small fixes

d3a5c3f

fix

42a140a

fix

6554d63

turn off execution of a test

4dfb554

github-actions bot added patch Small change, should increase patch version and removed patch Small change, should increase patch version labels Feb 19, 2025

Georgi Rusev added 2 commits February 19, 2025 12:17

fixed delete tests

f8cdd50

small error fix

84acd3b

github-actions bot added patch Small change, should increase patch version and removed patch Small change, should increase patch version labels Feb 19, 2025

grusev marked this pull request as ready for review February 19, 2025 11:53

grusev requested review from alexowens90, willdealtry and poodlewars as code owners February 19, 2025 11:53

last attempt

564e2d6

github-actions bot added patch Small change, should increase patch version and removed patch Small change, should increase patch version labels Feb 19, 2025

remove setup bug

2f04615

github-actions bot added patch Small change, should increase patch version and removed patch Small change, should increase patch version labels Feb 20, 2025

grusev mentioned this pull request Feb 25, 2025

Memray tests for memory leaks for head() and tail() #2199

Open

9 tasks

grusev and others added 2 commits February 25, 2025 13:44

Merge branch 'master' into asv_s3_more

e1e9865

fix bug

f3fdcd0

grusev mentioned this pull request Feb 26, 2025

Comparison of memory efficiency LMDB, S3, Pandas for read/write operations on dataframes (new approach on making sure we accurately measure peakmem with ASV) #2204

Open

9 tasks

IvoDD reviewed Feb 26, 2025

View reviewed changes

addressed comments

8520125

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Additional asv tests #2185

Additional asv tests #2185

grusev commented Feb 18, 2025 •

edited

Loading

IvoDD Feb 26, 2025

IvoDD Feb 26, 2025

grusev Feb 27, 2025

IvoDD Feb 26, 2025

grusev Feb 27, 2025

IvoDD Feb 26, 2025

grusev Feb 27, 2025

IvoDD Feb 26, 2025

IvoDD Feb 26, 2025

IvoDD Feb 26, 2025

IvoDD Feb 26, 2025

IvoDD Feb 26, 2025

IvoDD Feb 26, 2025

IvoDD Feb 26, 2025

		@@ -664,6 +665,322 @@ def clear_symbols_cache(self):
		lib._nvs.version_store._clear_symbol_list_keys()


		timeout = 1200

		SETUP_CLASS = (GeneralUseCaseNoSetup(storage=Storage.LMDB,

		return next


		class GeneralSetupOfLibrariesWithSymbols(EnvConfigurationBase):

Additional asv tests #2185

Are you sure you want to change the base?

Additional asv tests #2185

Conversation

grusev commented Feb 18, 2025 • edited Loading

Reference Issues/PRs

What does this implement or fix?

Change Type (Required)

Any other comments?

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

grusev commented Feb 18, 2025 •

edited

Loading