Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional asv tests #2185

Open
wants to merge 15 commits into
base: master
Choose a base branch
from
Open

Additional asv tests #2185

wants to merge 15 commits into from

Conversation

grusev
Copy link
Collaborator

@grusev grusev commented Feb 18, 2025

Reference Issues/PRs

More tests migrated to asv with S3 and new framework that allows LMDB, Amazon S3 and others:

  • batch tests
  • modification tests - update, append, delete
  • query benchamarks
  • finalized test data.

ASV run on S3 without errors here: https://github.com/man-group/ArcticDB/actions/runs/13414624562/job/37472454850

What does this implement or fix?

Change Type (Required)

  • Patch (Bug fix or non-breaking improvement)
  • Minor (New feature, but backward compatible)
  • Major (Breaking changes)
  • Cherry pick

Any other comments?

Checklist

Checklist for code changes...
  • Have you updated the relevant docstrings, documentation and copyright notice?
  • Is this contribution tested against all ArcticDB's features?
  • Do all exceptions introduced raise appropriate error messages?
  • Are API changes highlighted in the PR description?
  • Is the PR labelled as enhancement or bug so it appears in autogenerated release notes?

@github-actions github-actions bot added patch Small change, should increase patch version and removed patch Small change, should increase patch version labels Feb 19, 2025
@github-actions github-actions bot added patch Small change, should increase patch version and removed patch Small change, should increase patch version labels Feb 19, 2025
@grusev grusev marked this pull request as ready for review February 19, 2025 11:53
@github-actions github-actions bot added patch Small change, should increase patch version and removed patch Small change, should increase patch version labels Feb 19, 2025
@github-actions github-actions bot added patch Small change, should increase patch version and removed patch Small change, should increase patch version labels Feb 20, 2025
@@ -664,6 +665,322 @@ def clear_symbols_cache(self):
lib._nvs.version_store._clear_symbol_list_keys()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have encountered a few confusing things not related to the PR itself but the previous PR. They are small but I think would be good to address in this PR while we're still at this? Adding as replies here as I can't add a comment to a non-edited part :/

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the get_library_names is a bit confusing when using. We have in a few places code like:
self.get_library_names(suffix)[0] which is confusing because what does the [0] stand for?

What do you think about instead passing an argument to the function whether you want a modifiable or a persiatant library? E.g. I think this would be more readable:

self.get_library_name(suffix, lib_type=LibraryType.PERM)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea! Makes lots of sense and will be clear.


def setup(self, cache, num_chunks: int):
self.df_cache: CachedDFGenerator = cache["df_cache"]
self.set_env = GeneralUseCaseNoSetup.from_storage_info(cache["storage_info"])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use AWSFinalizeStagedData.SETUP_CLASS instead

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we cannot use. Actually we must not use it due to the way asv works. See details why here are the last paragraph: https://github.com/man-group/ArcticDB/wiki/Framework-for-setting-up-environment-needed-for-ASV-tests


timeout = 1200

SETUP_CLASS = (GeneralUseCaseNoSetup(storage=Storage.LMDB,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

class name suggests AWS but we use LMDB?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will change it. I make tests with LMDB as they are faster locally. thus this is clear mistake. Thanks for spotting this

@@ -553,3 +553,129 @@ def generate_random_dataframe(cls, rows: int, cols: int, indexed: bool = True, s
gen.add_timestamp_index("index", "s", pd.Timestamp(0))
return gen.generate_dataframe()


@classmethod
def generate_random_int_dataframe(seclslf, start_name_prefix: str,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in cls

if round_at is None:
data = np.random.uniform(min_value, max_value, size=(num_rows, num_cols)).astype(dtype)
else :
data = np.round(np.random.uniform(min_value, max_value,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of writing the np.random.uniform expression twice we can:

data = np.random.unfiorm(something)
if round_at is not None:
    data = np.round(data, round_at)

assert len(sequence_df_list) > 0
start = sequence_df_list[0].head(1).index.array[0]
last = sequence_df_list[-1].tail(1).index.array[0]
return [start, last]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not return a tuple?

return next


class GeneralSetupOfLibrariesWithSymbols(EnvConfigurationBase):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally find the names of all these EnvironmentConfigurations a bit confusing.

All of them start with General which I think just makes the names longer.

Also they live inside environment_setup so when importing environment_setup.General... it's clear that it is for a setup, so maybe let's also drop the setup from the naming.

So what do you think about the following renames:
GeneralSetupLibraryWithSymbols -> SingleLibrary
GeneralSetupSymblsVersionsSnapshots -> LibrariesWithVersionAndSnapshots
GeneralUseCaseNoSetup -> NoSetup
GeneralAppendSetup -> LibraryWithAppendData
GeneralSetupOfLibrariesWithSymbols -> LibrariesPerNumSymbols To indeicate each library is for a specific number of symbols.

I haven't thought too much about the names but it would be good to know e.g. this class will maintain many libraries and different libraries will have different number of symbols.

(list_rows, list_cols) = self._get_symbol_bounds()

for num_symbols in self._params[self.param_index_num_symbols]:
lib = self.get_library(num_symbols)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will raise if the library does not exist? Should we have a try catch to return False if something in the check raised?
I think that holds for all check_ok steps, so maybe the try catch can live in the setup_environment?

return self

def check_symbol(self, lib: Library, symbols_number: int):
pass
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're not checking the actual symbol data, should we remove this function?

start = time.time()
for sym_num in range(symbols_number):
for row in list_rows:
for col in list_cols:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like surprising behavior to me. Won't we generate num_symbols * num_rows * num_cols instead of just num_symbols?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
patch Small change, should increase patch version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants