-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Additional asv tests #2185
base: master
Are you sure you want to change the base?
Additional asv tests #2185
Conversation
@@ -664,6 +665,322 @@ def clear_symbols_cache(self): | |||
lib._nvs.version_store._clear_symbol_list_keys() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have encountered a few confusing things not related to the PR itself but the previous PR. They are small but I think would be good to address in this PR while we're still at this? Adding as replies here as I can't add a comment to a non-edited part :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here the get_library_names
is a bit confusing when using. We have in a few places code like:
self.get_library_names(suffix)[0]
which is confusing because what does the [0]
stand for?
What do you think about instead passing an argument to the function whether you want a modifiable or a persiatant library? E.g. I think this would be more readable:
self.get_library_name(suffix, lib_type=LibraryType.PERM)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea! Makes lots of sense and will be clear.
|
||
def setup(self, cache, num_chunks: int): | ||
self.df_cache: CachedDFGenerator = cache["df_cache"] | ||
self.set_env = GeneralUseCaseNoSetup.from_storage_info(cache["storage_info"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use AWSFinalizeStagedData.SETUP_CLASS
instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we cannot use. Actually we must not use it due to the way asv works. See details why here are the last paragraph: https://github.com/man-group/ArcticDB/wiki/Framework-for-setting-up-environment-needed-for-ASV-tests
|
||
timeout = 1200 | ||
|
||
SETUP_CLASS = (GeneralUseCaseNoSetup(storage=Storage.LMDB, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
class name suggests AWS
but we use LMDB
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will change it. I make tests with LMDB as they are faster locally. thus this is clear mistake. Thanks for spotting this
python/arcticdb/util/utils.py
Outdated
@@ -553,3 +553,129 @@ def generate_random_dataframe(cls, rows: int, cols: int, indexed: bool = True, s | |||
gen.add_timestamp_index("index", "s", pd.Timestamp(0)) | |||
return gen.generate_dataframe() | |||
|
|||
|
|||
@classmethod | |||
def generate_random_int_dataframe(seclslf, start_name_prefix: str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo in cls
python/arcticdb/util/utils.py
Outdated
if round_at is None: | ||
data = np.random.uniform(min_value, max_value, size=(num_rows, num_cols)).astype(dtype) | ||
else : | ||
data = np.round(np.random.uniform(min_value, max_value, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of writing the np.random.uniform
expression twice we can:
data = np.random.unfiorm(something)
if round_at is not None:
data = np.round(data, round_at)
assert len(sequence_df_list) > 0 | ||
start = sequence_df_list[0].head(1).index.array[0] | ||
last = sequence_df_list[-1].tail(1).index.array[0] | ||
return [start, last] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not return a tuple?
return next | ||
|
||
|
||
class GeneralSetupOfLibrariesWithSymbols(EnvConfigurationBase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally find the names of all these EnvironmentConfiguration
s a bit confusing.
All of them start with General
which I think just makes the names longer.
Also they live inside environment_setup
so when importing environment_setup.General...
it's clear that it is for a setup, so maybe let's also drop the setup from the naming.
So what do you think about the following renames:
GeneralSetupLibraryWithSymbols
-> SingleLibrary
GeneralSetupSymblsVersionsSnapshots
-> LibrariesWithVersionAndSnapshots
GeneralUseCaseNoSetup
-> NoSetup
GeneralAppendSetup
-> LibraryWithAppendData
GeneralSetupOfLibrariesWithSymbols
-> LibrariesPerNumSymbols
To indeicate each library is for a specific number of symbols.
I haven't thought too much about the names but it would be good to know e.g. this class will maintain many libraries and different libraries will have different number of symbols.
(list_rows, list_cols) = self._get_symbol_bounds() | ||
|
||
for num_symbols in self._params[self.param_index_num_symbols]: | ||
lib = self.get_library(num_symbols) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will raise if the library does not exist? Should we have a try catch to return False
if something in the check raised?
I think that holds for all check_ok steps, so maybe the try catch can live in the setup_environment
?
return self | ||
|
||
def check_symbol(self, lib: Library, symbols_number: int): | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're not checking the actual symbol data, should we remove this function?
start = time.time() | ||
for sym_num in range(symbols_number): | ||
for row in list_rows: | ||
for col in list_cols: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like surprising behavior to me. Won't we generate num_symbols * num_rows * num_cols
instead of just num_symbols
?
Reference Issues/PRs
More tests migrated to asv with S3 and new framework that allows LMDB, Amazon S3 and others:
ASV run on S3 without errors here: https://github.com/man-group/ArcticDB/actions/runs/13414624562/job/37472454850
What does this implement or fix?
Change Type (Required)
Any other comments?
Checklist
Checklist for code changes...