You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, we have a ConsoleReporter and a FileReporter in place, which, as their names suggest, report contents of benchmark records, i.e., results of and context around a benchmarking run of a set of parameters. "Reporting" in this case means to present the data contained in the benchmark records in a compelling way, for example in a table as seen in the README:
importnnbench@nnbench.benchmarkdefproduct(a: int, b: int) ->int:
returna*b@nnbench.benchmarkdefpower(a: int, b: int) ->int:
returna**breporter=nnbench.ConsoleReporter()
# first, collect the above benchmarks directly from the current module...benchmarks=nnbench.collect("__main__")
# ... then run the benchmarks with the parameters `a=2, b=10`...record=nnbench.run(benchmarks, params={"a": 2, "b": 10})
reporter.display(record) # ...and print the results to the terminal.# results in a table look like the following:# ┏━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓# ┃ Benchmark ┃ Value ┃ Wall time (ns) ┃ Parameters ┃# ┡━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩# │ product │ 20 │ 1917 │ {'a': 2, 'b': 10} │# │ power │ 1024 │ 583 │ {'a': 2, 'b': 10} │# └───────────┴───────┴────────────────┴───────────────────┘
The console reporter writes the data to a rich table into stdout, while the file reporter writes them to a local file (currently JSON, YAML, CSV, Parquet, and ndjson formats are supported), and afterwards optionally copies it to a remote location using fsspec.
I'm currently working to extend the reporting capabilities to databases (streaming to/from DBs, like postgres/sqlite) and web services (GET/POSTing records in JSON format, like mlflow), but a coherent design has so far eluded me.
While the obvious choice would be a general read/write interface like this:
# Example implementation, taken from issue 48.classFileIOReporter:
defwrite_record(self, r):
...
defwrite_record_batched(self, rb):
...
defread_record(self):
...
defopen(self, fp):
...
defclose(self, fp):
...
classJSONFileReporter(FileIOReporter):
defopen(self, fp):
withopen(fp, "r") asf:
returnjson.load(f)
defclose(self, fp):
fp.close()
classYAMLFileReporter(FileIOReporter):
... # same thing as for JSON, but use YAML read/write APIs.
One of the problems is that database reads in general require a query to be useful, while file reads don't - that messes up at least the read() interface:
classFileReporter:
defread(self, file: str|os.PathLike[str]) ->BenchmarkRecord:
...
classDatabaseReporter:
# incompatible with the above.defread(self, db: DatabaseInstance, query: str) ->BenchmarkRecord:
...
Another way (which the previous issue already alluded to in its title) would be to create an N-way taxonomy of IO, say, console | files | databases | web, and implement one interface for each.
A potential drawback for this is that some tools fit multiple of these categories, like duckDB, which supports SQL query-based analysis of local or remote files like Parquet/ndjson, so a duckDB reporter would need access to both file and database methods to be useful.
Suggested action:
This is to be understood in the context of our current project.
Decide on a way to handle different kinds of sinks (write) and sources (read) for benchmark data.
Decide on the IO vs. reporting responsibilities.
Implement benchmark record IO for a single database and/or web service (top prio: mlflow).
Explore composability of multiple IOs for a single reporter (like duckDB mentioned before).
cc @AdrianoKF@schroedk@janwillemkl - feel free to comment, create own action items, reach out in case of any unclear points.
The text was updated successfully, but these errors were encountered:
Supersedes / extends #48.
Currently, we have a
ConsoleReporter
and aFileReporter
in place, which, as their names suggest, report contents of benchmark records, i.e., results of and context around a benchmarking run of a set of parameters. "Reporting" in this case means to present the data contained in the benchmark records in a compelling way, for example in a table as seen in the README:The console reporter writes the data to a
rich
table into stdout, while the file reporter writes them to a local file (currently JSON, YAML, CSV, Parquet, and ndjson formats are supported), and afterwards optionally copies it to a remote location using fsspec.I'm currently working to extend the reporting capabilities to databases (streaming to/from DBs, like postgres/sqlite) and web services (GET/POSTing records in JSON format, like mlflow), but a coherent design has so far eluded me.
While the obvious choice would be a general read/write interface like this:
One of the problems is that database reads in general require a query to be useful, while file reads don't - that messes up at least the
read()
interface:Another way (which the previous issue already alluded to in its title) would be to create an N-way taxonomy of IO, say, console | files | databases | web, and implement one interface for each.
A potential drawback for this is that some tools fit multiple of these categories, like duckDB, which supports SQL query-based analysis of local or remote files like Parquet/ndjson, so a duckDB reporter would need access to both file and database methods to be useful.
Suggested action:
This is to be understood in the context of our current project.
cc @AdrianoKF @schroedk @janwillemkl - feel free to comment, create own action items, reach out in case of any unclear points.
The text was updated successfully, but these errors were encountered: