-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.qmd
355 lines (233 loc) · 15.9 KB
/
README.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
---
title: "Metrics As Scores [![DOI](https://zenodo.org/badge/524333119.svg)](https://zenodo.org/badge/latestdoi/524333119) [![status](https://joss.theoj.org/papers/eb549efe6c0111490395496c68717579/status.svg)](https://joss.theoj.org/papers/eb549efe6c0111490395496c68717579) [![codecov](https://codecov.io/github/MrShoenel/metrics-as-scores/branch/master/graph/badge.svg?token=HO1GYXVEUQ)](https://codecov.io/github/MrShoenel/metrics-as-scores)"
top-level-division: section
jupyter: python3
bibliography:
- refs.bib
- paper/refs.bib
- src/metrics_as_scores/datasets/known-datasets.bib
format:
pdf:
title: "Metrics As Scores [![DOI](./zenodo.eps)](https://zenodo.org/badge/latestdoi/524333119)"
table-of-contents: true
toc-depth: 3
papersize: a4
documentclass: scrartcl
gfm:
citeproc: true
toc: true
toc-depth: 3
number-sections: true
---
```{python}
#| echo: false
from IPython.display import display, Markdown
from pathlib import Path
from sys import path
from os import getcwd
root = Path(getcwd())
path.append(str(root.joinpath('./src').resolve()))
from metrics_as_scores.__version__ import __version__ as MAS_VERSION
dm = lambda s: display(Markdown(s))
```
-----------------
```{python}
#| echo: false
dm(f'''
**Please Note**: ___Metrics As Scores___ (`MAS`) changed considerably between versions [**`v1.0.8`**](https://github.com/MrShoenel/metrics-as-scores/tree/v1.0.8) and **`v2.x.x`**.
The current version is `v{MAS_VERSION}`.
'''.strip())
```
From version **`v2.x.x`** it has the following new features:
- [Textual User Interface (TUI)](#text-based-user-interface-tui)
- Proper documentation and testing
- New version on PyPI. Install the package and run the command line interface by typing **`mas`**!
[Metrics As Scores Demo.](https://user-images.githubusercontent.com/5049151/219892077-58854478-b761-4a3d-9faf-2fe46c122cf5.webm)
-----------------
Contains the data and scripts needed for the application __`Metrics as Scores`__, check out <https://mas.research.hönel.net/>.
```{python}
#| echo: false
dm(f'''
This package accompanies the paper entitled "_Contextual Operationalization of Metrics As Scores: Is My Metric Value Good?_" [@honel2022mas].
It seeks to answer the question whether or not the domain a software metric was captured in, matters.
It enables the user to compare domains and to understand their differences.
In order to answer the question of whether a metric value is actually good, we need to transform it into a **score**.
Scores are normalized **and rectified** distances, that can be compared in an apples-to-apples manner, across domains.
The same metric value might be good in one domain, while it is not in another.
To borrow an example from the domain of software: It is much more acceptable (or common) to have large applications (in terms of lines of code) in the domains of games and databases than it is for the domains of IDEs and SDKs.
Given an *ideal* value for a metric (which may also be user-defined), we can transform observed metric values to distances from that value and then use the cumulative distribution function to map distances to scores.
'''.strip())
```
--------------
# Usage
You may install Metrics As Scores directly from PyPI.
For users that wish to [**contribute**](https://github.com/MrShoenel/metrics-as-scores/blob/master/CONTRIBUTING.md) to Metrics As Scores, a [development setup](#development-setup) is recommended.
In either case, after the installation, [**you have access to the text-based user interface**](#text-based-user-interface-tui).
```shell
# Installation from PyPI:
pip install metrics-as-scores
```
You can **bring up the TUI** simply by typing the following after installing or cloning the repo (see next section for more details):
```shell
mas
```
## Text-based User Interface (TUI)
Metrics As Scores features a text-based command line user interface (TUI).
It offers a couple of workflows/wizards, that help you to work and interact with the application.
There is no need to modify any source code, if you want to do one of the following:
- Show Installed Datasets
- Show List of Known Datasets Available Online That Can Be Downloaded
- Download and install a known or existing dataset
- Create Own Dataset to be used with Metrics-As-Scores
- Fit Parametric Distributions for Own Dataset
- Pre-generate distributions for usage in the [**Web-Application**](#web-application)
- Bundle Own dataset so it can be published
- Run local, interactive Web-Application using a selected dataset
![Metrics As Scores Text-based User Interface (TUI).](./TUI.png "Metrics As Scores Text-based User Interface (TUI).")
## Web Application
Metrics As Scores' main feature is perhaps the Web Application.
It can be run directly and locally from the TUI using a selected dataset (you may download a known dataset or use your own).
The Web Application allows to visually inspect each *feature* across all the defined *groups*.
It features the PDF/PMF, CDF and CCDF, as well as the PPF for each feature in each group.
It offers five different principal types of densities: Parametric, Parametric (discrete), Empirical, Empirical (discrete), and (approximate) Kernel Density Estimation.
The Web Application includes a detailed [Help section](#) that should answer most of your questions.
![Metrics As Scores Interactive Web .](./WebApp.png "Metrics As Scores Interactive Web Application.")
## Development Setup
This project was developed using and requires Python `>=3.10`.
The development documentation can be found at <https://mrshoenel.github.io/metrics-as-scores/>.
Steps:
1. Clone the Repository,
2. Set up a virtual environment,
3. Install packages.
### Setting Up a Virtual Environment
It is recommended to use a virtual environment.
To use a virtual environment, follow these steps (Windows specific; activation of the environment might differ).
```shell
virtualenv --python=C:/Python310/python.exe venv # Use specific Python version for virtual environment
venv/Scripts/activate
```
Here is a Linux example that assumes you have Python `3.10` installed (this may also require installing `python3.10-venv` and/or `python3.10-dev`):
```shell
python3.10 -m venv venv
source venv/bin/activate # Linux
```
### Installing Packages
The project is managed with `Poetry`.
To install the required packages, simply run the following.
```shell
venv/Scripts/activate
# First, update pip:
(venv) C:\metrics-as-scores>python -m pip install --upgrade pip
# First install Poetry v1.3.2 using pip:
(venv) C:\metrics-as-scores>pip install poetry==1.3.2
# Install the projects and its dependencies
(venv) C:\metrics-as-scores> poetry install
```
The same in Linux:
```shell
source venv/bin/activate # Linux
(venv) ubuntu@vm:/tmp/metrics-as-scores$ python -m pip install --upgrade pip
(venv) ubuntu@vm:/tmp/metrics-as-scores$ pip install poetry==1.3.2
(venv) ubuntu@vm:/tmp/metrics-as-scores$ poetry install
```
### Running Tests
Tests are run using `poethepoet`:
```shell
# Runs the tests and prints coverage
(venv) C:\metrics-as-scores>poe test
```
You can also generate coverage reports:
```shell
# Writes reports to the local directory htmlcov
(venv) C:\metrics-as-scores>poe cov
```
--------------
# Example Usage
_Metrics As Scores_ can be thought of an *interactive*, *multiple-ANOVA* analysis and explorer.
The analysis of variance (ANOVA; @chambers2017statistical) is usually used to analyze the differences among *hypothesized* group means for a single *feature*.
An ANOVA might be used to estimate the goodness-of-fit of a statistical model.
Beyond ANOVA, `MAS` seeks to answer the question of whether a sample of a certain quantity (feature) is more or less common across groups.
For each group, we can determine what might constitute a common/ideal value, and how distant the sample is from that value.
This is expressed in terms of a percentile (a standardized scale of `[0,1]`), which we call **score**.
## Concrete Example Using the Qualitas.class Corpus Dataset
The notebook [`notebooks/Example-webapp-qcc.ipynb`](https://github.com/MrShoenel/metrics-as-scores/blob/master/notebooks/Example-webapp-qcc.ipynb)
holds a concrete example for using the web application to interactively obtain **scores**.
In this example, we create a hypothetical application that ought to be in the application domain *SDK*.
Using a concrete metric, *Number of Packages*, we find out that our hypothetical new SDK application scores poorly for what it is intended to be.
This example illustrates the point that software metrics, when captured out of context, are meaningless [@gil2016software].
For example, typical values for complexity metrics are vastly different, depending on the type of application.
We find that, for example, applications of type SDK have a much lower *expected* complexity compared to Games (`1.9` vs. `3.1`) [@honel2022mas].
Software metrics are often used in software quality models.
However, without knowledge of the application's context (here: domain), the deduced quality of these models is at least misleading, if not completely off.
This becomes apparent if we examine how an application's complexity scores across certain domains.
Since there are many software metrics that are captured simultaneously, we can also compare domains in their entirety:
How many metrics are statistically significantly different from each other?
Is there a set of domains that are not distinguishable from each other?
Are there metrics that are always different across domains and must be used with care?
In this example, we use a known and downloadable dataset [@dataset_qcc].
It is based on software metrics and application domains of the "Qualitas.class corpus" [@terra2013qualitas; @tempero2010qualitas].
## Concrete Example Using the Iris Dataset
The notebook [`notebooks/Example-create-own-dataset.ipynb`](https://github.com/MrShoenel/metrics-as-scores/blob/master/notebooks/Example-create-own-dataset.ipynb)
holds a concrete example for creating/importing/using one's own dataset.
Although all necessary steps can be achieved using the **TUI**, this notebook demonstrates a complete example of implementing this in code.
## Diamonds Example
The diamonds dataset [@ggplot2] holds prices of over 50,000 round cut diamonds.
It contains a number attributes for each diamond, such as its price, length, depth, or weight.
The dataset, however, features three quality attributes: The quality of the cut, the clarity, and the color.
Suppose we are interested in examining properties of diamonds of the highest quality only, across colors.
Therefore, we select only those diamonds from the dataset that have an *ideal* cut and the best (*IF*) clarity.
Now only the color quality gives a context to each diamonds and its attributes (i.e., diamonds are now *grouped* by color).
This constellation now allows us to examine differences across differently colored diamonds.
For example, there are considerable differences in price.
We find that only the group of diamonds of the best color is significantly different from the other groups.
This example is available as a downloadable dataset [@dataset_diamonds-ideal-if].
--------------
# Datasets
Metrics As Scores can use existing and own datasets. Please keep reading to learn how.
## Use Your Own
Metrics As Scores has a built-in wizard that lets you import your own dataset!
There is another wizard that bundles your dataset so that it can be shared with others.
You may [**contribute your dataset**](https://github.com/MrShoenel/metrics-as-scores/blob/master/CONTRIBUTING.md) so we can add it to the curated list of known datasets (see next section).
If you do not have an own dataset, you can use the built-in wizard to download any of the known datasets, too!
Note that Metrics As Scores supports you with all tools necessary to create a publishable dataset.
For example, it carries out the common statistical tests:
- ANOVA [@chambers2017statistical]: Analysis of variance of your data across the available groups.
- Tukey's Honest Significance Test (TukeyHSD; @Tukey1949): This test is used to gain insights into the results of an ANOVA test. While the former only allows obtaining the amount of corroboration for the null hypothesis, TukeyHSD performs all pairwise comparisons (for all possible combinations of any two groups).
- Two-sample T-test: Compares the means of two samples to give an indication whether or not they appear to come from the same distribution. Again, this is useful for comparing groups.
Tukey's test is used to gain insights into the results of an ANOVA test.
While the former only allows obtaining the amount of corroboration for the null hypothesis, TukeyHSD performs all pairwise comparisons (for all possible combinations of any two groups).
It also creates an **automatic report** based on these tests that you can simply render into a PDF using Quarto.
A publishable dataset must contain parametric fits and pre-generated densities (please check the wizard for these two).
Metrics As Scores can fit approximately **120** continuous and discrete random variables using `Pymoo` [@pymoo].
Note that Metrics As Scores also automatically carries out a number of goodness-of-fit tests.
The type of test also depends on the data (for example, not each test is valid for discrete data, such as the KS two-sample test).
These tests are then used to select some best fitting random variable for display in the web application.
* Cramér-von Mises- [@cramer1928] and Kolmogorov–Smirnov one-sample [@Stephens1974] tests: After fitting a distribution, the sample is tested against the fitted parametric distribution. Since the fitted distribution cannot usually accommodate all of the sample's subtleties, the test will indicate whether the fit is acceptable or not.
* Cramér-von Mises- [@Anderson1962], Kolmogorov–Smirnov-, and Epps–Singleton [@Epps1986] two-sample tests: After fitting, we create a second sample by uniformly sampling from the `PPF`. Then, both samples can be used in these tests. The Epps–Singleton test is also applicable for discrete distributions.
## Known Datasets
The following is a curated list of known, publicly available datasets that can be used with Metrics As Scores.
These datasets can be downloaded using the text-based user interface.
```{python}
#| echo: false
from metrics_as_scores.distribution.distribution import KnownDataset
from metrics_as_scores.cli.helpers import get_known_datasets, format_file_size
def format_dataset(ds: KnownDataset) -> str:
return f'''
- {ds["name"]} [@dataset_{ds["id"]}]. {format_file_size(num_bytes=ds["size_extracted"], digits=0)}. <{ds["info_url"]}>.
'''.strip()
dm('\n'.join([format_dataset(ds) for ds in get_known_datasets(use_local_file=False)]))
```
--------------
# Personalizing the Web Application
The web application _"[Metrics As Scores](#)"_ is located in the directory [`src/metrics_as_scores/webapp/`](https://github.com/MrShoenel/metrics-as-scores/blob/master/src/metrics_as_scores/webapp/).
The app itself has three vertical blocks: a header, the interactive part, and a footer.
Header and footer can be easily edited by modifing the files [`src/metrics_as_scores/webapp/header.html`](https://github.com/MrShoenel/metrics-as-scores/blob/master/src/metrics_as_scores/webapp/header.html) and [`src/metrics_as_scores/webapp/footer.html`](https://github.com/MrShoenel/metrics-as-scores/blob/master/src/metrics_as_scores/webapp/footer.html).
Note that when you create your own dataset, you get to add sections to the header and footer using two HTML fragments.
This is recommended over modifying the web application directly.
If you want to change the title of the application, you will have to modify the file [`src/metrics_as_scores/webapp/main.py`](https://github.com/MrShoenel/metrics-as-scores/blob/master/src/metrics_as_scores/webapp/main.py) at the very end:
```python
# Change this line to your desired title.
curdoc().title = "Metrics As Scores"
```
**Important**: If you modify the web application, you must always maintain two links: one to <https://mas.research.hönel.net/> and one to this repository, that is, <https://github.com/MrShoenel/metrics-as-scores>.
# References {-}