How to represent various dependency metrics #103

nasifimtiazohi · 2021-07-19T01:29:58Z

Depdive currently generates metrics under below dimension:

U indicates the higher the better.
D indicates the lower the better.
B indicates binary, the value is either 0 or 1
All metrics are positive numbers.

Note that, the term better may be debatable in some cases. However, such a binary categorization makes any cumulative representation simpler.

Usage

crates.io downloads U
crates.io dependents U
github stars U
github forks U
github subscribers count U

Activity metrics

Days since last commit D
Days since last time a issue was opened D
number of commits in last six months U
number of contributors in last six months U

Code analysis

Total lines of code D
Total rust lines of code D
Total rust code pulled in through its own dependencies D
Total rust code pulled in through its exclusive dependencies, i.e., dependency only introduced by this package in the whole dep graph D
has custom build script? B D
how many of its deps have custom build script (percentage)? D

Unsafe analysis

forbids unsafe? B U
unsafe expressions? D
unsafe functions? D
unsafe traits? D
unsafe impls? D
unsafe methods? D
how many of its deps uses unsafe? D
total unsafe code pulled in through dependencies? [ summation of exprs +functions + traits +impls + methods ] D
number of open issues labelled bug D
number of open issues labelled security D

How to communicate the metrics?

Below are my proposals:

For D metrics, do an inverse transformation through 1/n+1 for all metrics to have similar direction, i.e., the higher the better. +1 in the denominator prevents divide by zero where n=0
Normalize each metric into [0,1] range
This can help us in amalgamating various metrics into one metric or one visual representation (explained latter).

Normalization will be done based on all the existing direct dependencies. For example, the dep with the highest downloads will have a 1 rating for downloads and others will have a rating relative to that.
Now this method will certainly not work for a project with a single dependency. However, it can be argued that such a use case does not demand a depdive analysis to begin with. Additionally, we can create a mock super_package, that'll have the best possible values for some metrics such as D metrics where we know the best value is 0.
However, another downside is some metrics like downloads may have long-tailed distributions that can be dominated by some crates having a very high download count, e.g., libc. In these cases, we can use the log-scale of the metrics before normalization.

Now to amalgamate all these metrics into some easily digestable high-level format, two options come to mind:

A weighted sum of the metrics: Cons: Determining the weight is a key challenge here. Pros: We can rank the dependencies. Does anybody want such a ranking or not can be a valid question though! Another use case is that for each dimension, we can fix some threshold -- if a crate falls below such a threshold, someone probably should take a look what's happening.
Radar/ Spider chart: As our metrics are distributed over four dimension, a radar chart to visually represent how a dependency is doing in all the four dimensions can be useful. We'll probably have to introduce some python tolling for this.
A quick sample from google sheet looks like this: (This chart puts all of the three deps into one, but I was thinking of having an individual chart for each deps so that a quick look is sufficient to tell what dimension a crate may be lacking in)

When to generate such a reporting?

A weekly run: Use case here is two: i) having an updated overall dep report each week. ii) if some crate is falling below threshold(!) in some dimension, highlight them for developers to deicide if a review is needed.
Each a time a dependency is added: We can post a comment on PR generating this statistic to help decide if a dependency is welcome or not!

cc @bmwill , @metajack, @xvschneider

The text was updated successfully, but these errors were encountered:

nasifimtiazohi added the idea label Jul 19, 2021

nasifimtiazohi mentioned this issue Jul 29, 2021

Output dependency metrics in json format #113

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to represent various dependency metrics #103

How to represent various dependency metrics #103

nasifimtiazohi commented Jul 19, 2021 •

edited

Loading

How to represent various dependency metrics #103

How to represent various dependency metrics #103

Comments

nasifimtiazohi commented Jul 19, 2021 • edited Loading

Depdive currently generates metrics under below dimension:

Usage

Activity metrics

Code analysis

Unsafe analysis

How to communicate the metrics?

When to generate such a reporting?

nasifimtiazohi commented Jul 19, 2021 •

edited

Loading