Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to represent various dependency metrics #103

Open
6 tasks
nasifimtiazohi opened this issue Jul 19, 2021 · 0 comments
Open
6 tasks

How to represent various dependency metrics #103

nasifimtiazohi opened this issue Jul 19, 2021 · 0 comments
Labels

Comments

@nasifimtiazohi
Copy link
Contributor

nasifimtiazohi commented Jul 19, 2021

Depdive currently generates metrics under below dimension:

U indicates the higher the better.
D indicates the lower the better.
B indicates binary, the value is either 0 or 1
All metrics are positive numbers.

Note that, the term better may be debatable in some cases. However, such a binary categorization makes any cumulative representation simpler.

Usage

  1. crates.io downloads U
  2. crates.io dependents U
  3. github stars U
  4. github forks U
  5. github subscribers count U

Activity metrics

  1. Days since last commit D
  2. Days since last time a issue was opened D
  3. number of commits in last six months U
  4. number of contributors in last six months U

Code analysis

  1. Total lines of code D
  2. Total rust lines of code D
  3. Total rust code pulled in through its own dependencies D
  4. Total rust code pulled in through its exclusive dependencies, i.e., dependency only introduced by this package in the whole dep graph D
  5. has custom build script? B D
  6. how many of its deps have custom build script (percentage)? D

Unsafe analysis

  1. forbids unsafe? B U
  2. unsafe expressions? D
  3. unsafe functions? D
  4. unsafe traits? D
  5. unsafe impls? D
  6. unsafe methods? D
  7. how many of its deps uses unsafe? D
  8. total unsafe code pulled in through dependencies? [ summation of exprs +functions + traits +impls + methods ] D
  9. number of open issues labelled bug D
  10. number of open issues labelled security D

How to communicate the metrics?

Below are my proposals:

  • For D metrics, do an inverse transformation through 1/n+1 for all metrics to have similar direction, i.e., the higher the better. +1 in the denominator prevents divide by zero where n=0
  • Normalize each metric into [0,1] range
    This can help us in amalgamating various metrics into one metric or one visual representation (explained latter).
    image
    Normalization will be done based on all the existing direct dependencies. For example, the dep with the highest downloads will have a 1 rating for downloads and others will have a rating relative to that.
    Now this method will certainly not work for a project with a single dependency. However, it can be argued that such a use case does not demand a depdive analysis to begin with. Additionally, we can create a mock super_package, that'll have the best possible values for some metrics such as D metrics where we know the best value is 0.
    However, another downside is some metrics like downloads may have long-tailed distributions that can be dominated by some crates having a very high download count, e.g., libc. In these cases, we can use the log-scale of the metrics before normalization.

Now to amalgamate all these metrics into some easily digestable high-level format, two options come to mind:

  • A weighted sum of the metrics: Cons: Determining the weight is a key challenge here. Pros: We can rank the dependencies. Does anybody want such a ranking or not can be a valid question though! Another use case is that for each dimension, we can fix some threshold -- if a crate falls below such a threshold, someone probably should take a look what's happening.
  • Radar/ Spider chart: As our metrics are distributed over four dimension, a radar chart to visually represent how a dependency is doing in all the four dimensions can be useful. We'll probably have to introduce some python tolling for this.
    A quick sample from google sheet looks like this: (This chart puts all of the three deps into one, but I was thinking of having an individual chart for each deps so that a quick look is sufficient to tell what dimension a crate may be lacking in)

image

When to generate such a reporting?

  • A weekly run: Use case here is two: i) having an updated overall dep report each week. ii) if some crate is falling below threshold(!) in some dimension, highlight them for developers to deicide if a review is needed.
  • Each a time a dependency is added: We can post a comment on PR generating this statistic to help decide if a dependency is welcome or not!

cc @bmwill , @metajack, @xvschneider

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant