Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The root of the problem is the root: How to get a root at all times in a codebase (or no root at all) #25

Open
pombredanne opened this issue Jun 11, 2021 · 5 comments

Comments

@pombredanne
Copy link
Member

The root problem is this:

  1. ScanCode TK demands a single root named Resource in a Codebase
  2. ScanCode TK creates a fake "virtual_root" name if there is no common root in several scans (--from-json /VirtualCodebases)
  3. ScanCode.io does not have a single root for a project (or rather it has one which might be the "codebase" directory, but never reports it.
  4. on Windows there is no single root: each "drive" is its own root.
  5. on POSIX, there is a single root, but it has no name: this is name only "/" and it can have files and directories as children
  6. ScanCode TK and CommonCode generally ignore or even strip a leading slash therefore making the POSIX root moot
  7. When you "--strip-root" in ScanCode TK (leaving aside the possible loss of data attached to a root dir) it is potentially problematic to further read this with --from-json because we have no root anymore.
  8. Scanning a subset of paths or a collections of path in ScanCode is problematic because of this need for root
  9. The Codebase classes expect some sort order when creating Resources which may not make sense in all cases and may be overly restrictive as we cannot predict this sort order at all times
  10. Somethings are not entirely clear:
  • what is the different between a Codebase and a root Resource?
  • why could a Codebase not be just a collection of paths? And why do we even need a root?

We need to define a clean and well spec way to handle this across all projects

@pombredanne pombredanne changed the title The root of the problem is the root: How to get a root at all times in resource The root of the problem is the root: How to get a root at all times in a codebase (or no root at all) Jun 11, 2021
@pombredanne
Copy link
Member Author

@Pratikrocks
Copy link
Contributor

Generally when we are having a single root the rid of 0 is assigned to the first resource and we are treating it as a root

@pombredanne
Copy link
Member Author

@Pratikrocks I think we likely do not need a root at all

@Pratikrocks
Copy link
Contributor

@pombredanne I believe we are leveraging to some extent upon the root when we are walking the codebase then basically it is the root we visit first .

priv-kweihmann pushed a commit to priv-kweihmann/commoncode that referenced this issue Sep 15, 2021
@JonoYang
Copy link
Member

As stated above, we need to align commoncode's Codebase/VirtualCodebase class and Resource class with scancode.io's Project and CodebaseResource model. I would like this done such that code written to work with commoncode's Codebase and Resource classes can work on scancode.io's Project and CodebaseResource model. This is important in the case of fingerprinting Package Resources using the matchcode-toolkit fingerprinting functions (https://github.com/nexB/purldb/blob/main/matchcode-toolkit/src/matchcode_toolkit/fingerprinting.py#L69) because the fingerprints depend on the order of Resources when we walk a codebase. We want the codebase walk to be the same across both classes.

One of the issues is that we need to reconcile is that commoncode creates a root resource for a Codebase where scancode.io does not create one for a Project. The root is used to determine where the root of the codebase is and helps us perform depth-first traversal into the Codebase.

An idea would be to treat the Codebase/VirtualCodebase class as the root resource, since the Codebase represents the entirety of the codebase being scanned. This would involve adding the Resource class fields to the Codebase/VirtualCodebase class, so we can return the Codebase like a Resource when we are walking a Codebase or Resource. Resources that are in the root of the project would have the Codebase as the parent Resource.

Some issues:

  • In scancode, the Resource model is created at runtime, where fields from active plugins are added to either the Codebase or Resource model. How would we add these fields to the Codebase? Would we have to do something similar to what we do with Codebase.attributes?, maybe a specific Resource object created as an attribute on Codebase?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants