-
Notifications
You must be signed in to change notification settings - Fork 0
FAQ
The author and contributors to lice-comb
are not lawyers, and neither they nor lice-comb
itself provides legal advice. This is nothing more than logic to assist in finding license information. If you need a primer on the legal aspects of open source software, the author has found the Blue Oak Council to be a useful resource.
Q. lice-comb
is not detecting a license for something that I know has a license, or is getting the license wrong. What should I do?
A. Please raise an issue here!
Q. How comprehensive and reliable is the license detection logic?
A. While it makes a pretty good effort to find license information and detect which license(s) they represent (especially in the Clojure/Java/Maven ecosystems), this library is no substitute for a forensic software license compliance tool or service (such as ort, fossology, fossa, SourceAuditor, nexB/scancode etc.). It is, however, stateless (does not require databases or other infrastructure), substantially cheaper than some of those services (i.e. free), substantially easier to use (it's just a library, not an entire application that has to be downloaded and installed), is more optimised for the Clojure/Java/Maven packaging ecosystem, and substantially better supported through existing Clojure tooling.
Q. How does it work?
A. At its core, lice-comb
provides license detection based on 3 fundamentally different types of input:
- A license name
- A license URI
- A license text
License names are matched via a 3 step process:
- Checking whether the name is an SPDX license expression
- Checking whether the name is a URI, and if so performing license URI matching on it
- Attempting to construct an SPDX license expression by parsing the name (this is where most of the "secret sauce" of
lice-comb
is)
License URIs are matched via a 2 step process:
- Checking whether the URI is contained in the SPDX License List or SPDX Exception List
- Attempting to retrieve the plain text content of the URI, and if successful performing license text matching on it
- Note that if the URL references an HTML resource,
lice-comb
will convert the returned HTML to plain text and then perform license text matching on that
- Note that if the URL references an HTML resource,
License texts are matched using the SPDX Matching Guidelines, specifically as implemented by the Spdx-Java-Library. Note that this form of matching is computationally expensive, and tends to be slow, so it is a last resort for lice-comb
.
lice-comb
also wraps these fundamental primitive operations in a series of convenience functions that support license detection within files, directories, ZIP archives, Maven GAVs and pom.xml
files, and tools.deps and Leiningen style dependencies (Clojure data structures). These convenience functions all ultimately boil down to one or more of the 3 fundamental operations however (license name matching, license URI matching, or license text matching).
Q. Why does lice-comb
return sets of SPDX license expressions, instead of a single expressions that includes conjunctions (AND
, OR
)?
A. Because there are cases where lice-comb
has no way to infer whether AND
or OR
is intended by the author of the project, and rather than guess, it provides the detected license expressions as a set. Following the implementation of issue #54, this is expected to be a relatively rare case.
Q. I'm seeing LicenseRef-lice-comb-[other text]
or AdditionRef-lice-comb-[other-text]
SPDX license identifiers in the returned data - what are those?
A. SPDX offers "escape hatch" mechanisms for licenses and exceptions that aren't in the official SPDX lists, and these are called LicenseRef
s (for non-standard licenses) and AdditionRef
s (for non-standard license exceptions). These allow developers to create custom identifiers for licenses and exceptions that haven't been standardised in the official SPDX license list or exception list; for example a common use is to reference commercial/proprietary licenses used by non-open-source software.
lice-comb
uses LicenseRef
s and AdditionRef
s for things it finds that are not in the SPDX license or exception lists, including:
- Public Domain attestations (
LicenseRef-lice-comb-PUBLIC-DOMAIN
). "Public Domain" does not have an SPDX license identifier (for legal reasons described here), and solice-comb
falls back on using aLicenseRef
when it finds license declarations that attempt to place the associated work in the public domain. - Commercial/Proprietary/All Rights Reserved attestations (
LicenseRef-lice-comb-PROPRIETARY-COMMERCIAL
).lice-comb
groups all of these types of licenses together into a singleLicenseRef
, and leaves the specifics of what that means to the user to figure out (there are effectively an infinite variety of proprietary licenses out there, andlice-comb
is primarily intended to be used with SPDX-listed open source licenses). - License and exception names that can't be identified by
lice-comb
(LicenseRef-lice-comb-UNIDENTIFIED[-optional-suffix]
,AdditionRef-lice-comb-UNIDENTIFIED[-optional-suffix]
). This means that the logic found a license or exception name, but couldn't figure out which SPDX listed license(s) or exception(s) it represented. If you see these and thinklice-comb
should have detected an SPDX listed license or exception, please raise an issue here, including the fullLicenseRef
orAdditionRef
value, and ideally also the source (license name, file, URL, pom file, artifact coordinate or whatever) that resulted in that output, as this may be a bug inlice-comb
's license name parsing logic.
Q. Are files containing SPDX documents read by lice-comb
?
A. Not yet. While I consider this "low-hanging fruit" to implement, the sad reality is that the use of such documents is not widespread in the JVM ecosystem so the benefit is also low. Please 👍 or comment on that issue if you'd like to see this implemented.
Q. It looks like lice-comb
is multi-threaded internally. Is that correct?
A. Yes. lice-comb
's logic has no shared mutable state, and reads from a large number of input sources that require I/O (from disk, from the internet), so it's a great use case for multi-threading. On JVMs that support virtual threads (JDK 21+), lice-comb
will use those instead of platform threads for even better resource usage (being I/O bound, most of lice-comb
's threads are parked waiting for I/O for much of their elapsed runtime, and virtual threads have notable advantages over platform threads in this context).
Q. Does lice-comb
monkey patch virtual threads into Clojure core, which might affect my code?
A. No. lice-comb
uses a micro library called embroidery
that is opt-in (doesn't mess with core Clojure or your own code), and supports graceful degradation on JVMs that don't support virtual threads.
Q. Why should I license my project?
A. tl;dr - to protect yourself and to protect downstream users, and make it clear how both are achieved. Simon Phipps wrote in detail about this a decade ago, and more recently JohnnyJayJay wrote an excellent blog post on the topic.
Q. What license(s) should I choose for my project?
A. This is well beyond the scope of lice-comb
, but there are sites that can help, including:
- choosealicense
- The EU's JLA
- The FSF
- GitHub
- fossa
- ...and many others...
With that said, the author of lice-comb
is of the opinion that open source licenses can be crudely grouped into these categories:
-
Permissive licenses that place minimal restrictions on usage (e.g.
Apache-2.0
,MIT
,BSD-3-Clause
,CC0
,Unlicense
, etc.) -
Weak copyleft licenses that place restrictions on use of the project, but without "infecting" other code that is not part of the project (e.g.
MPL-2.0
,EPL-2.0
,LGPL-3.0
, etc.) -
Strong copyleft licenses that place restrictions on use of the project AND any code that happens to "link" with the project (e.g.
GPL-3.0
) -
Network copyleft licenses that place restrictions on use of the project AND any code that might be co-hosted with the project as part of a network accessible service (e.g.
AGPL-3.0
,EUPL-1.2
)
Picking a license is first and foremost a choice of picking one of these categories, and for each one the author likes these specific license choices:
-
Permissive license:
Apache-2.0
- widely adopted, has better patent protections than other popular licenses in this category (e.g.MIT
), and is compatible with downstreamGPL-3.0
licensed code -
Weak copyleft license:
MPL-2.0
- widely adopted, compatible with downstreamApache-2.0
andGPL-3.0
licensed code (unlike other popular licenses in this category e.g.LGPL-3.0
,EPL-2.0
) -
Strong copyleft license:
GPL-3.0
- widely adopted, not many alternatives -
Network copyleft license:
AGPL-3.0
- widely adopted, not many alternatives
The author of lice-comb
also strongly discourages anyone from "placing their code in the public domain". This is a nebulous legal concept that varies from non-existent (e.g. in continental Europe) to vague (e.g. in the United States). If you want to achieve the equivalent of placing your code in the public domain, please choose a "public domain equivalent" permissive license instead (e.g. 0BSD
, CC0
, Unlicense
, MIT-0
etc.), or better yet license your code with Apache-2.0
thereby protecting downstream users from patent trolls (something 0BSD
, CC0
, Unlicense
, MIT-0
etc. don't do).
Q. How should I configure my Clojure project to make it easy for lice-comb
(and other tools) to correctly detect my project's license(s)?
A. There are several things you should do:
- If you're deploying your project (e.g. to a Maven artifact repository - Clojars, Maven Central, etc.), ensure your
pom.xml
file includes a single<licenses><license><name>
element and place an SPDX License Expression in that element that describes your chosen license(s). Note that Maven Central has long required licensing information in POMs published to it, and Clojars recently adopted this change too. - If you wish to include a
<licenses><license><url>
element in your POM (which is completely optional), and have chosen a single license for your project, have the URL point to the official URL for that license as found in the SPDX License List. If you've multi-licensed your project (i.e. using an SPDX license expression), have the URL point to the plain text LICENSE file in your source repository (see next point). - Include a plain text
LICENSE
file in the root of your source repository, containing the official, unedited (except where required by the license) license text of your chosen license. If you've chosen multiple licenses, concatenate those texts, and delimit them with whitespace and a phrase such as or "OR, AT YOUR DISCRETION", or "AND ALSO". - Include the standard license headers for your license(s) as a block comment in the top of every source file. The forensic tools mentioned above usually scan source-code in addition to deployed artifacts.
- Consider adding an SPDX license document to the root of your source repository as well. While
lice-comb
doesn't use those (yet - see above), many tools do, and they're the best way to unambiguously communicate your chosen license(s). JohnnyJayJay wrote a small CLI tool that helps with this process.
Q. What if an artifact includes licenses from other software it depends upon / incorporates? Won't that be misreported as a license for the dependency that artifact is part of itself?
A. If those licenses are placed in probable license files, then yes they will. Sadly the vast majority of JVM ecosystem libraries (including Clojure libraries) don't use things like SPDX documents to unambiguously express cross-dependency licensing, so there's no deterministic way for a tool (any tool) to detect whether any given license fragment found in an artifact refers to that artifact or something upstream. All I can suggest is that you treat dependencies that have multiple detected licenses as needing further manual investigation.
Q. I'm using an upstream dependency that requires me to include their license in my derivative work; how should I handle this?
A. For starters make it easy for someone to manually figure out that that's what's going on; you might have some explanatory text in your README, or place the upstream dependency's license text in some kind of "namespaced" folder structure, or wot-not. You might also consider putting those license texts in a file that lice-comb
doesn't scan for license fragments (NOTICES
is a good choice, for example).
Q. Do you accept contributions?
A. Absolutely - I'm a strong believer in the true value of open source being two-way collaboration, not just a shallow one-way publishing mechanism (IMO the latter is better described as "free software", not "open source"). That said, open source collaboration isn't some kind of anarchistic "wild west"; contributions are expected to meet the contributing guidelines and I reserve the right to reject any contribution (in which case I will endeavour to provide a rationale for the rejection).
Q. How do I build and/or test lice-comb
?
A. lice-comb
uses tools.build
. You can get a list of available tasks by running:
clj -A:deps -T:build help/doc
Of particular interest are:
-
clj -T:build test
- run the unit tests -
clj -T:build lint
- run the linters (clj-kondo and eastwood) -
clj -T:build ci
- run the full CI suite (check for outdated dependencies, run the unit tests, run the linters) -
clj -T:build install
- build the JAR and install it locally (e.g. so you can test it with downstream code)
Q. The tests take a long time - is this expected?
A. Yes, especially the first time you run them. The Spdx-Java-Library
(used by clj-spdx
, which in turn is used by lice-comb
) downloads the SPDX license and exceptions lists, plus all of the various templates for all of the licenses contained therein, the first time it's used. These files are cached locally, but downloading them the first time takes several minutes, even with a good quality internet connection. The unit tests are also pretty exhaustive - as of v2.0 lice-comb
includes over 1,200 tests, and some of those tests (those that perform SPDX license matching) tend to be computationally expensive.