Skip to content

Naming Things

Nilesh edited this page Nov 29, 2024 · 17 revisions

This is part of my notes for attempting to build the domain model of humanity's universal learning map using first principles:


Names (in our case, names of topics or people) are very useful as identifiers but are not quite as simple as we programmers would like.

  • To have names act as identifiers, we want them to be URL-safe, hashtaggable for mentions, preferably unique, and case-insensitive (when using English/Latin characters).
  • But real-world human-readable names require case-preservation (eg: "AT&T") and special characters (eg: "C++ 20"). Sometimes they use non-Latin characters or even emojis.

This is how others have attempted to solve this:

  • Wikipedia uses URL escape codes that make names/identifiers hard to manually write. Eg: Zorn%27s_lemma or C%2B%2B (for "C++")
  • Usenet groups used a custom naming scheme of short names (like comp.lang.python) with some special characters allowed.

Currently, our Topic and People schema have two attributes: name (unique, lowercase, URL-safe identifier) and hname (case-preserving, human-preferable, duplicate-allowing names with special characters).

Another problem of disambiguation arises because the real-world relationships between things and names can be many-to-many.

  • Some things have more than one names: Soccer vs Football, Graph vs Network. Emojis, which are pictures as characters, make this even more complex. Did you know that https://en.wikipedia.org/wiki/🤔 redirects to https://en.wikipedia.org/wiki/Thought?
  • Some things have no well-defined name and may require an entire phrase to indicate. For example, libraries identify the subject of "research in the cure of tuberculosis of lungs by x-ray conducted in India in 1950" with the identifier "Medicine,Lungs;Tuberculosis:Treatment;X-ray:Research.India'1950" when using the Colon Classification System.
  • Some names refer to more than one things. See how big a variety of things are named Lua on Wikipedia.

Then there are other issues with human-recognizable names:

  • Should we support multiple languages and scripts/characters/math symbols?
    • For now, we are taking the easy way out and limiting ourselves to building this knowledge map in English only.
  • Who assigns and maintains these names and what are their incentives?

Then there are the issues of taxonomy which, in our case, applies to naming of topics or concepts, but not to naming of people. The simplest taxonomy is a hierarchy (eg: comp.lang.python). We currently keep a "parent" attribute in Topic scheme. But this too makes many assumptions:

  • Should topic names always fully-specified like math.algebra.quadratics or just quadratics? This affects brevity and ease of use but makes disambiguation harder.
  • Taxonomy maintenance, even for a hierarchy, is not easy:
    • When does a concept/subtopic deserve its own topic?
    • What happens when topics get merged or retired?
    • Is the parent-child relationship an "is-a" relationship (like nations/india) or an "includes" relationship (like math/algebra)?
    • What separator should we use - period or slah? Why or why not?
    • What if a topic belongs under two separate parent topics, eg: statistics.machine_learning as well as computer_science.machine_learning? Will we need symlinks in our topic hierarchy?
    • Too many existing standards

Newsgroups's approach to naming topics is quite nice, but the taxonomy is not big or granular enough for us to build a universal knowledge map. For eg: there is no name yet for quadratic equations. Also, everything other than the Big-8 (comp, humanities, misc, news, rec, sci, soc, and talk) gets shoved into the alt hierarchy (btw, the historical reason for that is European networks did not want to pay for groups about religion or racism).

In a hierarchy, often we would like to preserve a sort-order (for eg: chapter names in a book). This can be achieved with names like 100-physics, 200-chemistry, 300-biology etc. However, this quickly becomes unwieldy (eg: 100-math.400-algebra.200-polynomials). This is why I decided to keep a separate rank attribute for topics which is not part of the topic's name itself.

This leads me to defining a Topic as: (name, hname, parent, rank) and People (Creators) as (name, hname, links[]).

Next, we have got to deal with the problem of links as I don't find URLs as good enough.

Clone this wiki locally