Skip to content

Commit

Permalink
tweaks
Browse files Browse the repository at this point in the history
  • Loading branch information
kaycebasques authored Oct 20, 2024
1 parent 09e3088 commit f231a98
Showing 1 changed file with 16 additions and 10 deletions.
26 changes: 16 additions & 10 deletions data/embeddings.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Embeddings are underrated
=========================

Machine learning (ML) has the potential to greatly advance the state of the
art in technical writing. No, I'm not talking about Claude Opus, Gemini Pro,
art in technical writing. No, I'm not talking about text generation models like Claude Opus, Gemini Pro,
LLaMa, etc. The ML technology that might end up having the biggest impact
on technical writing is **embeddings**.

Expand Down Expand Up @@ -198,16 +198,18 @@ texts.
.. _Word2vec paper: https://arxiv.org/pdf/1301.3781

The concept of positioning items in a multi-dimensional
space like this goes by the wonderful name of `latent space`_.
space like this, where related items are clustered near each other, goes by the wonderful name of `latent space`_.

The most famous example of the weird utility of this technology comes from
the `Word2vec paper`_, the foundational research that got more people
interested in embeddings 11 years ago. In the paper they shared an anecdote
where they started with the embedding for ``king``, then subtracted the embedding
for ``man``, and then added the embedding for ``woman``. When they looked around
that area of the latent space, they found that the word for ``queen`` was close-by.
the `Word2vec paper`_, the foundational research that kickstarted interest in embeddings 11 years ago. In the paper they shared this anecdote:

The ``king - man + woman = queen`` anecdote must always be followed by this
.. code-block:: text
embedding("king") - embedding("man") + embedding("woman") ≈ embedding("queen")
Starting with the embedding for ``king``, subtract the embedding for ``man``, then add the embedding for ``woman``. When you look around this vicinity of the latent space, you find the embedding for ``queen`` nearby.

There appears to be an unspoken rule in ML culture that this anecdote must always be followed by this
quote from John Rupert Firth:

You shall know a word by the company it keeps!
Expand Down Expand Up @@ -236,8 +238,12 @@ Applications
------------

I could tell you exactly how I think we can advance the state of the art
in technical writing with embeddings, but where's the fun in that?
Let's just cover a basic example to put the ideas into practice and then
in technical writing with embeddings, but where's the fun in that? Here are two gigantic hints:

* A lot of documentation tasks revolve around detecting *discrepancies*.
* You can generate embeddings for *any* type of text, not just *documentation*.

Let's cover a basic example to put the intuition-building exercise into practice and then
wrap up this post.

Related pages
Expand Down

0 comments on commit f231a98

Please sign in to comment.