Skip to content

Commit

Permalink
Finish (?) 2024 review
Browse files Browse the repository at this point in the history
  • Loading branch information
Kayce Basques committed Jan 3, 2025
1 parent 41294a8 commit b0810b3
Show file tree
Hide file tree
Showing 5 changed files with 177 additions and 57 deletions.
11 changes: 8 additions & 3 deletions index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,8 @@ Docs-as-Data (DaD)
Machine learning
----------------

* :ref:`ml-outlooks-2023`. My initial thoughts on how GenAI might affect
* :ref:`ml-reviews-2024`. The sequel to :ref:`ml-reviews-2023`.
* :ref:`ml-reviews-2023`. My initial thoughts on how GenAI might affect
technical writing.
* :ref:`stateful-assistants`. GenAI chatbot assistants might be very useful if
they can serve as companions for the entire journey that readers take when
Expand Down Expand Up @@ -94,6 +95,9 @@ Search engine optimization
Strategy
--------

* :ref:`challenges`. There are 3 intractable challenges in technical writing.
I do not believe we will ever be able to completely solve these challenges
using only the practices and technologies of the 2010s.
* :ref:`decisions`. Docs should aim to help people decide what to do.
Only documenting procedures is usually not enough.

Expand Down Expand Up @@ -122,8 +126,8 @@ User experience
data/intertwingularity
embeddings/index
ml/evals
ml/outlooks/2023
ml/outlooks/2024
ml/reviews/2023
ml/reviews/2024
ml/huggingface
ml/playing-nicely
ml/plugins
Expand All @@ -134,6 +138,7 @@ User experience
seo/sentry-overflow
src/link-text-automation
src/verbatim-wrangling
strategy/challenges
strategy/decisions
ux/methodology
ux/offline
Expand Down
2 changes: 1 addition & 1 deletion ml/outlooks/2023.rst → ml/reviews/2023.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
.. _ml-outlooks-2023:
.. _ml-reviews-2023:

==================
GenAI Outlook 2023
Expand Down
153 changes: 104 additions & 49 deletions ml/outlooks/2024.rst → ml/reviews/2024.rst
Original file line number Diff line number Diff line change
@@ -1,36 +1,43 @@
.. _ml-outlooks-2024:
.. _ml-reviews-2024:

====================================================
2024 Machine Learning Review (For Technical Writers)
====================================================

2024 Dec 31

Back in March 2023 I published :ref:`ml-outlooks-2023`.
Back in March 2023 I published :ref:`ml-reviews-2023`.
With less than 12 hours remaining in 2024 I have managed to keep
my yearly streak going. This post recaps how much (or little)
the ideas mentioned in :ref:`ml-outlooks-2023` have panned
the ideas mentioned in :ref:`ml-reviews-2023` have panned
out, and then discusses potential future trends for 2025.

**Caution**: Nothing here is backed up with hard data which
means anything and everything could be wildly wrong. These
are just my general impressions, based off of anecdotal
conversations with other technical writers.

------------------------------
"GenAI Outlook" => "ML Review"
------------------------------

.. _expert systems: https://en.wikipedia.org/wiki/Expert_system

This year's post is called ``Machine Learning Review (For Technical Writers)``
rather than ``GenAI Outlook`` to reflect the widening scope of discussion that
I want to have. Generative AI (GenAI) is a subset of machine learning (ML), and
ML is a subset of artificial intelligence (AI). Over the past year I've realized
that there are many ways that ML and technical writing (TW) might potentially
interact beyond the relatively narrow subfield of GenAI. Maybe next year I'll
become aware of other AI fields outside of ML and I'll have to update the title again to
``AI Review (For Technical Writers)``, but for now the only field on my radar
is machine learning.
become aware of other AI fields outside of ML (e.g. `expert systems`_) and
I'll have to update the title again to ``AI Review (For Technical Writers)``,
but for now the only field on my radar is machine learning.

----------------------
Review of 2023 outlook
----------------------

First, status updates on the ideas mentioned in :ref:`ml-outlooks-2023`.
First, status updates on the ideas mentioned in :ref:`ml-reviews-2023`.

Job loss
========
Expand All @@ -45,13 +52,15 @@ Automation

.. _LLMs: https://en.wikipedia.org/wiki/Large_language_model

In the early days of the GenAI explosion, remember how many blog posts
In the early days of the GenAI explosion, remember how seemingly every blog post
included verbatim Q&A discussion with ChatGPT? "Here's what
ChatGPT has to say on the matter." My 2023 outlook was a victim of that
unfortunate trend. I asked GPT-4 to list out what parts of technical
writing are potentially automatable with `LLMs`_. Here's a quick summary
of how much each of those ideas has been adopted to date.

.. _ml-reviews-2024-content:

Basic content generation
------------------------

Expand Down Expand Up @@ -94,10 +103,12 @@ Formatting and template creation
ensuring they adhere to specific guidelines or templates.

.. _feature engineering: https://builtin.com/articles/feature-engineering
.. _toil: https://sre.google/sre-book/eliminating-toil/

I personally worked on this a lot in 2023. My current opinion is that
it's feasible but requires a fine-tuned model, which means a lot of
`feature engineering`_, which means a lot of upfront toil and careful design.
I personally worked on automated style guide editing a lot in 2023. My current
opinion is that it's feasible but requires a fine-tuned model, which means a lot of
`feature engineering`_, which means a lot of upfront `toil`_ and careful design.
Also, it's tough to get the UX right.

Grammar and spell-checking
--------------------------
Expand All @@ -108,10 +119,11 @@ Grammar and spell-checking
I have heard of technical writers using LLMs for one-off editing tasks.
E.g. they were given the first draft of a new doc written by a software
engineer (or product manager, or whatever) and were told that the doc
must be published in a couple hours. The first draft was riddled with grammatical
errors and typos. To meet the ridiculous deadline (pro tip: don't give your
technical writers ridiculous deadlines) the writers fed the first draft through
an LLM to quickly fix the most flagrant issues.
must be published in a couple hours. The first draft had a lot of errors and typos.
To meet the ridiculous deadline\ :sup:`1` the writers fed the first draft through
an LLM to quickly fix the major issues.

:sup:`1` Pro tip: don't do this

Terminology consistency
-----------------------
Expand All @@ -121,12 +133,17 @@ Terminology consistency

This still sounds feasible, but I haven't heard of anyone using LLMs for this task.
It may require a lot of upfront work around defining the preferred terms and
phrases. Simply identifying potential terminology inconsistency might be a lower
hanging fruit. E.g. the model produces a report telling you that you used
``foo`` on line 32 and ``bar`` on line 64 yet ``foo`` and ``bar`` seem to
relate to the same concept.
phrases.\ :sup:`2`

:sup:`2` On the other hand, it would be pretty trivial for me to provide each
section of my docs to a model and ask it to extract terms and create a concise
definition for each term. I'll try it later today. It's moments like these that
keep me motivated to keep blogging. When I blog, new ideas just float up to
the surface in a really natural and effortless way. The writing itself is hard,
as always. But it's amazing how new ideas just naturally float to the surface
as a byproduct of the writing.

.. _ml-outlooks-2024-summarization:
.. _ml-reviews-2024-summarization:

Content summarization
---------------------
Expand All @@ -142,7 +159,7 @@ LLM-generated summaries and I'm not aware of many teams using LLMs to
systematically generate summary-like content behind-the-scenes, such as the
opening or closing paragraphs of docs.

.. _ml-outlooks-2024-translation:
.. _ml-reviews-2024-translation:

Content translation
-------------------
Expand Down Expand Up @@ -186,9 +203,8 @@ Plagiarism detection
AI can identify potential plagiarism cases in technical
writing and suggest alternative content to maintain originality.

Ditto again, not aware of anyone doing this. Plagiarism doesn't come
up often in corporate technical writing. It seems to be more of an
academia concern.
Ditto again, not aware of anyone doing this in corporate technical
writing. I have heard about stuff like this in academia.

----------------------
Review of other trends
Expand All @@ -203,7 +219,7 @@ RAG chatbots have not taken over the docs world
.. _retrieval-augmented generation: https://en.wikipedia.org/wiki/Retrieval-augmented_generation

Gather a list of 1000 docs sites from any domain (or a mix of domains). You will find
that a supermajority (+75%) of them have not shipped a companion `retrieval-augmented generation`_
that a supermajority (+80%) of them have not shipped a companion `retrieval-augmented generation`_
(RAG) chatbot to supplement the traditional web-based docs experience. Even the
OpenAI docs don't have one.

Expand All @@ -217,7 +233,8 @@ Policy is a nightmare

For the minority of technical writers that are interested in seriously adopting GenAI
into their workflows, confusing policy seems to be a significant
obstacle to adoption. Questions like this seem to be coming up for everyone:
obstacle to adoption for everyone, across companies and across industries.
Questions like these are the current blockers:

* "What GenAI services are we even approved to use?"
* "Can we really trust GenAI service XYZ with our non-public data?"
Expand All @@ -230,53 +247,91 @@ obstacle to adoption. Questions like this seem to be coming up for everyone:
Continued lack of interest in GenAI
===================================

It seems that most (~55%) technical writers (TWs) are not interested in
It seems that most (~60%) technical writers (TWs) are not interested in
integrating GenAI into their work practices for a variety of reasons:

* Fear of accidentally automating themselves out of a job
* Environmental concerns
* Copyright issues

I expect adoption of GenAI in technical writing to continue to be slow
for this reason alone.
in 2025, because I don't think the three concerns listed above will be
solved in 2025.

Jobs still safe for another year
================================

Progress on intractable challenges through supervised learning
==============================================================
I'm not seeing the type of massive, systematic automation that would be
needed to eliminate the role of technical writer. There are faint hints
of it in :ref:`ml-reviews-2024-content` but this is only 1 of like 10
or more things that would need to be extensively and reliably automated.
This extensive automation is still possible for 2026 and beyond.

There are a few widely recognized intractable challenges :sup:`1` in technical
writing:
Progress on the intractable challenges
======================================

.. _four kinds of documentation: https://diataxis.fr/start-here/
There are a few widely recognized :ref:`intractable challenges <challenges>`\ :sup:`3`
in technical writing:

* Completeness. It's hard to comprehensively document all new features
in a timely fashion and it's hard to comprehensively deliver all
`four kinds of documentation`_ for all parts of the system.
.. _all the kinds of documentation: https://diataxis.fr/start-here/

* Correctness. As the system changes, it's hard to keep the docs
synchronized with the new reality of system.
* **Completeness**. It's hard to comprehensively document *all* new features
in a timely fashion and it's hard to comprehensively deliver
`all the kinds of documentation`_ that users may actually need to succeed.

* Discoveryness. Even if the needed content exists, it's hard to
guarantee that users will find it.
* **Correctness**. As the system changes, it's hard to keep the docs *always*
synchronized with the new reality of the system.

* **Discoveryness**. Even if the needed content exists, it's hard to
*guarantee* that users will find it.

.. _supervised learning: https://cloud.google.com/discover/what-is-supervised-learning
.. _fine-tuning: https://platform.openai.com/docs/guides/fine-tuning

There are *so many* areas in technical writing where a `supervised learning`_
approach may provide significant improvement in our ability to keep our
docs complete and up-to-date. Hint: `fine-tuning`_ is a form of supervised learning.
I have come to the conclusion that these challenges are not solvable with
the practices and technologies of the 2010s. The only way we'll make
further progress on the intractable challenges is to augment what we
figured out in the 2010s with new tools and approaches. I think these
ML technologies and approaches will help us make more progress:

* `Supervised learning`_ (`fine-tuning`_ is a form of supervised learning)
* :ref:`Embeddings <underrated>`
* Generation models (Gemini 1.5 Pro, Claude 3.5 Sonnet, etc.)

.. _defensive publication: https://www.tdcommons.org/

Also, it's possible that technical writers are potentially very well
positioned to create supervised learning datasets for their company's
ML teams.
I have a `defensive publication`_ in the works that demonstrates how
we can use these new tools to make progress on the correctness problem.
Hopefully it will be published in the next few months.

:sup:`1` Rhiona McNamara
.. _Riona McNamara: https://www.linkedin.com/in/rionam
.. _Aaron Gillies: https://www.linkedin.com/in/aaron-gillies-19a3755

:sup:`3` Many people have discussed these intractable challenges over
the years. The most recent ones I'm aware of are `Riona McNamara`_ and
`Aaron Gillies`_.

Q&A renaissance
===============

Q&A is a very natural fit for supervised learning. RAG chatbots also
This is a primordial soup of an idea.

I have a hunch that Q&A (questions & answers) will become more and more important.
When language models are trained or fine-tuned, the data is often
structured as question and answer. When I interact with Gemini,
Claude, etc. through a chat UI, the conversation is often Q&A-style.
Stack Overflow was an invaluable resource for human developers in
the 2010s, and it's all about Q&A. Reddit threads often take the
form of Q&A, where the OP provides a prompt (the question) and the
follow-up questions are basically answers. The theme of Q&A keeps
coming up.

Translation pipelines solved for us
===================================

As mentioned in :ref:`ml-outlooks-2024-translation`
As mentioned in :ref:`ml-reviews-2024-translation`, I think static
site generators (SSG) and content management systems (CMS) should solve machine
translation for us. E.g. just provide an API key to a GenAI service and
the SSG takes care of translating each doc, updating the translation when
the doc changes, etc. This seems like it should be solved at the level of
the SSG or CMS provider.
29 changes: 25 additions & 4 deletions rss.xml
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,33 @@
<channel>
<title>technicalwriting.dev</title>
<link>https://technicalwriting.dev</link>
<description>
<description>
A blog about technical writing by Kayce Basques.
</description>
</description>
<language>en-us</language>
<pubDate>Mon, 21 Oct 2024 09:09:03 -0700</pubDate>
<atom:link href="https://technicalwriting.dev/rss.xml" rel="self" type="application/rss+xml"/>
<pubDate>Fri, 3 Jan 2025 12:41:04 -0700</pubDate>
<atom:link href="https://technicalwriting.dev/rss.xml" rel="self" type="application/rss+xml"/>
<item>
<title>2024 Machine Learning Review (For Technical Writers)</title>
<link>https://technicalwriting.dev/ml/reviews/2024.html</link>
<description>
A review of how much (or little) the ideas from GenAI Outlook 2023 have panned
out, and a discussion of potential future trends in 2025.
</description>
<pubDate>Fri, 3 Jan 2025 12:41:04 -0700</pubDate>
<guid>https://technicalwriting.dev/ml/reviews/2024.html</guid>
</item>
<item>
<title>The intractable challenges of technical writing</title>
<link>https://technicalwriting.dev/strategy/challenges.html</link>
<description>
There are 3 intractable challenges in technical writing.
I do not believe we will ever be able to completely solve these challenges
using only the practices and technologies of the 2010s.
</description>
<pubDate>Fri, 3 Jan 2025 12:41:04 -0700</pubDate>
<guid>https://technicalwriting.dev/strategy/challenges.html</guid>
</item>
<item>
<title>Embeddings are underrated</title>
<link>https://technicalwriting.dev/embeddings/overview.html</link>
Expand Down
Loading

0 comments on commit b0810b3

Please sign in to comment.