diff --git a/a11y/skip.rst b/a11y/skip.rst index ecc4075..cce7206 100644 --- a/a11y/skip.rst +++ b/a11y/skip.rst @@ -1,8 +1,8 @@ .. _skip-to-main-content: -======================================================= -Please support "skip to main content" on your docs site -======================================================= +===================================== +Please support "skip to main content" +===================================== 2024 Jun 3 diff --git a/build/bazel.rst b/build/bazel.rst index 37351b4..222859f 100644 --- a/build/bazel.rst +++ b/build/bazel.rst @@ -1,8 +1,8 @@ .. _bazel: -=============================================== -Development log: Migrating pigweed.dev to Bazel -=============================================== +============================== +Migrating pigweed.dev to Bazel +============================== I have been tasked with the fascinating and somewhat-overwhelming project of migrating pigweed.dev's build system from GN to Bazel. diff --git a/data/intertwingularity.rst b/data/intertwingularity.rst index 5e698a4..f595bb8 100644 --- a/data/intertwingularity.rst +++ b/data/intertwingularity.rst @@ -26,7 +26,7 @@ links enable us to get closer to the intertwingled nature of knowledge: acknowledged—people keep pretending they can make things hierarchical, categorizable and sequential when they can't. -After reading those quotes, it was clear what I must do: +After reading those quotes, my first thought was: .. figure:: /_static/boat.png :alt: I should build a web crawler. @@ -53,8 +53,7 @@ broadly. If a lot of my docs pages link to some particular page, then that page is probably important. `PageRank`_ Lite, basically, except with much more focus on intra-site `backlinks`_. -(Also, I'm building a web crawler because it's a deeply fun and satisfying -programming challenge. You should try it!) +(Also, I'm building a web crawler because it's fun. Try it!) Importance ========== @@ -71,8 +70,8 @@ I can't give every page on my docs site the same level of tender loving care. I must decide which pages get more of my time and energy and which ones get less. -There is no single approach that provides a *full* answer to the fundametal -question. There are, however, lots of approaches that provide *partial* +There is no single approach that provides a *full* answer to this question. +There are, however, lots of approaches that provide *partial* answers. Pageviews @@ -108,14 +107,15 @@ pages have the most `load-bearing`_ content. Suppose that Pages A, B, and C all link to Page D. There's probably some important content on Page D. Pages with more backlinks (e.g. Page D) should probably be prioritized above pages with less backlinks. Think about it in terms of user journey. -Users on Pages A, B, and C all have a chance of ending up on Page D. +Users on Pages A, B, and C all have a chance of ending up on Page D. There's +some idea (or data, or knowledge, or whatever) on Page D that Pages A, B, and +C all depend on. .. _triangulate: https://en.wikipedia.org/wiki/Triangulation_(social_science) -Revisiting the "top 10 pages all have the same amount of pageviews and -I only have time to review 5" problem, when I `triangulate`_ the pageview -data with the backlink data, it becomes easier to decide which 5 to -prioritize: +Returning to the docs review prioritization problem, when I `triangulate`_ +the pageview data with the backlink data, it becomes easier to decide which 5 +to prioritize: .. csv-table:: :header: "Page ID", "Pageviews", "Backlinks" @@ -136,29 +136,39 @@ Networked knowledge .. _Too Big To Know: https://en.wikipedia.org/wiki/Too_Big_to_Know -Beyond the "docs review prioritization" problem I have another motivation -for studying backlinks. I simply want to know more about how the ideas -within my docs site relate and connect to each other. The concept of -*networked knowledge* from `Too Big To Know`_ fascinates me: +I have another motivation for studying backlinks. I simply want to know +more about how the ideas within my docs site relate and connect to each +other. The concept of *networked knowledge* from `Too Big To Know`_ +fascinates me: - The chance in the infrastructure of knowledge is altering knowledge's + The change in the infrastructure of knowledge is altering knowledge's shape and nature. As knowledge becomes networked, the smartest person in the room isn't the person at the front lecturing us, and isn't the collective wisdom of those in the room. The smartest person in the room is the room itself: the network that joins the people and ideas in the room, and connects to those outside of it. It's not that the - network is becoming a conscious super-brain. Rather knowledge is + network is becoming a conscious super-brain\ :sup:`2`. Rather knowledge is becoming inextricable from—literally unthinkable without— the network that enables it. Our task is to learn how to build smart rooms—that is, how to build networks that make us smarter... -How exactly does my web crawler relate to networked knowledge? I don't -know. My thinking here is still very much in the "primordial soup of -opportunities" phase. +:sup:`2` David Weinberger ------------------------------ -Anatomy of a backlink crawler ------------------------------ +--------------------- +Anatomy of my crawler +--------------------- + +I got the core crawler logic working in about 200 lines of Python code, +depending on ``requests`` for HTTP stuff and ``beautifulsoup4`` for scraping +stuff. Here's the gist of the logic: + +* Designate an entrypoint URL, e.g. ``https://technicalwriting.dev``. URLs + that start with this entrypoint are considered intra-site. URLs that don't + start with the entrypoint are considered external. +* Grab all links that are found within the main content of the entrypoint. +* Whenever an intra-site URL is found, visit and scrape that page's links. +* External URLs just need to be noted and tracked. They don't need to be + visited or scraped. --------------- To be continued diff --git a/index.rst b/index.rst index afc89f2..2d27200 100644 --- a/index.rst +++ b/index.rst @@ -33,7 +33,7 @@ Docs-as-Code (DaC) Docs-as-Data (DaD) ------------------ -* :ref:`intertwingularity` +* :ref:`intertwingularity`. .. _ml: diff --git a/ml/evals.rst b/ml/evals.rst index aa81f3e..a22a750 100644 --- a/ml/evals.rst +++ b/ml/evals.rst @@ -1,8 +1,8 @@ .. _evals: -================================================================== -Evaluating the quality of my retrieval-augmented generation system -================================================================== +================================= +Evaluating quality in RAG systems +================================= .. _Strategy\: Test changes systematically: https://platform.openai.com/docs/guides/gpt-best-practices/strategy-test-changes-systematically .. _OpenAI's Evals framework: https://github.com/openai/evals diff --git a/ml/huggingface.rst b/ml/huggingface.rst index 7cb9102..3b44b4f 100644 --- a/ml/huggingface.rst +++ b/ml/huggingface.rst @@ -1,8 +1,8 @@ .. _huggingface: -========================================================== -Generating summaries with HuggingFace summarization models -========================================================== +============================================ +Generating summaries with HuggingFace models +============================================ .. _HuggingFace Transformers: https://huggingface.co/docs/transformers/index .. _biodigitaljazz.net: https://biodigitaljazz.net diff --git a/ml/outlook-2023.rst b/ml/outlook-2023.rst index d9a15fc..4ce9292 100644 --- a/ml/outlook-2023.rst +++ b/ml/outlook-2023.rst @@ -1,13 +1,13 @@ .. _genai-outlook-2023: -========================================== -GenAI Outlook (For Technical Writers) 2023 -========================================== +================== +GenAI Outlook 2023 +================== 2023 Mar 24 A lot of people are talking about how generative AI is a gamechanger for -documentation. This post summarizes what's going on. +the technical writing industry. This post summarizes what's going on. ---------------- Potential threat diff --git a/ml/principles.rst b/ml/principles.rst index 6a65db9..0a30a52 100644 --- a/ml/principles.rst +++ b/ml/principles.rst @@ -1,8 +1,8 @@ .. _principles: -============================================== -Response to "10 principles for writing for AI" -============================================== +====================================== +Re: "10 principles for writing for AI" +====================================== 2023 Apr 21 diff --git a/ml/stateful-assistants.rst b/ml/stateful-assistants.rst index 91bc589..4198962 100644 --- a/ml/stateful-assistants.rst +++ b/ml/stateful-assistants.rst @@ -1,8 +1,8 @@ .. _stateful-assistants: -=========================================== -Stateful docs site assistants are promising -=========================================== +============================= +Stateful docs site assistants +============================= 2024 Feb 8 diff --git a/src/verbatim-wrangling.rst b/src/verbatim-wrangling.rst index 814417c..a74d17d 100644 --- a/src/verbatim-wrangling.rst +++ b/src/verbatim-wrangling.rst @@ -1,8 +1,8 @@ .. _verbatim-wrangling: -================================================= -Wrangling verbatim text in Doxygen+Breathe+Sphinx -================================================= +======================= +Wrangling verbatim text +======================= .. _small CLs: https://google.github.io/eng-practices/review/developer/small-cls.html .. _verbatim: https://www.doxygen.nl/manual/commands.html#cmdverbatim diff --git a/ux/offline.rst b/ux/offline.rst index 8981922..a6c3b7e 100644 --- a/ux/offline.rst +++ b/ux/offline.rst @@ -1,8 +1,8 @@ .. _offline: -==================================== -Software engineers want offline docs -==================================== +============ +Offline docs +============ 2023 May 31 diff --git a/ux/pdf.rst b/ux/pdf.rst index 997c393..563d5ff 100644 --- a/ux/pdf.rst +++ b/ux/pdf.rst @@ -1,8 +1,8 @@ .. _pdf: -======================================= -You can deeplink to a specific PDF page -======================================= +============================== +Deeplink to specific PDF pages +============================== 2024 Jul 11