diff --git a/html.md b/html.md index edb5b17..aee9205 100644 --- a/html.md +++ b/html.md @@ -1,72 +1,81 @@ -[book]: https://web.dev/learn/html -[book_hello_world]: https://web.dev/learn/html/document-structure -[ref]: https://developer.mozilla.org/en-US/docs/Web/HTML/Reference -[repo]: https://github.com/whatwg/html -[spec]: https://html.spec.whatwg.org/ +# HTML: Hypertext Markup Language -[language]: https://en.wikipedia.org/wiki/Writing_system -[lang]: https://en.wikipedia.org/wiki/IETF_language_tag -[zuerich_german]: https://en.wikipedia.org/wiki/Z%C3%BCrich_German -[enchanting_table]: https://minecraft.wiki/w/Enchanting_Table#Standard_Galactic_Alphabet +## Table of Contents +1. [Introduction](#introduction) +2. [What is Markup?](#what-is-markup) +3. [Hello World Example](#hello-world-example) +4. [HTML Syntax](#html-syntax) +5. [Key Attributes](#key-attributes) +6. [Important Notes for Professionals](#important-notes-for-professionals) +7. [Additional Resources](#additional-resources) -[title]: https://developer.mozilla.org/en-US/docs/Web/HTML/Element/title -[h1]: https://developer.mozilla.org/en-US/docs/Web/HTML/Element/Heading_Elements -[p]: https://developer.mozilla.org/en-US/docs/Web/HTML/Element/p -[a]: https://developer.mozilla.org/en-US/docs/Web/HTML/Element/a -[input]: https://developer.mozilla.org/en-US/docs/Web/HTML/Element/input -[img]: https://developer.mozilla.org/en-US/docs/Web/HTML/Element/img -[audio]: https://developer.mozilla.org/en-US/docs/Web/HTML/Element/audio -[video]: https://developer.mozilla.org/en-US/docs/Web/HTML/Element/video -[canvas]: https://developer.mozilla.org/en-US/docs/Web/HTML/Element/canvas +## Introduction -# [/whatwg/html][repo] - [MDN][ref] +HTML (Hypertext Markup Language) is the backbone of the web. It extends any document, in any language, with structured, multimedia, and interactive markup, transforming it into a dynamic digital document, webpage, website, or app. -HTML - _Hypertext Markup Language_, extends any document, in any language, with structured ([``][title]), multimedia ([`<video>`][video]), interactive ([`<input>`][input]), markup, into a, _magic_, digital document, or (web)page, (web)site, or app. +Key features of HTML include: +- Structured content (e.g., [`<title>`](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/title), [`<h1>`](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/Heading_Elements)) +- Multimedia support (e.g., [`<video>`](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/video), [`<audio>`](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/audio), [`<img>`](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/img)) +- Interactive elements (e.g., [`<input>`](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/input), [`<canvas>`](https://developer.mozilla.org/en-US/docs/Web/HTML/Element/canvas)) -### What is Markup +## What is Markup? -A markup language, is, a declarative programming language, that is a superset of [written natural language][language]. Markup languages have **semantics** because natural language does. +A markup language is a declarative programming language that extends written natural language. Markup languages have **semantics** because they build upon the inherent meaning in natural language. -### Note for pros - -HTML is not XHTML, XHTML is HTML in XML. HTML is not JSX, JSX is HTML in JS. HTML is not W3C/HTML, W3C made HTML4. Only WHATWG/HTML, HTML5, is HTML. Until Mozilla merges with WHATWG into HTML6. - -## Hello World +## Hello World Example ```html <!-- ./hello-world.html --> <!DOCTYPE html> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width"> - Hello, World! ``` -Read [web.dev/learn/html/document-structure][book_hello_world] to understand this ~~mess~~ history. +For a detailed explanation of this structure, refer to [web.dev's HTML document structure guide](https://web.dev/learn/html/document-structure). + +## HTML Syntax + +Basic HTML syntax includes: -## Syntax +- Comments: `<!-- comment -->` +- Unicode character references: `&#x` + $UNICODE + `;` +- Styling: `<style>` for CSS +- Scripting: `<script type="module">` for JavaScript +- Elements: `<tagname>content</tagname>` or `<tagname />` -- `<!-- comment -->` - Comment -- `&#x` + $UNICODE + `;` - Escape -- `<style>` - CSS -- `<script type="module">` - JS +## Key Attributes ### `lang` Attribute -The markup's language is set with the [IETF language tag][lang]. It includes fantasy languages prefixed with `x-`. +The `lang` attribute sets the language of the markup using [IETF language tags](https://en.wikipedia.org/wiki/IETF_language_tag). It can include real and fantasy languages: - `lang="en-GB"` - British English -- `lang="gsw-u-sd-chzh"` - [Zuerich German][zuerich_german] -- `lang="x-sga"` - [Minecraft Enchanting Table][enchanting_table] +- `lang="gsw-u-sd-chzh"` - [ZΓΌrich German](https://en.wikipedia.org/wiki/Z%C3%BCrich_German) +- `lang="x-sga"` - [Minecraft Enchanting Table](https://minecraft.wiki/w/Enchanting_Table#Standard_Galactic_Alphabet) ### `class` Attribute -Extends elements with semantic meaning. +The `class` attribute extends elements with semantic meaning: + +```html +<button class="signup">Sign Up</button> +<button class="signin">Sign In</button> +<button class="signout">Sign Out</button> +``` + +## Important Notes for Professionals + +- HTML is not XHTML (HTML in XML syntax) +- HTML is not JSX (HTML-like syntax in JavaScript) +- The current HTML standard is WHATWG's HTML5, not W3C's HTML4 +- A merger between Mozilla and WHATWG may lead to HTML6 in the future -- `<button class="signup">` -- `<button class="signin">` -- `<button class="signout">` +## Additional Resources -## See also +- [Official WHATWG HTML Specification](https://html.spec.whatwg.org/) +- [MDN Web Docs HTML Reference](https://developer.mozilla.org/en-US/docs/Web/HTML/Reference) +- [web.dev Learn HTML Course](https://web.dev/learn/html) +- [WHATWG HTML GitHub Repository](https://github.com/whatwg/html) -Learn HTML [web.dev/learn/html](https://web.dev/learn/html) +This README provides an overview of HTML. For in-depth information and implementation details, please consult the official documentation and resources linked above. diff --git a/utf8.md b/utf8.md index 507dfca..5d35564 100644 --- a/utf8.md +++ b/utf8.md @@ -1,69 +1,94 @@ -# [`UTF-8`](https://en.wikipedia.org/wiki/UTF-8) and [unicode.org](https://unicode.org/) +# UTF-8 and Unicode: A Comprehensive Guide -Unicode - _Universal Coded Character Set_, is an [ASCII](https://en.wikipedia.org/wiki/ASCII) superset of roughly 150 thousand characters. UTF-8 - _Unicode Transformation Format 8-bit_, maps binary numbers, or bytes, to Unicode characters. +## Table of Contents +1. [Introduction](#introduction) +2. [What is a Character?](#what-is-a-character) +3. [Character Encoding](#character-encoding) +4. [UTF-8 Explained](#utf-8-explained) +5. [Glyph Rendering](#glyph-rendering) +6. [Historical Context: Morse Code](#historical-context-morse-code) +7. [Unicode Updates](#unicode-updates) +8. [Additional Resources](#additional-resources) -```sh -# Hardware -8 bits : 1 byte +## Introduction -# UTF-8 -1 byte : 1 character # ASCII -2 bytes : 1 character -3 bytes : 1 character -4 bytes : 1 character +Unicode, short for Universal Coded Character Set, is a comprehensive standard for encoding and representing text in computer systems. UTF-8 (Unicode Transformation Format 8-bit) is the most widely used encoding method for Unicode characters. -# Font -X characters : 1 glyph -``` +> **Note:** The term "Unicode" comes from "en-cod-ing" and is not related to programming. A more fitting name might have been "Unilang" (Unified Language). + +## What is a Character? + +Characters are the fundamental elements of written language and digital communication. They include: + +1. Writing system characters (e.g., `1234`, `abcd`, `π“¨π“Ž†π“€π“€π“€`) +2. Control characters (e.g., `\n` for newline, `\b` for backspace) +3. Signs and symbols (e.g., `✝` Latin cross, `︻デ═一` ASCII art) +4. Optical Character Recognition (OCR) symbols +5. Emojis + +## Character Encoding + +Character encoding is the process of converting characters into a format that computers can store and process. Since the 1970s, a byte (8 bits) has been the smallest unit of data in computer hardware. Character encodings are software libraries that perform two main functions: + +1. Encode: Write characters into bytes +2. Decode: Read bytes into characters -The term _Unicode_, from _en-cod-ing_, is not related to programming, and _Unilang_, meaning _lang_ - language and _uni_ - unification or internationalization, would have been a more fitting name. +## UTF-8 Explained -## What's a character +UTF-8 is a variable-width character encoding that can represent all Unicode characters. It uses between 1 and 4 bytes per character: -Characters are elements of [language](https://en.wikipedia.org/wiki/language), and invisible [control characters](https://en.wikipedia.org/wiki/C0_and_C1_control_codes). +``` +1 byte : ASCII characters +2 bytes : Additional Latin, Greek, Cyrillic, etc. +3 bytes : Chinese, Japanese, Korean, etc. +4 bytes : Rare characters, emojis, etc. +``` -- [writing system](https://en.wikipedia.org/wiki/Writing_system) characters - - `1234` and `abcd` - - `π“¨π“Ž†π“€π“€π“€` `𓁨` `π“Ž†` `𓏀` `𓏀` `𓏀` is `1000013` in ancient egyptian - - control [`\n` newline](https://en.wikipedia.org/wiki/Newline), [`\b` backspace](https://en.wikipedia.org/wiki/Backspace) -- [signs](https://en.wikipedia.org/wiki/Sign) and symbols - - `✝` latin cross - - `︻デ═一` ASCII art -- [Optical Character Recognition](<https://en.wikipedia.org/wiki/Optical_Character_Recognition_(Unicode_block)>) -- [emoji](https://en.wikipedia.org/wiki/Emoji) +This design allows UTF-8 to be backward compatible with ASCII while supporting the full range of Unicode characters. -see: [unicode characters that are not writing system characters](https://en.wikipedia.org/wiki/Unicode_symbol) +## Glyph Rendering -_If some of these error, or [mojibake](https://en.wikipedia.org/wiki/Mojibake), then you're either, not on Unicode 2022 version 15 or later, or you don't have a supporting font loaded._ +A font (also called a typeface or font family) is responsible for displaying characters as visible glyphs. Some interesting features of fonts include: -## Character encoding +- Alternate glyphs for the same character (common in handwriting fonts) +- Ligatures: Merging two or more characters into a single glyph +- Variable fonts: Capable of animations and dynamic adjustments + +Popular font resources: +- [Google Fonts](https://fonts.google.com/) +- [Microsoft Typography](https://learn.microsoft.com/en-us/typography/) +- [Adobe Fonts](https://fonts.adobe.com/) -Since the 1970s, a [byte](https://en.wikipedia.org/wiki/Byte), or eight digit binary number, is the smallest amount of data that can exist in computer hardware. A character is one or more bytes. Character encodings are software libraries, to encode - _write characters into bytes_, and decode - _read bytes into characters_. +## Historical Context: Morse Code -1844 [Morse code](https://en.wikipedia.org/wiki/Morse_code). Latin characters => Ternary interpretation => International Morse code +Before digital encoding, Morse code was used to transmit text over telegraph lines. It's an interesting precursor to modern character encoding: -```sh -SOS -00021112000 +``` +SOS in Morse code: ... --- ... -# There is no lowercase in Morse -HELLO, WORLD! -000020201002010021112110011222011211120102010021002101011 +HELLO, WORLD! in Morse code: .... . .-.. .-.. --- --..-- .-- --- .-. .-.. -.. -.-.-- ``` -## Glyph rendering +## Unicode Updates -A [font, font family, or typeface](https://en.wikipedia.org/wiki/Typeface), displays characters as glyphs. +Unicode is regularly updated to include new characters and writing systems. The 2022 update (Unicode 15.0) included: -- [fonts.google.com](https://fonts.google.com/) -- [Microsoft Typography](https://learn.microsoft.com/en-us/typography/) -- [fonts.adobe.com](https://fonts.adobe.com/) -<!-- https://www.ibm.com/plex/ --> +- New writing systems (Kawi script, Mundari language) +- Control characters for Egyptian hieroglyphs +- Additional astrological symbols +- Thousands of new CJK (Chinese, Japanese, Korean) characters +- New emojis + +## Additional Resources -A font can have multiple [alternate glyphs](<https://en.wikipedia.org/wiki/Swash_(typography)>) for the same character. This is sometimes used in [handwriting fonts](https://fonts.google.com/?classification=Handwriting). A font can also merge, two or more characters, into a [ligature glyph](<https://en.wikipedia.org/wiki/Ligature_(writing)>). This is used to create [icons on the web](https://fonts.google.com/icons), and [programming ligatures](https://github.com/microsoft/cascadia-code). [Variable fonts are even capable of animations](https://fonts.google.com/knowledge/using_variable_fonts_on_the_web/interactive_animations_with_variable_fonts). +- [Official Unicode Website](https://unicode.org/) +- [UTF-8 on Wikipedia](https://en.wikipedia.org/wiki/UTF-8) +- [ASCII on Wikipedia](https://en.wikipedia.org/wiki/ASCII) +- [Writing Systems](https://en.wikipedia.org/wiki/Writing_system) +- [Emoji Charts](https://unicode.org/emoji/charts/) -## Patch notes +--- -[Unicode 2022](https://unicode.org/versions/Unicode15.0.0/) or version 15, includes, [one](https://en.wikipedia.org/wiki/Kawi_script), [two](https://en.wikipedia.org/wiki/Mundari_language) writing systems, [control characters for egyptian hieroglyphs](https://www.unicode.org/L2/L2021/21248-egyptian-controls.pdf), [8 astrology symbols](https://www.unicode.org/charts/PDF/Unicode-15.0/U150-1F700.pdf#page=5), [4000 chinse-japanese-korean characters](https://www.unicode.org/charts/PDF/Unicode-15.0/U150-31350.pdf), [some emojis](https://unicode.org/emoji/charts-15.0/emoji-released.html), [and more](https://www.unicode.org/charts/PDF/Unicode-15.0/). +This README provides an overview of UTF-8 and Unicode. For specific implementation details or more in-depth information, please consult the official documentation and resources linked above.