|
| 1 | +[](https://github.com/dart-lang/core/actions/workflows/characters.yaml) |
| 2 | +[](https://pub.dev/packages/characters) |
| 3 | +[](https://pub.dev/packages/characters/publisher) |
| 4 | + |
| 5 | +[`Characters`][Characters] are strings viewed as |
| 6 | +sequences of **user-perceived character**s, |
| 7 | +also known as [Unicode (extended) grapheme clusters][Grapheme Clusters]. |
| 8 | + |
| 9 | +The [`Characters`][Characters] class allows access to |
| 10 | +the individual characters of a string, |
| 11 | +and a way to navigate back and forth between them |
| 12 | +using a [`CharacterRange`][CharacterRange]. |
| 13 | + |
| 14 | +## Unicode characters and representations |
| 15 | + |
| 16 | +There is no such thing as plain text. |
| 17 | + |
| 18 | +Computers only know numbers, |
| 19 | +so any "text" on a computer is represented by numbers, |
| 20 | +which are again stored as bytes in memory. |
| 21 | + |
| 22 | +The meaning of those bytes are provided by layers of interpretation, |
| 23 | +building up to the *glyph*s that the computer displays on the screen. |
| 24 | + |
| 25 | +| Abstraction | Dart Type | Usage | Example | |
| 26 | +| --------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | |
| 27 | +| Bytes | [`ByteBuffer`][ByteBuffer],<br />[`Uint8List`][Uint8List] | Physical layout: Memory or network communication. | `file.readAsBytesSync()` | |
| 28 | +| [Code units][] | [`Uint8List`][Uint8List] (UTF‑8)<br />[`Uint16List`][Uint16List], [`String`][String] (UTF‑16) | Standard formats for<br /> encoding code points in memory.<br />Stored in memory using one (UTF‑8) or more (UTF‑16) bytes. One or more code units encode a code point. | `string.codeUnits`<br />`string.codeUnitAt(index)`<br />`utf8.encode(string)` | |
| 29 | +| [Code points][] | [`Runes`][Runes] | The Unicode unit of meaning. | `string.runes` | |
| 30 | +| [Grapheme Clusters][] | [`Characters`][Characters] | Human perceived character. One or more code points. | `string.characters` | |
| 31 | +| [Glyphs][] | | Visual rendering of grapheme clusters. | `print(string)` | |
| 32 | + |
| 33 | +A Dart `String` is a sequence of UTF-16 code units, |
| 34 | +just like strings in JavaScript and Java. |
| 35 | +The runtime system decides on the underlying physical representation. |
| 36 | + |
| 37 | +That makes plain strings inadequate |
| 38 | +when needing to manipulate the text that a user is viewing, or entering, |
| 39 | +because string operations are not working at the grapheme cluster level. |
| 40 | + |
| 41 | +For example, to abbreviate a text to, say, the 15 first characters or glyphs, |
| 42 | +a string like "A 🇬🇧 text in English" |
| 43 | +should abbreviate to "A 🇬🇧 text in Eng… when counting characters, |
| 44 | +but will become "A 🇬🇧 text in …" |
| 45 | +if counting code units using [`String`][String] operations. |
| 46 | + |
| 47 | +Whenever you need to manipulate strings at the character level, |
| 48 | +you should be using the [`Characters`][Characters] type, |
| 49 | +not the methods of the [`String`][String] class. |
| 50 | + |
| 51 | +## The Characters class |
| 52 | + |
| 53 | +The [`Characters`][Characters] class exposes a string |
| 54 | +as a sequence of grapheme clusters. |
| 55 | +All operations on [`Characters`][Characters] operate |
| 56 | +on entire grapheme clusters, |
| 57 | +so it removes the risk of splitting combined characters or emojis |
| 58 | +that are inherent in the code-unit based [`String`][String] operations. |
| 59 | + |
| 60 | +You can get a [`Characters`][Characters] object for a string using either |
| 61 | +the constructor [`Characters(string)`][Characters constructor] |
| 62 | +or the extension getter `string.characters`. |
| 63 | + |
| 64 | +At its core, the class is an [`Iterable<String>`][Iterable] |
| 65 | +where the element strings are single grapheme clusters. |
| 66 | +This allows sequential access to the individual grapheme clusters |
| 67 | +of the original string. |
| 68 | + |
| 69 | +On top of that, there are operations mirroring the operations |
| 70 | +of [`String`][String] that are not index, code-unit or code-point based, |
| 71 | +like [`startsWith`][Characters.startsWith] |
| 72 | +or [`replaceAll`][Characters.replaceAll]. |
| 73 | +There are some differences between these and the [`String`][String] operations. |
| 74 | +For example the replace methods only accept characters as pattern. |
| 75 | +Regular expressions are not grapheme cluster aware, |
| 76 | +so they cannot be used safely on a sequence of characters. |
| 77 | + |
| 78 | +Grapheme clusters have varying length in the underlying representation, |
| 79 | +so operations on a [`Characters`][Characters] sequence cannot be index based. |
| 80 | +Instead, the [`CharacterRange`][CharacterRange] *iterator* |
| 81 | +provided by [`Characters.iterator`][Characters.iterator] |
| 82 | +has been greatly enhanced. |
| 83 | +It can move both forwards and backwards, |
| 84 | +and it can span a *range* of grapheme cluster. |
| 85 | +Most operations that can be performed on a full [`Characters`][Characters] |
| 86 | +can also be performed on the grapheme clusters |
| 87 | +in the range of a [`CharacterRange`][CharacterRange]. |
| 88 | +The range can be contracted, expanded or moved in various ways, |
| 89 | +not restricted to using [`moveNext`][CharacterRange.moveNext], |
| 90 | +to move to the next grapheme cluster. |
| 91 | + |
| 92 | +Example: |
| 93 | + |
| 94 | +```dart |
| 95 | +// Using String indices. |
| 96 | +String? firstTagString(String source) { |
| 97 | + var start = source.indexOf('<') + 1; |
| 98 | + if (start > 0) { |
| 99 | + var end = source.indexOf('>', start); |
| 100 | + if (end >= 0) { |
| 101 | + return source.substring(start, end); |
| 102 | + } |
| 103 | + } |
| 104 | + return null; |
| 105 | +} |
| 106 | +
|
| 107 | +// Using CharacterRange operations. |
| 108 | +Characters? firstTagCharacters(Characters source) { |
| 109 | + var range = source.findFirst('<'.characters); |
| 110 | + if (range != null && range.moveUntil('>'.characters)) { |
| 111 | + return range.currentCharacters; |
| 112 | + } |
| 113 | + return null; |
| 114 | +} |
| 115 | +``` |
| 116 | + |
| 117 | +[ByteBuffer]: https://api.dart.dev/dart-typed_data/ByteBuffer-class.html "ByteBuffer class" |
| 118 | +[CharacterRange.moveNext]: https://pub.dev/documentation/characters/latest/characters/CharacterRange/moveNext.html "CharacterRange.moveNext" |
| 119 | +[CharacterRange]: https://pub.dev/documentation/characters/latest/characters/CharacterRange-class.html "CharacterRange class" |
| 120 | +[Characters constructor]: https://pub.dev/documentation/characters/latest/characters/Characters/Characters.html "Characters constructor" |
| 121 | +[Characters.iterator]: https://pub.dev/documentation/characters/latest/characters/Characters/iterator.html "CharactersRange get iterator" |
| 122 | +[Characters.replaceAll]: https://pub.dev/documentation/characters/latest/characters/Characters/replaceAll.html "Characters.replaceAlle" |
| 123 | +[Characters.startsWith]: https://pub.dev/documentation/characters/latest/characters/Characters/startsWith.html "Characters.startsWith" |
| 124 | +[Characters]: https://pub.dev/documentation/characters/latest/characters/Characters-class.html "Characters class" |
| 125 | +[Code Points]: https://unicode.org/glossary/#code_point "Unicode Code Point" |
| 126 | +[Code Units]: https://unicode.org/glossary/#code_unit "Unicode Code Units" |
| 127 | +[Glyphs]: https://unicode.org/glossary/#glyph "Unicode Glyphs" |
| 128 | +[Grapheme Clusters]: https://unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries "Unicode (Extended) Grapheme Cluster" |
| 129 | +[Iterable]: https://api.dart.dev/dart-core/Iterable-class.html "Iterable class" |
| 130 | +[Runes]: https://api.dart.dev/dart-core/Runes-class.html "Runes class" |
| 131 | +[String]: https://api.dart.dev/dart-core/String-class.html "String class" |
| 132 | +[Uint16List]: https://api.dart.dev/dart-typed_data/Uint16List-class.html "Uint16List class" |
| 133 | +[Uint8List]: https://api.dart.dev/dart-typed_data/Uint8List-class.html "Uint8List class" |
0 commit comments