String parsing is the method of turning a string into a char array (i.e. an array of strings with a length of one).
For example, string parsing would turn the string "abc"
into ["a","b","c"]
.
This repository hosted the first string parser, however you should now use Suso's general string parser instead.
Table of Contents
On 2019-12-02 in the Minecraft Version 1.15 Pre-release 4
the bug MC-157426 was fixed, which finally allowed string parsing.
Shortly afterwards, the first string parser was made and published in this repo.
Since 2022-01-19, the string parser on this repo is no longer relevant and has been replaced by the faster suso string parser. Instead, you can now find general info about string parsing here.
The original string parser (strings
, which is featured in this repo) is no longer the recommended to be used. Instead, the Suso's general string parser should be used.
suso.str
link: This implements all the features of the 4 deprecated string parsers and more. You should use this string parser for anything that requires string parsing.
These string parsers are deprecated and should no longer be used.
suso.str (Old)
link: fast, can parse anything except "
, !
and
suso.str_heads
link: fastest, can parse only skulls
suso.str (Ts Fork)
link: average, can parse anything (including "
, !
and
)
strings
(deprecated): slow, parses only a set of 96 characters, including "
, !
and
Here's some utilities that are useful in combination with string parsing:
Here's some examples of what this could or has been used for:
- Parsing Player Input¹ (showcase video)
- Detecting Player Skin Changes²
- Detecting Player's Capes / Model² (showcase video)
- Getting unix timestamp² (showcase video)
- Parsing JSON (old parser, old explanation video)
- TTS (explanation video)
- Skull DLC² (showcase video)
- Simplified Skull DLC² (showcase video)
- Putting Player Names on shields³ (explanation video)
- Syncing to Real Time⁴ (explanation video)
- Informing a datapack when it's outdated²
- Determining the width of a string⁵
- Getting the seed⁴
- Offbrand Players³ (image)
- Replacing player names⁵
String parsing provides an easy way to parse player inputs, especially by using signs and books. You can simply copy the string from these into the string parser. It should be noted that server-side books and signs only update once you close them.
When you create a player skull it will load some player data as a Base64 encoded string. For example this Base64 value in my skill would be:
ewogICJ0aW1lc3RhbXAiIDogMTY0MzA2MzE3Njc1MywKICAicHJvZmlsZUlkIiA6ICJhNDM5YzU5YjI5ZDQ0MzkwOWM5OTJlYTk1ODlhZmI0YyIsCiAgInByb2ZpbGVOYW1lIiA6ICJfdHN0c18iLAogICJ0ZXh0dXJlcyIgOiB7CiAgICAiU0tJTiIgOiB7CiAgICAgICJ1cmwiIDogImh0dHA6Ly90ZXh0dXJlcy5taW5lY3JhZnQubmV0L3RleHR1cmUvMWJiZGI3ZTNkMTg2YzE0OGZkM2Y2MGNmODM0YjlmMmE5NzNlNTkwNjEyODg3YzgxMmUxN2I1OGRiZWEyOTUxOCIKICAgIH0sCiAgICAiQ0FQRSIgOiB7CiAgICAgICJ1cmwiIDogImh0dHA6Ly90ZXh0dXJlcy5taW5lY3JhZnQubmV0L3RleHR1cmUvMTc5MTI3OTBmZjE2NGI5MzE5NmYwOGJhNzFkMGU2MjEyOTMwNDc3NmQwZjM0NzMzNGY4YTZlYWU1MDlmOGE1NiIKICAgIH0KICB9Cn0=
By first string parsing this value and then Base64 encoding it, we can obtain the following data:
{
"timestamp" : 1643063176753,
"profileId" : "a439c59b29d443909c992ea9589afb4c",
"profileName" : "_tsts_",
"textures" : {
"SKIN" : {
"url" : "http://textures.minecraft.net/texture/1bbdb7e3d186c148fd3f60cf834b9f2a973e590612887c812e17b58dbea29518",
"metadata" : {
"model" : "slim"
}
},
"CAPE" : {
"url" : "http://textures.minecraft.net/texture/17912790ff164b93196f08ba71d0e62129304776d0f347334f8a6eae509f8a56"
}
}
}
profileId
and profileName
aren't very useful as we can obtain this value more easily. However this decoded data gives us access to four interesting things:
timestamp
, the unix timestamptextures.SKIN.url
, the url of the player's skin (the same value for players with the same skin)textures.SKIN.metadata.model
, the model of the players skin (metadata is only set for Alex model, for Steve model no value is present instead)textures.CAPE.url
, the url of the player's currently enabled cape (the same value for players with the same cape)
Outside of the obvious uses, this can be used to detect when a player's skin changes (e.g. the creator's skin. You could use this to inform older versions of maps/datapacks that they're outdated by changing skin when a new version comes out). You can go further and change skin a bunch of times, save the skin urls and then have a look up table where each skin means something specific. In Skull DLC we use a very advanced method of this to allow 2^32 (4294967296) different values. You can also use both the cape (everyone has at least the migrator cape, so you always can turn that on and off) and the model as 2 easy bits of data you can transmit.
Because skulls load asynchronously, you can't just place it and immediately parse it. You need to wait until it's loaded, however Suso's General String Parser provides a builtin function to do this for you.
You can obtain player names as a char array, by using the player head loottable and then copying the name from the created skull into the string parser. You then can do various things with the player's name.
You can also use string parsing to get data from commandblock outputs you can't otherwise access. For example the time at the start of every command (e.g. /help me
, the time is not the same as unix timestamp, the command block time is a) local time and b) only hours/minutes/seconds) as well as the output of /seed
. To do this you would first execute the command in a command block, then copy the LastOutput
of this command into the string parser. This process can be optimized a lot by using a prep string, max chars, and limiting the char set it searches for.
Getting the length of a non-JSON string is just a matter of parsing it and then using a look-up table for all characters you want to support. You then add up the widths of all characters (while respecting that the gap between characters is 1 pixel).
You can for example use this to determine the length of a player name, then using negative spaces in team prefix/suffixes to replace the player name with a different text.
Getting the length of a JSON string is a fairly convoluted process. You first need to string parse the string, then JSON parse it, then extract all characters from the JSON, while separating bold and non-bold characters. You use the same look-up table process as for normal strings, with the exception, that all bold characters are 1 pixel wider than the non-bold variant.
To parse a string into a char array, string parsing abuses the fact that the tag list
command sorts its output alphabetically. By "guessing" a string, then comparing it to the string we are trying to parse, we can slowly binary search our way to the original input string.
Since the tag list
commands return value is the amount of tags, we can't easily access the sorted tag list. The only way to access this sorted list is by executing the command from a command block, which will then return the sorted tag list as part of its LastOutput
, albeit inside a bunch of JSON.
The tag list
LastOutput
of an entity called _tsts_
, with the tags x
, y
and z
, would look something like this to the user:
[13:29:32] _tsts_ has 3 tags: x, y, z
but in reality, the actual value of the LastOutput
looks more like this:
In the middle of this are the three tags, sorted:
Unfortunately, since this uses command blocks, string parsing has to use them. To be able to go through as many iterations of the tag list
(and other) commands as we want, we have to use a Command Block Loop.
With the help of the sorted tag list output from the previous step, we can now binary search our way to the actual string.
For this, we put our input string as well as a guess onto an entity, then run tag list
on the entity.
The entity will sort the the tags and we can tell whether our guess is too high or too low in the alphabet.
The guess always consists out of all already found characters (at the start, none) and a single character we are guessing.
For every character we are guessing, we pick the character in the middle of the range of possible characters.
For example, if the whole range is all the lower case characters,
we would start by guessing n
as it is in the middle of the range.
If our input string is "xyz"
, and we start by guessing "n"
, tag list
will sort "n"
to before "xyz"
and we will know that we need to guess higher. The range of possible characters is now all characters above n
and in the middle of that is t
, our next guess.
Eventually, only a single character would remain in the range of possible characters, and we would know that we have found the right character (i.e. x
), we then add this character to our already found characters and continue guessing with "xn"
, then "xt"
, etc.
This way, we can slowly get the entire input string.
Unfortunately, there is an issue with the previous step. Our input string would be, well, a string (e.g. "xyz"
), while our found characters would be an array of characters (e.g. ["x","y"]
after finding the first two). This means we can't actually compare these two and our binary search won't work.
To work around this, the array needs to be flattened into a string (["x","y"]
=> "xy"
). Unfortunately, Minecraft doesn't offer a convenient way to flatten strings.
The only way to flatten strings is to use Enchant Flattening
To do this, first we convert our NBT Array of characters into JSON (by putting it on a sign with {"nbt":"<path>","storage":"<storage>","interpret":true
), which will return something like {"text":"x","extra":[{"text":"y"},{"text":"z"}]}
. Then, we copy this onto an entity's name (signs support Component Resolution, but entities do not). After using enchant flattening, our string will have turned into something like:
Our array has now been flattened into a string, though it is in the middle of a JSON string and we cannot extract it.
To solve that, we simply also enchant flatten the original input string, which wil return the same result.
We can now copy both enchant flattened strings onto an entity, use tag list
as described before, and binary search this way.
One more problem arises when trying to read the output of tag list
. Since the output value is also a string, we don't actually know which of our values sorted first.
To figure out which value sorted first, we have to create a fake tag list
output that matches one of the two possible results we expect. We can enchant flatten our fake output as well as the real fake output, and then we copy the real output on top of the fake output. If they are the same, the operation will fail; otherwise, it will succeed. This way, we can know which way our strings sorted.
Because the output of the tag list
command contains the name of the entity, its UUID, and the current time; constructing the fake output is fairly convoluted.
First, we need to add a tag "a, b, c"
(it's a single tag, trust me) to our entity in the binary search step. This tag will always sort first (since the strings we are comparing are JSON strings and both start with {
) and thereby doesn't affect the result. The exact name of this "a, b, c"
tag is irrelevant, what matters is that it's typed into the output as if it was three tags (when it really is just one).
Now, the output of binary search will look something like: <entity> has 3 tags: a, b, c, <string #1>, <string #2>
To create our fake output, we first remove all the tags from the entity used by the binary search (has to be the same one!), then add 3 separate tags, a
, b
and c
, then run the tag list
command for it. We will receive an output that looks something like this: <entity> has 3 tags: a, b, c
You might already see where this is going. We now append ,
, string #1, ,
and string #2 to it, to get a string that looks the same as our real output.
Then, we use enchant flattening (on both the fake and real output) and our fake output will have the exact same format as the real one.
When the full string has been found, the return value (the integer) will be 2 (one of those being a, b, c
) instead of 3 since both the input string as well as the current guess will be the same. In this case, we know our current guess is right, and that we are done string parsing. This is caused by duplicate tags being removed, which would be the case if both strings were to match.
The LastOutput
of commands always contains the current time. If the current time changes this might mean that two LastOutput
s that should be same are no longer the same. To avoid this, we use help
's feedback to see if the time has changed. If the time has changed during one iteration, we ignore anything found during this iteration and restart it.
When a quote is in the input, it would mess up the list of characters when found, e.g., when we find "
as the second character, our list of found characters might be: ["a",'"']
. The important thing here is that "
's are automatically surrounded by single quotes instead of by double quotes, unlike every other character. This NBT array is no longer valid JSON, and can't be interpreted anymore (which is required to put it on an entity name, to enchant flatten it, or to use tag list
on it).
Originally, we used the help of a sign to solve this issue, however, I finally found a much better method. Instead of using interpret we simply use: {"nbt":"array[]","storage":"storage","interpret":true,"separator":""}
Additionally, quotes are represented as \"
, meaning we always have to search for \"
instead. This means \"
does not sort like you would expect from "
, instead it sorts right before \\
!
and
are the only two printable characters that sort before "
. Since each string ends with a "
, these characters appearing in the string would cause cases where the parser would normally continue the binary search to a higher value to go to a lower value instead. This can cause the correct character to be left out of the search if it's the one being checked at the middle of the range.
To solve this issue, we add the last possible unicode character �
, with code \uFFFF
, to the end of both the input string and our search string. This makes it so that, in every case where the sorting would depend on the character that delimits the end of our string, the sort would continue going lower. To accomodate for this change, we also changed the character that is selected for the comparisons to be the last one of the first half of the possible characters (used to be the first in the second half). This ensures the correct character isn't wrongfully excluded.
To search for \
, you have to search for \\
instead. This sorts after \"
(see quotes)
You can search for newlines by searching for \n
(sorts after \\
). However, you can't easily obtain a string with a value of \n
. To obtain it, open a writable book, type enter, close the book, and copy the contents of the first page of the book into storage. Now, when you want to search for newline, copy this newline string into the right spot in your input character array.
Suso's general string parser contains a loot table, to generate such a book automatically.
Command blocks can be arranged in a loop (e.g. a 2x2 loop of chain command blocks, with an impulse command pointing into them to start the loop). Usually command blocks will only try to run once per tick. However, by setting their UpdateLastExecution
value to 0
they will "forget" that they've already ran a command, and will be able to run another one.
So, if you reset the UpdateLastExecution
of all command blocks in the loop at the end of every loop iteration, you will be able to have a basically infinite command block loop.
This setup is really convenient when combined with functions inside some of the command blocks, since they can change the contents of their neighbors and get output from them on each iteration of the loop.
When the enchant command fails, the LastOutput
of a command block executing the command will contain a flattened version of the entity name.
For example using enchant 00000000-0073-9653-0000-000000000002 minecraft:aqua_affinity
on an entity called "xyz"
, will return '{"extra":[{"color":"red","extra":[{"translate":"commands.enchant.failed.entity","with":["xyz"]}],"text":""}],"text":"[02:13:53] "}'
Resolution of nbt
components can only easily be done by signs.
help
subcommands always return the same output. help me
returns the shortest LastOutput
(i.e. [<timestamp>] /me <action>
), and should thereby be the most efficient
Tags can be added and removed from entities by using the data
command to modify their Tags
value. Tags added this way can contain characters that aren't valid in the tag
command.
- McTsts > Original String Parser
- Suso > String Parser
- gibbsly > Base64 & JSON Parser (included in this repo)
- Zonteek > Time from Command Block Concept
- SirBenet >
"a, b, c"
tag step explanation - Onnowhere >
/help
step idea - Oskar > Asking the devs about the
tag list
bug (MC-157426) - Asometric > Moral Support