Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate our own unicode block constants #6

Merged
merged 1 commit into from
Aug 30, 2024
Merged

Conversation

gregoire-mullvad
Copy link
Member

@gregoire-mullvad gregoire-mullvad commented Jul 26, 2024

Add a script to generate a custom unicode_block module with constants
representing unicode blocks. This replaces the blocks from
unic-ucd-blocks.

Also refactor CharacterType::Range to use a RangeInclusive so we can
also remove the dependency on unic-char-range.

@gregoire-mullvad gregoire-mullvad force-pushed the genblocks branch 5 times, most recently from 1203f23 to b5942d7 Compare July 26, 2024 14:54
Base automatically changed from hackday2 to main August 30, 2024 08:18
Copy link
Member

@faern faern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice!

Nitpicks:

  1. Add a doc comment to UNICODE_BLOCKS and some newlines before it. Makes that generated Rust file a bit easier to read.
  2. Make the Python script download and save Blocks.txt so we can commit it. Makes it possible to trace back what version of unicode_blocks.rs comes from what version of Blocks.txt

@gregoire-mullvad gregoire-mullvad force-pushed the genblocks branch 2 times, most recently from de67b06 to 01bf99c Compare August 30, 2024 09:47
Copy link
Member

@faern faern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nitpick/docs improvement suggestions.

I also think we can have a more descriptive name than genblocks. It does not say much what it's about. How about generate-unicode-block-bindings?

genblocks Outdated
for line in blocksdata.decode().splitlines():
if match := BLOCKDEF.match(line.strip()):
name = match.group("name")
if name in {"Low Surrogates", "High Surrogates", "High Private Use Surrogates"}:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice with a comment on why these are ignored.

genblocks Outdated
@@ -0,0 +1,51 @@
#!/usr/bin/env python
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would appreciate a comment at the start of this script saying approximately what it does. Makes things so much more easy to follow.

genblocks Outdated

with RSBLOCKFILE.open("w") as file:
print("#![cfg_attr(rustfmt, rustfmt_skip)]", file=file)
print("// Code generated by hack/genblocks. DO NOT EDIT.", file=file)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The path to self here is outdated now when it's moved out of hack/

@gregoire-mullvad gregoire-mullvad force-pushed the genblocks branch 2 times, most recently from d7a630c to 508cec6 Compare August 30, 2024 10:15
Add a script to generate a custom unicode_block module with constants
representing unicode blocks. This replaces the blocks from
unic-ucd-blocks.

Also refactor CharacterType::Range to use a RangeInclusive so we can
also remove the dependency on unic-char-range.
@gregoire-mullvad gregoire-mullvad merged commit 35053be into main Aug 30, 2024
25 checks passed
@gregoire-mullvad gregoire-mullvad deleted the genblocks branch August 30, 2024 11:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants