feat: refactor truncate string package #1097

steff456 · 2023-09-13T21:26:06Z

Part of #1062

Description

What is the purpose of this pull request?

This pull request:

Refactors the string package @stdlib/string/truncate
Adds @stdlib/string/base/truncate
Adds @stdlib/string/base/truncate-code-point
Adds @stdlib/string/base/truncate-grapheme-cluster
Adds @stdlib/string/num-code-points

Related Issues

Does this pull request have any related issues?

This pull request:

part of Tracking issue: refactor string packages handling grapheme clusters in terms of "base" packages #1062

Questions

Any questions for reviewers of this pull request?

No.

Other

Any other information relevant to this pull request? This may include screenshots, references, and/or implementation notes.

No.

Checklist

Please ensure the following tasks are completed before submitting this pull request.

Read, understood, and followed the contributing guidelines.

@stdlib-js/reviewers

stdlib-bot · 2023-09-13T21:28:32Z

Coverage Report

Package	Statements	Branches	Functions	Lines
string/base/truncate-code-point	$\color{green}165/165$ $\color{green}+100.00\%$	$\color{green}15/15$ $\color{green}+100.00\%$	$\color{green}1/1$ $\color{green}+100.00\%$	$\color{green}165/165$ $\color{green}+100.00\%$
string/base/truncate-grapheme-cluster	$\color{green}142/142$ $\color{green}+100.00\%$	$\color{green}14/14$ $\color{green}+100.00\%$	$\color{green}1/1$ $\color{green}+100.00\%$	$\color{green}142/142$ $\color{green}+100.00\%$
string/base/truncate	$\color{green}113/113$ $\color{green}+100.00\%$	$\color{green}8/8$ $\color{green}+100.00\%$	$\color{green}1/1$ $\color{green}+100.00\%$	$\color{green}113/113$ $\color{green}+100.00\%$
string/num-code-points	$\color{green}137/137$ $\color{green}+100.00\%$	$\color{green}11/11$ $\color{green}+100.00\%$	$\color{green}1/1$ $\color{green}+100.00\%$	$\color{green}137/137$ $\color{green}+100.00\%$
string/truncate	$\color{green}187/187$ $\color{green}+100.00\%$	$\color{green}22/22$ $\color{green}+100.00\%$	$\color{green}1/1$ $\color{green}+100.00\%$	$\color{green}187/187$ $\color{green}+100.00\%$

The above coverage report was generated for the changes in this PR.

kgryte

@steff456 Similar to string/reverse (pr: #1082), would you mind renaming truncate-code-point and truncate-grapheme-cluster to truncate-code-points and truncate-grapheme-clusters, respectively? As in that PR, it makes sense to use the plural; otherwise, the package name suggests that one truncates a single code point or grapheme cluster, which is not the case.

stdlib-bot · 2023-11-07T23:14:03Z

Coverage Report

Package	Statements	Branches	Functions	Lines
string/base/truncate-code-points	$\color{green}165/165$ $\color{green}+100.00\%$	$\color{green}15/15$ $\color{green}+100.00\%$	$\color{green}1/1$ $\color{green}+100.00\%$	$\color{green}165/165$ $\color{green}+100.00\%$
string/base/truncate-grapheme-clusters	$\color{green}142/142$ $\color{green}+100.00\%$	$\color{green}14/14$ $\color{green}+100.00\%$	$\color{green}1/1$ $\color{green}+100.00\%$	$\color{green}142/142$ $\color{green}+100.00\%$
string/base/truncate	$\color{green}113/113$ $\color{green}+100.00\%$	$\color{green}8/8$ $\color{green}+100.00\%$	$\color{green}1/1$ $\color{green}+100.00\%$	$\color{green}113/113$ $\color{green}+100.00\%$
string/num-code-points	$\color{green}137/137$ $\color{green}+100.00\%$	$\color{green}11/11$ $\color{green}+100.00\%$	$\color{green}1/1$ $\color{green}+100.00\%$	$\color{green}137/137$ $\color{green}+100.00\%$
string/truncate	$\color{green}187/187$ $\color{green}+100.00\%$	$\color{green}22/22$ $\color{green}+100.00\%$	$\color{green}1/1$ $\color{green}+100.00\%$	$\color{green}187/187$ $\color{green}+100.00\%$

The above coverage report was generated for the changes in this PR.

stdlib-bot · 2023-11-07T23:35:50Z

Coverage Report

Package	Statements	Branches	Functions	Lines
string/base/truncate-code-points	$\color{green}165/165$ $\color{green}+100.00\%$	$\color{green}15/15$ $\color{green}+100.00\%$	$\color{green}1/1$ $\color{green}+100.00\%$	$\color{green}165/165$ $\color{green}+100.00\%$
string/base/truncate-grapheme-clusters	$\color{green}142/142$ $\color{green}+100.00\%$	$\color{green}14/14$ $\color{green}+100.00\%$	$\color{green}1/1$ $\color{green}+100.00\%$	$\color{green}142/142$ $\color{green}+100.00\%$
string/base/truncate	$\color{green}113/113$ $\color{green}+100.00\%$	$\color{green}8/8$ $\color{green}+100.00\%$	$\color{green}1/1$ $\color{green}+100.00\%$	$\color{green}113/113$ $\color{green}+100.00\%$
string/num-code-points	$\color{green}137/137$ $\color{green}+100.00\%$	$\color{green}11/11$ $\color{green}+100.00\%$	$\color{green}1/1$ $\color{green}+100.00\%$	$\color{green}137/137$ $\color{green}+100.00\%$
string/truncate	$\color{green}187/187$ $\color{green}+100.00\%$	$\color{green}22/22$ $\color{green}+100.00\%$	$\color{green}1/1$ $\color{green}+100.00\%$	$\color{green}187/187$ $\color{green}+100.00\%$

The above coverage report was generated for the changes in this PR.

stdlib-bot · 2023-11-07T23:42:30Z

Coverage Report

Package	Statements	Branches	Functions	Lines
string/base/truncate-code-points	$\color{green}165/165$ $\color{green}+100.00\%$	$\color{green}15/15$ $\color{green}+100.00\%$	$\color{green}1/1$ $\color{green}+100.00\%$	$\color{green}165/165$ $\color{green}+100.00\%$
string/base/truncate-grapheme-clusters	$\color{green}142/142$ $\color{green}+100.00\%$	$\color{green}14/14$ $\color{green}+100.00\%$	$\color{green}1/1$ $\color{green}+100.00\%$	$\color{green}142/142$ $\color{green}+100.00\%$
string/base/truncate	$\color{green}113/113$ $\color{green}+100.00\%$	$\color{green}8/8$ $\color{green}+100.00\%$	$\color{green}1/1$ $\color{green}+100.00\%$	$\color{green}113/113$ $\color{green}+100.00\%$
string/num-code-points	$\color{green}137/137$ $\color{green}+100.00\%$	$\color{green}11/11$ $\color{green}+100.00\%$	$\color{green}1/1$ $\color{green}+100.00\%$	$\color{green}137/137$ $\color{green}+100.00\%$
string/truncate	$\color{green}187/187$ $\color{green}+100.00\%$	$\color{green}22/22$ $\color{green}+100.00\%$	$\color{green}1/1$ $\color{green}+100.00\%$	$\color{green}187/187$ $\color{green}+100.00\%$

The above coverage report was generated for the changes in this PR.

stdlib-bot · 2023-11-07T23:55:32Z

Coverage Report

Package	Statements	Branches	Functions	Lines
string/base/truncate-code-points	$\color{green}165/165$ $\color{green}+100.00\%$	$\color{green}15/15$ $\color{green}+100.00\%$	$\color{green}1/1$ $\color{green}+100.00\%$	$\color{green}165/165$ $\color{green}+100.00\%$
string/base/truncate-grapheme-clusters	$\color{green}142/142$ $\color{green}+100.00\%$	$\color{green}14/14$ $\color{green}+100.00\%$	$\color{green}1/1$ $\color{green}+100.00\%$	$\color{green}142/142$ $\color{green}+100.00\%$
string/base/truncate	$\color{green}113/113$ $\color{green}+100.00\%$	$\color{green}8/8$ $\color{green}+100.00\%$	$\color{green}1/1$ $\color{green}+100.00\%$	$\color{green}113/113$ $\color{green}+100.00\%$
string/num-code-points	$\color{green}137/137$ $\color{green}+100.00\%$	$\color{green}11/11$ $\color{green}+100.00\%$	$\color{green}1/1$ $\color{green}+100.00\%$	$\color{green}137/137$ $\color{green}+100.00\%$
string/truncate	$\color{green}187/187$ $\color{green}+100.00\%$	$\color{green}22/22$ $\color{green}+100.00\%$	$\color{green}1/1$ $\color{green}+100.00\%$	$\color{green}187/187$ $\color{green}+100.00\%$

The above coverage report was generated for the changes in this PR.

lib/node_modules/@stdlib/string/base/truncate-code-points/README.md

stdlib-bot · 2023-11-10T16:15:19Z

Coverage Report

Package	Statements	Branches	Functions	Lines
string/base/truncate-code-points	$\color{green}162/162$ $\color{green}+100.00\%$	$\color{green}14/14$ $\color{green}+100.00\%$	$\color{green}1/1$ $\color{green}+100.00\%$	$\color{green}162/162$ $\color{green}+100.00\%$
string/base/truncate-grapheme-clusters	$\color{green}139/139$ $\color{green}+100.00\%$	$\color{green}13/13$ $\color{green}+100.00\%$	$\color{green}1/1$ $\color{green}+100.00\%$	$\color{green}139/139$ $\color{green}+100.00\%$
string/base/truncate	$\color{green}112/112$ $\color{green}+100.00\%$	$\color{green}7/7$ $\color{green}+100.00\%$	$\color{green}1/1$ $\color{green}+100.00\%$	$\color{green}112/112$ $\color{green}+100.00\%$
string/num-code-points	$\color{green}137/137$ $\color{green}+100.00\%$	$\color{green}11/11$ $\color{green}+100.00\%$	$\color{green}1/1$ $\color{green}+100.00\%$	$\color{green}137/137$ $\color{green}+100.00\%$
string/truncate	$\color{green}187/187$ $\color{green}+100.00\%$	$\color{green}22/22$ $\color{green}+100.00\%$	$\color{green}1/1$ $\color{green}+100.00\%$	$\color{green}187/187$ $\color{green}+100.00\%$

The above coverage report was generated for the changes in this PR.

lib/node_modules/@stdlib/string/base/truncate-code-points/README.md

lib/node_modules/@stdlib/string/base/truncate-code-points/benchmark/benchmark.js

lib/node_modules/@stdlib/string/base/truncate-code-points/docs/repl.txt

stdlib-bot · 2023-11-10T21:26:02Z

Coverage Report

Package	Statements	Branches	Functions	Lines
string/base/truncate-code-points	$\color{green}162/162$ $\color{green}+100.00\%$	$\color{green}14/14$ $\color{green}+100.00\%$	$\color{green}1/1$ $\color{green}+100.00\%$	$\color{green}162/162$ $\color{green}+100.00\%$
string/base/truncate-grapheme-clusters	$\color{green}139/139$ $\color{green}+100.00\%$	$\color{green}13/13$ $\color{green}+100.00\%$	$\color{green}1/1$ $\color{green}+100.00\%$	$\color{green}139/139$ $\color{green}+100.00\%$
string/base/truncate	$\color{green}112/112$ $\color{green}+100.00\%$	$\color{green}7/7$ $\color{green}+100.00\%$	$\color{green}1/1$ $\color{green}+100.00\%$	$\color{green}112/112$ $\color{green}+100.00\%$
string/num-code-points	$\color{green}137/137$ $\color{green}+100.00\%$	$\color{green}11/11$ $\color{green}+100.00\%$	$\color{green}1/1$ $\color{green}+100.00\%$	$\color{green}137/137$ $\color{green}+100.00\%$
string/truncate	$\color{green}187/187$ $\color{green}+100.00\%$	$\color{green}22/22$ $\color{green}+100.00\%$	$\color{green}1/1$ $\color{green}+100.00\%$	$\color{green}187/187$ $\color{green}+100.00\%$

The above coverage report was generated for the changes in this PR.

stdlib-bot · 2023-11-10T21:27:05Z

Coverage Report

Package	Statements	Branches	Functions	Lines
string/base/truncate-code-points	$\color{green}162/162$ $\color{green}+100.00\%$	$\color{green}14/14$ $\color{green}+100.00\%$	$\color{green}1/1$ $\color{green}+100.00\%$	$\color{green}162/162$ $\color{green}+100.00\%$
string/base/truncate-grapheme-clusters	$\color{green}139/139$ $\color{green}+100.00\%$	$\color{green}13/13$ $\color{green}+100.00\%$	$\color{green}1/1$ $\color{green}+100.00\%$	$\color{green}139/139$ $\color{green}+100.00\%$
string/base/truncate	$\color{green}112/112$ $\color{green}+100.00\%$	$\color{green}7/7$ $\color{green}+100.00\%$	$\color{green}1/1$ $\color{green}+100.00\%$	$\color{green}112/112$ $\color{green}+100.00\%$
string/num-code-points	$\color{green}137/137$ $\color{green}+100.00\%$	$\color{green}11/11$ $\color{green}+100.00\%$	$\color{green}1/1$ $\color{green}+100.00\%$	$\color{green}137/137$ $\color{green}+100.00\%$
string/truncate	$\color{green}187/187$ $\color{green}+100.00\%$	$\color{green}22/22$ $\color{green}+100.00\%$	$\color{green}1/1$ $\color{green}+100.00\%$	$\color{green}187/187$ $\color{green}+100.00\%$

The above coverage report was generated for the changes in this PR.

lib/node_modules/@stdlib/string/num-code-points/README.md

stdlib-bot · 2023-11-10T21:47:55Z

Coverage Report

Package	Statements	Branches	Functions	Lines
string/base/truncate-code-points	$\color{green}161/161$ $\color{green}+100.00\%$	$\color{green}14/14$ $\color{green}+100.00\%$	$\color{green}1/1$ $\color{green}+100.00\%$	$\color{green}161/161$ $\color{green}+100.00\%$
string/base/truncate-grapheme-clusters	$\color{green}139/139$ $\color{green}+100.00\%$	$\color{green}13/13$ $\color{green}+100.00\%$	$\color{green}1/1$ $\color{green}+100.00\%$	$\color{green}139/139$ $\color{green}+100.00\%$
string/base/truncate	$\color{green}112/112$ $\color{green}+100.00\%$	$\color{green}7/7$ $\color{green}+100.00\%$	$\color{green}1/1$ $\color{green}+100.00\%$	$\color{green}112/112$ $\color{green}+100.00\%$
string/num-code-points	$\color{green}137/137$ $\color{green}+100.00\%$	$\color{green}11/11$ $\color{green}+100.00\%$	$\color{green}1/1$ $\color{green}+100.00\%$	$\color{green}137/137$ $\color{green}+100.00\%$
string/truncate	$\color{green}187/187$ $\color{green}+100.00\%$	$\color{green}22/22$ $\color{green}+100.00\%$	$\color{green}1/1$ $\color{green}+100.00\%$	$\color{green}187/187$ $\color{green}+100.00\%$

The above coverage report was generated for the changes in this PR.

lib/node_modules/@stdlib/string/num-code-points/benchmark/benchmark.js

kgryte · 2023-11-10T21:51:15Z

lib/node_modules/@stdlib/string/num-code-points/bin/cli

+		}
+		data = data.toString();
+		if ( flags.lines ) {
+			lines = data.split( RE_EOL );


This should be refactored. See https://github.com/stdlib-js/stdlib/blob/30019b1fc159eb3fcf59b182e62ef53997a7675f/lib/node_modules/%40stdlib/string/camelcase/bin/cli.

Please open an issue about num-grapheme-clusters.

lib/node_modules/@stdlib/string/num-code-points/lib/main.js

lib/node_modules/@stdlib/string/num-code-points/test/test.js

stdlib-bot · 2023-11-30T00:00:23Z

Coverage Report

Package	Statements	Branches	Functions	Lines
string/base/truncate-code-points	$\color{green}161/161$ $\color{green}+100.00\%$	$\color{green}14/14$ $\color{green}+100.00\%$	$\color{green}1/1$ $\color{green}+100.00\%$	$\color{green}161/161$ $\color{green}+100.00\%$
string/base/truncate-grapheme-clusters	$\color{green}139/139$ $\color{green}+100.00\%$	$\color{green}13/13$ $\color{green}+100.00\%$	$\color{green}1/1$ $\color{green}+100.00\%$	$\color{green}139/139$ $\color{green}+100.00\%$
string/base/truncate	$\color{green}112/112$ $\color{green}+100.00\%$	$\color{green}7/7$ $\color{green}+100.00\%$	$\color{green}1/1$ $\color{green}+100.00\%$	$\color{green}112/112$ $\color{green}+100.00\%$
string/num-code-points	$\color{green}132/132$ $\color{green}+100.00\%$	$\color{green}11/11$ $\color{green}+100.00\%$	$\color{green}1/1$ $\color{green}+100.00\%$	$\color{green}132/132$ $\color{green}+100.00\%$
string/truncate	$\color{green}186/186$ $\color{green}+100.00\%$	$\color{green}22/22$ $\color{green}+100.00\%$	$\color{green}1/1$ $\color{green}+100.00\%$	$\color{green}186/186$ $\color{green}+100.00\%$

The above coverage report was generated for the changes in this PR.

lib/node_modules/@stdlib/string/base/truncate-code-points/README.md

lib/node_modules/@stdlib/string/base/truncate-code-points/docs/types/index.d.ts

Signed-off-by: Athan <[email protected]>

lib/node_modules/@stdlib/string/base/truncate-code-points/lib/main.js

Signed-off-by: Athan <[email protected]>

kgryte · 2023-12-29T18:11:42Z

lib/node_modules/@stdlib/string/base/truncate-code-points/lib/main.js

+			// Check for an unpaired surrogate at the end of the input string...
+			if ( i === len - endingLength - 1 ) {
+				// We found an unpaired surrogate...
+				break;


@steff456 Given that we break here, does this mean we could end up with a string having fewer than the desired number of Unicode code points?

The comment on L96 says the end of the input string, but this is not precise, as it is at the end of the "eligible" string. There could be a low surrogate at the next code unit, but we don't process that character, and I believe we should be.

Is my understanding correct?

kgryte · 2023-12-29T18:17:34Z

lib/node_modules/@stdlib/string/base/truncate-code-points/test/test.js

+
+	str = '𐒻𐓟𐒻𐓟𐒻𐓟𐒻𐓟𐒻𐓟';
+	len = 5;
+	expected = '𐒻...';


This test case seems odd to me. Isn't the 𐒻 character comprised of a high and low surrogate? If so, that is a single Unicode code point, but the expected output is only 4 code point (visual) characters. Is this the intended behavior?

kgryte · 2023-12-29T18:19:51Z

lib/node_modules/@stdlib/string/base/truncate-code-points/test/test.js

+
+	str = '\uD800𐓟𐒻𐓟𐒻𐓟𐒻𐓟𐒻𐓟';
+	len = 4;
+	expected = '...';


In this case, we are omitting the unpaired surrogate entirely, resulting in a string of 3 Unicode code points, which is less than len. This also doesn't make sense to me.

I would think that we should treat unpaired high surrogates as simply another code unit. In which case, for this test, I'd expect \uD800....

headlessNode · 2025-02-25T20:04:04Z

After a discussion during project office hours on unblocking the #5414, I will split this PR it into separate PRs in order to unblock the related ongoing work.

cc @kgryte

kgryte · 2025-02-25T20:06:19Z

@headlessNode That sounds good! Will be good to move this work forward. During review and merge, I'll be sure to give @steff456 proper attribution.

PR-URL: #5456 Ref: #1062 Ref: #1097 Co-authored-by: Stephannie Jiménez Gacha <[email protected]> Reviewed-by: Athan Reines <[email protected]> Signed-off-by: Muhammad Haris <[email protected]>

feat: add truncate string packages

9a94c69

steff456 requested a review from kgryte September 13, 2023 21:26

steff456 self-assigned this Sep 13, 2023

steff456 mentioned this pull request Sep 4, 2023

Tracking issue: refactor string packages handling grapheme clusters in terms of "base" packages #1062

Open

11 tasks

This was referenced Oct 30, 2023

feat: add next-code-point-index string package #1117

Merged

feat: add truncate-middle-code-point string package #1119

Open

kgryte requested changes Oct 31, 2023

View reviewed changes

fix: rename base packages

c85a6df

docs: fix readme import

9a428dc

docs: fix readme example import

e306508

docs: add missing import in readme example

31cb844

kgryte reviewed Nov 8, 2023

View reviewed changes

lib/node_modules/@stdlib/string/base/truncate-code-points/README.md Outdated Show resolved Hide resolved

kgryte reviewed Nov 8, 2023

View reviewed changes

lib/node_modules/@stdlib/string/base/truncate-code-points/README.md Outdated Show resolved Hide resolved

steff456 added 2 commits November 10, 2023 10:50

docs: update truncate-code-points description

dbc41cc

fix: remove variadic interfaces in base truncate packages

ed5c70e