Web of Science Tagged: More robust against trailing space characters. #3053

zoe-translates · 2023-06-15T10:30:21Z

If there are trailing whitespace characters after the EF "tag", the script fails because it interprets "EF" as an unknown tag rather than end of file.

This is fixed by interpreting EF as intended, see the table of export field tags at Clarivate:
https://webofscience.help.clarivate.com/en-us/Content/export-records.htm#mc-dropdown-body6dc16a03-4cb8-43e1-b72a-ff5a3197f8f0

Bug reported by gxy1932582, see
https://forums.zotero.org/discussion/comment/436794/#Comment_436794

If there are trailing whitespace characters after the EF "tag", the script fails because it interprets "EF" as an unknown tag rather than end of file. This is fixed by interpreting EF as intended, see the table of export field tags at Clarivate: https://webofscience.help.clarivate.com/en-us/Content/export-records.htm#mc-dropdown-body6dc16a03-4cb8-43e1-b72a-ff5a3197f8f0 Bug reported by gxy1932582, see https://forums.zotero.org/discussion/comment/436794/#Comment_436794

zoe-translates · 2023-06-15T10:45:31Z

Note that the check failed (timed out) because the Selenium test doesn't run for non-Web translators. (zotero/zotero-connectors#429). The fix is a bit complicated, and it needs to be done in multiple places.

dstillman · 2023-06-15T10:49:26Z

Web of Science Tagged.js

@@ -229,6 +229,11 @@ function doImport(text) {
 	while ((rawLine = Zotero.read()) !== false) {    // until EOF
 		//Z.debug("line: " + rawLine);
 		let split = rawLine.match(/^([A-Z0-9]{2})\s(?:([^\n]*))?/);
+		// EF is equivalent to the end of file.
+		if (/^EF\s*/.test(rawLine)) {


The \s* isn't doing anything here.

Ah, right. I probably meant /^EF\s*$/ which includes EF or EF but not EFF or something. But even this may be unnecessary? Simply terminating on rawLine.startsWith("EF") should be alright? There are no three-letter tags yet.

Yeah, I think that's fine.

At least assuming every line is guaranteed to start with a tag… How are multi-line fields (e.g., multi-line notes) handled?

No reason not to do /^EF\s*$/, I suppose.

How are multi-line fields (e.g., multi-line notes) handled?

The 2nd line in a multi-line value will start with three spaces. Same with "array" values such as list of authors. So a wrapped line in the abstract that happens to start with "EF", or an author name like EFSTATHIOU will not be confused with end-of-file (in a properly formatted file).

dstillman · 2023-06-15T10:49:31Z

Web of Science Tagged.js

@@ -229,6 +229,11 @@ function doImport(text) {
 	while ((rawLine = Zotero.read()) !== false) {    // until EOF
 		//Z.debug("line: " + rawLine);
 		let split = rawLine.match(/^([A-Z0-9]{2})\s(?:([^\n]*))?/);
+		// EF is equivalent to the end of file.
+		if (/^EF\s*/.test(rawLine)) {
+			Z.debug("Encountered EF (equivalent to End of File)");


Comment out

Hah, right, there's already a debug message.

- Remove comment that doesn't add to info. - Make the EF short-circuiting next to the while condition.

zoe-translates · 2023-06-15T11:39:18Z

There's more to this error the user is encountering. There's a lot more ways completeItem() may fail because of missing checks.

zoe-translates · 2023-06-20T04:49:53Z

To be honest, the logic of the existing code is a bit baffling to me. I think it's worthwhile to do first improve the underlying structure of the code before fixing these errors that arose from the lack of robustness of that structure. This is continued in #3062.

I am going to close this one because the effort here has been superseded. Thank you @dstillman for the comments -- those were not in vain, because the ideas will turn out in different ways in the new PR.

dstillman · 2023-07-17T03:13:18Z

trimEnd() isn't available in Zotero 6, so we can't use it yet

cc7c3c7

dstillman reviewed Jun 15, 2023

View reviewed changes

Web of Science Tagged: [Minor] Readability fixes.

7b81808

- Remove comment that doesn't add to info. - Make the EF short-circuiting next to the while condition.

zoe-translates closed this Jun 20, 2023

zoe-translates mentioned this pull request Jun 20, 2023

WoS Tagged: Reimplement the core algorithms. #3062

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Web of Science Tagged: More robust against trailing space characters. #3053

Web of Science Tagged: More robust against trailing space characters. #3053

zoe-translates commented Jun 15, 2023

zoe-translates commented Jun 15, 2023

dstillman Jun 15, 2023

zoe-translates Jun 15, 2023

dstillman Jun 15, 2023

dstillman Jun 15, 2023

dstillman Jun 15, 2023

zoe-translates Jun 15, 2023

dstillman Jun 15, 2023

zoe-translates Jun 15, 2023

zoe-translates commented Jun 15, 2023

zoe-translates commented Jun 20, 2023

dstillman commented Jul 17, 2023

Web of Science Tagged: More robust against trailing space characters. #3053

Web of Science Tagged: More robust against trailing space characters. #3053

Conversation

zoe-translates commented Jun 15, 2023

zoe-translates commented Jun 15, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zoe-translates commented Jun 15, 2023

zoe-translates commented Jun 20, 2023

dstillman commented Jul 17, 2023