-
Notifications
You must be signed in to change notification settings - Fork 774
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Web of Science Tagged: More robust against trailing space characters. #3053
Web of Science Tagged: More robust against trailing space characters. #3053
Conversation
If there are trailing whitespace characters after the EF "tag", the script fails because it interprets "EF" as an unknown tag rather than end of file. This is fixed by interpreting EF as intended, see the table of export field tags at Clarivate: https://webofscience.help.clarivate.com/en-us/Content/export-records.htm#mc-dropdown-body6dc16a03-4cb8-43e1-b72a-ff5a3197f8f0 Bug reported by gxy1932582, see https://forums.zotero.org/discussion/comment/436794/#Comment_436794
Note that the check failed (timed out) because the Selenium test doesn't run for non-Web translators. (zotero/zotero-connectors#429). The fix is a bit complicated, and it needs to be done in multiple places. |
@@ -229,6 +229,11 @@ function doImport(text) { | |||
while ((rawLine = Zotero.read()) !== false) { // until EOF | |||
//Z.debug("line: " + rawLine); | |||
let split = rawLine.match(/^([A-Z0-9]{2})\s(?:([^\n]*))?/); | |||
// EF is equivalent to the end of file. | |||
if (/^EF\s*/.test(rawLine)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The \s*
isn't doing anything here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right. I probably meant /^EF\s*$/
which includes EF
or EF
but not EFF
or something. But even this may be unnecessary? Simply terminating on rawLine.startsWith("EF")
should be alright? There are no three-letter tags yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think that's fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least assuming every line is guaranteed to start with a tag… How are multi-line fields (e.g., multi-line notes) handled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No reason not to do /^EF\s*$/
, I suppose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are multi-line fields (e.g., multi-line notes) handled?
The 2nd line in a multi-line value will start with three spaces. Same with "array" values such as list of authors. So a wrapped line in the abstract that happens to start with "EF", or an author name like EFSTATHIOU
will not be confused with end-of-file (in a properly formatted file).
@@ -229,6 +229,11 @@ function doImport(text) { | |||
while ((rawLine = Zotero.read()) !== false) { // until EOF | |||
//Z.debug("line: " + rawLine); | |||
let split = rawLine.match(/^([A-Z0-9]{2})\s(?:([^\n]*))?/); | |||
// EF is equivalent to the end of file. | |||
if (/^EF\s*/.test(rawLine)) { | |||
Z.debug("Encountered EF (equivalent to End of File)"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hah, right, there's already a debug message.
- Remove comment that doesn't add to info. - Make the EF short-circuiting next to the while condition.
There's more to this error the user is encountering. There's a lot more ways |
To be honest, the logic of the existing code is a bit baffling to me. I think it's worthwhile to do first improve the underlying structure of the code before fixing these errors that arose from the lack of robustness of that structure. This is continued in #3062. I am going to close this one because the effort here has been superseded. Thank you @dstillman for the comments -- those were not in vain, because the ideas will turn out in different ways in the new PR. |
trimEnd() isn't available in Zotero 6, so we can't use it yet |
If there are trailing whitespace characters after the EF "tag", the script fails because it interprets "EF" as an unknown tag rather than end of file.
This is fixed by interpreting EF as intended, see the table of export field tags at Clarivate:
https://webofscience.help.clarivate.com/en-us/Content/export-records.htm#mc-dropdown-body6dc16a03-4cb8-43e1-b72a-ff5a3197f8f0
Bug reported by gxy1932582, see
https://forums.zotero.org/discussion/comment/436794/#Comment_436794