-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue #8, convert hunspell output to spellchecker JSON format #9
base: master
Are you sure you want to change the base?
Conversation
done | ||
shift $((OPTIND-1)) | ||
|
||
if [ "$#" != 1 ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any sense to surround 1
with double-quotes, to compare string and string?
fixed hunspell options to interpret input as latex file
INFILE=$1 | ||
|
||
JSON=$( | ||
cat $INFILE | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enquote $INFILE
, otherwise you're going have troubles with spaces in the path
One interesting question is duplicate keys: This script will produce JSON with three equals keys. I suggest filtering duplicates with |
options_number=split(split_string[2], options, ", "); | ||
|
||
for (i = 1; i <= options_number; i++) | ||
{print "\t\t\""options[i]"\","} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comma after the last element of the array produces invalid JSON, at least from jq
perspective.The same applies to the commma after the last entry
removed "pretty-printing efforts"
fixed it with an additional variable in awk
) | ||
|
||
echo $JSON | ||
echo "$JSON" | | ||
sed ':a;N;$!ba;s/,\n}/\n}/g' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can probably be better solved by introducing a global variable which is empty in the beginning and "," after the first use. Use it like a prefix of the serialized array:
print $2 + PREFIX + [
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heck, that's a really good point. Shame on me for such a dummy solution.
Merge branch 'tkt_8_convert_hunspell_to_json' of https://github.com/bardsoftware/papeeria into tkt_8_convert_hunspell_to_json
} | ||
|
||
message Suggestions { | ||
string json = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one of the purposes of protocol buffers is to provide typed interface for data exchange between servers. Please replace this json with typed interface, like map<string, Replacements> suggestions
, use it in the spellchecker and remove your own json-serialization
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes a lot of sense, right.
# note: these fields' names doesn't start with "_" since deleting of this | ||
# object isn't so straightforward -- for some reason interpreter destroys | ||
# these "private" fields before __del__ is invoked. | ||
self.parser_lib_ = cdll.LoadLibrary("../libparser/build/libparser.so") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the purpose of the trailing underscores?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, please parameterize this class with the path to the shared library
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I thought I removed them.
Yep, that's what I forgot to do.
|
||
class SpellcheckerServicer(spellchecker_pb2.SpellcheckServicer): | ||
""" | ||
gRPC service to check text that comes from stubs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is service client who sends you the text, not a stub. Stub is a purely technical thing which provides typed interface on the client side.
@@ -11,5 +11,9 @@ message Text { | |||
} | |||
|
|||
message Suggestions { | |||
string json = 1; | |||
message Suggestion { | |||
string key = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need the key anymore
[bard@bardtop3 build (tkt_8_convert_hunspell_to_json)]$ make |
@dbarashev meh, didn't realize that CMake generates such Makefiles, I'll remove the Makefile. Of course you want to use cmake. |
type=str, | ||
metavar="PATH", | ||
help="path to .dic and .aff files") | ||
required.add_argument("-L", "--language", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
language is defined in the request, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Request from client? Nope, in the request we ask spellchecker to use those languages and that demand is satisfyed only if spellchecker already has hunspell instance initialized with a dictionaries of that languages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which means that we're restricted to one language, since this argument is single-valued.
Why don't you initialize hunspell instance with the languagw requested bythe client on demand?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alrighty then, I'll work that out.
No description provided.