Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #8, convert hunspell output to spellchecker JSON format #9

Open
wants to merge 22 commits into
base: master
Choose a base branch
from

Conversation

teners
Copy link

@teners teners commented Aug 10, 2016

No description provided.

@teners teners self-assigned this Aug 10, 2016
done
shift $((OPTIND-1))

if [ "$#" != 1 ]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any sense to surround 1 with double-quotes, to compare string and string?

fixed hunspell options to interpret input as latex file
INFILE=$1

JSON=$(
cat $INFILE |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enquote $INFILE, otherwise you're going have troubles with spaces in the path

@dbarashev
Copy link
Contributor

One interesting question is duplicate keys:
echo "documeent documeent documeent" | hunspell -a

This script will produce JSON with three equals keys. I suggest filtering duplicates with | jq '.'

options_number=split(split_string[2], options, ", ");

for (i = 1; i <= options_number; i++)
{print "\t\t\""options[i]"\","}
Copy link
Contributor

@dbarashev dbarashev Aug 11, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comma after the last element of the array produces invalid JSON, at least from jq perspective.The same applies to the commma after the last entry

root and others added 2 commits August 11, 2016 16:34
fixed it with an additional variable in awk
)

echo $JSON
echo "$JSON" |
sed ':a;N;$!ba;s/,\n}/\n}/g'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can probably be better solved by introducing a global variable which is empty in the beginning and "," after the first use. Use it like a prefix of the serialized array:

print $2 + PREFIX + [

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heck, that's a really good point. Shame on me for such a dummy solution.

}

message Suggestions {
string json = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one of the purposes of protocol buffers is to provide typed interface for data exchange between servers. Please replace this json with typed interface, like map<string, Replacements> suggestions, use it in the spellchecker and remove your own json-serialization

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes a lot of sense, right.

# note: these fields' names doesn't start with "_" since deleting of this
# object isn't so straightforward -- for some reason interpreter destroys
# these "private" fields before __del__ is invoked.
self.parser_lib_ = cdll.LoadLibrary("../libparser/build/libparser.so")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the purpose of the trailing underscores?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, please parameterize this class with the path to the shared library

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I thought I removed them.

Yep, that's what I forgot to do.


class SpellcheckerServicer(spellchecker_pb2.SpellcheckServicer):
"""
gRPC service to check text that comes from stubs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is service client who sends you the text, not a stub. Stub is a purely technical thing which provides typed interface on the client side.

@@ -11,5 +11,9 @@ message Text {
}

message Suggestions {
string json = 1;
message Suggestion {
string key = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need the key anymore

@dbarashev
Copy link
Contributor

[bard@bardtop3 build (tkt_8_convert_hunspell_to_json)]$ make
CMake Error: The source directory "/home/tener/git/papeeria/backend/spellchecker/libparser" does not exist.
Specify --help for usage, or press the help button on the CMake GUI.

@teners
Copy link
Author

teners commented Aug 31, 2016

@dbarashev meh, didn't realize that CMake generates such Makefiles, I'll remove the Makefile. Of course you want to use cmake.

type=str,
metavar="PATH",
help="path to .dic and .aff files")
required.add_argument("-L", "--language",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

language is defined in the request, no?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Request from client? Nope, in the request we ask spellchecker to use those languages and that demand is satisfyed only if spellchecker already has hunspell instance initialized with a dictionaries of that languages.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which means that we're restricted to one language, since this argument is single-valued.

Why don't you initialize hunspell instance with the languagw requested bythe client on demand?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alrighty then, I'll work that out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants