-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wip/translation api #806
base: master
Are you sure you want to change the base?
Wip/translation api #806
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots of comments, some changes necessary.
lib/prediction/localparserclient.ts
Outdated
@@ -262,4 +279,24 @@ export default class LocalParserClient { | |||
}; | |||
}); | |||
} | |||
|
|||
async translateUtterance(input : string[], contextEntities : EntityMap|undefined, translationOptions : Record<string, any>) : Promise<GenerationResult[]> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
contextEntities
contains only the entities that are identified by the tokenizer (dates, times, hashtags, urls, etc). Do you need all other entities?
Also, the LocalParserClient has a locale property: how do you handle that?
I suppose the locale will be the target language?
We should add a source language parameter as well.
Also, we should make an allowlist of valid options and their type, rather than allowing everything, because some option might be unsafe to pass down to genienlp and cause us to crash. And we should validate the options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
contextEntities contains only the entities that are identified by the tokenizer (dates, times, hashtags, urls, etc). Do you need all other entities?
I thought it was human-provided. In that case, I can add another parameter e.g. "entities" which contains the strings user wants to preserve from input in the output translation.
Also, the LocalParserClient has a locale property: how do you handle that?
Right now in genie, there's only one locale and is used for both input and output. We need to change that behavior in multiple files. I suggest differing that to another PR that just does this. Right now translateUtterance doesn't go through the rest of the code which uses genie's internal locale.
Also, we should make an allowlist of valid options and their type, rather than allowing everything, because some option might be unsafe to pass down to genienlp and cause us to crash. And we should validate the options.
Yeah should we do this in genienlp? Cause we need to guard against direct HTTP requests not attempted from genie too.
3190b84
to
367e922
Compare
lib/prediction/remoteparserclient.ts
Outdated
|
||
const data = { | ||
input: input.join(' '), | ||
tgt_locale: translationOptions.tgt_locale, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tgt_locale is the same as this.locale right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remoteClient is instantiated once with the internal locale and doesn't change. tgt_locale changes with every request so should read from the options.
- fix syntax - add transaltion interface for remoteParserClient
457507e
to
df6dcbc
Compare
do_alignment ?: boolean | ||
align_preserve_input_quotation ?: boolean | ||
align_remove_output_quotation ?: boolean | ||
translate_example_split ?: boolean |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need so many options? Seriously, let's cut this down to nothing, and we hardcode whatever is meaningful for Genie.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ignoring "yagni" here cause I foresee using translation api for translating po and other stuff too so it's better to add support for modifying all generation args in genienlp right now once for all.
This is just the interface. In (local|remote)_predictor and server I changed it to read only the necessary options.
lib/prediction/localparserclient.ts
Outdated
|
||
async translateUtterance(input : string[], entities : string[]|undefined, generationOptions : GENERATION_OPTIONS) : Promise<GenerationResult[]> { | ||
input = Utils.qpisEntities(input, entities); | ||
const candidates = await this._predictor.predict('', input.join(' '), undefined, TRANSLATION_TASK, 'id-null', generationOptions); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is passing the input as question. I thought our tests showed that you can't do that, and you need to pass it as context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed this in genienlp. It can be passed as either now.
- declare a proper type for generation arguments that user can override when calling the genienlp parser - some refactoring for better reading
df6dcbc
to
417d7d2
Compare
@gcampax Also is there anything else here ? Can we merge it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we even need this PR, given we're not doing translation at test time?
I think having a translation endpoint wouldn't hurt even if we don't use it immediately. |
The problem is that once you merge an API, you have to support it forever (until the next major API break). You have to compare the small amount of work you did so far, with the future amount of work. |
No description provided.