Wip/translation api #806

Mehrad0711 · 2021-10-05T21:34:44Z

No description provided.

gcampax

Lots of comments, some changes necessary.

lib/prediction/localparserclient.ts

lib/prediction/predictor.ts

gcampax · 2021-10-06T15:52:42Z

lib/prediction/localparserclient.ts

@@ -262,4 +279,24 @@ export default class LocalParserClient {
            };
        });
    }
+
+    async translateUtterance(input : string[], contextEntities : EntityMap|undefined, translationOptions : Record<string, any>) : Promise<GenerationResult[]> {


contextEntities contains only the entities that are identified by the tokenizer (dates, times, hashtags, urls, etc). Do you need all other entities?

Also, the LocalParserClient has a locale property: how do you handle that?
I suppose the locale will be the target language?
We should add a source language parameter as well.

Also, we should make an allowlist of valid options and their type, rather than allowing everything, because some option might be unsafe to pass down to genienlp and cause us to crash. And we should validate the options.

contextEntities contains only the entities that are identified by the tokenizer (dates, times, hashtags, urls, etc). Do you need all other entities?

I thought it was human-provided. In that case, I can add another parameter e.g. "entities" which contains the strings user wants to preserve from input in the output translation.

Also, the LocalParserClient has a locale property: how do you handle that?

Right now in genie, there's only one locale and is used for both input and output. We need to change that behavior in multiple files. I suggest differing that to another PR that just does this. Right now translateUtterance doesn't go through the rest of the code which uses genie's internal locale.

Also, we should make an allowlist of valid options and their type, rather than allowing everything, because some option might be unsafe to pass down to genienlp and cause us to crash. And we should validate the options.

Yeah should we do this in genienlp? Cause we need to guard against direct HTTP requests not attempted from genie too.

lib/prediction/predictor.ts

tool/server.ts

package.json

lib/utils/misc-utils.ts

lib/prediction/remoteparserclient.ts

gcampax · 2021-10-07T01:17:09Z

lib/prediction/remoteparserclient.ts

+
+        const data = {
+            input: input.join(' '),
+            tgt_locale: translationOptions.tgt_locale,


tgt_locale is the same as this.locale right?

remoteClient is instantiated once with the internal locale and doesn't change. tgt_locale changes with every request so should read from the options.

- fix syntax - add transaltion interface for remoteParserClient

lib/prediction/types.ts

gcampax · 2021-10-07T23:08:01Z

lib/prediction/types.ts

+    do_alignment ?: boolean
+    align_preserve_input_quotation ?: boolean
+    align_remove_output_quotation ?: boolean
+    translate_example_split ?: boolean


Why do we need so many options? Seriously, let's cut this down to nothing, and we hardcode whatever is meaningful for Genie.

I'm ignoring "yagni" here cause I foresee using translation api for translating po and other stuff too so it's better to add support for modifying all generation args in genienlp right now once for all.
This is just the interface. In (local|remote)_predictor and server I changed it to read only the necessary options.

gcampax · 2021-10-07T23:09:12Z

lib/prediction/localparserclient.ts

+
+    async translateUtterance(input : string[], entities : string[]|undefined, generationOptions : GENERATION_OPTIONS) : Promise<GenerationResult[]> {
+        input = Utils.qpisEntities(input, entities);
+        const candidates = await this._predictor.predict('', input.join(' '), undefined, TRANSLATION_TASK, 'id-null', generationOptions);


This is passing the input as question. I thought our tests showed that you can't do that, and you need to pass it as context.

I fixed this in genienlp. It can be passed as either now.

- declare a proper type for generation arguments that user can override when calling the genienlp parser - some refactoring for better reading

Mehrad0711 · 2021-10-15T06:26:22Z

@gcampax Also is there anything else here ? Can we merge it?

gcampax

Do we even need this PR, given we're not doing translation at test time?

lib/prediction/remoteparserclient.ts

Mehrad0711 · 2021-10-15T22:46:20Z

Do we even need this PR, given we're not doing translation at test time?

I think having a translation endpoint wouldn't hurt even if we don't use it immediately.
For po translations, I'm planning to move the genienlp calls from makefile to JS code for which the translateUtterance method is useful.
Also we spent so much time on it, we might as well merge it 🤣

gcampax · 2021-10-15T23:04:07Z

The problem is that once you merge an API, you have to support it forever (until the next major API break). You have to compare the small amount of work you did so far, with the future amount of work.

Mehrad0711 marked this pull request as draft October 5, 2021 21:42

Mehrad0711 mentioned this pull request Oct 5, 2021

Wip/translation api stanford-oval/genienlp#211

Merged

gcampax suggested changes Oct 6, 2021

View reviewed changes

Mehrad0711 force-pushed the wip/translation-api branch 3 times, most recently from 3190b84 to 367e922 Compare October 7, 2021 00:08

gcampax reviewed Oct 7, 2021

View reviewed changes

package.json Outdated Show resolved Hide resolved

gcampax suggested changes Oct 7, 2021

View reviewed changes

Mehrad0711 added 2 commits October 7, 2021 10:57

Add support for translation API

dbaa1a1

Address PR comments

58edbfb

- fix syntax - add transaltion interface for remoteParserClient

Mehrad0711 force-pushed the wip/translation-api branch 3 times, most recently from 457507e to df6dcbc Compare October 7, 2021 18:00

gcampax reviewed Oct 7, 2021

View reviewed changes

Address PR comments (2)

417d7d2

- declare a proper type for generation arguments that user can override when calling the genienlp parser - some refactoring for better reading

Mehrad0711 force-pushed the wip/translation-api branch from df6dcbc to 417d7d2 Compare October 8, 2021 00:10

Mehrad0711 marked this pull request as ready for review October 13, 2021 17:26

gcampax suggested changes Oct 15, 2021

View reviewed changes

lib/prediction/remoteparserclient.ts Show resolved Hide resolved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wip/translation api #806

Wip/translation api #806

Mehrad0711 commented Oct 5, 2021

gcampax left a comment

gcampax Oct 6, 2021

Mehrad0711 Oct 6, 2021

gcampax Oct 7, 2021

Mehrad0711 Oct 7, 2021

gcampax Oct 7, 2021

Mehrad0711 Oct 7, 2021 •

edited

Loading

gcampax Oct 7, 2021

Mehrad0711 Oct 7, 2021

Mehrad0711 commented Oct 15, 2021

gcampax left a comment

Mehrad0711 commented Oct 15, 2021

gcampax commented Oct 15, 2021

Wip/translation api #806

Are you sure you want to change the base?

Wip/translation api #806

Conversation

Mehrad0711 commented Oct 5, 2021

gcampax left a comment

Choose a reason for hiding this comment

gcampax Oct 6, 2021

Choose a reason for hiding this comment

Mehrad0711 Oct 6, 2021

Choose a reason for hiding this comment

gcampax Oct 7, 2021

Choose a reason for hiding this comment

Mehrad0711 Oct 7, 2021

Choose a reason for hiding this comment

gcampax Oct 7, 2021

Choose a reason for hiding this comment

Mehrad0711 Oct 7, 2021 • edited Loading

Choose a reason for hiding this comment

gcampax Oct 7, 2021

Choose a reason for hiding this comment

Mehrad0711 Oct 7, 2021

Choose a reason for hiding this comment

Mehrad0711 commented Oct 15, 2021

gcampax left a comment

Choose a reason for hiding this comment

Mehrad0711 commented Oct 15, 2021

gcampax commented Oct 15, 2021

Mehrad0711 Oct 7, 2021 •

edited

Loading