Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wip/translation api #806

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Wip/translation api #806

wants to merge 3 commits into from

Conversation

Mehrad0711
Copy link
Member

No description provided.

@Mehrad0711 Mehrad0711 marked this pull request as draft October 5, 2021 21:42
Copy link
Contributor

@gcampax gcampax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of comments, some changes necessary.

lib/prediction/localparserclient.ts Outdated Show resolved Hide resolved
lib/prediction/predictor.ts Outdated Show resolved Hide resolved
@@ -262,4 +279,24 @@ export default class LocalParserClient {
};
});
}

async translateUtterance(input : string[], contextEntities : EntityMap|undefined, translationOptions : Record<string, any>) : Promise<GenerationResult[]> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

contextEntities contains only the entities that are identified by the tokenizer (dates, times, hashtags, urls, etc). Do you need all other entities?

Also, the LocalParserClient has a locale property: how do you handle that?
I suppose the locale will be the target language?
We should add a source language parameter as well.

Also, we should make an allowlist of valid options and their type, rather than allowing everything, because some option might be unsafe to pass down to genienlp and cause us to crash. And we should validate the options.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

contextEntities contains only the entities that are identified by the tokenizer (dates, times, hashtags, urls, etc). Do you need all other entities?

I thought it was human-provided. In that case, I can add another parameter e.g. "entities" which contains the strings user wants to preserve from input in the output translation.

Also, the LocalParserClient has a locale property: how do you handle that?

Right now in genie, there's only one locale and is used for both input and output. We need to change that behavior in multiple files. I suggest differing that to another PR that just does this. Right now translateUtterance doesn't go through the rest of the code which uses genie's internal locale.

Also, we should make an allowlist of valid options and their type, rather than allowing everything, because some option might be unsafe to pass down to genienlp and cause us to crash. And we should validate the options.

Yeah should we do this in genienlp? Cause we need to guard against direct HTTP requests not attempted from genie too.

lib/prediction/predictor.ts Outdated Show resolved Hide resolved
tool/server.ts Show resolved Hide resolved
@Mehrad0711 Mehrad0711 force-pushed the wip/translation-api branch 3 times, most recently from 3190b84 to 367e922 Compare October 7, 2021 00:08
package.json Outdated Show resolved Hide resolved
lib/utils/misc-utils.ts Outdated Show resolved Hide resolved
lib/prediction/remoteparserclient.ts Outdated Show resolved Hide resolved
lib/prediction/remoteparserclient.ts Outdated Show resolved Hide resolved

const data = {
input: input.join(' '),
tgt_locale: translationOptions.tgt_locale,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tgt_locale is the same as this.locale right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remoteClient is instantiated once with the internal locale and doesn't change. tgt_locale changes with every request so should read from the options.

- fix syntax
- add transaltion interface for remoteParserClient
@Mehrad0711 Mehrad0711 force-pushed the wip/translation-api branch 3 times, most recently from 457507e to df6dcbc Compare October 7, 2021 18:00
lib/prediction/types.ts Outdated Show resolved Hide resolved
do_alignment ?: boolean
align_preserve_input_quotation ?: boolean
align_remove_output_quotation ?: boolean
translate_example_split ?: boolean
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need so many options? Seriously, let's cut this down to nothing, and we hardcode whatever is meaningful for Genie.

Copy link
Member Author

@Mehrad0711 Mehrad0711 Oct 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ignoring "yagni" here cause I foresee using translation api for translating po and other stuff too so it's better to add support for modifying all generation args in genienlp right now once for all.
This is just the interface. In (local|remote)_predictor and server I changed it to read only the necessary options.


async translateUtterance(input : string[], entities : string[]|undefined, generationOptions : GENERATION_OPTIONS) : Promise<GenerationResult[]> {
input = Utils.qpisEntities(input, entities);
const candidates = await this._predictor.predict('', input.join(' '), undefined, TRANSLATION_TASK, 'id-null', generationOptions);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is passing the input as question. I thought our tests showed that you can't do that, and you need to pass it as context.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed this in genienlp. It can be passed as either now.

- declare a proper type for generation arguments that user can override when calling the genienlp parser
- some refactoring for better reading
@Mehrad0711 Mehrad0711 force-pushed the wip/translation-api branch from df6dcbc to 417d7d2 Compare October 8, 2021 00:10
@Mehrad0711 Mehrad0711 marked this pull request as ready for review October 13, 2021 17:26
@Mehrad0711
Copy link
Member Author

@gcampax Also is there anything else here ? Can we merge it?

Copy link
Contributor

@gcampax gcampax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we even need this PR, given we're not doing translation at test time?

lib/prediction/remoteparserclient.ts Show resolved Hide resolved
@Mehrad0711
Copy link
Member Author

Do we even need this PR, given we're not doing translation at test time?

I think having a translation endpoint wouldn't hurt even if we don't use it immediately.
For po translations, I'm planning to move the genienlp calls from makefile to JS code for which the translateUtterance method is useful.
Also we spent so much time on it, we might as well merge it 🤣

@gcampax
Copy link
Contributor

gcampax commented Oct 15, 2021

The problem is that once you merge an API, you have to support it forever (until the next major API break). You have to compare the small amount of work you did so far, with the future amount of work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants