Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terminology #448

Open
wants to merge 45 commits into
base: main
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
c43c57a
Basic terminology API
XapaJIaMnu May 9, 2023
06ec79b
reference to the code
XapaJIaMnu May 9, 2023
867fc6c
Update marian with gcc 12
kpu May 9, 2023
9153041
WiP python iface
XapaJIaMnu May 25, 2023
e5977e6
More WiP
XapaJIaMnu May 26, 2023
b1cb3bc
Works except stdin
XapaJIaMnu May 26, 2023
7a93f7c
Python interface
XapaJIaMnu May 26, 2023
5be7b96
Merge branch 'main' into terminology
XapaJIaMnu May 26, 2023
c1a659e
Small fixes, removes pybind submodule
XapaJIaMnu May 26, 2023
1f8ba76
Allow dictionary maps. Work in progress
XapaJIaMnu May 26, 2023
cc44014
Convert the map to python map
XapaJIaMnu May 26, 2023
6c7fe75
Allow dictionary terminology set up
XapaJIaMnu May 26, 2023
c586e09
Attempt to install pybind11 for the wheel build
XapaJIaMnu May 26, 2023
26529dc
Merge branch 'main' into terminology
XapaJIaMnu Jun 6, 2023
82cc687
Add support for different terminology format
XapaJIaMnu Jun 13, 2023
5c9161b
Try to update the workflows.
XapaJIaMnu Jun 14, 2023
7d6f4e5
Refactor terminology replace
jelmervdl Jun 15, 2023
f53879d
Fix formatting
jelmervdl Jun 15, 2023
a95001d
Update marian dev which should allow for compilation on newer platforms
XapaJIaMnu Jun 18, 2023
316c5dd
Fix for latest argparse
XapaJIaMnu Jun 28, 2023
58e5363
technology -> terminology
kpu Jun 28, 2023
0a6be45
Buffer input for efficiency
kpu Jun 28, 2023
ca37e8f
Pass terminology_form from CLI to Translator
graemenail Jul 4, 2023
4011f88
Leave USE_STATIC_LIBS off by default
kpu Jul 9, 2023
19ca40d
Enable cuda compilation
XapaJIaMnu Aug 1, 2023
1a8b90c
Merge branch 'main' into terminology
XapaJIaMnu Aug 1, 2023
1e80e79
Working, except in python
XapaJIaMnu Aug 2, 2023
3d37edf
Simplify invocation a bit
XapaJIaMnu Aug 2, 2023
e5d4ed0
Formatting fixes
XapaJIaMnu Aug 2, 2023
72ade1d
Update the terminology format
XapaJIaMnu Aug 4, 2023
5f9858f
Merge branch 'main' into terminology
XapaJIaMnu Aug 8, 2023
168d589
Use 0 GPU workers by default
XapaJIaMnu Aug 9, 2023
3eab045
Attempt to fix tests
XapaJIaMnu Aug 9, 2023
88e7f28
Fix error in workflow syntax
XapaJIaMnu Aug 9, 2023
1db9d09
Fix typing error
XapaJIaMnu Aug 9, 2023
537f4e1
I hate python linters
XapaJIaMnu Aug 9, 2023
042acc2
pytype can't access C++ modules
XapaJIaMnu Aug 9, 2023
e3b4a7c
Small fixes
XapaJIaMnu Aug 11, 2023
05a7379
Merge branch 'main' into terminology
XapaJIaMnu Oct 2, 2023
5479c20
Merge with main
XapaJIaMnu Oct 2, 2023
d2356a6
Merge branch 'main' into terminology
kpu Dec 7, 2023
97c8da4
Pull in submodule fixing clang compilation
kpu Dec 7, 2023
095d602
Update marian-dev with newer fbgemm for clang
kpu Dec 7, 2023
007b578
Merge branch 'main' into terminology
kpu Dec 7, 2023
2417225
Merge branch 'main' into terminology
kpu Dec 7, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Small fixes, removes pybind submodule
  • Loading branch information
XapaJIaMnu committed May 26, 2023

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
commit c1a659e1ff794d513784f9069ec787369ab216ff
3 changes: 0 additions & 3 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -7,6 +7,3 @@
[submodule "bergamot-translator-tests"]
path = bergamot-translator-tests
url = https://github.com/browsermt/bergamot-translator-tests
[submodule "3rd_party/pybind11"]
path = 3rd_party/pybind11
url = https://github.com/pybind/pybind11.git
4 changes: 0 additions & 4 deletions 3rd_party/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -30,7 +30,3 @@ get_directory_property(CMAKE_C_FLAGS DIRECTORY marian-dev DEFINITION CMAKE_C_FLA
get_directory_property(CMAKE_CXX_FLAGS DIRECTORY marian-dev DEFINITION CMAKE_CXX_FLAGS)
set(CMAKE_C_FLAGS ${CMAKE_C_FLAGS} PARENT_SCOPE)
set(CMAKE_CXX_FLAGS ${CMAKE_CXX_FLAGS} PARENT_SCOPE)

#if(COMPILE_PYTHON)
# add_subdirectory(pybind11)
#endif(COMPILE_PYTHON)
1 change: 0 additions & 1 deletion 3rd_party/pybind11
Submodule pybind11 deleted from 9ec112
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -135,5 +135,5 @@ Bergamot translator interfacing with the C++ code.

translator = Translator("/path/to/model.npz.best-bleu.npz.decoder.brg.yml", terminology="/path/to/terminology.tsv")
translator.translate(["text"])
output
[output]
```
47 changes: 22 additions & 25 deletions bindings/python/translator.py
Original file line number Diff line number Diff line change
@@ -19,11 +19,11 @@ class Translator:
_responseOpts What to include in the response (alignment, html restoration, etc..)
_service The translation service
"""
num_workers: int
cache: int
logging: str
terminology: str
force_terminology: bool
_num_workers: int
_cache: int
_logging: str
_terminology: str
_force_terminology: bool

_config: bergamot.ServiceConfig
_model: bergamot.TranslationModel
@@ -41,13 +41,13 @@ def __init__(self, model_conifg_path: str, num_workers: int=1, cache: int=0, \
:param terminology: Path to terminology file, TSV format
:param force_terminology: Force terminology to appear on the target side. May impact translation quality.
"""
self.num_workers = num_workers
self.cache = cache
self.logging = logging
self.terminology = terminology
self.force_terminology = force_terminology
self._num_workers = num_workers
self._cache = cache
self._logging = logging
self._terminology = terminology
self._force_terminology = force_terminology

self._config = bergamot.ServiceConfig(self.num_workers, self.cache, self.logging, self.terminology, self.force_terminology)
self._config = bergamot.ServiceConfig(self._num_workers, self._cache, self._logging, self._terminology, self._force_terminology)
self._service = bergamot.Service(self._config)
self._responseOpts = bergamot.ResponseOptions() # Default false for all, if we want to enable HTML later, from here
self._model = self._service.modelFromConfigPath(model_conifg_path)
@@ -58,34 +58,31 @@ def resetTerminology(self, terminology: str="", force_terminology: bool=False) -
:param force_terminology: force terminology
:return: None
"""
self.terminology = terminology
self.force_terminology = force_terminology
self._config = bergamot.ServiceConfig(self.num_workers, self.cache, self.logging, self.terminology, self.force_terminology)
self._terminology = terminology
self._force_terminology = force_terminology
self._config = bergamot.ServiceConfig(self._num_workers, self._cache, self._logging, self._terminology, self._force_terminology)
self._service = bergamot.Service(self._config)

def resetNumWorkers(self, num_workers) -> None:
"""Resets the number of workers
:param num_workers: number of parallel CPU threads.
:return: None
"""
self.num_workers = num_workers
self._config = bergamot.ServiceConfig(self.num_workers, self.cache, self.logging, self.terminology, self.force_terminology)
self._num_workers = num_workers
self._config = bergamot.ServiceConfig(self._num_workers, self._cache, self._logging, self._terminology, self._force_terminology)
self._service = bergamot.Service(self._config)

def translate(self, sentences: List[str]) -> str:
def translate(self, sentences: List[str]) -> List[str]:
"""Translates a list of strings
:param sentences: A List of strings to be translated.
:return: Translation output.
:return: A list of translation outputs.
"""
responses = self._service.translate(self._model, bergamot.VectorString(sentences), self._responseOpts)
ret = ""
for response in responses:
ret = ret + response.target.text
return ret
return [response.target.text for response in responses]
#@TODO add async translate with futures

def main():
parser = argparse.ArgumentParser(description="bergamot-translator interfance")
parser = argparse.ArgumentParser(description="bergamot-translator interface")
parser.add_argument("--config", '-c', required=True, type=str, help='Model YML configuration input.')
parser.add_argument("--num-workers", '-n', type=int, default=1, help='Number of CPU workers.')
parser.add_argument("--logging", '-l', type=str, default="off", help='Set verbosity level of logging: trace, debug, info, warn, err(or), critical, off. Default is off')
@@ -100,10 +97,10 @@ def main():
if args.path_to_input is not None:
with open(args.path_to_input, 'r', encoding='utf-8') as infile:
lines = infile.readlines()
print(translator.translate(lines))
print("".join(translator.translate(lines)))
else:
for line in stdin:
print(translator.translate([line.strip()]))
print("".join(translator.translate([line.strip()])))

if __name__ == '__main__':
main()