Skip to content
This repository has been archived by the owner on Jun 8, 2020. It is now read-only.

Matecat Segmentation Issues #30

Open
uhallac opened this issue Dec 1, 2018 · 2 comments
Open

Matecat Segmentation Issues #30

uhallac opened this issue Dec 1, 2018 · 2 comments

Comments

@uhallac
Copy link

uhallac commented Dec 1, 2018

We have been experimenting segmentation issues with Matecat, please see the following examples:

  1. When there are dots after numbers inside a sentence. Same in both general and paragraph segmentation most of the time:
  1. When there is a quoted sentence inside the sentence:
    https://www.matecat.com/translate/quotesodt/tr-TR-en-GB/1710109-904edd6b46db#966996041

I'm aware segmentation is not an easy task to accomplish but bad segmentation is causing messed up TMs when translation is done between two syntactically different languages such as Turkish and English. For these 2 segments to be properly reflected in the translated document, we need to improvise with the segment translations as you can see below:
image

I believe the most efficient and easy way to solve this problem is adding the capability to merge multiple segments into one from the UI. At the moment Matecat doesn't let you merge segments unless they were split before. It doesn't seem to be a complex technical task to achieve this. In return it'd have serious benefits. What do you think?

@giusilvano
Copy link
Contributor

Hi @uhallac! I definitely see the problems in these examples you provided, and I will try to prevent them in future releases working on the segmentation rules. I also completely agree that the automatic segmentation can't be perfect all the times because it cannot grasp the "semantic" of the segments, and that the CAT should provide a way to override the automatic segmentation. The MateCat team has this feature already in their backlog, the development is just a matter of time and priorities.

@uhallac
Copy link
Author

uhallac commented Dec 13, 2018

Hello @giusilvano, thank you for the information. It'd be great to see this function added in near future.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants