Skip to content

LearningFromEuroparl

alexr edited this page Jan 18, 2013 · 5 revisions

using the pre-computed Europarl intersections

  • provided by Els
  • We apparently don't need to do sentence alignment ourselves!! Convenient.
  • We might want to do word alignment. We're going to have to, at least, figure out which sentences are good training data.

downloading Europarl

alignment

  • TODO(alexr)
  • We're going to have to do this to get training data at all. What's the best/easiest aligner to use on Europarl?

baseline we should try: just run Joshua on the source context...

This would be an interesting argument against taking WSD as a separate task in MT at all; what if we got better results just calling an MT system on the input text? "Oh no, Joshua does better than your carefully-crafted classifiers!"

how to train Joshua

using more data

There's a lot more text in the full Europarl v7 corpus than what we get in the sentence-aligned intersections...

So maybe what we could do is sentence-align all the available text, train on that, get out the best answers that we can, and then if they're not senses that are in the intersection used by Els, get the best sense that is used by Els.