Wikipedia DPR data

This directory contains scripts download and process the Wikipedia DPR data, created by Facebook Research. Although, the code is fully functional detailed instructions are not available yet. In a nutshell, one needs to

Download data: passages and queries. We suggest placing them in a collection sub-directory such as download
Each DPR dataset comes with the training set, which we split into three subsets: bitext (regular training data), dev (development), and train_fusion (a set to learn a fusion model). Splitting and processing the queries can be done using the following script.
Finally passages, need to be converted using convert_pass.py.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Wikipedia DPR data

Files

README.md

Latest commit

History

README.md

File metadata and controls

Wikipedia DPR data