Skip to content

Commit

Permalink
update data
Browse files Browse the repository at this point in the history
  • Loading branch information
actions-user committed Sep 15, 2023
1 parent 151804e commit b84162c
Show file tree
Hide file tree
Showing 3 changed files with 121 additions and 0 deletions.
17 changes: 17 additions & 0 deletions database/database.json
Original file line number Diff line number Diff line change
Expand Up @@ -33258,5 +33258,22 @@
"tags": [
"python"
]
},
"http://arxiv.org/abs/2309.06131": {
"extra-tags": [
"data",
"fine-tuning",
"strategies",
"training",
"annotation"
],
"title": "Annotating Data for Fine-Tuning a Neural Ranker? Current Active Learning Strategies are not Better than Random Selection",
"summary": "Search methods based on Pretrained Language Models (PLM) have demonstrated great effectiveness gains compared to statistical and early neural ranking models. However, fine-tuning PLM-based rankers requires a great amount of annotated training data. Annotating data involves a large manual effort and thus is expensive, especially in domain specific tasks. In this paper we investigate fine-tuning PLM-based rankers under limited training data and budget. We investigate two scenarios: fine-tuning a ranker from scratch, and domain adaptation starting with a ranker already fine-tuned on general data, and continuing fine-tuning on a target dataset. We observe a great variability in effectiveness when fine-tuning on different randomly selected subsets of training data. This suggests that it is possible to achieve effectiveness gains by actively selecting a subset of the training data that has the most positive effect on the rankers. This way, it would be possible to fine-tune effective PLM rankers at a reduced annotation budget. To investigate this, we adapt existing Active Learning (AL) strategies to the task of fine-tuning PLM rankers and investigate their effectiveness, also considering annotation and computational costs. Our extensive analysis shows that AL strategies do not significantly outperform random selection of training subsets in terms of effectiveness. We further find that gains provided by AL strategies come at the expense of more assessments (thus higher annotation costs) and AL strategies underperform random selection when comparing effectiveness given a fixed annotation cost. Our results highlight that ``optimal'' subsets of training data that provide high effectiveness at low annotation cost do exist, but current mainstream AL strategies applied to PLM rankers are not capable of identifying them.",
"date": "2023-09-14",
"tags": [
"computer science - computation and language",
"computer science - information retrieval",
"colbert"
]
}
}
Binary file modified database/pipeline.pkl
Binary file not shown.
104 changes: 104 additions & 0 deletions database/triples.json
Original file line number Diff line number Diff line change
Expand Up @@ -151650,5 +151650,109 @@
{
"head": "tensorflow",
"tail": "learning"
},
{
"head": "computer science - computation and language",
"tail": "colbert"
},
{
"head": "computer science - computation and language",
"tail": "data"
},
{
"head": "computer science - computation and language",
"tail": "fine-tuning"
},
{
"head": "computer science - computation and language",
"tail": "strategies"
},
{
"head": "computer science - computation and language",
"tail": "training"
},
{
"head": "computer science - computation and language",
"tail": "annotation"
},
{
"head": "computer science - information retrieval",
"tail": "colbert"
},
{
"head": "computer science - information retrieval",
"tail": "data"
},
{
"head": "computer science - information retrieval",
"tail": "fine-tuning"
},
{
"head": "computer science - information retrieval",
"tail": "strategies"
},
{
"head": "computer science - information retrieval",
"tail": "training"
},
{
"head": "computer science - information retrieval",
"tail": "annotation"
},
{
"head": "colbert",
"tail": "data"
},
{
"head": "colbert",
"tail": "fine-tuning"
},
{
"head": "colbert",
"tail": "strategies"
},
{
"head": "colbert",
"tail": "training"
},
{
"head": "colbert",
"tail": "annotation"
},
{
"head": "data",
"tail": "fine-tuning"
},
{
"head": "data",
"tail": "strategies"
},
{
"head": "data",
"tail": "training"
},
{
"head": "data",
"tail": "annotation"
},
{
"head": "fine-tuning",
"tail": "strategies"
},
{
"head": "fine-tuning",
"tail": "training"
},
{
"head": "fine-tuning",
"tail": "annotation"
},
{
"head": "strategies",
"tail": "annotation"
},
{
"head": "training",
"tail": "annotation"
}
]

0 comments on commit b84162c

Please sign in to comment.