1.1.0 New languages, default cluster setting & default error raising
Updates
- π©π°π³π΄πΈπͺ New Danish, Norwegian and Swedish BitextMining & Classification tasks
AngryTweetsClassification
,BornholmBitextMining
,DKHateClassification
,DalajClassification
,LccSentimentClassification
,NordicLangClassification
,NorwegianParliament
,ScalaDaClassification
,ScalaNbClassification
&ScalaSvClassification
thanks to @KennethEnevoldsen - π©πͺ New German Clustering tasks
BlurbsClusteringP2P
,BlurbsClusteringS2S
,TenKGnadClusteringP2P
&TenKGnadClusteringS2S
thanks to @slvnwhrl - β Change in cluster initialization from
3
to the sklearn recommended default ofauto
. This leads to tiny changes in clustering scores going forward and hence makes this release not backwards-compatible. See here for a discussion. Thanks to @stephantul for this change. - β Errors are now directly raised by default. This behavior can be deactivated by passing a kwarg at evaluation. Previously, they were just written to a
.txt file
. Thanks to @KennethEnevoldsen for introducing this change. - π» Code cleanups thanks to @stephantul @izhx @permutohedra
- π The leaderboard has also improved a lot with new task-based rankings, better caching and many new models
What's Changed
- Fix kNN Multiclass by @Muennighoff in #92
- Fix SemmEval description by @ahoho in #97
- Make inputs always List[str] & call in one by @Muennighoff in #99
- Fix clustering warning by @stephantul in #104
- Fix the extending of language pairs in
MTEB
by @izhx in #106 - Add @Property annotation to description method of AbsTask by @permutohedra in #111
- Add German clustering datasets by @slvnwhrl in #116
- Added support for Scandinavian Languages by @KennethEnevoldsen in #124
- Bump version ID and update PyPI by @KennethEnevoldsen in #128
New Contributors
- @ahoho made their first contribution in #97
- @stephantul made their first contribution in #104
- @izhx made their first contribution in #106
- @permutohedra made their first contribution in #111
- @slvnwhrl made their first contribution in #116
- @KennethEnevoldsen made their first contribution in #124
Full Changelog: 1.0.1...1.1.0