Skip to content

Commit

Permalink
BM25 tokenizer lowercase (#9745)
Browse files Browse the repository at this point in the history
  • Loading branch information
hatianzhang authored Dec 29, 2023
1 parent 7f32ca4 commit 9c30dbe
Showing 1 changed file with 2 additions and 0 deletions.
2 changes: 2 additions & 0 deletions llama_index/retrievers/bm25_retriever.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@


def tokenize_remove_stopwords(text: str) -> List[str]:
# lowercase and stem words
text = text.lower()
stemmer = PorterStemmer()
words = list(simple_extract_keywords(text))
return [stemmer.stem(word) for word in words]
Expand Down

0 comments on commit 9c30dbe

Please sign in to comment.