Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does "termFreq" has the capability to search for the word frequency from two corpora simultaneously? #507

Closed
yua5 opened this issue Apr 6, 2024 · 1 comment

Comments

@yua5
Copy link

yua5 commented Apr 6, 2024

Hello, I'm currently using BlackLab for corpus retrieval. I'd like to know if BlackLab api "termFreq" has the capability to search for the word frequency of the most common words , such as top 20 words in Corpus A and top 20 words Corpus B, and then provide the total word frequency of top 20 words across both corpora (Maybe top 20 words in CorpusA, corpusB and corpusA&B are not same). I once considered separately counting the word frequencies of all words in corpus A and corpus B in my script and then calculating the total word frequency, but I'm concerned that this may slow down my retrieval speed. Thank you very much!

@jan-niestadt
Copy link
Member

BlackLab's search operations always work on a single corpus only.

You can perform two termfreq operations, one on each corpus, at the same time, and then combine the results using a script. Depending on the hardware you're running on, you might run into bottlenecks (I/O, CPU or memory) if you run multiple requests at once.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants