-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow when specify word length. #14
Comments
A working code snippet: from math import floor
from time import perf_counter
from random import randint, random
from wonderwords import RandomWord
sorted_nouns = sorted(RandomWord()._categories['nouns'], key=len)
def bisect_by_length(arr: list, target_length: int):
l, r = 0, len(arr) - 1
while l <= r:
m = l + (r - l) // 2
if len(arr[m]) < target_length:
l = m + 1
else:
r = m - 1
return l
def get_word_by_length(min_length: int, max_length: int):
start = bisect_by_length(sorted_nouns, min_length)
end = bisect_by_length(sorted_nouns, max_length + 1) - 1
i = start + floor(random() * (end - start))
return sorted_nouns[i] Evaluate it: def test_get_word_by_length(number: int = 1000):
start_ts = perf_counter()
for _ in range(number):
min_len, max_len = randint(len(sorted_nouns[0]), len(
sorted_nouns[-1])), randint(len(sorted_nouns[0]), len(sorted_nouns[-1]))
if min_len > max_len:
min_len, max_len = max_len, min_len
w = get_word_by_length(min_len, max_len)
if min_len == max_len == 18:
...
# no words in the array has length == 18
# print('min: {}, max: {}, actuall: {}'.format(
# min_len, max_len, len(w)))
else:
assert min_len <= len(w) <= max_len, 'min: {}, max: {}, actuall: {}'.format(
min_len, max_len, len(w))
end_ts = perf_counter()
t = end_ts - start_ts
print(f'{number} loops took {round(t, 4)} seconds')
print(f'each loop took {round(t * 1000 / number, 4)} milliseconds')
test_get_word_by_length(1000000)
# output:
# 1000000 loops took 10.2348 seconds
# each loop took 0.0102 milliseconds |
Thank you for your thorough analysis and contribution. I agree; there is much to optimize.
|
I don't have a good solution for Regex. Regex is not very efficient in certain scenarios. Maybe give it its own API and the user will decide to use it when they don't need to worry about performance. |
Thanks for the advice. I created a new branch where I'm going to push optimizations. Only implementing the bisect method gave astronomically better results. On my computer, it used to go at 5.106667959000333, now timeit shows 0.06474015199637506. |
Implemented the length filter optimization suggested in #14.
I just implemented a trie. Currently, the system is clunky because for |
Implemented the length filter optimization suggested in #14.
Describe the bug
RandomWord is slow.
To Reproduce
run this:
Output:
Expected behavior
A thousand calls should finish in 1 second.
Additional context
I think the code here is not very efficient.
wonderwordsmodule/wonderwords/random_word.py
Line 183 in 80dcdf5
It copies all words from the category, then removing them if they don't match the length criteria.
also, removing an item from an array takes O(n) time where n is the length of the array, making the whole operation O(n^2) where n is the numbers of words in the category.
IMO, using
words = set()
instead ofwords = []
could make it O(n) time, but it's still not fast enough.When just requesting one word, we shouldn't even read all words in the category. It's better to pre-sort words in all categories by length, then use bisect to find
start_index
andend_index
that match the length criteria, and just generate a random integer to access read the specific word. This way the overall time complexity is O(log N).The above approach didn't optimize
start_with
andend_with
, which requires introducingTrie
.The text was updated successfully, but these errors were encountered: