Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quote Engine Non-CDN Retrieval #48

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions tests/test_commands.py
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,7 @@ def test_with_multiword_term(self, mock_commands):

def test_quote():
with patch('tululbot.commands.quote_engine') as mock_engine:
mock_engine.refresh_cache_if_applicable.return_value = False
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this needed? the return value is not used right?

mock_engine.retrieve_random.return_value = 'some random quote'
rv = quote()
assert rv == 'some random quote'
Expand Down
124 changes: 124 additions & 0 deletions tests/test_quoteengine.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
from datetime import datetime
from unittest.mock import patch
from unittest.mock import MagicMock

import pytest

from tululbot.utils import QuoteEngine


class FakeResponse:
def get_json(self):
return self.json
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the value of self.json? you don't create a constructor for this class so I suspect this must be None. is this expected?


pass
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this here?



@pytest.fixture
def qe():
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please use a more descriptive name? it's not obvious what qe stands for.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

before that, why did you put this as fixture? as I understand it, this is your system under test. shouldn't you explicitly instantiate QuoteEngine inside your test case and not provide this as fixture? this is not going to be used anywhere else (at least for now).

return QuoteEngine()


def cdn_with_branch(branch):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please add a verb in the function name? this is minor but a good practice nonetheless.

return 'https://cdn.rawgit.com/tulul/tulul-quotes/{}/quote.yaml'.format(branch) # noqa


def test_refresh_time(qe):
with patch('tululbot.utils.TimeHelper') as mock_dt:
with patch('tululbot.utils.requests') as mock_requests:
response1 = FakeResponse()
response1.status_code = 200
response1.json = {
"commit": {
"sha": "beefbeef"
}
}

response1.ok = True
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. so you're assigning the properties here. can you please put them as instance variable and pass them to the constructor instead? that way it's easier to understand what a fake response has.

in general I'm not a fan of dynamically attaching properties like this. often times this leads to confusion like "where the hell this property comes from if it's not defined inside the class definition?"


mock_requests.get.return_value = response1

qe._refresh_interval = 5
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changing this into instance variable is probably a good idea. I can't see why refresh interval should be private. or do you have reason for this?


mock_dt.now.return_value = datetime(2009, 1, 6, 15, 8, 30)
qe._last_refresh = datetime(2000, 1, 6, 15, 8, 24)

assert qe._quote_url == cdn_with_branch("master")

result = qe.refresh_url_if_applicable()
assert result is True
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use assert result or assert not result. is is used for comparison with None.

result = qe.refresh_url_if_applicable()
assert result is False
result = qe.refresh_url_if_applicable()
assert result is False

assert qe._quote_url == cdn_with_branch("beefbeef")

qe._last_refresh = datetime(2000, 1, 6, 15, 8, 24)

result = qe.refresh_url_if_applicable()
assert result is True
result = qe.refresh_url_if_applicable()
assert result is False


def test_refresh_response(qe):
with patch('tululbot.utils.requests') as mock_requests:

# Test: stay empty if no need to refresh
qe.refresh_url_if_applicable = MagicMock()
qe.refresh_url_if_applicable.return_value = False

qe.refresh_cache_if_applicable()

assert qe._cache == []

# Test: should refresh
qe.refresh_url_if_applicable.return_value = True
mock_requests.get.return_value = generic_quote_1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you put generic_quote definition inside this test case? it's not used anywhere else right?

qe.refresh_cache_if_applicable()
result = qe.retrieve_random()
assert result == 'Yuzu kok lucu banget sih - Waza, Pejuang'

# Test: Remote content changed but no need to refresh
qe.refresh_url_if_applicable.return_value = False
mock_requests.get.return_value = generic_quote_2
qe.refresh_cache_if_applicable()
result = qe.retrieve_random()
assert result == 'Yuzu kok lucu banget sih - Waza, Pejuang'

# Test: Remote content changed and it does refresh
qe.refresh_url_if_applicable.return_value = True
mock_requests.get.return_value = generic_quote_2
qe.refresh_cache_if_applicable()
result = qe.retrieve_random()
assert result == 'Miki Hoshii is the best girl - Waza, Rocket Builder'


generic_quote_1 = FakeResponse()
generic_quote_1.text = '''
---
quotes: \n
- q_no: 1
quote: "Yuzu kok lucu banget sih"
author: Waza
author_bio: Pejuang
tags:
- cinta
- sahur
- ramadhan
'''

generic_quote_2 = FakeResponse()
generic_quote_2.text = '''
---
quotes:
- q_no: 3
quote: "Miki Hoshii is the best girl"
author: Waza
author_bio: Rocket Builder
tags:
- cinta
- geek
- matematika
'''
2 changes: 1 addition & 1 deletion tululbot/commands.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,6 @@
dispatcher = CommandDispatcher()

quote_engine = QuoteEngine()
quote_engine.refresh_cache()


@dispatcher.command(r'^/leli (?P<term>.+)$')
Expand Down Expand Up @@ -95,6 +94,7 @@ def search_on_google():

@dispatcher.command(r'^/quote$')
def quote():
quote_engine.refresh_cache_if_applicable()
return quote_engine.retrieve_random()


Expand Down
52 changes: 49 additions & 3 deletions tululbot/utils.py
Original file line number Diff line number Diff line change
@@ -1,28 +1,74 @@
from datetime import datetime
import random

import requests
import yaml


class TimeHelper:

@classmethod
def now():
return datetime.datetime.now()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this class needed? why not simply importing datetime and use .now() directly?



class QuoteEngine:

def __init__(self):
self._quote_url = 'https://cdn.rawgit.com/tulul/tulul-quotes/master/quote.yaml' # noqa
# Note: rawgit does not have 100% uptime, but at
# least they're not throttling us.
self._quote_url = 'https://cdn.rawgit.com/tulul/tulul-quotes/master/quote.yaml' # noqa

# Why not using rawgit's development endpoint? Because
# they can't promise 100% uptime on the development endpoint.
# Meanwhile, production endpoint's uptime is better because
# it is served by MaxCDN
self._git_branch_check_url = 'https://api.github.com/repos/tulul/tulul-quotes/branches/master' # noqa

self._cache = []

# URI refresh per interval in seconds
self._refresh_interval = 5 * 60
# Dummy date. Must be old enough (just to trigger)
# the uri must be refreshed on the first call
self._last_refresh = datetime(2009, 1, 6, 15, 8, 24, 78915)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. this is so you don't have to handle the first call as special call like if self._last_refresh is None. cool.


def retrieve_random(self):
cache = self._cache
return self.format_quote(random.choice(cache))

def format_quote(self, q):
return '{q[quote]} - {q[author]}, {q[author_bio]}'.format(q=q)

def refresh_cache(self):
def refresh_cache_if_applicable(self):
if (not self.refresh_url_if_applicable()):
return False
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this method has to return boolean? I look at the usage and this return value is ignored by the caller anyway.

as a side note, my preference is to separate pure functions (no side-effect) and impure functions. pure functions should return a meaningful value to the caller while impure ones shouldn't. this is like procedures in Pascal. I find this separation makes the functions or methods easier to reason about.


body = requests.get(self._quote_url).text
# What if previosuly we have the cache, but this time
# What if previously we have the cache, but this time
# when we try to get new cache, the network occurs error?
# We will think about "don't refresh if error" later.
self._cache = yaml.load(body)['quotes']

return True

def refresh_url_if_applicable(self):
now = TimeHelper.now()
delta = now - self._last_refresh

if (delta.seconds < self._refresh_interval):
return False

res = requests.get(self._git_branch_check_url)

# Don't care broken request
if (not res.ok):
return False

json = res.get_json()
sha = json['commit']['sha']
self._quote_url = 'https://cdn.rawgit.com/tulul/tulul-quotes/{}/quote.yaml'.format(sha)

self._last_refresh = now

return True
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm now this function has side-effect, but its return value is also needed. quite problematic. how about changing the flow to something like this:

def refresh_cache_if_applicable(self):
    if self.need_refresh():
        self.refresh_url()
        self.refresh_cache()

def need_refresh(self):
    return delta.seconds < self._refresh_interval

def refresh_url(self):
    # do refresh url here and don't return

def refresh_cache(self):
    # do refresh cache here and don't return

this way it's clear that need_refresh is pure while the others aren't.