Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Function name fix, using uuid1mc, time format fix, TaskRegistry fix #60

Merged
merged 42 commits into from
Jun 23, 2017
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
152c153
function name fix, using uuid1mc, time format fix
May 2, 2017
039cdbd
deleting keys from registry when deleting a file
May 2, 2017
2f5df3b
Merge branch 'imgee-v2' of github.com:hasgeek/imgee into code-review
May 9, 2017
6b8e07f
restructured TaskRegistry to use Redis Keys with TTL set to 60 seconds
May 9, 2017
221dcde
re-adding nose and coverage to travis file, removed it by mistake
May 9, 2017
ede59b9
using runtests.py file to run tests in travis
May 9, 2017
dfdf09a
reordered functions and better string formating
May 11, 2017
df77ffc
made the redis key expiry time an argument
May 31, 2017
2d7ff92
assigning error handler the right way
May 31, 2017
95dce28
not resizing if the gif image is animated
May 31, 2017
5cead4e
added pillow as dependency
May 31, 2017
d6766a7
fixed conflict
Jun 1, 2017
b5906f9
Merge branch 'imgee-v2' into code-review
Jun 5, 2017
7a5bcb9
using redis pipeline to delete multiple keys
Jun 6, 2017
28e656f
missed one place to use redis pipeline
Jun 6, 2017
83128dd
deleted runtests.py, not needed anymore
Jun 6, 2017
144d4ac
fixed typo
Jun 6, 2017
dc961d6
fixed some import issues
Jun 8, 2017
907c7fe
moved upload directory creation to manage.py
Jun 8, 2017
a1372fb
added some comments
Jun 8, 2017
127aa4c
moved importing app inside the only function that uses it
Jun 8, 2017
e3228e4
Update runtests.sh
Jun 15, 2017
3ca82e9
Update runtests (testing jenkins)
Jun 15, 2017
b581138
testing jenkins
Jun 15, 2017
395b862
removed pinned versions
Jun 15, 2017
85b505c
testing jenkins
Jun 15, 2017
9d17938
test jenkins
Jun 15, 2017
e42e750
testing jenkins
Jun 15, 2017
fd8737c
testing jenkins
Jun 15, 2017
0cc523e
added manage.py init to runtests
Jun 19, 2017
64da948
removed mkdir from travis config
Jun 19, 2017
eeb5b96
trying to fix travis build
Jun 20, 2017
81ccc8d
trying to fix travis build
Jun 20, 2017
7a71a7a
trying to debug travis
Jun 20, 2017
f2b4b26
trying to fix travis build
Jun 20, 2017
154b23c
fixed upload path issue
Jun 21, 2017
e7f46fa
moar fixes to registry, cleanup, moved back to uuid4
Jun 21, 2017
5ceb68c
fixed typo
Jun 21, 2017
bb6a350
setting test environment variable in travis file
Jun 21, 2017
78bd8d3
setting env in existing block, otherwise it gets overwritten
Jun 21, 2017
993287c
moved the init command out of if block
Jun 23, 2017
481701a
rearranged travis config, cleanup of upload directory path
Jun 23, 2017
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,11 +16,11 @@ services:
before_script:
- mkdir imgee/static/test_uploads
script:
- nosetests --with-coverage
- python runtests.py
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change this to a ./runtests.sh. Copy that from hgapp's template. Basically:

#!/bin/sh
export FLASK_ENV="TESTING"
coverage run `which nosetests --with-timer`
coverage report

install:
- sudo apt-get -qq install zlib1g-dev libfreetype6-dev liblcms1-dev libwebp-dev libjpeg-dev libpng-dev libfreetype6-dev libtiff4-dev librsvg2-dev ghostscript imagemagick pandoc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where have these dependencies gone?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line has been removed. Don't we need these libraries anymore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Umm. It's not been removed. They are there.

- sudo apt-get -qq install zlib1g-dev libfreetype6-dev liblcms1-dev libwebp-dev libjpeg-dev libpng-dev libfreetype6-dev libtiff4-dev librsvg2-dev ghostscript imagemagick pandoc

- pip install -r requirements.txt
- pip install nose coverage BeautifulSoup4 Pillow
- pip install nose coverage
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add these to a test_requirements.txt and use pip install -r test_requirements.txt

notifications:
email: false
slack:
Expand Down
8 changes: 3 additions & 5 deletions imgee/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,16 +20,12 @@

from . import models, views
from .models import db
from .tasks import TaskRegistry

registry = TaskRegistry(os.getenv('ENV', 'production'))


def mkdir_p(dirname):
if not os.path.exists(dirname):
os.makedirs(dirname)


# Configure the app
coaster.app.init_app(app)
migrate = Migrate(app, db)
Expand All @@ -48,4 +44,6 @@ def error403(error):
app.config['MEDIA_DOMAIN'] = app.config['MEDIA_DOMAIN'].split(':', 1)[1]
mkdir_p(app.config['UPLOADED_FILES_DEST'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noticed this mkdir call. The should be in manage.py as say manage.py init. It should not be a runtime check.


registry.set_connection()
from .tasks import TaskRegistry

registry = TaskRegistry()
4 changes: 2 additions & 2 deletions imgee/models/stored_file.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,9 +61,9 @@ def extn(self):
def dict_data(self):
return dict(
title=self.title,
uploaded=self.created_at.strftime('%B %d, %Y'),
uploaded=self.created_at.isoformat() + 'Z',
filesize=app.jinja_env.filters['filesizeformat'](self.size),
imgsize='%s x %s' % (self.width, self.height),
imgsize=u'%s×%s px' % (self.width, self.height),
url=url_for('view_image', profile=self.profile.name, image=self.name),
thumb_url=url_for('get_image', image=self.name, size=app.config.get('THUMBNAIL_SIZE'))
)
Expand Down
41 changes: 26 additions & 15 deletions imgee/storage.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,9 @@
from imgee import app
from imgee.models import db, Thumbnail, StoredFile
from imgee.utils import (
newid, guess_extension, get_file_type,
newid, guess_extension, get_file_type, is_animated_gif,
path_for, get_s3_folder, get_s3_bucket,
download_frm_s3, get_width_height, ALLOWED_MIMETYPES,
download_from_s3, get_width_height, ALLOWED_MIMETYPES,
exists_in_s3, THUMBNAIL_COMMANDS
)

Expand All @@ -29,17 +29,23 @@ def get_resized_image(img, size, is_thumbnail=False):
"""
registry = imgee.registry
Copy link
Member

@jace jace Jun 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a circular import. Can you possibly define the registry before other modules are imported, to avoid the circular import?

One way is to move the registry's __init__ code into a distinct init_app method and call that separately after the app is configured (__init__ should also take an optional app parameter and if present call init_app).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, an init_app() function in the tasks.py (where TaskRegistry exists), and initialize the registry there and assign it to let's say app.registry, and call it from imgee/__init__.py so that it can be used from anywhere?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, an init_app method on the class. Not in the module.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the convention in Flask is that if you want to add something to the app, add it under app.extensions (a dictionary). We already have app.extensions['lastuser']. You could do app.extensions['taskregistry']. I will not recommend this however as the task registry is specific to Imgee and is not a shared extension. It's better being referenced as a direct Python import.

img_name = img.name

if img.mimetype == 'image/gif':
# if the gif file is animated, no need to resize
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we do need to resize with animation, but if this is non-trivial, file a ticket for this to be addressed separately. (Preserving animation is higher priority than resizing.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed as #55.

src_path = download_from_s3(img.name + img.extn)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does img.extn include a period? If so, it seems a bit redundant to store a period in the database.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does include the period. and the period is stored in db too. and .extn is a function property https://github.com/hasgeek/imgee/blob/master/imgee/models/stored_file.py#L58

if is_animated_gif(src_path):
return img.name

size_t = parse_size(size)
if (size_t and size_t[0] != img.width and size_t[1] != img.height) or ('thumb_extn' in ALLOWED_MIMETYPES[img.mimetype] and ALLOWED_MIMETYPES[img.mimetype]['thumb_extn'] != img.extn):
w_or_h = or_(Thumbnail.width == size_t[0], Thumbnail.height == size_t[1])
scaled = Thumbnail.query.filter(w_or_h, Thumbnail.stored_file == img).first()
if scaled and exists_in_s3(scaled):
img_name = scaled.name
else:
size = get_fitting_size((img.width, img.height), size_t)
original_size = (img.width, img.height)
size = get_fitting_size(original_size, size_t)
resized_filename = get_resized_filename(img, size)
registry = imgee.registry

try:
if resized_filename in registry:
# file is still being processed
Expand All @@ -48,12 +54,10 @@ def get_resized_image(img, size, is_thumbnail=False):
else:
registry.add(resized_filename)
img_name = resize_and_save(img, size, is_thumbnail=is_thumbnail)
except Exception as e:
# something broke while processing the file
raise e
finally:
# file has been processed, remove from registry
registry.remove(resized_filename)
if resized_filename in registry:
registry.remove(resized_filename)
return img_name


Expand All @@ -74,7 +78,7 @@ def save_file(fp, profile, title=None):

stored_file = save_img_in_db(name=id_, title=title, local_path=local_path,
profile=profile, mimetype=content_type, orig_extn=extn)
s3resp = save_on_s3(img_name, content_type=content_type)
save_on_s3(img_name, content_type=content_type)
return title, stored_file


Expand Down Expand Up @@ -121,7 +125,7 @@ def save_on_s3(filename, remotename='', content_type='', bucket='', folder=''):
headers = {
'Cache-Control': 'max-age=31536000', # 60*60*24*365
'Content-Type': get_file_type(fp, filename),
'Expires': datetime.now() + timedelta(days=365)
'Expires': str(datetime.utcnow() + timedelta(days=365))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this output in the correct format for an Expires header?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does. Passing datetime object like it was before also worked. The boto library converted it to string itself.

Copy link
Member

@jace jace Jun 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, these are S3 headers, not HTTP headers. Makes sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Boto works with datetimes, I'll say just pass that in and don't bother about casting to string.

}
k.set_contents_from_file(fp, policy='public-read', headers=headers)
return filename
Expand All @@ -146,7 +150,7 @@ def parse_size(size):
return tuple(map(int, size))


def get_fitting_size((orig_w, orig_h), size):
def get_fitting_size(original_size, size):
"""
Return the size to fit the image to the box
along the smaller side and preserve aspect ratio.
Expand All @@ -171,6 +175,8 @@ def get_fitting_size((orig_w, orig_h), size):
>>> get_fitting_size((200, 500), (400, 600))
[240, 600]
"""
orig_w, orig_h = original_size

if orig_w == 0 or orig_h == 0:
# this is either a cdr file or a zero width file
# just go with target size
Expand Down Expand Up @@ -216,7 +222,7 @@ def resize_and_save(img, size, is_thumbnail=False):
Get the original image from local disk cache, download it from S3 if it misses.
Resize the image and save resized image on S3 and size details in db.
"""
src_path = download_frm_s3(img.name + img.extn)
src_path = download_from_s3(img.name + img.extn)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the recurring use of img.name + img.extn, this can be a filename property on the model that returns this.


if 'thumb_extn' in ALLOWED_MIMETYPES[img.mimetype]:
format = ALLOWED_MIMETYPES[img.mimetype]['thumb_extn']
Expand Down Expand Up @@ -258,7 +264,8 @@ def resize_img(src, dest, size, mimetype, format, is_thumbnail):

def clean_local_cache(expiry=24):
"""
Remove files from local cache which are NOT accessed in the last `expiry` hours.
Remove files from local cache
which are NOT accessed in the last `expiry` hours.
"""
cache_path = app.config.get('UPLOADED_FILES_DEST')
cache_path = os.path.join(cache_path, '*')
Expand All @@ -275,8 +282,12 @@ def clean_local_cache(expiry=24):
def delete(stored_file, commit=True):
"""
Delete all the thumbnails and images associated with a file, from local cache and S3.
Wait for the upload/resize to complete if queued for the same image.
"""
registry = imgee.registry

# remove from registry
registry.remove_keys_starting_with(stored_file.name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is name a UUID here? Meaning we're removing everything that has that UUID as prefix?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, UUID. It removes all the keys matching imgee:registry:<name>*. So if there are thumbnails of the image being deleted in the registry, they are removed too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be …:<name>:*? The lack of a separator is asking for future accidents.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I'm sorry. it removes all keys matching imgee:registry:default:<name>*.

File names are <name>.<extn> for original file and <name>_wNN_hNN.<extn> for thumbnails. the redis keys dont have extensions. just the <name> part. so they are like imgee:registry:default:<name> or imgee:registry:default:<name>_wNN_hNN, We're deleting keys starting with <name> here.

in __init__() function,

def init_app(self, name='default', connection=None):
        self.connection = connection or self.set_connection_from_url()
        self.name = name
        self.key_prefix = 'imgee:registry:%s' % name
        ...

The <name> in that key_prefix is the name of the registry, which is set to default. i'm not sure why that name is there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of deleting <name>*, we could do two separate deletes for <name> and <name>_* as a safety measure. Same with filesystem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default name makes sense because it's always possible to have two registries. We just don't have a need for that right now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. I'll do that.


# remove locally
cache_path = app.config.get('UPLOADED_FILES_DEST')
cached_img_path = os.path.join(cache_path, '%s*' % stored_file.name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise here. Shouldn't there be an underscore separator?

Expand Down
64 changes: 45 additions & 19 deletions imgee/tasks.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,32 +4,58 @@

class TaskRegistry(object):
def __init__(self, name='default', connection=None):
self.connection = redis.from_url(connection) if connection else None
self.connection = connection or self.set_connection_from_url()
self.name = name
self.key = 'imgee:registry:%s' % name
self.key_prefix = 'imgee:registry:%s' % name

def set_connection(self, connection=None):
connection = connection or app.config.get('REDIS_URL')
self.connection = redis.from_url(connection)
def set_connection_from_url(self, url=None):
url = url or app.config.get('REDIS_URL')
self.connection = redis.from_url(url)
return self.connection
Copy link
Member

@jace jace Jun 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't return it if you're setting it under self. This entire method's code could be inside the init_app method.


def add(self, taskid):
self.connection.sadd(self.key, taskid)

def remove(self, taskid):
self.connection.srem(self.key, taskid)

def remove_all(self):
for k in self.get_all_keys():
self.remove(k)
def _make_key(self, taskid):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method should be named key_for. Use verbs like make to indicate state changing methods, and get for non-state changing but heavy processes. Since this is neither, it doesn't need a verb.

return "{key_prefix}:{taskid}".format(key_prefix=self.key_prefix, taskid=taskid)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Single quotes and unicode string, as .format always enforces the type that it's being used with.


def __contains__(self, taskid):
return self.connection.sismember(self.key, taskid)
return len(self.connection.keys(self._make_key(taskid))) > 0

def keys_starting_with(self, query):
return filter(lambda k: k.startswith(query), self.connection.smembers(self.key))
def add(self, taskid, expire=60):
self.connection.set(self._make_key(taskid), taskid)
# setting TTL of 60 seconds for the key
# if the file doesn't get processed within 60 seconds,
# it'll be removed from the eregistry
self.connection.expire(self._make_key(taskid), expire)

def search(self, query):
return filter(lambda k: str(query) in k, self.connection.smembers(self.key))
# >> KEYS imgee:registry:default:*query*
return self.connection.keys(self._make_key("*{}*".format(query)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Single quotes, and please ensure there's validation for whatever this query is.


def get_all_keys(self):
return list(self.connection.smembers(self.key))
# >> KEYS imgee:registry:default:*
return self.connection.keys(self._make_key("*"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Single quotes


def keys_starting_with(self, query):
# >> KEYS imgee:registry:default:query*
return self.connection.keys(self._make_key("{}*".format(query)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Single quotes and query validation.


def remove(self, taskid):
# remove a key with the taskid
self.remove_by_key(self._make_key(taskid))

def remove_by_key(self, key):
# remove a redis key directly
self.connection.delete(key)

def remove_keys_matching(self, query):
keys = self.search(query)
for k in keys:
self.remove_by_key(k)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a Redis pipe and push all commands at once, unless there's a way to do a wildcard remove directly in Redis.


def remove_keys_starting_with(self, query):
keys = self.keys_starting_with(query)
for k in keys:
self.remove_by_key(k)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a pipe.


def remove_all(self):
for k in self.get_all_keys():
self.remove_by_key(k)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a pipe

17 changes: 14 additions & 3 deletions imgee/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,16 @@
import os.path
from subprocess import check_output, CalledProcessError
from urlparse import urljoin
from uuid import uuid4

from boto import connect_s3
from boto.s3.bucket import Bucket
from boto.s3.key import Key
import defusedxml.cElementTree as elementtree
from flask import request
import magic
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If magic is a Python standard lib, it should be imported above, before third party libs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's from a 3rd party lib python-magic. https://github.com/ahupp/python-magic

from PIL import Image

from coaster.utils import uuid1mc
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've switched back from uuid1mc to uuid4 as of hasgeek/coaster#120.

import imgee
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this import for?

Copy link
Contributor Author

@iambibhas iambibhas Jun 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imgee.storage.get_resized_image() was being used somewhere below. Fixing by just importing that function instead of the whole module.

from imgee import app

Expand Down Expand Up @@ -84,7 +85,7 @@


def newid():
return unicode(uuid4().hex)
return unicode(uuid1mc().hex)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Back to uuid4



def get_media_domain(scheme=None):
Expand Down Expand Up @@ -131,6 +132,16 @@ def is_svg(fp):
return tag == '{http://www.w3.org/2000/svg}svg'


def is_animated_gif(local_path):
is_animated = True
gif = Image.open(local_path)
try:
gif.seek(1)
except EOFError:
is_animated = False
return is_animated


def get_file_type(fp, filename=None):
fp.seek(0)
data = fp.read()
Expand Down Expand Up @@ -186,7 +197,7 @@ def exists_in_s3(thumb):
return True


def download_frm_s3(img_name):
def download_from_s3(img_name):
local_path = path_for(img_name)
if not os.path.exists(local_path):
bucket = get_s3_bucket()
Expand Down
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,4 @@ xml4h
python-magic
defusedxml
Flask-Migrate
pillow