Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Function name fix, using uuid1mc, time format fix, TaskRegistry fix #60

Merged
merged 42 commits into from
Jun 23, 2017
Merged
Show file tree
Hide file tree
Changes from 41 commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
152c153
function name fix, using uuid1mc, time format fix
May 2, 2017
039cdbd
deleting keys from registry when deleting a file
May 2, 2017
2f5df3b
Merge branch 'imgee-v2' of github.com:hasgeek/imgee into code-review
May 9, 2017
6b8e07f
restructured TaskRegistry to use Redis Keys with TTL set to 60 seconds
May 9, 2017
221dcde
re-adding nose and coverage to travis file, removed it by mistake
May 9, 2017
ede59b9
using runtests.py file to run tests in travis
May 9, 2017
dfdf09a
reordered functions and better string formating
May 11, 2017
df77ffc
made the redis key expiry time an argument
May 31, 2017
2d7ff92
assigning error handler the right way
May 31, 2017
95dce28
not resizing if the gif image is animated
May 31, 2017
5cead4e
added pillow as dependency
May 31, 2017
d6766a7
fixed conflict
Jun 1, 2017
b5906f9
Merge branch 'imgee-v2' into code-review
Jun 5, 2017
7a5bcb9
using redis pipeline to delete multiple keys
Jun 6, 2017
28e656f
missed one place to use redis pipeline
Jun 6, 2017
83128dd
deleted runtests.py, not needed anymore
Jun 6, 2017
144d4ac
fixed typo
Jun 6, 2017
dc961d6
fixed some import issues
Jun 8, 2017
907c7fe
moved upload directory creation to manage.py
Jun 8, 2017
a1372fb
added some comments
Jun 8, 2017
127aa4c
moved importing app inside the only function that uses it
Jun 8, 2017
e3228e4
Update runtests.sh
Jun 15, 2017
3ca82e9
Update runtests (testing jenkins)
Jun 15, 2017
b581138
testing jenkins
Jun 15, 2017
395b862
removed pinned versions
Jun 15, 2017
85b505c
testing jenkins
Jun 15, 2017
9d17938
test jenkins
Jun 15, 2017
e42e750
testing jenkins
Jun 15, 2017
fd8737c
testing jenkins
Jun 15, 2017
0cc523e
added manage.py init to runtests
Jun 19, 2017
64da948
removed mkdir from travis config
Jun 19, 2017
eeb5b96
trying to fix travis build
Jun 20, 2017
81ccc8d
trying to fix travis build
Jun 20, 2017
7a71a7a
trying to debug travis
Jun 20, 2017
f2b4b26
trying to fix travis build
Jun 20, 2017
154b23c
fixed upload path issue
Jun 21, 2017
e7f46fa
moar fixes to registry, cleanup, moved back to uuid4
Jun 21, 2017
5ceb68c
fixed typo
Jun 21, 2017
bb6a350
setting test environment variable in travis file
Jun 21, 2017
78bd8d3
setting env in existing block, otherwise it gets overwritten
Jun 21, 2017
993287c
moved the init command out of if block
Jun 23, 2017
481701a
rearranged travis config, cleanup of upload directory path
Jun 23, 2017
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 8 additions & 4 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,19 +8,23 @@ env:
SQ9lTuSD5jg3NSM8yMfMU58ppAYBgd+6VQP01wUPFrV/JSsbfgnVjMTm/Nbk
EGeHYvQUiyAg7zM4KdNJr6txj+jBE8MAeh7EwYNHoh9B7Vx//GxmXFnWyjXV
9cJkFroDW1Zfs2SZjLtzMQC8YXE30jmMxg+XHCQewKzRa4u1320=
# this is already being set in runtests.sh,
# but need to set it for `./manage.py init` too.
# could set it before running manage.py, but chose to put it here
- FLASK_ENV=TESTING
language: python
python:
- 2.7
services:
- redis-server
before_script:
- mkdir imgee/static/test_uploads
- ./manage.py init # creates the test upload directory
script:
- nosetests --with-coverage
- ./runtests.sh
install:
- sudo apt-get -qq install zlib1g-dev libfreetype6-dev liblcms1-dev libwebp-dev libjpeg-dev libpng-dev libfreetype6-dev libtiff4-dev librsvg2-dev ghostscript imagemagick pandoc
- sudo apt-get -qq install zlib1g-dev libfreetype6-dev liblcms1-dev libwebp-dev libjpeg-dev libpng-dev libfreetype6-dev libtiff4-dev librsvg2-dev ghostscript imagemagick pandoc inkscape
- pip install -r requirements.txt
- pip install nose coverage BeautifulSoup4 Pillow
- pip install -r test_requirements.txt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a comment on the order of sections, to make this file more readable. It should be in install, before_script, script order.

notifications:
email: false
slack:
Expand Down
18 changes: 4 additions & 14 deletions imgee/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,30 +22,20 @@
from .models import db
from .tasks import TaskRegistry

registry = TaskRegistry(os.getenv('ENV', 'production'))
registry = TaskRegistry()


def mkdir_p(dirname):
if not os.path.exists(dirname):
os.makedirs(dirname)


# Configure the app
# Configure the application
coaster.app.init_app(app)
migrate = Migrate(app, db)
baseframe.init_app(app, requires=['baseframe', 'picturefill', 'imgee'])
lastuser.init_app(app)
lastuser.init_usermanager(UserManager(db, models.User))
registry.init_app(app)


@app.errorhandler(403)
def error403(error):
return redirect(url_for('login'))

if app.config.get('MEDIA_DOMAIN') and (
app.config['MEDIA_DOMAIN'].startswith('http:') or
app.config['MEDIA_DOMAIN'].startswith('https:')):
if app.config.get('MEDIA_DOMAIN', '').lower().startswith(('http://', 'https://')):
app.config['MEDIA_DOMAIN'] = app.config['MEDIA_DOMAIN'].split(':', 1)[1]
mkdir_p(app.config['UPLOADED_FILES_DEST'])

registry.set_connection()
8 changes: 6 additions & 2 deletions imgee/models/stored_file.py
Original file line number Diff line number Diff line change
Expand Up @@ -58,12 +58,16 @@ def __repr__(self):
def extn(self):
return guess_extension(self.mimetype, self.orig_extn) or ''

@property
def filename(self):
return self.name + self.extn

def dict_data(self):
return dict(
title=self.title,
uploaded=self.created_at.strftime('%B %d, %Y'),
uploaded=self.created_at.isoformat() + 'Z',
filesize=app.jinja_env.filters['filesizeformat'](self.size),
imgsize='%s x %s' % (self.width, self.height),
imgsize=u'%s×%s px' % (self.width, self.height),
url=url_for('view_image', profile=self.profile.name, image=self.name),
thumb_url=url_for('get_image', image=self.name, size=app.config.get('THUMBNAIL_SIZE'))
)
Expand Down
52 changes: 34 additions & 18 deletions imgee/storage.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
from datetime import datetime, timedelta
from glob import glob
import os.path
import os
import re
from subprocess import check_call, CalledProcessError
import time
Expand All @@ -14,9 +15,9 @@
from imgee import app
from imgee.models import db, Thumbnail, StoredFile
from imgee.utils import (
newid, guess_extension, get_file_type,
newid, guess_extension, get_file_type, is_animated_gif,
path_for, get_s3_folder, get_s3_bucket,
download_frm_s3, get_width_height, ALLOWED_MIMETYPES,
download_from_s3, get_width_height, ALLOWED_MIMETYPES,
exists_in_s3, THUMBNAIL_COMMANDS
)

Expand All @@ -29,17 +30,25 @@ def get_resized_image(img, size, is_thumbnail=False):
"""
registry = imgee.registry
Copy link
Member

@jace jace Jun 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a circular import. Can you possibly define the registry before other modules are imported, to avoid the circular import?

One way is to move the registry's __init__ code into a distinct init_app method and call that separately after the app is configured (__init__ should also take an optional app parameter and if present call init_app).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, an init_app() function in the tasks.py (where TaskRegistry exists), and initialize the registry there and assign it to let's say app.registry, and call it from imgee/__init__.py so that it can be used from anywhere?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, an init_app method on the class. Not in the module.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the convention in Flask is that if you want to add something to the app, add it under app.extensions (a dictionary). We already have app.extensions['lastuser']. You could do app.extensions['taskregistry']. I will not recommend this however as the task registry is specific to Imgee and is not a shared extension. It's better being referenced as a direct Python import.

img_name = img.name

if img.mimetype == 'image/gif':
# if the gif file is animated, not resizing it for now
# but we will need to resize the gif, keeping animation intact
# https://github.com/hasgeek/imgee/issues/55
src_path = download_from_s3(img.filename)
if is_animated_gif(src_path):
return img.name

size_t = parse_size(size)
if (size_t and size_t[0] != img.width and size_t[1] != img.height) or ('thumb_extn' in ALLOWED_MIMETYPES[img.mimetype] and ALLOWED_MIMETYPES[img.mimetype]['thumb_extn'] != img.extn):
w_or_h = or_(Thumbnail.width == size_t[0], Thumbnail.height == size_t[1])
scaled = Thumbnail.query.filter(w_or_h, Thumbnail.stored_file == img).first()
if scaled and exists_in_s3(scaled):
img_name = scaled.name
else:
size = get_fitting_size((img.width, img.height), size_t)
original_size = (img.width, img.height)
size = get_fitting_size(original_size, size_t)
resized_filename = get_resized_filename(img, size)
registry = imgee.registry

try:
if resized_filename in registry:
# file is still being processed
Expand All @@ -48,12 +57,10 @@ def get_resized_image(img, size, is_thumbnail=False):
else:
registry.add(resized_filename)
img_name = resize_and_save(img, size, is_thumbnail=is_thumbnail)
except Exception as e:
# something broke while processing the file
raise e
finally:
# file has been processed, remove from registry
registry.remove(resized_filename)
if resized_filename in registry:
registry.remove(resized_filename)
return img_name


Expand All @@ -74,7 +81,7 @@ def save_file(fp, profile, title=None):

stored_file = save_img_in_db(name=id_, title=title, local_path=local_path,
profile=profile, mimetype=content_type, orig_extn=extn)
s3resp = save_on_s3(img_name, content_type=content_type)
save_on_s3(img_name, content_type=content_type)
return title, stored_file


Expand Down Expand Up @@ -121,7 +128,8 @@ def save_on_s3(filename, remotename='', content_type='', bucket='', folder=''):
headers = {
'Cache-Control': 'max-age=31536000', # 60*60*24*365
'Content-Type': get_file_type(fp, filename),
'Expires': datetime.now() + timedelta(days=365)
# once cached, it is set to expire after a year
'Expires': datetime.utcnow() + timedelta(days=365)
}
k.set_contents_from_file(fp, policy='public-read', headers=headers)
return filename
Expand All @@ -146,7 +154,7 @@ def parse_size(size):
return tuple(map(int, size))


def get_fitting_size((orig_w, orig_h), size):
def get_fitting_size(original_size, size):
"""
Return the size to fit the image to the box
along the smaller side and preserve aspect ratio.
Expand All @@ -171,6 +179,8 @@ def get_fitting_size((orig_w, orig_h), size):
>>> get_fitting_size((200, 500), (400, 600))
[240, 600]
"""
orig_w, orig_h = original_size

if orig_w == 0 or orig_h == 0:
# this is either a cdr file or a zero width file
# just go with target size
Expand Down Expand Up @@ -216,7 +226,7 @@ def resize_and_save(img, size, is_thumbnail=False):
Get the original image from local disk cache, download it from S3 if it misses.
Resize the image and save resized image on S3 and size details in db.
"""
src_path = download_frm_s3(img.name + img.extn)
src_path = download_from_s3(img.filename)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noticed this is for download. Since img.extn is calculated, does this mean we're not actually saving the exact filename as stored in S3? That is a problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are storing the the filename as stored in S3. I'm not sure why this function exists https://github.com/hasgeek/imgee/blob/master/imgee/utils.py#L88

if the file name is abcd.png, in db, we're storing name=abcd, orig_extn=.png

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I get it. We're not storing the exact name in s3 in db. orig_extn is the original extension of the file. but as the thumbnail extension can be different from original extension, that target extension is stored in the ALLOWED_MIMETYPES dict, and taken from there every time we request for the extension.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed as #69.


if 'thumb_extn' in ALLOWED_MIMETYPES[img.mimetype]:
format = ALLOWED_MIMETYPES[img.mimetype]['thumb_extn']
Expand Down Expand Up @@ -258,9 +268,10 @@ def resize_img(src, dest, size, mimetype, format, is_thumbnail):

def clean_local_cache(expiry=24):
"""
Remove files from local cache which are NOT accessed in the last `expiry` hours.
Remove files from local cache
which are NOT accessed in the last `expiry` hours.
"""
cache_path = app.config.get('UPLOADED_FILES_DEST')
cache_path = os.path.join(app.static_folder, app.config.get('UPLOADED_FILES_DIR'))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

os.path.join will break out of app.static_folder if the config setting has .. or / in it. I recall there's a secure version of path.join that checks for this. Investigate it? (Not particularly serious though since it's only in server-side config.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll check this out and make changes when I find it.

cache_path = os.path.join(cache_path, '*')
min_atime = time.time() - expiry*60*60

Expand All @@ -275,11 +286,16 @@ def clean_local_cache(expiry=24):
def delete(stored_file, commit=True):
"""
Delete all the thumbnails and images associated with a file, from local cache and S3.
Wait for the upload/resize to complete if queued for the same image.
"""
registry = imgee.registry
# remove all the keys related to the given file name
# this is delete all keys matching `imgee:registry:<name>*`
registry.remove_keys_starting_with(stored_file.name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is name a UUID here? Meaning we're removing everything that has that UUID as prefix?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, UUID. It removes all the keys matching imgee:registry:<name>*. So if there are thumbnails of the image being deleted in the registry, they are removed too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be …:<name>:*? The lack of a separator is asking for future accidents.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I'm sorry. it removes all keys matching imgee:registry:default:<name>*.

File names are <name>.<extn> for original file and <name>_wNN_hNN.<extn> for thumbnails. the redis keys dont have extensions. just the <name> part. so they are like imgee:registry:default:<name> or imgee:registry:default:<name>_wNN_hNN, We're deleting keys starting with <name> here.

in __init__() function,

def init_app(self, name='default', connection=None):
        self.connection = connection or self.set_connection_from_url()
        self.name = name
        self.key_prefix = 'imgee:registry:%s' % name
        ...

The <name> in that key_prefix is the name of the registry, which is set to default. i'm not sure why that name is there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of deleting <name>*, we could do two separate deletes for <name> and <name>_* as a safety measure. Same with filesystem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default name makes sense because it's always possible to have two registries. We just don't have a need for that right now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. I'll do that.


# remove locally
cache_path = app.config.get('UPLOADED_FILES_DEST')
cached_img_path = os.path.join(cache_path, '%s*' % stored_file.name)
cache_path = os.path.join(app.static_folder, app.config.get('UPLOADED_FILES_DIR'))
os.remove(os.path.join(cache_path, '%s' % stored_file.filename))
cached_img_path = os.path.join(cache_path, '%s_*' % stored_file.name)
for f in glob(cached_img_path):
os.remove(f)

Expand Down
88 changes: 69 additions & 19 deletions imgee/tasks.py
Original file line number Diff line number Diff line change
@@ -1,35 +1,85 @@
import re
import redis
from imgee import app


class InvalidRedisQueryException(Exception):
pass


class TaskRegistry(object):
def __init__(self, name='default', connection=None):
self.connection = redis.from_url(connection) if connection else None
def __init__(self, name='default', connection=None, app=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually app is the first parameter after self.

self.name = name
self.key = 'imgee:registry:%s' % name
self.connection = connection
self.key_prefix = 'imgee:registry:%s' % name
self.filename_pattern = '^[a-z0-9\_\.]+$'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to use re.compile(r'^[a-z0-9_\.]+$')? It needs to be a raw r string to remove ambiguity around the \, and the . needs escaping in regex, but not _.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

making it a raw string and not escaping _. And we dont need to compile because as you mentioned before, no point compiling during runtime. passing the pattern directly to re.match().


def set_connection(self, connection=None):
connection = connection or app.config.get('REDIS_URL')
self.connection = redis.from_url(connection)
if app:
self.init_app(app)
else:
self.app = None

def add(self, taskid):
self.connection.sadd(self.key, taskid)
def init_app(self, app):
self.app = app

def remove(self, taskid):
self.connection.srem(self.key, taskid)
if not self.connection:
url = self.app.config.get('REDIS_URL')
self.connection = redis.from_url(url)
self.pipe = self.connection.pipeline()

def remove_all(self):
for k in self.get_all_keys():
self.remove(k)
def is_valid_query(self, query):
return bool(re.match(self.filename_pattern, query))

def key_for(self, taskid):
return u'{key_prefix}:{taskid}'.format(key_prefix=self.key_prefix, taskid=taskid)

def __contains__(self, taskid):
return self.connection.sismember(self.key, taskid)
return len(self.connection.keys(self.key_for(taskid))) > 0

def keys_starting_with(self, query):
return filter(lambda k: k.startswith(query), self.connection.smembers(self.key))
def add(self, taskid, expire=60):
self.connection.set(self.key_for(taskid), taskid)
# setting TTL of 60 seconds for the key
# if the file doesn't get processed within 60 seconds,
# it'll be removed from the registry
self.connection.expire(self.key_for(taskid), expire)

def search(self, query):
return filter(lambda k: str(query) in k, self.connection.smembers(self.key))
# >> KEYS imgee:registry:default:*query*
if not self.is_valid_query(query):
raise InvalidRedisQueryException(u'Invalid query for searching redis keys: {}'.format(query))
return self.connection.keys(self.key_for('*{}*'.format(query)))

def get_all_keys(self):
return list(self.connection.smembers(self.key))
# >> KEYS imgee:registry:default:*
return self.connection.keys(self.key_for('*'))

def keys_starting_with(self, query):
# >> KEYS imgee:registry:default:query_*
# chances are that we'll use this function to find the
# thumbnail keys, which look like "name_wNN_hNN", hence the _
if not self.is_valid_query(query):
raise InvalidRedisQueryException(u'Invalid query for searching redis keys, starting with: {}'.format(query))
return self.connection.keys(self.key_for('{}_*'.format(query)))

def remove(self, taskid):
# remove a key with the taskid
self.remove_by_key(self.key_for(taskid))

def remove_by_key(self, key):
# remove a redis key directly
self.connection.delete(key)

def remove_keys(self, keys):
for k in keys:
self.pipe.delete(k)
self.pipe.execute()

def remove_keys_matching(self, query):
self.remove_keys(self.search(query))

def remove_keys_starting_with(self, query):
# query will most likely be stored_file.name, So
# Delete keys for only `<query>` and `<query>_*`
self.remove_keys([self.key_for(query)] + self.keys_starting_with(query))

def remove_all(self):
self.remove_keys(self.get_all_keys())
Loading