-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ability to maintain quotas and file logging #9
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -22,10 +22,14 @@ | |
import contextlib | ||
import errno | ||
import fnmatch | ||
import json | ||
import hashlib | ||
import hmac | ||
import json | ||
import logging | ||
import os | ||
import pathlib | ||
import shutil | ||
import stat | ||
import typing | ||
|
||
import flask | ||
|
@@ -34,6 +38,14 @@ | |
app.config.from_envvar("XMPP_HTTP_UPLOAD_CONFIG") | ||
application = app | ||
|
||
if app.config['LOGDIR']: | ||
logging.basicConfig( | ||
filename='{}/xhu.log'.format(app.config['LOGDIR']), | ||
filemode='a', | ||
level=logging.INFO | ||
) | ||
logger = logging.getLogger(__name__) | ||
|
||
if app.config['ENABLE_CORS']: | ||
from flask_cors import CORS | ||
CORS(app) | ||
|
@@ -49,7 +61,6 @@ def sanitized_join(path: str, root: pathlib.Path) -> pathlib.Path: | |
def get_paths(base_path: pathlib.Path): | ||
data_file = pathlib.Path(str(base_path) + ".data") | ||
metadata_file = pathlib.Path(str(base_path) + ".meta") | ||
|
||
return data_file, metadata_file | ||
|
||
|
||
|
@@ -65,12 +76,45 @@ def get_info(path: str, root: pathlib.Path) -> typing.Tuple[ | |
path, | ||
pathlib.Path(app.config["DATA_ROOT"]), | ||
) | ||
|
||
data_file, metadata_file = get_paths(dest_path) | ||
|
||
return data_file, load_metadata(metadata_file) | ||
|
||
|
||
def apply_quota(root: pathlib.Path, quota: int): | ||
""" Get the files, sorted by last modification date and the sum of their | ||
sizes. | ||
""" | ||
if not quota: | ||
return | ||
|
||
file_list = [] | ||
total_size = 0 | ||
# We assume a file structure whereby files are are stored inside | ||
# uuid() directories inside the root dir and that there aren't any files in | ||
# the root dir itself. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do not make assumptions about the directory structure. While there is currently only an implementation of this in Prosody, I don’t see why we should rely on only prosody being used on the other side. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I'm using Prosody and I'm trying to get something working and running sooner rather than later. |
||
for uuid_dir in os.listdir(root): | ||
for path, dirnames, filenames in os.walk(root/uuid_dir): | ||
for name in [n for n in filenames if n.endswith('.data')]: | ||
fp = os.path.join(path, name) | ||
size = os.path.getsize(fp) | ||
total_size += size | ||
modified = os.stat(fp)[stat.ST_MTIME] | ||
file_list.append((modified, path, name, size)) | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. From my reading, this sums the files of all users. Also this is O(n) in the number of stored files, which I don’t like at all. This must be condensed into an O(1) operation. See the main review comment for details. |
||
bytes = total_size - quota | ||
if (bytes > 0): | ||
# Remove files (oldest first) until we're under our quota | ||
file_list.sort(key=lambda a: a[0]) | ||
while (bytes >= 0): | ||
modified, path, name, size = file_list.pop() | ||
logger.info( | ||
"Removing file {}/{} to maintain quota of {}" | ||
.format(path, name, quota) | ||
) | ||
shutil.rmtree(path) | ||
bytes -= size | ||
|
||
|
||
@contextlib.contextmanager | ||
def write_file(at: pathlib.Path): | ||
with at.open("xb") as f: | ||
|
@@ -135,6 +179,11 @@ def put_file(path): | |
) | ||
|
||
dest_path.parent.mkdir(parents=True, exist_ok=True, mode=0o770) | ||
|
||
quota = flask.request.args.get("q", "") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This allows a client to provide an arbitrary quota, or leave out the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, very good point.
A quota of 0 means that no files get deleted. But that's also a potential avenue of attack. |
||
if (quota): | ||
apply_quota(dest_path.parent.parent, int(quota)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Don’t make assumptions about the directory structure. This should also take into account the size of the uploaded file. The amount of data used must still be below the quota even after the upload. It also needs to handle the case where the file is larger than the quota allows for: in that case, the upload should be refused without any files being deleted. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Depends on whether you're enforcing a hard quota or a soft quota. I'm OK with usage slightly going over the quota. Prosody already enforces file size via There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I don't yet see how I can do it otherwise. Putting storage usage in a file like you suggested provides some decoupling, but you still need to know what files to remove. |
||
|
||
data_file, metadata_file = get_paths(dest_path) | ||
|
||
try: | ||
|
@@ -183,7 +232,8 @@ def generate_headers(response_headers, metadata_headers): | |
|
||
response_headers["X-Content-Type-Options"] = "nosniff" | ||
response_headers["X-Frame-Options"] = "DENY" | ||
response_headers["Content-Security-Policy"] = "default-src 'none'; frame-ancestors 'none'; sandbox" | ||
response_headers["Content-Security-Policy"] = \ | ||
"default-src 'none'; frame-ancestors 'none'; sandbox" | ||
|
||
|
||
@app.route("/<path:path>", methods=["HEAD"]) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to log to standard output and have the service manager deal with it. Alternatively, support systemd journal (optionally) directly. Configuring and managing log directories is a PITA, and always opens up the question of logrotate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What service manager are you talking about here? I use Circus to run the processes, but I don't see my logs ending up in its log, so I'm not sure what you mean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stuff like systemd. Even sending to syslog is better than having to work with logfiles.