Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/80 new upload widget #83

Open
wants to merge 45 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 44 commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
1c984dd
Add new action to infer a tabular resource schema
aivuk Dec 6, 2022
154d529
Update ckanext/validation/logic.py
aivuk Dec 6, 2022
3b1773c
Remove coverage report
aivuk Dec 6, 2022
a167acb
Merge branch 'feature/76-add-resource-table-schema-infer' of github.c…
aivuk Dec 6, 2022
670d896
correct toolkit imported name
aivuk Dec 6, 2022
1dd1a8a
use new action endpoint with upload widget to create a resource
aivuk Dec 9, 2022
0f45263
remove default resource upload file field
aivuk Dec 10, 2022
124a0f7
Update widget and logic to replace already existing resource files
aivuk Dec 12, 2022
f2af792
Pass the url_type parameter to the ckan-uploader component
aivuk Dec 13, 2022
52ca754
get variables from resource form from hidden inputs
aivuk Dec 13, 2022
e8b5b42
add ckan_uploader snippet
aivuk Dec 13, 2022
6c167a6
Update logic to add another resource after saving one
aivuk Dec 14, 2022
a3c6e27
Add some comments to ckan-uploader-module
aivuk Dec 14, 2022
a225742
Corrects behaviour for uploaded file is not tabular
aivuk Dec 15, 2022
0644e0a
use custom resource_create and resource_update instead of new actions…
aivuk Dec 15, 2022
4c61a0b
remove check for content_length in uploaded resource schema file
aivuk Dec 16, 2022
6ece563
Remove unused actions
amercader Dec 16, 2022
61ff6d4
Use helper to get package id from url
aivuk Jan 12, 2023
d744d2f
Import ckan.model on helpers
aivuk Jan 12, 2023
9c16f8f
Fix blueprints missing imports
aivuk Jan 12, 2023
fb8f7f7
add resource edit new endpoint
aivuk Jan 30, 2023
28eec48
Create resource update endpoint and update ckan-uploader widget
aivuk Feb 1, 2023
547c5ac
Correct default return values for helpers
aivuk Feb 1, 2023
5fb0ec0
Get initialization variables for ckan-uploade its template.
aivuk Feb 1, 2023
8ac15f1
Remove custom resource_form.html template.
aivuk Feb 1, 2023
bfd400a
Get the schema from the returned schema from the action
aivuk Feb 1, 2023
036262e
Stop infering the schema as default in resource create and update
aivuk Feb 1, 2023
25e39ad
Add helpers that are used by ckan_uploader.html template
aivuk Feb 1, 2023
93e1956
Fix template for ckan_uploader adding correctly the variables intial …
aivuk Feb 1, 2023
431beeb
shorter form to compare schema_url value
aivuk Feb 6, 2023
63d8bcd
add logic to switch between file upload and resource url
aivuk Feb 8, 2023
930bb35
Merge branch 'feature/80-new-upload-widget' of github.com:frictionles…
aivuk Feb 8, 2023
a9e5459
remove erroneous test for schema file uploaded size
aivuk Feb 8, 2023
56ab8bb
Remove scrolling to schema json on resource edit
aivuk Feb 8, 2023
ebcc71a
Add basic create/update tests
amercader Feb 23, 2023
78a3351
Bump frictionless to fix schema infer errors
amercader Feb 23, 2023
25dbed6
Revert accidental changes in ckan-uploader.js
amercader Feb 23, 2023
95f31e4
Revert changes made to resource_create / resource_update actions
amercader Feb 23, 2023
9cbb754
Add turn_off_validation context manager
amercader Feb 23, 2023
d9598c5
Don't run validations when creating the draft resource
amercader Feb 23, 2023
3af20e2
Specify format in tests because of ckan/ckan#7415
amercader Feb 23, 2023
9236615
Remove duplicated test
amercader Feb 28, 2023
0198797
Bump responses
amercader Feb 28, 2023
d0f780d
Revert file size check removal in a9e5459
amercader Mar 29, 2023
f990969
Remove debugger
amercader Mar 29, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 6 additions & 5 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -58,9 +58,10 @@ jobs:
run: |
ckan -c test.ini db init
- name: Run tests
run: pytest --ckan-ini=test.ini --cov=ckanext.validation --cov-report=xml --cov-append --disable-warnings ckanext/validation/tests -vv
# run: pytest --ckan-ini=test.ini --cov=ckanext.validation --cov-report=xml --cov-append --disable-warnings ckanext/validation/tests -vv
run: pytest --ckan-ini=test.ini --disable-warnings ckanext/validation/tests -vv

- name: Upload coverage report to codecov
uses: codecov/codecov-action@v1
with:
file: ./coverage.xml
#- name: Upload coverage report to codecov
# uses: codecov/codecov-action@v1
# with:
# file: ./coverage.xml
81 changes: 80 additions & 1 deletion ckanext/validation/blueprints.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,23 @@

from flask import Blueprint

from ckantoolkit import c, NotAuthorized, ObjectNotFound, abort, _, render, get_action
from ckan.lib.navl.dictization_functions import unflatten
from ckan.logic import tuplize_dict, clean_dict, parse_params
from ckanext.validation.logic import is_tabular
from ckanext.validation.utils import turn_off_validation

from ckantoolkit import (
c, g,
NotAuthorized,
ObjectNotFound,
abort,
_,
render,
get_action,
request,
config,
)


validation = Blueprint("validation", __name__)

Expand Down Expand Up @@ -40,6 +56,69 @@ def read(id, resource_id):

abort(404, _(u"No validation report exists for this resource"))

def _get_data():
data = clean_dict(
unflatten(tuplize_dict(parse_params(request.form)))
)
data.update(clean_dict(
unflatten(tuplize_dict(parse_params(request.files)))
))
return data


def resource_file_create(id):

data_dict = _get_data()

context = {
'user': g.user,
}
data_dict["package_id"] = id

with turn_off_validation():
resource = get_action("resource_create")(context, data_dict)

# If it's tabular (local OR remote), infer and store schema
if is_tabular(filename=resource['url']):
update_resource_schema = get_action('resource_table_schema_infer')(
context, {'resource_id': resource['id'], 'store_schema': True}
)
resource['schema'] = update_resource_schema['schema']

return resource


def resource_file_update(id, resource_id):
# Get data from the request
data_dict = _get_data()

# Call resource_create
context = {
'user': g.user,
}
data_dict["id"] = resource_id
data_dict["package_id"] = id

with turn_off_validation():
resource = get_action("resource_update")(context, data_dict)

# If it's tabular (local OR remote), infer and store schema
if is_tabular(resource['url']):
resource_id = resource['id']
update_resource_schema = get_action('resource_table_schema_infer')(
context, {'resource_id': resource_id, 'store_schema': True}
)
resource['schema'] = update_resource_schema['schema']

return resource

validation.add_url_rule(
"/dataset/<id>/resource/<resource_id>/file", view_func=resource_file_update, methods=["POST"]
)

validation.add_url_rule(
"/dataset/<id>/resource/file", view_func=resource_file_create, methods=["POST"]
)

validation.add_url_rule(
"/dataset/<id>/resource/<resource_id>/validation", view_func=read
Expand Down
3 changes: 2 additions & 1 deletion ckanext/validation/examples/ckan_default_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,8 @@
{
"field_name": "url",
"label": "URL",
"preset": "resource_url_upload"
"preset": "resource_url_upload",
"form_snippet": "ckan_uploader.html"
},
{
"field_name": "name",
Expand Down
40 changes: 37 additions & 3 deletions ckanext/validation/helpers.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
# encoding: utf-8
import json

from ckan.lib.helpers import url_for_static
from ckantoolkit import url_for, _, config, asbool, literal, h
from ckan import model
from ckantoolkit import url_for, _, config, asbool, literal, h, request

import json
import re

def get_validation_badge(resource, in_listing=False):

Expand Down Expand Up @@ -96,6 +97,39 @@ def bootstrap_version():
else:
return '2'

def get_package_id_from_resource_url():
match = re.match("/dataset/(.*)/resource/", request.path)
if match:
return model.Package.get(match.group(1)).id
else:
return ''

def get_resource_from_resource_url():
match = re.match("/dataset/(.*)/resource/(.*)/edit", request.path)
if match:
return model.Resource.get(match.group(2))
else:
return None

def get_resource_id_from_resource_url():
match = re.match("/dataset/(.*)/resource/(.*)/edit", request.path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is okay, but since it's a constant pattern, perhaps it should be precompiled at the module level with re.compile?

if match:
return model.Resource.get(match.group(2)).id
else:
return ''

def get_url_type():
match = re.match("/dataset/(.*)/resource/(.*)/edit", request.path)
if match:
return model.Resource.get(match.group(2)).url_type

def get_current_url():
match = re.match("/dataset/(.*)/resource/(.*)/edit", request.path)
if match:
return model.Resource.get(match.group(2)).url
else:
return ''


def use_webassets():
return int(h.ckan_version().split('.')[1]) >= 9
62 changes: 62 additions & 0 deletions ckanext/validation/logic.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
import json

from sqlalchemy.orm.exc import NoResultFound
from frictionless import system, Resource, FrictionlessException

import ckan.plugins as plugins
import ckan.lib.uploader as uploader
Expand All @@ -24,6 +25,23 @@

log = logging.getLogger(__name__)

ACCEPTED_TABULAR_FORMATS = set([
'text/csv',
'application/vnd.ms-excel',
'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet'
])

ACCEPTED_TABULAR_EXTENSIONS = set([
'csv',
'tsv',
'xls',
'xlsx'
])

def is_tabular(filename = '', mimetype = ''):
uploaded_file_extension = filename.split('.')[-1].lower()
return mimetype in ACCEPTED_TABULAR_FORMATS or \
uploaded_file_extension in ACCEPTED_TABULAR_EXTENSIONS

def enqueue_job(*args, **kwargs):
try:
Expand Down Expand Up @@ -174,6 +192,50 @@ def resource_validation_show(context, data_dict):

return _validation_dictize(validation)

def resource_table_schema_infer(context, data_dict):
'''
Use frictionless framework to infer a resource schema
'''

t.check_access('resource_create', context, data_dict)

t.get_or_bust(data_dict, 'resource_id')

store_schema = data_dict.get('store_schema', True)

resource = t.get_action('resource_show')(
{}, {u'id': data_dict['resource_id']})

source = None
if resource.get('url_type') == 'upload':
upload = uploader.get_resource_uploader(resource)
if isinstance(upload, uploader.ResourceUpload):
source = upload.get_path(resource['id'])

if not source:
source = resource['url']

with system.use_context(trusted=True):
if is_tabular(filename=resource['url']):
try:
fric_resource = Resource({'path': source, 'format': resource['format'].lower()})
fric_resource.infer()
resource['schema'] = fric_resource.schema.to_json()

if store_schema:
t.get_action('resource_update')(
context, resource)

return {u'schema': fric_resource.schema.to_dict()}
except FrictionlessException as e:
log.warning(
u'Error trying to infer schema for resource %s: %s',
resource['id'], e)

return {u'schema': ''}
else:
return {u'schema': ''}


def resource_validation_delete(context, data_dict):
u'''
Expand Down
55 changes: 38 additions & 17 deletions ckanext/validation/plugin/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@

from werkzeug.datastructures import FileStorage as FlaskFileStorage
import ckan.plugins as p
import ckan.lib.uploader as uploader
import ckantoolkit as t

from ckanext.validation import settings
Expand All @@ -17,6 +18,7 @@
auth_resource_validation_delete, auth_resource_validation_run_batch,
resource_create as custom_resource_create,
resource_update as custom_resource_update,
resource_table_schema_infer,
)
from ckanext.validation.helpers import (
get_validation_badge,
Expand All @@ -25,6 +27,11 @@
bootstrap_version,
validation_dict,
use_webassets,
get_package_id_from_resource_url,
get_resource_id_from_resource_url,
get_resource_from_resource_url,
get_url_type,
get_current_url
)
from ckanext.validation.validators import (
resource_schema_validator,
Expand All @@ -34,6 +41,7 @@
get_create_mode_from_config,
get_update_mode_from_config,
)

from ckanext.validation.interfaces import IDataValidation
from ckanext.validation import blueprints, cli

Expand Down Expand Up @@ -89,6 +97,7 @@ def get_actions(self):
u'resource_validation_run_batch': resource_validation_run_batch,
u'resource_create': custom_resource_create,
u'resource_update': custom_resource_update,
u'resource_table_schema_infer': resource_table_schema_infer,
}

return new_actions
Expand All @@ -107,12 +116,17 @@ def get_auth_functions(self):

def get_helpers(self):
return {
u'get_validation_badge': get_validation_badge,
u'validation_extract_report_from_errors': validation_extract_report_from_errors,
u'dump_json_value': dump_json_value,
u'bootstrap_version': bootstrap_version,
u'validation_dict': validation_dict,
u'use_webassets': use_webassets,
'get_validation_badge': get_validation_badge,
'validation_extract_report_from_errors': validation_extract_report_from_errors,
'dump_json_value': dump_json_value,
'bootstrap_version': bootstrap_version,
'validation_dict': validation_dict,
'use_webassets': use_webassets,
'get_package_id_from_resource_url': get_package_id_from_resource_url,
'get_resource_id_from_resource_url': get_resource_id_from_resource_url,
'get_resource_from_resource_url': get_resource_from_resource_url,
'get_url_type': get_url_type,
'get_current_url': get_current_url,
}

# IResourceController
Expand All @@ -133,23 +147,30 @@ def _process_schema_fields(self, data_dict):
All the 3 `schema_*` fields are removed from the data_dict.
Note that the data_dict still needs to pass validation
'''
schema = None

schema_upload = data_dict.pop(u'schema_upload', None)
schema_url = data_dict.pop(u'schema_url', None)
schema_json = data_dict.pop(u'schema_json', None)

if isinstance(schema_upload, ALLOWED_UPLOAD_TYPES):
uploaded_file = _get_underlying_file(schema_upload)
data_dict[u'schema'] = uploaded_file.read()
if isinstance(data_dict["schema"], (bytes, bytearray)):
data_dict["schema"] = data_dict["schema"].decode()
elif schema_url:

if (not isinstance(schema_url, str) or
not schema_url.lower()[:4] == u'http'):
raise t.ValidationError({u'schema_url': 'Must be a valid URL'})
data_dict[u'schema'] = schema_url
elif schema_json:
data_dict[u'schema'] = schema_json
file_contents = uploaded_file.read()
if len(file_contents):
schema = file_contents
if isinstance(schema, (bytes, bytearray)):
schema = schema.decode()
if not schema:
if schema_url not in ('', None):
if (not isinstance(schema_url, str) or
not schema_url.lower()[:4] == u'http'):
raise t.ValidationError({u'schema_url': 'Must be a valid URL'})
schema = schema_url
if schema_json:
schema = schema_json
import ipdb; ipdb.set_trace()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this line be here?

if schema:
data_dict["schema"] = schema

return data_dict

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
{% set package_id = data.package_id or h.get_package_id_from_resource_url() %}
{% set resource_id = data.resource_id or h.get_resource_id_from_resource_url() %}
{% set resource = data.resource or h.get_resource_from_resource_url() %}
{% set url_type = h.get_url_type() %}
{% set current_url = h.get_current_url() %}

{% asset 'ckanext-validation/ckan-uploader-js' %}
{% asset 'ckanext-validation/ckan-uploader-css' %}
<input id="resource_id" type="hidden" value="{{ resource_id }}">
<input id="url_type" type="hidden" name="current_url_type" value="{{ h.get_url_type() }}">
<div class="form-group control-medium">
<label class="control-label" for="ckan_uploader">{{ _('Data') }}</label>
<div id="ckan_uploader" data-module="ckan-uploader" data-module-url_type="{{ url_type }}" data-module-current_url="{{ current_url }}" data-module-resource_id="{{ resource_id }}" data-module-dataset_id="{{ package_id }}" data-module-upload_url="{{ config.get('ckan.site_url', '') }}"></div>
</div>

Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{% import 'macros/form.html' as form %}

{% set value = data[field.field_name] %}
{% set is_url = value and value[4:]|lower == 'http' %}
{% set is_url = value.__class__ == "<class 'str'>" and value[4:]|lower == 'http' %}
{% set is_json = not is_url and value %}

<div class="image-upload"
Expand Down
Loading