Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sanitize queries on JSONField in database #20

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

megies
Copy link
Collaborator

@megies megies commented Sep 28, 2017

As I understand, getting filtered sets of data (e.g. events in quakeml) uses django's QuerySet.extra(where=[...]) with manually built up query strings. This is pretty hard to extend and there's quite a lot of code involved to build those query strings.
Django docs mention that..

Use this method as a last resort

This is an old API that we aim to deprecate at some point in the future. [...]

and..

Warning

You should be very careful whenever you use extra(). Every time you use it, you should escape any parameters that the user can control by using params in order to protect against SQL injection attacks.

Maybe I'm missing something but couldn't this part be simplified by using the API on querying JSONFields?

Also.. django 1.9 now ships JSONField, maybe this should be used instead of the additional jsonfield dependency?

@megies megies changed the title Sanititze queries on JSONField in database Sanitize queries on JSONField in database Apr 30, 2016
@barsch
Copy link
Collaborator

barsch commented May 3, 2016

keep in mind that the project started while django 1.6 was the most recent version - at this time the json datatype within postgresql was bleeding edge technology - therefore the usage of extra with costum SQL - so no reason to quote the Django documentation as we were aware of its implications ;)

anyway, I agree that we have to switch to the internal JSONField of Django, but this requires to make Jane compatible with Django 1.9 - currently its based on 1.8

@megies
Copy link
Collaborator Author

megies commented May 3, 2016

keep in mind that the project started while django 1.6

can't keep in mind what I don't know, all i see is that it's based on 1.8 as it stands now ;-)

so no reason to quote the Django documentation as we were aware of its implications

didn't intend to be offensive, just summarizing the current state

but this requires to make Jane compatible with Django 1.9

Totally clear, I tried to give it a shot but had to give up after working around a couple of issues. Just way too little knowledge about django and the specific setup..

For the time being, I could work around current limitations in the filtering introducing a very ugly negation of the whole where SQL part (see #21).

@krischer
Copy link
Owner

krischer commented May 3, 2016

I can take care of this if we want to do this - it will greatly simplify the whole thing.

The only thing to keep in mind that it currently works with django 1.8 which is LTS. The json field was only introduced in django 1.9 which has a fairly short life cycle so somebody will need to take care to update it to newer django versions.

@barsch
Copy link
Collaborator

barsch commented May 3, 2016

the JSONField alone is a reason to go for Django 1.9 - also I'm not a big fan of Django LTS versions as upgrading them later is a huge pain

@megies
Copy link
Collaborator Author

megies commented Sep 28, 2017

I've started work on this.. currently this is partially hampered by django/django#6929, the regex filtering doesn't work due to that bug. The patch is super simple, maybe we can work around it somehow..

@krischer
Copy link
Owner

or someone figures out to either update django-plugins or get rid of it and use some other plugin system and we can just use the latest django.

@krischer
Copy link
Owner

or you could just monkey patch the one function I guess

@megies
Copy link
Collaborator Author

megies commented Sep 29, 2017

This should be pretty close to being ready now.. but I'm stuck with some problem in the station index querying.. no clue why this is not working as expected:

(Pdb) for i in query: print(i.json['station'])
ALTM
ALTM
ALTM
(Pdb) query.filter(json__station__exact='ALTM')
[]
(Pdb) print(type(query))
<class 'django.contrib.gis.db.models.query.GeoQuerySet'>

fdsn stationxml seems to be pretty much untested though, so passing
tests don't say much..
somehow still doesn't work, not the slightest idea why, in debugger the
simplest queries seem to fail somehow:

(Pdb) for i in query: print(i.json['station'])
ALTM
ALTM
ALTM
(Pdb) query.filter(json__station__exact='ALTM')
[]
(Pdb) print(type(query))
<class 'django.contrib.gis.db.models.query.GeoQuerySet'>
@megies megies force-pushed the sanitize_json_queries branch from a5f9a37 to 5e76d35 Compare October 27, 2017 05:33
@megies
Copy link
Collaborator Author

megies commented Oct 27, 2017

Tests pass, only some flake8 fails that I don't get locally. I think this is ready for merging after a review by @krischer or @barsch.


This removes internal usage of where low-level queries (that are deprecated in newer Django) with builtin queries on the new Django builtin jsonfield.

All existing tests pass. Only one test was changed, but I think that it should be changed (also returning null elements when a specific string is excluded, see https://github.com/krischer/jane/pull/20/files#diff-90539ca4a8392804c680442de74b8ee3R396). So there's a slight change in results for exclusion type queries but I think this is closer to the expected result.

The only ugly thing about this PR is that we need to patch Django because of a bug in django <1.11, see 894824c. But the patching seems to work alright, so I think it might be the lesser evil.

Also, this PR should enable having nested information in the json field (i.e. dictionaries), which is currently not supported. (CC @bch0w)

@megies megies requested review from krischer and barsch October 27, 2017 11:26
barsch
barsch previously approved these changes Oct 27, 2017
Copy link
Collaborator

@barsch barsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't run the tests - but looks much cleaner now

@megies
Copy link
Collaborator Author

megies commented Oct 27, 2017

Ok, so I finally found out what's going on with the comma-separated-lists station query.. I thought it never was implemented, but turns out it was, but it never was tested, so I accidentally removed it in this PR. So don't merge yet..

Also I think event queries are completely untested??

1 similar comment
@megies
Copy link
Collaborator Author

megies commented Oct 27, 2017

Ok, so I finally found out what's going on with the comma-separated-lists station query.. I thought it never was implemented, but turns out it was, but it never was tested, so I accidentally removed it in this PR. So don't merge yet..

Also I think event queries are completely untested??

@megies megies dismissed barsch’s stale review October 27, 2017 20:25

found out there was untested code that was broken by this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants