Skip to content

Conversation

adzshaf
Copy link
Contributor

@adzshaf adzshaf commented Nov 3, 2024

Trac ticket number

ticket-32519, supersedes #18489

Branch description

This PR introduces the JSONSet and JSONRemove functions to make partial updates to JSONFields directly on the database. Consider following example to update a JSONField using JSONSet.

>>> from django.db.models.functions import JSONSet
>>> user_preferences = UserPreferences.objects.create(
...     settings={"font": {"name": "Arial", "size": 10}, "notifications": True}
... )
>>> UserPreferences.objects.update(settings=JSONSet("settings", font__name="Comic Sans", font__size=20))
>>> user_preferences = UserPreferences.objects.get(pk=user_preferences.pk)
>>> print(user_preferences.settings)
{'font': {'name': 'Comic Sans', 'size': 20}, 'notifications': True}

You can also remove a key by using JSONRemove.

>>> from django.db.models.functions import JSONRemove
>>> user_preferences = UserPreferences.objects.create(
...     settings={"font": {"name": "Arial", "size": 10}, "notifications": True}
... )
>>> UserPreferences.objects.update(settings=JSONRemove("settings", "font__name", "font__size"))
1
>>> user_preferences = UserPreferences.objects.get(pk=user_preferences.pk)
>>> print(user_preferences.settings)
{'font': {}, 'notifications': True}

Checklist

  • This PR targets the main branch.
  • The commit message is written in past tense, mentions the ticket number, and ends with a period.
  • I have checked the "Has patch" ticket flag in the Trac system.
  • I have added or updated relevant tests.
  • I have added or updated relevant docs, including release notes if applicable.
  • I have attached screenshots in both light and dark modes for any UI changes.

Copy link
Member

@charettes charettes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the distinct MR @adzshaf.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should pass this as a parameterized text[] otherwise this is a SQL injection vector

Try

SELECT '{foo",bar}'::text[];
Suggested change
key_paths = key.split(LOOKUP_SEP)
key_paths_join = ",".join(key_paths)
new_source_expressions.append(Value(f"{{{key_paths_join}}}"))
key_paths = key.split(LOOKUP_SEP)
new_source_expressions.append(Value(key_paths))

If it doesn't work by itself you might need to pass an explicit output_field=ArrayField(TextField()).

We also likely want to add tests for using JSONSet with keys containing commas and quotes

JSONSet("data", **{"key',\"}": "foo"})

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. Done in 85d3a44.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to make it a Value here instead of in each implementations?

Suggested change
else value
else Value(value, output_field=self.output_field)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to do this in 75dd7b5.

However, for some reason, Value(None) is interpreted as SQL NULL on PostgreSQL, so I had to wrap it in Value again. Before this change, I am pretty sure it was only Value(None) and not Value(Value(None)). Do you have any idea why? @charettes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strange, Value(Value(...)) should not even work as only non-expression are allowed to be passed to Value.

I think you'll have to special case value is None until there's a way to explicitly differentiate between both.

I guess you could give a shot at defining

class JSONNull(BaseExpression):
   output_field = JSONField()

   def as_sql(self, connection, compiler):
       return "%s", [connection.ops.adapt_json_value(None)]

And then using it like

JSONNull() if value is None else Value(value, output_field=self.output_field)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto this is a SQL injection vector as keys are not sanitized.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure how this can be a SQL injection vector. The compile_json_path() function uses json.dumps() for each key in the path, so it will escape quotes and backslashes properly. Then, we use Value which will separate the expression into the SQL template and parameters in its as_sql().

I added some tests in 08f07dc. Although, I just noticed it seems SQLite cannot handle double quotes in the path. See also https://stackoverflow.com/questions/67993982. I think we can address this with a feature flag.

However, if there is a better way to do it, please let me know. Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right about the key JSON quoting in this case, I thought it was the same thing as in the as_postgres method.

The issue with double quotes on SQLite and likely with single quotes on Oracle in generalized in ticket-32213 and ticket-35842.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we point out why JSONSet is preferable here to avoid similar problems as the one described in why F objects should be used?

The current docs point out that it can be used as an alternative but not why; mainly to avoid race conditions that would overwrite the full settings instead of specific keys.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is a good idea! I'll get to it later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 6bbc16f3c1668fd3d571816b7ee7774960d50be8.

@adzshaf adzshaf force-pushed the ticket_32519_jsonset_jsonremove_funcs branch from 08f07dc to fe9c715 Compare December 7, 2024 15:48
@adzshaf adzshaf force-pushed the ticket_32519_jsonset_jsonremove_funcs branch 2 times, most recently from 36f5018 to 6be9101 Compare December 15, 2024 14:02
@adzshaf adzshaf force-pushed the ticket_32519_jsonset_jsonremove_funcs branch 3 times, most recently from 34f8dba to f6ae03d Compare December 20, 2024 03:35
@adzshaf
Copy link
Contributor Author

adzshaf commented Dec 20, 2024

The PR is ready for another review. Please feel free to share your comments or suggestions 😊 @laymonage @charettes @sarahboyce

Tagging @felixxm in case you are interested and have some time to take a look.

Copy link
Contributor

@sarahboyce sarahboyce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @adzshaf ⭐ added a couple of small docs comments before I go on vacation
Hope to review this more thoroughly in the new year 🎄

@adzshaf adzshaf force-pushed the ticket_32519_jsonset_jsonremove_funcs branch 2 times, most recently from 0854e45 to c6b60da Compare January 4, 2025 02:57
@adzshaf
Copy link
Contributor Author

adzshaf commented Jan 4, 2025

Thank you for the suggestions! I already fixed the comments on docs. Feel free to share more comments! @sarahboyce

Copy link
Member

@pauloxnet pauloxnet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some code suggestions

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
elif output_type == "JSONField":
if output_type == "JSONField":

Superfluous elif

The elif statement is not needed, as the return statement will always break out of the enclosing function. Removing the elif will reduce nesting and make the code more readable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
key, value = list(self.fields.items())[0]
key, value = next(iter(self.fields.items()))

Calling list(...) will create a new list of the entire collection ... If you only need the first element of the collection, you can use next(iter(...) to lazily fetch the first element.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
elif connection.vendor in ["postgresql", "mysql"]:
elif connection.vendor in {"postgresql", "mysql"}:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"theme": {"type": "dark", "opacity": decimal.Decimal(100.0)},
"theme": {"type": "dark", "opacity": decimal.Decimal("100.0")},

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
theme__opacity=decimal.Decimal(50.0),
theme__opacity=decimal.Decimal("50.0"),

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"theme": {"type": "dark", "opacity": decimal.Decimal(100.0)},
"theme": {"type": "dark", "opacity": decimal.Decimal("100.0")},

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
theme__opacity=decimal.Decimal(50.0),
theme__opacity=decimal.Decimal("50.0"),

Copy link
Contributor

@laymonage laymonage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have yet to review the docs and tests, but the implementation seems close! Not sure about some of the feature flags, though. I think having supports_partial_json_update is fair (given that we need it for Oracle). The others may be a bit too specific, but it also helps us in testing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's probably done for consistency with the other DB vendors, but I think in this case we can make a small optimization by skipping copy() for the base case:

Suggested change
copy = self.copy()
all_items = self.paths
path, *rest = all_items
if rest:
copy.paths = (path,)
return JSONRemove(copy, *rest).as_oracle(
compiler, connection, **extra_context
)
return super(JSONRemove, copy).as_sql(
path, *rest = self.paths
if rest:
copy = self.copy()
copy.paths = (path,)
return JSONRemove(copy, *rest).as_oracle(
compiler, connection, **extra_context
)
return super().as_sql(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem necessary

Suggested change
if isinstance(value, Value):
# We do not need Cast() because we use the FORMAT JSON clause instead.
value = Value(value, output_field=self.output_field)
new_source_expressions.append(value)
new_source_expressions.append(value)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
copy = self.copy()
all_items = list(self.fields.items())
key, value = all_items[0]
rest = all_items[1:]
copy = self.copy()
(key, value), *rest = self.fields.items()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
copy = self.copy()
all_items = list(self.fields.items())
key, value = all_items[0]
rest = all_items[1:]
copy = self.copy()
(key, value), *rest = self.fields.items()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably be a bit stricter in the checks to ensure we don't make assumptions for Values that don't have a JSONField output_field.

Suggested change
if isinstance(value, Value):
if isinstance(value, Value) and isinstance(value.output_field, JSONField):

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, so that we don't do Cast() if the user does not want to

Suggested change
# If it's a Value, assume it to be a JSON-formatted string.
# Use Cast to ensure the string is treated as JSON on the database.
if isinstance(value, Value):
# Use Cast to ensure the string is treated as JSON on the database.
if isinstance(value, Value) and isinstance(value.output_field, JSONField):

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return {**super().get_repr_options(), **self.fields}
return {**super()._get_repr_options(), **self.fields}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't work, because we use positional arguments for the paths, not kwargs.

We had to resolve the paths separately instead of passing it as *expressions to Func (because some DBs need to use recursive calls instead of arbitrary number of args). Unfortunately there's no equivalent of _get_repr_options for the positional args, so we have to override __repr__ if we really want the nice representation, e.g.

Suggested change
def _get_repr_options(self):
return {**super().get_repr_options(), **self.fields}
def __repr__(self):
args = self.arg_joiner.join(str(arg) for arg in self.source_expressions)
paths = self.arg_joiner.join(str(path) for path in self.paths)
return f"{self.__class__.__name__}({args}, {paths})"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we also need supports_partial_json_update to depend on supports_json_field on SQLite.

@jacobtylerwalls jacobtylerwalls force-pushed the ticket_32519_jsonset_jsonremove_funcs branch from 179cde5 to 00ea5b9 Compare September 26, 2025 22:26
@jacobtylerwalls
Copy link
Member

HI @adzshaf, thanks for being so diligent responding to feedback. I rebased just to see how CI looked. Do you have time to push this forward? This would now target 6.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants