Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_EncodingMixin.encode types defined too narrow #3552

Closed
dangotbanned opened this issue Aug 21, 2024 · 10 comments · Fixed by #3567
Closed

_EncodingMixin.encode types defined too narrow #3552

dangotbanned opened this issue Aug 21, 2024 · 10 comments · Fixed by #3567
Labels

Comments

@dangotbanned
Copy link
Member

What happened?

Noticed this regression since #3515 while writing the test for #3394 (comment)

image

Minimal repro

import altair as alt

alt.Chart().encode(color=alt.when(x=1).then(alt.value("grey")))

What would you like to happen instead?

The types @binste and I defined for EncodeKwds were too strict and didn't account for alt.(condition|when).

The simplest fix would be to add SchemaBase to each annotation - as that is what Then falls into

Which version of Altair are you using?

c984002

@binste
Copy link
Contributor

binste commented Aug 22, 2024

Nice catch!

Looking a bit into it, I think it's not a regression as color (and other channels) were not typed with SchemaBase before as well. It worked/still works for alt.condition if it returns a dict, which it does in some cases, and not SchemaBase.

Adding SchemaBase as a hint sounds pragmatic to me. Feel free to go for it! I'm mostly without reception the next 2 days but can check in again on Sunday. If you want to merge this and release before that, that's ok with me. We can narrow the type hint again in the future if we have a better idea.

@dangotbanned
Copy link
Member Author

dangotbanned commented Aug 23, 2024

Nice catch!

Looking a bit into it, I think it's not a regression as color (and other channels) were not typed with SchemaBase before as well. It worked/still works for alt.condition if it returns a dict, which it does in some cases, and not SchemaBase.

Huh you're right @binste.
I spent so much time with the @overload(s) on alt.(condition|when) that I was sure SchemaBase was here before.

Adding SchemaBase as a hint sounds pragmatic to me. Feel free to go for it! I'm mostly without reception the next 2 days but can check in again on Sunday. If you want to merge this and release before that, that's ok with me. We can narrow the type hint again in the future if we have a better idea.

There are a few things we can try (will update tomorrow with some alternatives)
but I think we risk making .encode() too noisy in terms of annotations again

Update

Below represent color, renamed just to keep the same width for name here

  • col_0 the current annotation for color.
  • col_1 my original suggestion
  • col_2 narrower, but I think this would promote using alt.Then(...) directly
    • I'd want to avoid that ideally
  • col_3 Schema (Schema_3)
  • col_4 Schema (Schema_4)
  • col_5 Schema (Schema_5)
# New alias def
Schema_3: TypeAlias = Map | SchemaBase

# Alternative idea I had for `Then`
# Instead of inheriting from `SchemaBase`
@runtime_checkable
class SchemaLike(Protocol):
    _schema: ClassVar[dict[Literal["type"], Literal["object"]]] = {"type": "object"}

    def to_dict(self, *args, **kwds) -> Any: ...

# How we'd use `SchemaLike` in annotations
Schema_4: TypeAlias = SchemaBase | SchemaLike

# Combining 4 & 5 essentially
Schema_5: TypeAlias = Map | SchemaBase | SchemaLike

class _EncodingMixin
	def encode(
		self, 
		col_0: Optional[str | Color | Map | ColorDatum | ColorValue] = Undefined,
		col_1: Optional[str | Color | Map | SchemaBase | ColorDatum | ColorValue] = Undefined,
		col_2: Optional[str | Color | Map | Then[Any] | ColorDatum | ColorValue] = Undefined,
		col_3: Optional[str | Color | Schema_3 | ColorDatum | ColorValue] = Undefined,
		col_4: Optional[str | Color | Map | Schema_4 | ColorDatum | ColorValue] = Undefined,
		col_5: Optional[str | Color | Schema_5 | ColorDatum | ColorValue] = Undefined,
	): ...

Specifically on the SchemaLike idea, I'm referring to a stash I have left over.
Would be quite easy to integrate, and would also be useful in simplifying this check in _todict() 3275c5e (#3501):

Screenshot

image

@binste
Copy link
Contributor

binste commented Sep 4, 2024

  • col_2: Agree with you that we don't want to promote alt.Then directly
  • col_3: No preference if we have Map | SchemaBase directly in the type hints or use an alias.
  • col_4 and col_5: If I understand it correctly, all SchemaBase would pass as SchemaLike. Hence, if we use SchemaLike, we are as permissive as if we would just use SchemaBase directly, even a bit more. As SchemaBase is already too broad of a type hint, I'm not sure if we get any mileage out of the protocol.

For the above reasons, I currently prefer col_1. It solves the type errors, although the hints will be too broad. But I don't see a simple way of fixing that without changing the return types of alt.condition which sounds like a larger issue. Not sure if that's worth tackling right now. What do you think?

dangotbanned added a commit to dangotbanned/altair that referenced this issue Sep 4, 2024
Purely demonstrating type checking behaviour for vega#3552 (comment)
@dangotbanned
Copy link
Member Author

dangotbanned commented Sep 4, 2024

Really appreciate you taking the time to think this over @binste

Static/type checking

col_4 and col_5: If I understand it correctly, all SchemaBase would pass as SchemaLike.
Hence, if we use SchemaLike, we are as permissive as if we would just use SchemaBase directly, even a bit more.
As SchemaBase is already too broad of a type hint, I'm not sure if we get any mileage out of the protocol.

I've revised the original idea a little and added some examples over on #3567 to demo usage.

So far, that covers what mypy|pyright think - but the other component is the runtime checking.


Runtime

With @runtime_checkable, you are right that any SchemaBase will still pass.
However, we do already work around this for other protocols e.g. for SupportsGeoInterface

alt.utils.data.to_values

altair/altair/utils/data.py

Lines 311 to 332 in 5207768

def to_values(data: DataType) -> ToValuesReturnType:
"""Replace a DataFrame by a data model with values."""
check_data_type(data)
# `strict=False` passes `data` through as-is if it is not a Narwhals object.
data_native = nw.to_native(data, strict=False)
if isinstance(data_native, SupportsGeoInterface):
return {"values": _from_geo_interface(data_native)}
elif _is_pandas_dataframe(data_native):
data_native = sanitize_pandas_dataframe(data_native)
return {"values": data_native.to_dict(orient="records")}
elif isinstance(data_native, dict):
if "values" not in data_native:
msg = "values expected in data dict, but not present."
raise KeyError(msg)
return data_native
elif isinstance(data, nw.DataFrame):
data = sanitize_narwhals_dataframe(data)
return {"values": data.rows(named=True)}
else:
# Should never reach this state as tested by check_data_type
msg = f"Unrecognized data type: {type(data)}"
raise ValueError(msg)

It is mostly just thinking about which might need to take priority for a particular use.

Reusing the _to_dict example from before, SchemaBase already takes priority.
SchemaLike would be replacing the hasattr(obj, "to_dict") branch - or both could be merged - since all we need is the method found in the protocol.

tools.schemapi.schemapi._todict

def _todict(obj: Any, context: dict[str, Any] | None, np_opt: Any, pd_opt: Any) -> Any: # noqa: C901
"""Convert an object to a dict representation."""
if np_opt is not None:
np = np_opt
if isinstance(obj, np.ndarray):
return [_todict(v, context, np_opt, pd_opt) for v in obj]
elif isinstance(obj, np.number):
return float(obj)
elif isinstance(obj, np.datetime64):
result = str(obj)
if "T" not in result:
# See https://github.com/vega/altair/issues/1027 for why this is necessary.
result += "T00:00:00"
return result
if isinstance(obj, SchemaBase):
return obj.to_dict(validate=False, context=context)
elif isinstance(obj, (list, tuple)):
return [_todict(v, context, np_opt, pd_opt) for v in obj]
elif isinstance(obj, dict):
return {
k: _todict(v, context, np_opt, pd_opt)
for k, v in obj.items()
if v is not Undefined
}
elif (
hasattr(obj, "to_dict")
and (module_name := obj.__module__)
and module_name.startswith("altair")
):
return obj.to_dict()
elif pd_opt is not None and isinstance(obj, pd_opt.Timestamp):
return pd_opt.Timestamp(obj).isoformat()
elif _is_iterable(obj, exclude=(str, bytes)):
return _todict(_from_array_like(obj), context, np_opt, pd_opt)
else:
return obj

Additional POC

An earlier version of Then worked like this using a _ThenLike protocol

Edit

I'm realising now that I forgot to include another motivating factor for this change.

In this comment I gave some examples of how inheriting from SchemaBase made the autocompletion for methods less helpful.

If we instead inherited from SchemaLike, most of the additional methods would not appear.
I think they aren't particularly helpful to have - when we only need to_dict().

As a user, the only ones that are important are Then.when and Then.otherwise.

@binste
Copy link
Contributor

binste commented Sep 6, 2024

The draft PR is very helpful! Trying to summarise/reword it for myself:

  • Then should not inherit from SchemaBase as it muddies autocompletion
  • We need a type hint which lets a SchemaBase pass due to alt.condition and alt.when().then().otherwise(). Can be SchemaBase` or a protocol
  • It's unclear to me when, as a developer, I'd use SchemaLike. If I understand it correctly, we mainly introduce it now to make these types work.
  • Purpose:
    • Fix type errors users might get right now, keeping types as narrow as possible so they are more useful
    • Improving UX: keep/improve autocompletion + provide a good visual indication that we mean conditions can be passed

How about

@runtime_checkable
class Condition(Protocol):
   """We use this protocol to signal to a user that they can pass any condition created by either `alt.condition` or `alt.when`.

   Other SchemaBase instances will pass as `Condition` as well and so for type checkers it won't matter but using this protocol when only conditions are expected at least gives a better visual indication to users.
   """
    _schema: ClassVar[_SchemaLikeDict] = {"type": "object"}

    def to_dict(self, *args, **kwds) -> Any: ...


def chart_encode(
   col_7: Optional[str | Color | Condition | Map | ColorDatum | ColorValue] = Undefined,
)

Condition is not yet used anywhere in the codebase and it makes it clearer what we intend to do with the protocol. What do you think?

@dangotbanned
Copy link
Member Author

dangotbanned commented Sep 6, 2024

Feedback

The draft PR is very helpful! Trying to summarise/reword it for myself:
...

No problem @binste, yeah you've got the assignment summarized well 😉

How about

Code block
@runtime_checkable
class Condition(Protocol):
   """We use this protocol to signal to a user that they can pass any condition created by either `alt.condition` or `alt.when`.

   Other SchemaBase instances will pass as `Condition` as well and so for type checkers it won't matter but using this protocol when only conditions are expected at least gives a better visual indication to users.
   """
    _schema: ClassVar[_SchemaLikeDict] = {"type": "object"}

    def to_dict(self, *args, **kwds) -> Any: ...


def chart_encode(
   col_7: Optional[str | Color | Condition | Map | ColorDatum | ColorValue] = Undefined,
)

Now this is interesting ...
Big +1 for the visual clarity - I think that should be a high priority.

Condition is not yet used anywhere in the codebase and it makes it clearer what we intend to do with the protocol. What do you think?

So it looks like we approached this from different angles.
Where I went for SchemaLike to represent SchemaBase, your proposal is more narrowly focused on conditions themselves.

My immediate thought was combining the two, where internally all we care about is .to_dict() but externally we are more specific:

Condition(SchemaLike)
from typing import Any, ClassVar, Dict, Literal
from typing_extensions import Protocol, TypeAlias, runtime_checkable, LiteralString

_SchemaLikeDict: TypeAlias = Dict[Literal["type"], LiteralString]

@runtime_checkable
class SchemaLike(Protocol):
    _schema: ClassVar[_SchemaLikeDict]

    def to_dict(self, *args, **kwds) -> Any: ...

@runtime_checkable
class Condition(SchemaLike):
    _schema: ClassVar[dict[Literal["type"], Literal["object"]]] = {"type": "object"}

We could also reuse SchemaLike elsewhere in OperatorMixin|Expression|DatumType|Parameter

Technical

Have you had a chance to look at the implementation of when-then-otherwise yet?

api.py

class _ConditionClosed(TypedDict, closed=True, total=False): # type: ignore[call-arg]
# https://peps.python.org/pep-0728/
# Parameter {"param", "value", "empty"}
# Predicate {"test", "value"}
empty: Optional[bool]
param: Parameter | str
test: _TestPredicateType
value: Any
class _ConditionExtra(TypedDict, closed=True, total=False): # type: ignore[call-arg]
# https://peps.python.org/pep-0728/
# Likely a Field predicate
empty: Optional[bool]
param: Parameter | str
test: _TestPredicateType
value: Any
__extra_items__: _StatementType | OneOrSeq[_LiteralValue]
_Condition: TypeAlias = _ConditionExtra
"""A singular, non-chainable condition produced by ``.when()``."""
_Conditions: TypeAlias = t.List[_ConditionClosed]
"""Chainable conditions produced by ``.when()`` and ``Then.when()``."""
_C = TypeVar("_C", _Conditions, _Condition)
class _Conditional(TypedDict, t.Generic[_C], total=False):
condition: Required[_C]
value: Any
class _Value(TypedDict, closed=True, total=False): # type: ignore[call-arg]
# https://peps.python.org/pep-0728/
value: Required[Any]
__extra_items__: Any
def _reveal_parsed_shorthand(obj: Map, /) -> dict[str, Any]:
# Helper for producing error message on multiple field collision.
return {k: v for k, v in obj.items() if k in utils.SHORTHAND_KEYS}
def _is_extra(*objs: Any, kwds: Map) -> Iterator[bool]:
for el in objs:
if isinstance(el, (SchemaBase, t.Mapping)):
item = el.to_dict(validate=False) if isinstance(el, SchemaBase) else el
yield not (item.keys() - kwds.keys()).isdisjoint(utils.SHORTHAND_KEYS)
else:
continue
def _is_condition_extra(obj: Any, *objs: Any, kwds: Map) -> TypeIs[_Condition]:
# NOTE: Short circuits on the first conflict.
# 1 - Originated from parse_shorthand
# 2 - Used a wrapper or `dict` directly, including `extra_keys`
return isinstance(obj, str) or any(_is_extra(obj, *objs, kwds=kwds))
def _parse_when_constraints(
constraints: dict[str, _FieldEqualType], /
) -> Iterator[BinaryExpression]:
"""
Wrap kwargs with `alt.datum`.
```py
# before
alt.when(alt.datum.Origin == "Europe")
# after
alt.when(Origin="Europe")
```
"""
for name, value in constraints.items():
yield _expr_core.GetAttrExpression("datum", name) == value
def _validate_composables(
predicates: Iterable[Any], /
) -> Iterator[_ComposablePredicateType]:
for p in predicates:
if isinstance(p, (_expr_core.OperatorMixin, SelectionPredicateComposition)):
yield p
else:
msg = (
f"Predicate composition is not permitted for "
f"{type(p).__name__!r}.\n"
f"Try wrapping {p!r} in a `Parameter` first."
)
raise TypeError(msg)
def _parse_when_compose(
predicates: tuple[Any, ...],
constraints: dict[str, _FieldEqualType],
/,
) -> BinaryExpression:
"""
Compose an `&` reduction predicate.
Parameters
----------
predicates
Collected positional arguments.
constraints
Collected keyword arguments.
Raises
------
TypeError
On the first non ``_ComposablePredicateType`` of `predicates`
"""
iters = []
if predicates:
iters.append(_validate_composables(predicates))
if constraints:
iters.append(_parse_when_constraints(constraints))
r = functools.reduce(operator.and_, itertools.chain.from_iterable(iters))
return t.cast(_expr_core.BinaryExpression, r)
def _parse_when(
predicate: Optional[_PredicateType],
*more_predicates: _ComposablePredicateType,
empty: Optional[bool],
**constraints: _FieldEqualType,
) -> _ConditionType:
composed: _PredicateType
if utils.is_undefined(predicate):
if more_predicates or constraints:
composed = _parse_when_compose(more_predicates, constraints)
else:
msg = (
f"At least one predicate or constraint must be provided, "
f"but got: {predicate=}"
)
raise TypeError(msg)
elif more_predicates or constraints:
predicates = predicate, *more_predicates
composed = _parse_when_compose(predicates, constraints)
else:
composed = predicate
return _predicate_to_condition(composed, empty=empty)
def _parse_literal(val: Any, /) -> dict[str, Any]:
if isinstance(val, str):
return utils.parse_shorthand(val)
else:
msg = (
f"Expected a shorthand `str`, but got: {type(val).__name__!r}\n\n"
f"From `statement={val!r}`."
)
raise TypeError(msg)
def _parse_then(statement: _StatementType, kwds: dict[str, Any], /) -> dict[str, Any]:
if isinstance(statement, SchemaBase):
statement = statement.to_dict()
elif not isinstance(statement, dict):
statement = _parse_literal(statement)
statement.update(kwds)
return statement
def _parse_otherwise(
statement: _StatementType, conditions: _Conditional[Any], kwds: dict[str, Any], /
) -> SchemaBase | _Conditional[Any]:
selection: SchemaBase | _Conditional[Any]
if isinstance(statement, SchemaBase):
selection = statement.copy()
conditions.update(**kwds) # type: ignore[call-arg]
selection.condition = conditions["condition"]
else:
if not isinstance(statement, t.Mapping):
statement = _parse_literal(statement)
selection = conditions
selection.update(**statement, **kwds) # type: ignore[call-arg]
return selection
class _BaseWhen(Protocol):
# NOTE: Temporary solution to non-SchemaBase copy
_condition: _ConditionType
def _when_then(
self, statement: _StatementType, kwds: dict[str, Any], /
) -> _ConditionClosed | _Condition:
condition: Any = _deepcopy(self._condition)
then = _parse_then(statement, kwds)
condition.update(then)
return condition
class When(_BaseWhen):
"""
Utility class for ``when-then-otherwise`` conditions.
Represents the state after calling :func:`.when()`.
This partial state requires calling :meth:`When.then()` to finish the condition.
References
----------
`polars.when <https://docs.pola.rs/py-polars/html/reference/expressions/api/polars.when.html>`__
"""
def __init__(self, condition: _ConditionType, /) -> None:
self._condition = condition
def __repr__(self) -> str:
return f"{type(self).__name__}({self._condition!r})"
@overload
def then(self, statement: str, /, **kwds: Any) -> Then[_Condition]: ...
@overload
def then(self, statement: _Value, /, **kwds: Any) -> Then[_Conditions]: ...
@overload
def then(
self, statement: dict[str, Any] | SchemaBase, /, **kwds: Any
) -> Then[Any]: ...
def then(self, statement: _StatementType, /, **kwds: Any) -> Then[Any]:
"""
Attach a statement to this predicate.
Parameters
----------
statement
A spec or value to use when the preceding :func:`.when()` clause is true.
.. note::
``str`` will be encoded as `shorthand<https://altair-viz.github.io/user_guide/encodings/index.html#encoding-shorthands>`__.
**kwds
Additional keyword args are added to the resulting ``dict``.
Returns
-------
:class:`Then`
Examples
--------
Simple conditions may be expressed without defining a default::
import altair as alt
from vega_datasets import data
source = data.movies()
predicate = (alt.datum.IMDB_Rating == None) | (alt.datum.Rotten_Tomatoes_Rating == None)
alt.Chart(source).mark_point(invalid=None).encode(
x="IMDB_Rating:Q",
y="Rotten_Tomatoes_Rating:Q",
color=alt.when(predicate).then(alt.value("grey")),
)
"""
condition = self._when_then(statement, kwds)
if _is_condition_extra(condition, statement, kwds=kwds):
return Then(_Conditional(condition=condition))
else:
return Then(_Conditional(condition=[condition]))
class Then(SchemaBase, t.Generic[_C]):
"""
Utility class for ``when-then-otherwise`` conditions.
Represents the state after calling :func:`.when().then()`.
This state is a valid condition on its own.
It can be further specified, via multiple chained `when-then` calls,
or finalized with :meth:`Then.otherwise()`.
References
----------
`polars.when <https://docs.pola.rs/py-polars/html/reference/expressions/api/polars.when.html>`__
"""
_schema = {"type": "object"}
def __init__(self, conditions: _Conditional[_C], /) -> None:
super().__init__(**conditions)
self.condition: _C
@overload
def otherwise(self, statement: _TSchemaBase, /, **kwds: Any) -> _TSchemaBase: ...
@overload
def otherwise(self, statement: str, /, **kwds: Any) -> _Conditional[_Condition]: ...
@overload
def otherwise(
self, statement: _Value, /, **kwds: Any
) -> _Conditional[_Conditions]: ...
@overload
def otherwise(
self, statement: dict[str, Any], /, **kwds: Any
) -> _Conditional[Any]: ...
def otherwise(
self, statement: _StatementType, /, **kwds: Any
) -> SchemaBase | _Conditional[Any]:
"""
Finalize the condition with a default value.
Parameters
----------
statement
A spec or value to use when no predicates were met.
.. note::
Roughly equivalent to an ``else`` clause.
.. note::
``str`` will be encoded as `shorthand<https://altair-viz.github.io/user_guide/encodings/index.html#encoding-shorthands>`__.
**kwds
Additional keyword args are added to the resulting ``dict``.
Examples
--------
Points outside of ``brush`` will not appear highlighted::
import altair as alt
from vega_datasets import data
source = data.cars()
brush = alt.selection_interval()
color = alt.when(brush).then("Origin:N").otherwise(alt.value("grey"))
alt.Chart(source).mark_point().encode(
x="Horsepower:Q",
y="Miles_per_Gallon:Q",
color=color,
).add_params(brush)
"""
conditions: _Conditional[Any]
is_extra = functools.partial(_is_condition_extra, kwds=kwds)
if is_extra(self.condition, statement):
current = self.condition
if isinstance(current, list) and len(current) == 1:
# This case is guaranteed to have come from `When` and not `ChainedWhen`
# The `list` isn't needed if we complete the condition here
conditions = _Conditional(condition=current[0]) # pyright: ignore[reportArgumentType]
elif isinstance(current, dict):
if not is_extra(statement):
conditions = self.to_dict()
else:
cond = _reveal_parsed_shorthand(current)
msg = (
f"Only one field may be used within a condition.\n"
f"Shorthand {statement!r} would conflict with {cond!r}\n\n"
f"Use `alt.value({statement!r})` if this is not a shorthand string."
)
raise TypeError(msg)
else:
# Generic message to cover less trivial cases
msg = (
f"Chained conditions cannot be mixed with field conditions.\n"
f"{self!r}\n\n{statement!r}"
)
raise TypeError(msg)
else:
conditions = self.to_dict()
return _parse_otherwise(statement, conditions, kwds)
def when(
self,
predicate: Optional[_PredicateType] = Undefined,
*more_predicates: _ComposablePredicateType,
empty: Optional[bool] = Undefined,
**constraints: _FieldEqualType,
) -> ChainedWhen:
"""
Attach another predicate to the condition.
The resulting predicate is an ``&`` reduction over ``predicate`` and optional ``*``, ``**``, arguments.
Parameters
----------
predicate
A selection or test predicate. ``str`` input will be treated as a test operand.
.. note::
Accepts the same range of inputs as in :func:`.condition()`.
*more_predicates
Additional predicates, restricted to types supporting ``&``.
empty
For selection parameters, the predicate of empty selections returns ``True`` by default.
Override this behavior, with ``empty=False``.
.. note::
When ``predicate`` is a ``Parameter`` that is used more than once,
``alt.when().then().when(..., empty=...)`` provides granular control for each occurrence.
**constraints
Specify `Field Equal Predicate <https://vega.github.io/vega-lite/docs/predicate.html#equal-predicate>`__'s.
Shortcut for ``alt.datum.field_name == value``, see examples for usage.
Returns
-------
:class:`ChainedWhen`
A partial state which requires calling :meth:`ChainedWhen.then()` to finish the condition.
Examples
--------
Chain calls to express precise queries::
import altair as alt
from vega_datasets import data
source = data.cars()
color = (
alt.when(alt.datum.Miles_per_Gallon >= 30, Origin="Europe")
.then(alt.value("crimson"))
.when(alt.datum.Horsepower > 150)
.then(alt.value("goldenrod"))
.otherwise(alt.value("grey"))
)
alt.Chart(source).mark_point().encode(x="Horsepower", y="Miles_per_Gallon", color=color)
"""
condition = _parse_when(predicate, *more_predicates, empty=empty, **constraints)
conditions = self.to_dict()
current = conditions["condition"]
if isinstance(current, list):
conditions = t.cast(_Conditional[_Conditions], conditions)
return ChainedWhen(condition, conditions)
elif isinstance(current, dict):
cond = _reveal_parsed_shorthand(current)
msg = (
f"Chained conditions cannot be mixed with field conditions.\n"
f"Additional conditions would conflict with {cond!r}\n\n"
f"Must finalize by calling `.otherwise()`."
)
raise TypeError(msg)
else:
msg = (
f"The internal structure has been modified.\n"
f"{type(current).__name__!r} found, but only `dict | list` are valid."
)
raise NotImplementedError(msg)
def to_dict(self, *args: Any, **kwds: Any) -> _Conditional[_C]: # type: ignore[override]
m = super().to_dict(*args, **kwds)
return _Conditional(condition=m["condition"])
def __deepcopy__(self, memo: Any) -> Self:
return type(self)(_Conditional(condition=_deepcopy(self.condition)))
class ChainedWhen(_BaseWhen):
"""
Utility class for ``when-then-otherwise`` conditions.
Represents the state after calling :func:`.when().then().when()`.
This partial state requires calling :meth:`ChainedWhen.then()` to finish the condition.
References
----------
`polars.when <https://docs.pola.rs/py-polars/html/reference/expressions/api/polars.when.html>`__
"""
def __init__(
self,
condition: _ConditionType,
conditions: _Conditional[_Conditions],
/,
) -> None:
self._condition = condition
self._conditions = conditions
def __repr__(self) -> str:
return (
f"{type(self).__name__}(\n"
f" {self._conditions!r},\n {self._condition!r}\n"
")"
)
def then(self, statement: _StatementType, /, **kwds: Any) -> Then[_Conditions]:
"""
Attach a statement to this predicate.
Parameters
----------
statement
A spec or value to use when the preceding :meth:`Then.when()` clause is true.
.. note::
``str`` will be encoded as `shorthand<https://altair-viz.github.io/user_guide/encodings/index.html#encoding-shorthands>`__.
**kwds
Additional keyword args are added to the resulting ``dict``.
Returns
-------
:class:`Then`
Examples
--------
Multiple conditions with an implicit default::
import altair as alt
from vega_datasets import data
source = data.movies()
predicate = (alt.datum.IMDB_Rating == None) | (alt.datum.Rotten_Tomatoes_Rating == None)
color = (
alt.when(predicate)
.then(alt.value("grey"))
.when(alt.datum.IMDB_Votes < 5000)
.then(alt.value("lightblue"))
)
alt.Chart(source).mark_point(invalid=None).encode(
x="IMDB_Rating:Q", y="Rotten_Tomatoes_Rating:Q", color=color
)
"""
condition = self._when_then(statement, kwds)
conditions = self._conditions.copy()
conditions["condition"].append(condition)
return Then(conditions)
def when(
predicate: Optional[_PredicateType] = Undefined,
*more_predicates: _ComposablePredicateType,
empty: Optional[bool] = Undefined,
**constraints: _FieldEqualType,
) -> When:
"""
Start a ``when-then-otherwise`` condition.
The resulting predicate is an ``&`` reduction over ``predicate`` and optional ``*``, ``**``, arguments.
Parameters
----------
predicate
A selection or test predicate. ``str`` input will be treated as a test operand.
.. note::
Accepts the same range of inputs as in :func:`.condition()`.
*more_predicates
Additional predicates, restricted to types supporting ``&``.
empty
For selection parameters, the predicate of empty selections returns ``True`` by default.
Override this behavior, with ``empty=False``.
.. note::
When ``predicate`` is a ``Parameter`` that is used more than once,
``alt.when(..., empty=...)`` provides granular control for each occurrence.
**constraints
Specify `Field Equal Predicate <https://vega.github.io/vega-lite/docs/predicate.html#equal-predicate>`__'s.
Shortcut for ``alt.datum.field_name == value``, see examples for usage.
Returns
-------
:class:`When`
A partial state which requires calling :meth:`When.then()` to finish the condition.
Notes
-----
- Directly inspired by the ``when-then-otherwise`` syntax used in ``polars.when``.
References
----------
`polars.when <https://docs.pola.rs/py-polars/html/reference/expressions/api/polars.when.html>`__
Examples
--------
Setting up a common chart::
import altair as alt
from vega_datasets import data
source = data.cars()
brush = alt.selection_interval()
points = (
alt.Chart(source)
.mark_point()
.encode(x="Horsepower", y="Miles_per_Gallon")
.add_params(brush)
)
points
Basic ``if-then-else`` conditions translate directly to ``when-then-otherwise``::
points.encode(color=alt.when(brush).then("Origin").otherwise(alt.value("lightgray")))
Omitting the ``.otherwise()`` clause will use the channel default instead::
points.encode(color=alt.when(brush).then("Origin"))
Predicates passed as positional arguments will be reduced with ``&``::
points.encode(
color=alt.when(
brush, (alt.datum.Miles_per_Gallon >= 30) | (alt.datum.Horsepower >= 130)
)
.then("Origin")
.otherwise(alt.value("lightgray"))
)
Using keyword-argument ``constraints`` can simplify compositions like::
verbose_composition = (
(alt.datum.Name == "Name_1")
& (alt.datum.Color == "Green")
& (alt.datum.Age == 25)
& (alt.datum.StartDate == "2000-10-01")
)
when_verbose = alt.when(verbose_composition)
when_concise = alt.when(Name="Name_1", Color="Green", Age=25, StartDate="2000-10-01")
"""
condition = _parse_when(predicate, *more_predicates, empty=empty, **constraints)
return When(condition)
# ------------------------------------------------------------------------
# Top-Level Functions
def value(value: Any, **kwargs: Any) -> _Value:
"""Specify a value for use in an encoding."""
return _Value(value=value, **kwargs) # type: ignore[typeddict-item]

Implementation notes

The particular issue is that

  • .when() is always an intermediate step
  • .then() returns an object that can represent a condition or an intermediate step
  • .otherwise() is always a final step

To pull this off (currently)

  • When|ChainedWhen are not recognised by any of the API
  • Then is converted to the wrapped dict via .to_dict() if used in an encoding
  • .otherwise() returns the final wrapped dict

Apologies if you already understood the above, but if not, does it cause you to reconsider?

I really like the idea of annotating as Condition - but I think that would need to be the name of a TypeAlias - to reflect the result of .otherwise():

@runtime_checkable
class _ConditionLike(SchemaLike):
    _schema: ClassVar[dict[Literal["type"], Literal["object"]]] = {"type": "object"}

Condition: TypeAlias = _ConditionLike | api._Conditional[Any]

This definition could then also reflect the return of alt.condition() - since that can return a dict.
I know that they would already be covered by Map - but I think a name as broad as Condition should capture all the acceptable states.

Edit

Alternative names: Conditional, IntoCondition


Apologies for the essay @binste! 😅
But your idea gave me a lot to think about and I'm glad we didn't rush this into v5.4.1

@binste
Copy link
Contributor

binste commented Sep 6, 2024

I thought it was ok to just use Map to cover otherwise and alt.condition (in the usual cases) but agree that it's even better to also have it in Condition -> You're TypeAlias idea sounds great!

@dangotbanned
Copy link
Member Author

I thought it was ok to just use Map to cover otherwise and alt.condition (in the usual cases) but agree that it's even better to also have it in Condition -> You're TypeAlias idea sounds great!

Thanks happy to hear it @binste

I'll loop back to this but definitely want to have this fixed for v5.5.0.
For reference on the name of this alias, these were why I'd be leaning away from Condition:

Using Into... would align with IntoExpression:

IntoExpression: TypeAlias = Union[bool, None, str, OperatorMixin, Dict[str, Any]]

Maybe this annotation represents:

"anything that can convert into a conditional encoding/property"


None of these directly collide with Condition, but there is somewhat of a hierarchy to components of conditions defined:

class _ConditionClosed(TypedDict, closed=True, total=False): # type: ignore[call-arg]
# https://peps.python.org/pep-0728/
# Parameter {"param", "value", "empty"}
# Predicate {"test", "value"}
empty: Optional[bool]
param: Parameter | str
test: _TestPredicateType
value: Any
class _ConditionExtra(TypedDict, closed=True, total=False): # type: ignore[call-arg]
# https://peps.python.org/pep-0728/
# Likely a Field predicate
empty: Optional[bool]
param: Parameter | str
test: _TestPredicateType
value: Any
__extra_items__: _StatementType | OneOrSeq[_LiteralValue]
_Condition: TypeAlias = _ConditionExtra
"""A singular, non-chainable condition produced by ``.when()``."""
_Conditions: TypeAlias = t.List[_ConditionClosed]
"""Chainable conditions produced by ``.when()`` and ``Then.when()``."""
_C = TypeVar("_C", _Conditions, _Condition)
class _Conditional(TypedDict, t.Generic[_C], total=False):
condition: Required[_C]
value: Any

_StatementType: TypeAlias = Union[SchemaBase, Map, str]
"""Permitted types for `if_true`/`if_false`.
In python terms:
```py
if _PredicateType:
return _StatementType
elif _PredicateType:
return _StatementType
else:
return _StatementType
```
"""
_ConditionType: TypeAlias = t.Dict[str, Union[_TestPredicateType, Any]]

@binste
Copy link
Contributor

binste commented Sep 7, 2024

I don't have a strong preference here. IntoCondition is also fine. It could also be a type alias which we do not expose externally and then we should just aim for what provides the best UX and can always change it later if we want to broaden it's scope.

@dangotbanned
Copy link
Member Author

Will be moving onto this issue next, just noting here another internal typing issue for when-then-otherwise that I'll attempt to fix in that PR:

Screenshot

image

I'm only getting this warning after correctly enabling support for __extra_items__ in #3585.

I used a different benefit of the feature in fix: Support hyphenated keys in TypedDict(s) (0bdfa72 (#3536))

@dangotbanned dangotbanned linked a pull request Sep 14, 2024 that will close this issue
5 tasks
dangotbanned added a commit to dangotbanned/altair that referenced this issue Sep 15, 2024
The warning raised by `pyright` here was actually quite helpful.
Essentially, a shorthand string is never allowed here

vega#3552 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants