More convenient syntax instead of FieldOneOfPredicate #3239

joelostblom · 2023-10-26T01:41:30Z

Writing a condition that matches a single value in relatively straightforward:

import altair as alt
from vega_datasets import data


source = data.cars()
alt.Chart(source).mark_point().encode(
    x='Horsepower',
    y='Miles_per_Gallon',
    color=alt.condition(
        alt.datum.Origin == 'Japan',
        alt.value('red'),
        alt.value('grey')
    )
)

But when we want to match multiple values, the alt.datum.Origin == 'Japan', needs to be replaced with the notably more compllicated:

        alt.FieldOneOfPredicate(
            'Origin',
            ['Japan', 'Europe']
        ),

It would be nice to support the in operator here and allow the syntax to just be alt.datum.Origin in ['Japan', 'Europe']. I haven't looked into the change necessary to make this happen but noting it down as something to revisit in the future.

The text was updated successfully, but these errors were encountered:

mattijn · 2023-10-28T22:16:48Z

I think that you would need a custom Altair list instead of using a python list to control this behaviour in the predicate.
You could add a custom magic method for the membership operator (eg. __contains__) to the OperatorMixin, but the equivalent in and not in membership operators will only be applied to the right-hand-side operand and not on the left-hand-side operand. Basically the list decide what it contains and since this is a python list it will return a boolean.

joelostblom · 2023-10-28T22:27:11Z

...the equivalent in and not in membership operators will only be applied to the right-hand-side operand and not on the left-hand-side operand. Basically the list decide what it contains and since this is a python list it will return a boolean.

I was wondering about this. What made me think that it might be possible was that we do use the standard Python operators for the other comparisons, but I hear that this might not work the same for the in operator. I don't fully understand why it would only be applied to the right-hand side from just skimming the code, but I will try to have a closer look at that section of the code base in the future. Thanks for pointing this out and linking to where the logic is!

dangotbanned · 2024-07-11T15:27:33Z

I have a couple of alternative ideas for this one.

1 - A wrapper function (`one_of`)

Note: the ...TimeUnit_T aliases are from https://github.com/dangotbanned/altair/blob/ac23fd51460e69b1a7189da28b9a10e28eda2ac7/altair/vegalite/v5/schema/core.py#L16331-L16333

Code block

from __future__ import annotations

from typing import TYPE_CHECKING

from altair import FieldOneOfPredicate, Undefined

if TYPE_CHECKING:
    from altair import Optional, SchemaBase
    from altair.vegalite.v5.schema._typing import (
        BinnedTimeUnit_T,
        Map,
        MultiTimeUnit_T,
        SingleTimeUnit_T,
    )


def one_of(
    field: str,
    /,
    *oneOf: bool | float | Map | SchemaBase,
    timeUnit: Optional[
        dict | SchemaBase | MultiTimeUnit_T | BinnedTimeUnit_T | SingleTimeUnit_T
    ] = Undefined,
) -> FieldOneOfPredicate:
    return FieldOneOfPredicate(field=field, oneOf=oneOf, timeUnit=timeUnit)

Compared

This has a very similar appearance to @joelostblom's original:

alt.datum.Origin in ['Japan', 'Europe'] # original

alt.one_of("Origin", "Japan", "Europe") # proposed

Pros

field is positional-only
- so alt.one_of(field=origin,...) would raise a TypeError
- This would be implicit without the /, I'm just using that to highlight
oneOf is variadic
- No need to spell out a list
timeUnit
- Specifying this wouldn't have been possible in a hypothetical operator-based solution
Doesn't require modifying any existing code

Cons

Without reading the parameter names, it might be unintuitive to figure out what this function is one_of-of
Given the current number of symbols in the top level namespace, alt.one_of could be difficult to find among hundreds
This only addresses 1/8 of the Field...Predicate classes https://vega.github.io/vega-lite/docs/predicate.html#field-predicate

2 - Utility class for all field predicates (`field`)

Note: This is enough to demo the idea, but I've left it as stubs to avoid repeating parts

Code block

class field:
    @classmethod
    def one_of(
        cls,
        field: str,
        /,
        *oneOf: bool | float | Map | SchemaBase,
        timeUnit: Optional[
            dict | SchemaBase | MultiTimeUnit_T | BinnedTimeUnit_T | SingleTimeUnit_T
        ] = Undefined,
    ) -> FieldOneOfPredicate:
        return FieldOneOfPredicate(field=field, oneOf=oneOf, timeUnit=timeUnit)

    @classmethod
    def eq(
        cls,
        field: str,
        value: Any,
        /,
        *,
        timeUnit: Optional[
            dict | SchemaBase | MultiTimeUnit_T | BinnedTimeUnit_T | SingleTimeUnit_T
        ] = Undefined,
    ) -> FieldEqualPredicate:
        return FieldEqualPredicate(equal=value, field=field, timeUnit=timeUnit)

    @classmethod
    def lt(
        cls,
        field: str,
        value: Any,
        /,
        *,
        timeUnit: Optional[
            dict | SchemaBase | MultiTimeUnit_T | BinnedTimeUnit_T | SingleTimeUnit_T
        ] = Undefined,
    ) -> FieldLTPredicate:
        return FieldLTPredicate(lt=value, field=field, timeUnit=timeUnit)

    @classmethod
    def lte(cls, field: str, value: Any, /, **kwds) -> FieldLTEPredicate: ...
    @classmethod
    def gt(cls, field: str, value: Any, /, **kwds) -> FieldGTPredicate: ...
    @classmethod
    def gte(cls, field: str, value: Any, /, **kwds) -> FieldGTEPredicate: ...
    @classmethod
    def valid(cls, field: str, valid: bool, **kwds) -> FieldValidPredicate: ...
    @classmethod
    def range(
        cls,
        field: str,
        range: Optional[
            dict
            | Parameter
            | SchemaBase
            | Sequence[dict | None | float | Parameter | SchemaBase]
        ] = Undefined,
        /,
        **kwds,
    ) -> FieldRangePredicate: ...

Compared

alt.datum.Origin in ['Japan', 'Europe'] # original

alt.one_of("Origin", "Japan", "Europe") # proposed (1)

alt.field.one_of("Origin", "Japan", "Europe") # proposed (2)

Autocomplete

Pros

More explicit, without being overly verbose
All 8 of the field predicate types are accessible under a single namespace
Using positional-only parameters avoids lots of verbose repetition
- alt.FieldValidPredicate(field=..., valid=,...)
Would align closer to the proposed expr changes in feat: Reimplement expr as a class that is understood by IDEs #3466

Cons

Still may be difficult to find, but easy to teach like alt.expr, alt.datum
Some of these operations overlap with alt.datum

Summary

Personally think alt.field.one_of would be the least controversial and, in the future, could allow removing those wrapped classes from the top-level.
Rather than special-casing one_of alone.

I'd be happy to work on this if there's any interest

joelostblom · 2024-07-17T12:37:50Z

Thanks for these great ideas @dangotbanned ! I think I agree that a submodule is better than top module to avoid cluttering the top level api and allow easy discovery of related functionality. However, of the existing field predicates, most are already covered by the regular comparison operators >, ==, etc, so there is really only a need to introduce one_of and range (and maybe valid) which makes the grouping/discoverability advantages from the submodule less noticeable.

I wonder if we could be closer to how the syntax looks in Python (since this is what is used for the already existing comparison operators). I'm a bit short of time at the moment to dive deep into things, but do you think something along these lines would be possible or at they both susceptible to the constraints Mattijn mentioned above?

# 1
alt.datum.Origin in alt.list('Japan', 'Europe')
alt.datum.Origin in alt.range(1, 10)

# 2
alt.datum.Origin alt.in ['Japan', 'Europe']
alt.datum.Origin alt.in range(1, 10)

dangotbanned · 2024-07-17T15:25:48Z

Thanks for the response @joelostblom

However, of the existing field predicates, most are already covered by the regular comparison operators >, ==, etc, so there is really only a need to introduce one_of and range (and maybe valid) which makes the grouping/discoverability advantages from the submodule less noticeable.

Yeah I agree. I've personally not used any of them before, so mostly just included them for completeness.

Continuation of Utility class for all field predicates (field)

One additional thing I thought of, but seemed a bit much for the initial idea.
Changing the classmethod definitions to wrap the result in api.SelectionPredicateComposition.

So this allows using ~, &, | with compositions of selection objects.

# Previously ref as (2)
class field_old:
    @classmethod
    def one_of(
        cls,
        field: str,
        /,
        *oneOf: bool | float | Map | SchemaBase,
        timeUnit= Undefined,
    ) -> FieldOneOfPredicate:
        return FieldOneOfPredicate(field=field, oneOf=oneOf, timeUnit=timeUnit)
	@classmethod
    def range(
        cls,
        field: str,
        range: Optional[
            dict
            | Parameter
            | SchemaBase
            | Sequence[dict | None | float | Parameter | SchemaBase]
        ] = Undefined,
        /,
        **kwds,
    ) -> FieldRangePredicate: ...

# ----------------------------------------------------
# Extending for composition (3)
class field:
	@classmethod
    def one_of(
        cls,
        field: str,
        /,
        *values: bool | float | dict[str, Any] | SchemaBase,
        timeUnit: TimeUnitType = Undefined,
    ) -> SelectionPredicateComposition:
		m = FieldOneOfPredicate(field=field, oneOf=values, timeUnit=timeUnit).to_dict()
		return SelectionPredicateComposition(m)

	@classmethod
    def range(
        cls, field: str, value: RangeType, /, *, timeUnit: TimeUnitType = Undefined
    ) -> SelectionPredicateComposition:
        m = FieldRangePredicate(field=field, range=value, timeUnit=timeUnit).to_dict()
        return SelectionPredicateComposition(m)

I was curious on if this would actually render, and surpisingly it did!

altair.utils._shorthand_ns isn't part of any public branch at the moment.

Feedback on @joelostblom's idea

I wonder if we could be closer to how the syntax looks in Python (since this is what is used for the already existing comparison operators). I'm a bit short of time at the moment to dive deep into things, but do you think something along these lines would be possible or at they both susceptible to the constraints Mattijn mentioned above?
# 1
alt.datum.Origin in alt.list('Japan', 'Europe')
alt.datum.Origin in alt.range(1, 10)

My first thoughts on # 1 were issues like:

Since both list and range are builtins

But apparently shadowing a builtin with a function isn't an issue?

Personally, I think it would be error prone since a user could do this and accidentally change the meaning of their code:

from altair import list, range

...

list(...)
range(...)

# 2
alt.datum.Origin alt.in ['Japan', 'Europe']
alt.datum.Origin alt.in range(1, 10)

For # 2 there are some major issues:

Cell In[11],   [line 22](vscode-notebook-cell:?execution_count=11&line=22)
    alt.datum.Origin alt.in range(1, 10)
                     ^
SyntaxError: invalid syntax

Also VSCode yelling at me:

SyntaxError: Simple statements must be separated by newlines or semicolons (Ruff)
Statements must be separated by newlines or semicolons (Pylance)
Expected attribute name after "." (Pylance)
SyntaxError: Expected an identifier, but found a keyword 'in' that cannot be used here (Ruff)
Statements must be separated by newlines or semicolons (Pylance)
Expected expression (Pylance)
SyntaxError: Simple statements must be separated by newlines or semicolons (Ruff)
Missing whitespace around operator (Ruff)

in is a reserved keyword https://docs.python.org/3/reference/lexical_analysis.html#keywords

Which is the same reason you'll see as_ used in the TopLevelMixin.transform_ methods:

altair/altair/vegalite/v5/api.py

Lines 1624 to 1628 in 9afd374

    
           def transform_bin( 
        
               self, 
        
               as_: Optional[str | FieldName | list[str | FieldName]] = Undefined, 
        
               field: Optional[str | FieldName] = Undefined, 
        
               bin: Literal[True] | BinParams = True,

mattijn · 2024-07-17T16:09:59Z

I quite like the alt.field.* approach! Thanks for this exploration @dangotbanned!

joelostblom · 2024-07-17T19:14:02Z

Personally, I think it would be error prone since a user could do this and accidentally change the meaning of their code:

Yup, definitely. We would have to append an underscore like we do with alt.expr.if_ but I've always thought that trailing underscore looks a bit unofficial/private, so I'm not super sold on alt.list_/alt.range_.

So this allows using ~, &, | with compositions of selection objects

I like this! Maybe just go ahead with the field submodule approach then as mattijn is on board too. One modification I wonder if we can do is that instead of passing the items of the list as parameters, could we just pass the list? So the syntax would be alt.field.one_of("Origin", ["Japan", "Europe"]). I think that is more intuitive, aligns with the syntax for field.range in your screenshot, and it would make it easier if the list is predefined as well since we avoid unpacking it, ie avoid alt.field.one_of("Origin", *predefined_list). Unless I'm missing some key functionality enables by having these as params instead?

dangotbanned · 2024-07-17T20:09:02Z

I quite like the alt.field.* approach! Thanks for this exploration @dangotbanned!

Thanks @mattijn

So this allows using ~, &, | with compositions of selection objects

I like this! Maybe just go ahead with the field submodule approach then as mattijn is on board too. One modification I wonder if we can do is that instead of passing the items of the list as parameters, could we just pass the list? So the syntax would be alt.field.one_of("Origin", ["Japan", "Europe"]). I think that is more intuitive,

@joelostblom You can absolutely have both for field.one_of, if you feel its more intuitive!

Most of the polars API that accepts *args does so like LazyFrame.select.

aligns with the syntax for field.range in your screenshot, and it would make it easier if the list is predefined as well since we avoid unpacking it, ie avoid alt.field.one_of("Origin", *predefined_list). Unless I'm missing some key functionality enables by having these as params instead?

So for field.range, which wraps FieldRangePredicate - the range parameter type is a bit more complex:

https://github.com/vega/altair/blob/f8c9776a72afa433f3e6b311bdaf5133d38f127b/altair/vegalite/v5/schema/core.py#L16545-L16576

But I think for most cases it would be tuple[float, float].
The most important part being 2 elements (min/max), so *args might be misleading as a specific length is expected.

mattijn · 2024-07-25T21:33:06Z

I saw #3440 (comment) again.
How would the proposed syntax looks like here?

import altair as alt
from vega_datasets import data

source = data.disasters()
columns_sorted = ['Drought', 'Epidemic', 'Earthquake', 'Flood']

alt.Chart(source, height=200).transform_filter(
    {'and': [
        alt.FieldOneOfPredicate(field='Entity', oneOf=columns_sorted), # Filter data to show only disasters in columns_sorted
        alt.FieldRangePredicate(field='Year', range=[1900, 2000]) # Filter data to show only 20th century
    ]}
).mark_line(interpolate='monotone').encode(x=alt.X('Year:Q').axis(format='d'), color='Entity',column='Entity', y='sum(Deaths)')

Will it be as such?

alt.field.one_of(field='Entity', one_of=columns_sorted) & alt.field.range(field='Year', range=[1900, 200])

Also linking #695 here. Is this applicable here too?

`field` proposed in vega#3239 (comment) `agg` was developed during vega#3427 (comment) as a solution to part of vega#3476

vega#3239 (comment)

dangotbanned · 2024-07-26T12:00:53Z

I saw #3440 (comment) again. How would the proposed syntax looks like here?

import altair as alt
from vega_datasets import data

source = data.disasters()
columns_sorted = ['Drought', 'Epidemic', 'Earthquake', 'Flood']

alt.Chart(source, height=200).transform_filter(
    {'and': [
        alt.FieldOneOfPredicate(field='Entity', oneOf=columns_sorted), # Filter data to show only disasters in columns_sorted
        alt.FieldRangePredicate(field='Year', range=[1900, 2000]) # Filter data to show only 20th century
    ]}
).mark_line(interpolate='monotone').encode(x=alt.X('Year:Q').axis(format='d'), color='Entity',column='Entity', y='sum(Deaths)')

@mattijn I've opened #3505 to allow you (and anyone really) to test out what I've presented so far.

Will it be as such?

alt.field.one_of(field='Entity', one_of=columns_sorted) & alt.field.range(field='Year', range=[1900, 2000])

It would be the slightly shorter:

# Also valid
# alt.field.one_of('Entity', 'Drought', 'Epidemic', 'Earthquake', 'Flood')
alt.field.one_of('Entity', columns_sorted) & alt.field.range('Year', [1900, 2000])

Each field argument is positional-only, and what would be one_of and range are as well.

See #3239 (comment) for extra detail on that choice

joelostblom added the enhancement label Oct 26, 2023

dangotbanned added a commit to dangotbanned/altair that referenced this issue Jul 26, 2024

feat(RFC): Adds agg, field utility classes

fe7e77e

`field` proposed in vega#3239 (comment) `agg` was developed during vega#3427 (comment) as a solution to part of vega#3476

dangotbanned added a commit to dangotbanned/altair that referenced this issue Jul 26, 2024

feat: Support non-variadic input to field.one_of

7a4451a

vega#3239 (comment)

dangotbanned linked a pull request Jul 26, 2024 that will close this issue

feat(RFC): Adds agg, field utility classes #3505

Draft

dangotbanned added the v6 Aim to be completed for version 6 label Aug 13, 2024

dangotbanned mentioned this issue Oct 31, 2024

feat: Support Chart.transform_filter(*predicates, **constraints) #3664

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More convenient syntax instead of FieldOneOfPredicate #3239

More convenient syntax instead of FieldOneOfPredicate #3239

joelostblom commented Oct 26, 2023

mattijn commented Oct 28, 2023 •

edited

Loading

joelostblom commented Oct 28, 2023

dangotbanned commented Jul 11, 2024 •

edited

Loading

joelostblom commented Jul 17, 2024

dangotbanned commented Jul 17, 2024 •

edited

Loading

mattijn commented Jul 17, 2024

joelostblom commented Jul 17, 2024

dangotbanned commented Jul 17, 2024 •

edited

Loading

mattijn commented Jul 25, 2024 •

edited by joelostblom

Loading

dangotbanned commented Jul 26, 2024 •

edited

Loading

More convenient syntax instead of FieldOneOfPredicate #3239

More convenient syntax instead of FieldOneOfPredicate #3239

Comments

joelostblom commented Oct 26, 2023

mattijn commented Oct 28, 2023 • edited Loading

joelostblom commented Oct 28, 2023

dangotbanned commented Jul 11, 2024 • edited Loading

1 - A wrapper function (one_of)

Compared

Pros

Cons

2 - Utility class for all field predicates (field)

Compared

Autocomplete

Pros

Cons

Summary

joelostblom commented Jul 17, 2024

dangotbanned commented Jul 17, 2024 • edited Loading

Continuation of Utility class for all field predicates (field)

Feedback on @joelostblom's idea

mattijn commented Jul 17, 2024

joelostblom commented Jul 17, 2024

dangotbanned commented Jul 17, 2024 • edited Loading

mattijn commented Jul 25, 2024 • edited by joelostblom Loading

dangotbanned commented Jul 26, 2024 • edited Loading

mattijn commented Oct 28, 2023 •

edited

Loading

dangotbanned commented Jul 11, 2024 •

edited

Loading

1 - A wrapper function (`one_of`)

2 - Utility class for all field predicates (`field`)

dangotbanned commented Jul 17, 2024 •

edited

Loading

dangotbanned commented Jul 17, 2024 •

edited

Loading

mattijn commented Jul 25, 2024 •

edited by joelostblom

Loading

dangotbanned commented Jul 26, 2024 •

edited

Loading