Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Union Type Confusion #71

Open
billy1kaplan opened this issue Aug 1, 2022 · 11 comments
Open

Union Type Confusion #71

billy1kaplan opened this issue Aug 1, 2022 · 11 comments
Labels
bug Something isn't working

Comments

@billy1kaplan
Copy link

Describe the bug:
I would expect the following to be a "coercable" union type:

class Demo
  extend T::Sig

  class Test < T::Struct
    const :key, String
    const :value, String
  end

  sig { void }
  def self.help
    TypeCoerce[T.any(String, Test)].new.from(
      {
        "key" => "Hello",
        "value" => "World",
      },
    )
  end
end

Actual result:

TypeError: T.let: Expected type T.any(Demo::Test, String), got type Hash with value {"key"=>"Hello", "value"=>"World" }

This does seem to work with the other half of the union type (i.e. for a String) where we're able to return a String directly:

sig { returns(String) }
def self.help
  TypeCoerce[T.any(String, Test)].new.from(
    "Hi",
  )
end

Expected behavior:

Naively, I would expect the above to return the Test struct with the properly initialized values. Is this a known limitation? And would this be a possible feature to support?

Versions:

  • Ruby: 2.7.2p137
  • Sorbet: 0.5.9449
  • Sorbet-Coerce: 0.6.0
@billy1kaplan billy1kaplan added the bug Something isn't working label Aug 1, 2022
@billy1kaplan
Copy link
Author

This might be a bit trickier than I thought, e.g. which of these is coerced to? Maybe the user would need to supply the type of the struct in order for something like this to work:

class Test < T::Struct
  const :key, String
  const :value, String
end

class Test1 < T::Struct
  const :key, String
  const :value, String
  const :other, String
end

@mattxwang
Copy link
Contributor

mattxwang commented Aug 3, 2022

Hi @billy1kaplan, thanks for submitting an issue! I'll write out this response with the caveat that I'm only interning at @chanzuckerberg and am not the original creator of this repo.

First, to get it out of the way, I can replicate this locally:

class TestClass < T::Struct
  const :a, String
end
TypeCoerce[T.any(String, TestClass)].new.from({
  a: 'hello'
})
# TypeError: T.let: Expected type T.any(String, TestClass), got type Hash with value {:a=>"hello"}

However, this is actually a bit complicated! If you're alright with it, join me on a deeper dive!

Naively, I would expect the above to return the Test struct with the properly initialized values. Is this a known limitation?

Ah, this is - it just hasn't been documented particularly well. If you peek at the code for converting a Union:

elsif type.is_a?(T::Types::Union)
true_idx = T.let(nil, T.nilable(Integer))
false_idx = T.let(nil, T.nilable(Integer))
nil_idx = T.let(nil, T.nilable(Integer))
type.types.each_with_index do |t, i|
nil_idx = i if t.is_a?(T::Types::Simple) && t.raw_type == NilClass
true_idx = i if t.is_a?(T::Types::Simple) && t.raw_type == TrueClass
false_idx = i if t.is_a?(T::Types::Simple) && t.raw_type == FalseClass
end
return value unless (
(!true_idx.nil? && !false_idx.nil? && !nil_idx.nil?) || # T.nilable(T::Boolean)
(type.types.length == 2 && (
!nil_idx.nil? || (!true_idx.nil? && !false_idx.nil?) # T.nilable || T::Boolean
))
)
if !true_idx.nil? && !false_idx.nil?
_convert_simple(value, T::Boolean, raise_coercion_error, coerce_empty_to_nil)
else
_convert(value, type.types[nil_idx == 0 ? 1 : 0], raise_coercion_error, coerce_empty_to_nil)
end

The only "explicitly" supported unions are of types T.nilable and T::Boolean; prior to version 0.6.0 (i.e. before #53 and #54 was merged in), this was clearer from a specific error message. It 'forwards' other unions / leaves them "as-is". I don't think we've done a great job of documenting this, though I'm happy to take that up.

In particular, the way the code is written:

  • we'll reach the last else statement, which assumes that the type is T.nilable (the other if statements knock out the T::Bolean statements)
  • it then assumes that the two types are one of NilClass and one of the enclosed type (as T.nilable(SomeClass) = T.any(NilClass, SomeClass)
  • so, it picks the first type as the type to convert to, unless the first type is NilClass - then, it'll pick the second.

So, in your case, it picks the first type and converts to String (failing). That's also why when you switch the convert value to something Stringable, it works! Unfortunately, I don't think switching the order of the classes works - it seems like they get sorted somewhere (I imagine by sorbet).

Note that this still fails if the union is part of the struct:

class TestClass < T::Struct
  const :a, String
end
class WithUnsupportedComplexUnion < T::Struct
  const :union, T.any(String, TestClass)
end
TypeCoerce[WithUnsupportedComplexUnion].new.from({
  union: {
    a: 'hello'
  }
}) 
# TypeError: Parameter 'union': Can't set WithUnsupportedComplexUnion.union to {:a=>"hello"} (instance of Hash) - need a T.any(String, TestClass)

What I can definitely do is:

  • add some documentation indicating that this doesn't work
  • add a test case that makes it more clear that this is unsupported behaviour

And would this be a possible feature to support?

That's a great question. Right now, I don't know, though I'm happy to dig in a bit more if I have some free time. Of course, a PR is welcome if you find a way to make this work (even only for situations like your first example; the second can be "undefined behaviour" to some extent). The first step would be to relax the assumption (in the above code) that unions are only nilable or booleans, and add some more control flow logic. The else case currently assumes it's a nilable type, and clearly that's wrong in this case.

I'm also not sure if T.any is a first-class type and is intended for this purpose!

I'm certainly no expert in this / safe_type (an underlying gem we use) / sorbet-runtime, and I could be completely wrong. @donaldong and @datbth implemented this original feature - if they have any thoughts, feel free to chime in!

@donaldong
Copy link
Contributor

donaldong commented Aug 4, 2022

@mattxwang thanks!

The only "explicitly" supported unions are of types T.nilable and T::Boolean

+1. However this does not match what the README page says: T.any(<supported type>, ...).

And would this be a possible feature to support?

I haven't looked at the code for a while, but it feels doable to me.

With

class Test < T::Struct
  const :key, String
  const :value, String
end

class Test1 < T::Struct
  const :key, String
  const :value, String
  const :other, T.nilable(String)
end

should we expect the following?

  TypeCoerce[T.any(String, Test)].new.from(
    "Hi",
  ) # => "Hi"

Can be casted to String already, return String.

  TypeCoerce[T.any(Test1, Test)].new.from(
          {
        "key" => "Hello",
        "value" => "World",
      },
  ) # => Test1

Cast to Test1 first, because that's the first valid try

  TypeCoerce[T.any(Test, Test1)].new.from(
          {
        "key" => "Hello",
        "value" => "World",
      },
  ) # => Test

Cast to Test first, because that's the first valid try

Similarly, TypeCoerce[T.any(String, Test)] will convert the hash to String, and TypeCoerce[T.any(Test, String)] will try to convert to T::Struct first if valid

@mattxwang
Copy link
Contributor

Thanks @donaldong for the clarification! I think what you proposed makes sense. It does seem like the types in T.any are sorted, so the first try will be consistent - which may be a good thing.

However this does not match what the README page says: T.any(, ...).

Should I update the docs? I think that was merged in by #54.


@billy1kaplan any thoughts? If you'd like to contribute, I think this is a great opportunity to do so (and I can review your code, etc. if it's in the next 5 weeks). If not, no worries - I can take a stab at it.

@billy1kaplan
Copy link
Author

@donaldong @mattxwang thank you both for the detailed replies!

I'm happy to take a shot at this hopefully within the next week or so. My understanding of a possible approach would be to try coercing to the types defined by T.any(...) in the order that they appear, returning the first successful match if one exists or raising a coercion error if not.

@mattxwang
Copy link
Contributor

I'm happy to take a shot at this hopefully within the next week or so.

Great! Let me know how I can support you - happy to lend a hand however I can.

My understanding of a possible approach would be to try coercing to the types defined by T.any(...) in the order that they appear, returning the first successful match if one exists or raising a coercion error if not.

That sounds right to me!

@leifg
Copy link
Contributor

leifg commented Aug 5, 2022

My understanding of a possible approach would be to try coercing to the types defined by T.any(...) in the order that they appear, returning the first successful match if one exists or raising a coercion error if not.

We implemented a monkey patch for this based on version 0.5.0. It caught the "old" ArgumentError. Not a great approach but it worked.

Now that it is a TypeError nested Union types don't seem to work so great anymore.

I also submitted #72 that illustrates the error.

@datbth
Copy link
Contributor

datbth commented Aug 19, 2022

I think automatically trying to coerce union types is indeed convenient.
However, it comes with these costs:

  1. Performance. Try coercing and rescuing through every type within the union type is very costly.
  2. This is a significant behavioral change and might break the existing usages due to new unexpected coercion.

The performance cost is the main concern to me.
If we are going to support this, I think we should at least make it togglable.

@mattxwang
Copy link
Contributor

Thanks @datbth. @billy1kaplan has done some work in #73, I think it's reasonable to add a toggle (since this is a breaking change). Do you have any additional context on the original PRs that you wrote with Donald?

(for context: wrapping up an internship at CZI and doing some maintenance for some infra repos on the side)

@datbth
Copy link
Contributor

datbth commented Aug 20, 2022

Context of #53

Okay let me try to recall the context of #53.
I believe I was doing a nested coercion like this:

class Owner < T::Struct
  const :name, String
  const :pet_id, T.any(Integer, String)
end

owner_data = {
  name: 'Alice',
  pet_id: 'abcdef',
}
TypeCoerce[Owner].new.from(owner_data)

Then I got the error 'the only supported union types are T.nilable and T::Boolean' in both of these cases:

  1. When pet_id is already an Integer or String
  2. When pet_id is of an invalid type, not an Integer nor String

I don't remember exactly how nested Owner was or how complex pet_id was, but I believe they were pretty simple (like in the above example).

So by implementing #53, I was trying to achieve:

  1. Passing case (pet_id is of a valid type already): the coercion should keep it as-is
  2. Non-passing case (pet_id is of an invalid type): the coercion should raise an informative error about the expected type

Some additional comments on coercing union types

I myself would try to stay away from actually coercing union types because:

  1. Performance
  2. The behavior is tricky (and too magical?) and might cause unexpected results

One example for the tricky behavior issue:

class Cat < T::Struct
  const :name
end

class Dog < T::Struct
  const :name
end

class Owner < T::Struct
  const :name, String
  const :pet, T.any(Cat, Dog)
end

data = {
  name: 'Alice',
  pet: {
    type: 'dog',
    name: 'Goofy',
  }
}

TypeCoerce[Owner].new.from(data)

In this case, even though the pet data is coercible into Cat, I would prefer raising an error. Because the error helps me detect the poor handling of the pet type info.

For the case where the user explicitly tries to coerce a union type (e.g. the union type is the root type) like TypeCoerce[T.any(Cat, Dog)].new.from(pet_data) then it might make sense to carry out the automatic coercion.
But I myself have not ran into the need for this. I would always do a check on a type info, wherever it is.

@mattxwang
Copy link
Contributor

That is great context, thank you @datbth. I don't have too much context into the vision of this gem (we still use it at CZI, but much of that is out of scope of my direct work). I can do two things:

  • I'll briefly ask internally if we have any strong thoughts on this
  • @billy1kaplan, I want to see if we can address your needs as well. Given this new context, what are your thoughts on the feature request? I see
    • we continue the existing path (i.e. coerce everything), optionally with a flag to delimit behaviour
    • we only do this behaviour if the union type is the root type
    • we stop the work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants