Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is the Text instance guaranteed to generate legal text? #9

Closed
saurabhnanda opened this issue Jun 16, 2017 · 24 comments
Closed

Is the Text instance guaranteed to generate legal text? #9

saurabhnanda opened this issue Jun 16, 2017 · 24 comments

Comments

@saurabhnanda
Copy link

Is this line guaranteed to generate legal text values? My naive understanding of legal text values is stuff that has something to do with legality in a UTF sense.

Is the test failure at lpsmith/postgresql-simple#218 (comment) somehow related to the underlying text values being generated?

@saurabhnanda
Copy link
Author

If what is written at https://hackage.haskell.org/package/quickcheck-text is correct, should those instances be merged in this package?

The usual Arbitrary instance for Text (in quickcheck-instances) only has single-byte instances and so isn't an ideal representation of a valid UTF-8 character. This package has generators for one-, two- and three-byte UTF-8 characters (all that are currently in use).

@phadej
Copy link
Collaborator

phadej commented Jun 16, 2017

as far as i can tell converting String to Text using Data.Text.pack is correct. It generates legal Text values, which often aren't human readable though.

@saurabhnanda
Copy link
Author

Still puzzling me though:

  • What is https://hackage.haskell.org/package/quickcheck-text trying to solve in that case? Will /cc @olorin send the maintainer some sort of notification? Probably he/she can chime-in.
  • If these instances are being using to generate a [(Text, Text)] value, which is what pg-simple serializes into an HSTORE, why would that be generating an illegal HSTORE representation? If you notice the generated HSTORE string is missing the closing double quote " Could the random string being generated actually have a backspace/delete character somewhere? Is that what's happening here?

@phadej
Copy link
Collaborator

phadej commented Jun 16, 2017

I was not aware of quickcheck-text before. Most likely it generated more strings working around previously limited Arbitrary Char: nick8325/quickcheck#164 which is solved in QuickCheck-2.10: http://hackage.haskell.org/package/QuickCheck-2.10/docs/src/Test-QuickCheck-Arbitrary.html#line-647

About second point: I don't know. By reading postgresql-simple issue it seems, that you have to escape HSTORE representation yourself.

Well, after playing with it a bit, the problem seems to be that you didn't unescape the string that you are supplying to postgresql-simple.

But I don't know for sure, I have no experience with HSTORE, so don't take that as an advice.

@saurabhnanda
Copy link
Author

About second point: I don't know. By reading postgresql-simple issue it seems, that you have to escape HSTORE representation yourself.

Actually there are two separate issues being discussed in that thread. The first one is where I was pumping a raw HSTORE string, where this is true. The second is where I'm generating an HStoreMap via arbtirary and going through the pre-defined hstore API. The expectation is, that if one is going through a well-defined API then HSTORE/PG-Simple should be able to handle any kind of (Text, Text) pairs, which doesn't seem to be happening in this case.

Therefore, digging deeper to understand if it is possible to generate illegal Text values in the first place.

@olorin
Copy link

olorin commented Jun 18, 2017

What is https://hackage.haskell.org/package/quickcheck-text trying to solve in that case? Will /cc @olorin send the maintainer some sort of notification? Probably he/she can chime-in.

@phadej's guess is correct:

Most likely it generated more strings working around previously limited Arbitrary Char: nick8325/quickcheck#164

I wrote quickcheck-text because I needed more complete coverage of the UTF-8 character space, it wasn't an issue with the correctness of the existing instance. (I can't remember why I didn't put the gen up as a pull request here at the time instead of a separate package, I probably should have.)

@saurabhnanda
Copy link
Author

I'm back again. What could explain the following error from PG:

� quickCheck (testClientDB conn)
ld: warning: directory not found for option '-L/opt/local/lib/'
*** Failed! Exception: 'SqlError {sqlState = "22021", sqlExecStatus = FatalError, sqlErrorMsg = "invalid byte sequence for encoding \"UTF8\": 0x00", sqlErrorDetail = "", sqlErrorHint = ""}' (after 7 tests): 
FreshClient {fcEmail = EmailAddress (Address {addressName = Nothing, addressEmail = "[email protected]"}), fcContactName = "\DLE$\a\SYNN\163", fcPhone = "p\156?", fcName = "e\DC1\175\&7\175", fcCity = "\RSI\170\&2", fcNumberOfEmployees = "\161\\\255uV$", fcHomePage = Just "", fcHasWebsite = True, fcNatureOfBusiness = "\f\NUL", fcRequirements = "\DELJ5^"}

@phadej
Copy link
Collaborator

phadej commented Jun 20, 2017

@saurabhnanda
Copy link
Author

Many C-libraries don't like nulls in strings though.

Could this explain lpsmith/postgresql-simple#218 (comment) and lpsmith/postgresql-simple#218 (comment) ?

Also, is this a good use-case for a PR around generating Text values without \NUL characters for the common use-case of passing them to Postgres? The behaviour to restrict certain characters in the Text could be non-default and would require an explicit opt-in.

@phadej
Copy link
Collaborator

phadej commented Jun 21, 2017

Could this explain ...

Maybe. I have no time to investigate. Have you tried generating Text without \NULs?t

About the PR. For now, adding modifier newtypes is out-of-scope of this package. I'd like to prevent the feature creep. I'd suggest making a domain specific newtype, e.g. PostgresText, and use it in your tests. See http://hackage.haskell.org/package/QuickCheck-2.10.0.1/docs/Test-QuickCheck-Modifiers.html

@phadej
Copy link
Collaborator

phadej commented Jun 21, 2017

Could this explain ...

Maybe. I have no time to investigate. Have you tried generating Text without \NULs?

About the PR. For now, adding modifier newtypes is out-of-scope of this package. I'd like to prevent the feature creep. I'd suggest making a domain specific newtype, e.g. PostgresText, and use it in your tests. See http://hackage.haskell.org/package/QuickCheck-2.10.0.1/docs/Test-QuickCheck-Modifiers.html for examples.

@saurabhnanda
Copy link
Author

Okay - let me confirm that \NUL is the actual problem and report back.

@saurabhnanda
Copy link
Author

saurabhnanda commented Jun 21, 2017

Possible Solution

Confirmed that this is a \NUL issue. I've forked and published a patched version of this package at https://github.com/vacationlabs/qc-instances for anyone who has a similar problem in the future. The fix is just a one-liner, but I'm not sure how to make this an easy opt-in without having to resort to newtypes all over the place.

@saurabhnanda
Copy link
Author

One possible solution could be to break this package into submodules -- one for each type -- so that users can selectively import the exact instances they need.

@phadej
Copy link
Collaborator

phadej commented Jun 21, 2017

Won't do that. That would impose huge maintainment cost. Just do a newtype in your code, if you need it more than. Having a Text instance which won't generate \NUL will hide some bugs from you.

The fact that postgres requires Text without NUL should be documented in postgresql-simple, as it should be documented everywhere else when the type-signature isn't precise. Think if we always generated non-empty lists for [] because foldr1 fails with empty one!

@saurabhnanda
Copy link
Author

saurabhnanda commented Jun 21, 2017

Just do a newtype in your code, if you need it more than.

The practical side is that this has a huge cost in development. All Text fields in all records will have to be changed to newtype PGCompatibleText = PGCompatibleText Text. Passing a PGCompatibleText to any other library would require a (un)wrapping, or lift/fmap operations.

@tomjaguarpaw
Copy link

I think you're missing @phadej's point. I think he is suggesting that you use the newtype for generating random Texts without NUL but keep Text in all your client code. This is what I would suggest too.

@tomjaguarpaw
Copy link

Here's a random relevant StackOverflow question I found, in case anyone finds it useful: https://stackoverflow.com/questions/28813409/are-null-bytes-allowed-in-unicode-strings-in-postgresql-via-python

@saurabhnanda
Copy link
Author

saurabhnanda commented Jun 21, 2017

I think you're missing @phadej's point. I think he is suggesting that you use the newtype for generating random Texts without NUL but keep Text in all your client code. This is what I would suggest too.

You are right - this reduces the dev overhead, but there is overhead nevertheless. Every arbitrary instance that was earlier a one-liner:

instance Arbitrary MyRecord where
  arbitrary = genericArbitrary uniform

will now need to be

unwrapText :: PGCompatibleText -> Text

instance Arbitrary MyRecord where
  arbitrary = MyRecord 
    <$> arbitrary
    <*> (unwrapText <$> arbitrary)
    <*> arbitrary
    <*> (unwrapText <$> arbitrary)
    -- and so on

Unless one writes a custom generic function to do this automagically, I guess.

@tomjaguarpaw
Copy link

Yes, I can see that can become a pain. The alternative is to patch quickcheck-text and just put up with the fact that you have to use a non-standard package and not-fully-general Arbitrary instance for all your non-Postgres Texts.

The bug lies in Postgres, of course.

@ocharles
Copy link

You could have

arbitraryNotNullText :: Gen Text
arbitraryNotNullText = ...

then

instance Arbitrary MyRecord where
  arbitrary = MyRecord 
    <$> arbitrary
    <*> arbitraryNotNullText
    <*> arbitrary
    <*> arbitraryNotNullText
    -- and so on

You don't have to use arbitrary.

@saurabhnanda
Copy link
Author

This increases friction / boilerplate unnecessarily. Unless NonNullText is a core type in your app, you can't do this without introducing friction. For example, now I can't use a generics or TH library to manage all the Arbitrary instance boilerplate.

I concede that it is possible, but if everyone is forced to do this, then either it is not useful, or we need to fix the bug that this uncovers, or we need to change the String/Text types being used by foundational libraries.

@Lysxia
Copy link

Lysxia commented Dec 31, 2017

I think this is a limitation of Generics/TH libraries that can be overcome with an option to insert your own implementation of arbitrary/whatever for specific types/fields.

@Lysxia
Copy link

Lysxia commented Jan 1, 2018

@saurabhnanda How's this, to patch generators for specific types (or fields), with generic-random-1.1.0.0:

-- write once
genPG :: GenList '[Text]
genPG = (genPGCompatibleText :: Gen Text) :@ Nil
 
instance Arbitrary MyRecord where
  arbitrary = genericArbitraryUG genPG

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants