Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multiline matching of stdout/stderr doesn't work as expected (cannot achieve it) #29

Open
ppenguin opened this issue Feb 14, 2021 · 15 comments
Labels

Comments

@ppenguin
Copy link

ppenguin commented Feb 14, 2021

When testing this, it matches:

# test multiline matches
$ echo -e "Line 1 blabla\nLine 2 haha\nLine 3 hihihi"
> /.*Line 1.*/
>= 0

but when testing this, it doesn't (returns failure):

# test multiline matches
$ echo -e "Line 1 blabla\nLine 2 haha\nLine 3 hihihi"
> /.*Line 1.*Line 2.*/
>= 0

the regex.TDFA matcher is supposed to default to multiline, so I'd expect that one to work. But even if I explicitly try to include newlines for the catch-all, it doesn't work as well (also returns failure):

# test multiline matches
$ echo -e "Line 1 blabla\nLine 2 haha\nLine 3 hihihi"
> /.*Line 1(.|\n)*Line 2.*/
>= 0

Is there a syntax I can use to achieve multiline matching as-is, or does it require a mod to the code?

@simonmichael
Copy link
Owner

Sorry I'm not sure - needs debugging. Perhaps we are not calling it in multiline mode.

@obfusk
Copy link
Contributor

obfusk commented Feb 19, 2021

It is called in multiline mode. But regex-tdfa has a non-standard multiline mode that combines what is usually known as "multiline" with inverse "dotall" and also disables matching newlines in inverted character classes (so you can't even use e.g. [^!]).

You can match a newline using "(.|\n)", but only with an actual newline in the pattern (since \n is just n to regex-tdfa). I don't think that shelltestrunner can currenly do that for you.

@obfusk
Copy link
Contributor

obfusk commented Feb 19, 2021

Also: fyi echo -e doesn't usually work with /bin/sh.

@obfusk
Copy link
Contributor

obfusk commented Feb 19, 2021

regex-tdfa does recognise [[:space:]] though, so this works:

# test multiline matches
$ printf "Line 1 blabla\nLine 2 haha\nLine 3 hihihi\n"
> /.*Line 1(.|[[:space:]])*Line 2.*/
>= 0

@obfusk
Copy link
Contributor

obfusk commented Feb 19, 2021

Also: fyi echo -e doesn't usually work with /bin/sh.

# with bash
$ echo "foo\nbar" 
foo\nbar
$ echo -e "foo\nbar" 
foo
bar
$ printf "foo\nbar\n"
foo
bar
# with /bin/sh (on my system)
$ echo "foo\nbar" 
foo
bar
$ echo -e "foo\nbar"
-e foo
bar
$ printf "foo\nbar\n"
foo
bar

The behaviour of echo regarding escapes and options differs greatly between systems.
I recommend using printf instead (though you need to manually add a \n at the end).

@ppenguin
Copy link
Author

@obfusk Thanks for the infos, very useful.
Indeed on my sh echo -e works as expected, but since in some cases I need compatibility with e.g. busybox etc, it's still a valuable comment which I will use.

As for the [[:space:]] workaround, very useful! I guess this greatly lessens at least the urgency of this issue.

What I didn't 100% understand is whether we should abandon the expectation to handle newlines in a standard way completely due to inherent limitations of regex-tdfa, or whether it would be possible to configure it in such a way that the behaviour is possible?

@obfusk
Copy link
Contributor

obfusk commented Feb 21, 2021

What I didn't 100% understand is whether we should abandon the expectation to handle newlines in a standard way completely due to inherent limitations of regex-tdfa, or whether it would be possible to configure it in such a way that the behaviour is possible?

You could (add an option to shelltestrunner to) turn multiline mode off; this allows you to match newlines with ., but no longer allows you to match the start/end of a line with ^/$ (they only match at the start/end of the whole string).

It's unfortunate that regex-tdfa has chosen such non-standard behaviour: merging "multiline" and "dotall" into one option + not matching newlines in complementing character classes (which AFAIK no other regex implementation does). Thus (optionally, if you want backwards compatibility) using a different regex implementation might be preferable.

Another option would be to (have an option to) "preprocess" the regex and replace . with (.|[[:space:]]) (though this is non-trivial); e.g. using a syntax like /.../s (similar to e.g. Perl and JavaScript).

@simonmichael
Copy link
Owner

simonmichael commented Feb 21, 2021 via email

@obfusk
Copy link
Contributor

obfusk commented Feb 21, 2021

Worth raising in regex-tdfa's issuentrwcker maybr ?

haskell-hvr/regex-tdfa#11

@ppenguin
Copy link
Author

ppenguin commented Feb 24, 2021

You could (add an option to shelltestrunner to) turn multiline mode off; this allows you to match newlines with ., but no longer allows you to match the start/end of a line with ^/$ (they only match at the start/end of the whole string).

@simonmichael
This might actually be a nice option to have as a command line option to shelltest which is probably easy to implement?
Then one could simply choose the behaviour based on the use case.
Multi-line off would be perfectly suitable for cli-testing where e.g. a program feedback is checked (e.g. the contents of a help or error message), since in many cases the keywords/patterns will be more important than the lines they're on.

@obfusk
Copy link
Contributor

obfusk commented Feb 24, 2021

@ppenguin fwiw I recently quickly hacked together a Python implementation of something similar to shelltest. It's unfinished, not entirely compatible, only implements part of the functionality, hasn't been documented yet, and probably has some bugs. But it does support proper multiline tests (and uses Python's more extensive regex capabilities):

# test multiline matches
$ printf "Line 1 foo\nLine 2 bar\nLine 3 baz\n"
> /^line 1.*^line 2/ims

Note the /.../ims to enable case insensitive matching (i), multiline (m) & dotall (s).

@teto
Copy link

teto commented Mar 30, 2021

I would very much like this. I am currently porting my application from python to haskell and I dearly miss the integrated test generation ("transcript") of https://github.com/python-cmd2/cmd2 where you can save the output of the application in order to test it at a later date. I've just written one test with shelltestrunner but due to the size of the output it would be too impractical to maintain those transcripts manually. I mention this because it can be an inspiration for line handling too.

NB: I also find the expected output/command/get output quite hard to notice.

@simonmichael
Copy link
Owner

Would anybody like to propose/work on some improvements ?

@iustin
Copy link
Collaborator

iustin commented Jun 5, 2021

Couldn't this problem (easy multiline matching) be solved by allowing multiple regexes per file descriptor? At least assuming that order of lines is not important.

I.e. I'm thinking of

printf "Line 1 foo\nLine 2 bar\nLine 3 baz\n"
>>> /Line 1/
>>> /Line 2 bar$/
>>>= 0

And one would need to resort to proper multiline matching only if specific order is needed.

@simonmichael
Copy link
Owner

Some years ago, regex-tdfa was the best compromise of power and portability. Is there anything better (more standard, more robust) nowadays ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants