Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Multiple, Newline-Separated Patterns #646

Open
wants to merge 1 commit into
base: dev
Choose a base branch
from

Conversation

DabeDotCom
Copy link

Per POSIX 1003.1:

The pattern_list's value shall consist of one or more patterns separated by <newline> characters

So, for example, you could do:

bash%  ls -l /etc/ | grep $'passwd\ngroup'

Now you can do the same with ack:

bash% ls -l /etc/ | ack $'passwd\ngroup`
bash% ls -l /etc/ | ack $'p(asswd|rofile)\ns(hells|udoers)'

This is especially handy with command substitution [backticks], as it means you don't have to manually perform any alice|bob|carol shenanigans; the alternation is automatically performed for you:

bash% ack /etc --match "`/bin/ls /home/`" 

@DabeDotCom
Copy link
Author

(Personally, I think this is a more useful fix than #522)

@petdance
Copy link
Collaborator

Thanks for this, Dabe. Interesting idea that I think is worth talking about. The idea of multiple regexes is not a new one. If we do adopt it, it will go into ack3, not ack2. ack2 is not getting any new development. ack3 is getting ready for its first alpha in the next week or two. If we do add this, it would be a great thing to have on the "new in ack 3!" features list.

Some thoughts, in no particular order:

  • This is a big change in behavior that I don't want to just drop in without discussion about what other effects this might have.

  • Have you done any measurements on what this may do to execution speed?

  • We'll need to also test that --regex=foo\nbar works.

  • I'm not sure that this is the best (or only) way to get multiple regexes in.

  • Let's think of all the ways a simple join( '|', @pat ) could go wrong.

@n1vux
Copy link

n1vux commented Aug 10, 2017

Ack already has ability to do multiple patterns with ls -l /etc/ | ack 'passwd|group' .

We have tried to maintain some degree of plug-compatibility with (f)grep, but since the RE language is different, it can't be exact. An ability to read a bunch of -wQ words from a file would not be bad.

ways a simple join( '|', @pat ) could go wrong

@DabeDotCom
Copy link
Author

I didn't even realize there was an ack3. Mea culpa

I'm all for a more elegant solution than join( '|', @pat )... (I'm sure you could end up in backtracking hell.)

But I haven't looked to see what, e.g., GNU grep does to maximize efficiency.

In terms of measuring speed, though, it's kind of tough, since currently there's NO way to include multiple regexes. (So does that make this "infinitely" faster? «wink»)

@petdance
Copy link
Collaborator

I didn't even realize there was an ack3. Mea culpa

I haven't really announced it, and there's not really a central place to announce it. :-/

@petdance
Copy link
Collaborator

@DabeDotCom What prompted you to want this bit of functionality?

@DabeDotCom
Copy link
Author

Ack already has ability to do multiple patterns with ls -l /etc/ | ack 'passwd|group' .

Yup, I acknowledged that.

But this is just gross, IMHO:

ack /etc --match "`/bin/ls /home/ | tr '\n' '|' | sed -e 's/|$//'`"

AND it opens you up to injection bugs: cd /home ; touch "|" — now you've included an empty regex. In the PR, one of the tests in t/split-newline.t verifies that -Q Does What I Mean; it quotes each regex individually, not the "meta" regex characters that glue each pattern together:

ack -Q /etc --match "`/bin/ls /home/`"

@DabeDotCom
Copy link
Author

DabeDotCom commented Aug 10, 2017

@petdance I wanted to be able to do:

ls -ltr | ack --passthru "`grep -l SOME_TAG *`"

to get a list of all files, sorted by date, highlighting the ones that matched.

(Admittedly, there are still some pathological cases — like if a filename contains a newline, e.g. But POSIX grep would fail that, too...)

@n1vux
Copy link

n1vux commented Aug 10, 2017

We have tried to maintain some degree of plug-compatibility with (f)grep, but since the RE language is different, it can't be exact. In this case, we do it as egrep does, so we needn't do it as grep does.

While not supporting this GNUism may violate least-surprise for folks who discovered this quirk in GNU grep before finding egrep, I don't see that as a driving reason to change.

Encouraging embedding newlines in cmd line arguments doesn't make me happy either.

If anything, if we wish to provide a $pat = join( '|', @pat ) service, rather than splitting on \n, i could support multiple occurrences of --regex=foo --regex=bar as that could simplify a script using ack (and give it a choice of doing its own in-fix join or generating 1 to N "--regex='$pat'" ).

(However, an ability to read a bunch of -wQ or -Qwords from a file, as fgrep -f words does, would be very good, and that would also be a join. While we could accept that without an implied -Q, hewing to the fgrep -f and saying they're words and thus -Qw seems simplest and least-surprise-y to me.)

ways a simple join( '|', @pat ) could go wrong
For one, any | in the @pat needs escaping; is that on the user or on us?
(One reason to prefer -Q or -Qw for hypothetical --file-of-patterns aka --fgrep-f )

But this is just gross, IMHO:

ack /etc --match "`/bin/ls /home/ | tr '\n' '|' | sed -e 's/|$//'`"

AND it opens you up to injection bugs: cd /home ; touch "|"
— now you've included an empty regex.

That isn't ack doing that, that's the calling script trusting its input.
Since we allow '|' in patterns, unless your calling script escapes all metachars in /bin/ls output, touch '|' will give you two (not one) empty alternatives, even if \n is taken as an alias for or.

In the PR, one of the tests in t/split-newline.t verifies that -Q Does What I Mean; it quotes each regex individually, not the "meta" regex characters that glue each pattern together:

Nice touch, i like that .

@DabeDotCom
Copy link
Author

For one, any | in the @pat needs escaping; is that on the user or on us?

@n1vux Actually, it wouldn't. From t/split-newline.t:

MULTIPLE_REGEXES: {
    my @expected = split( /\n/, <<'EOF' );
I was playin' soft while Bobbie sang the blues
From the Kentucky coal mines to the California sun
Bobbie shared the secrets of my soul
Bobbie baby kept me from the cold
One day up near Salinas, Lord, I let her slip away
EOF

    my @files = qw( t/text/me-and-bobbie-mcgee.txt );
    my @results = run_ack( "co(?:ld|al)\nso(?:ft|ul)\nSalinas", @files );

    lists_match( \@results, \@expected, 'Multiple regexes' );
}

There, I WANT the alternation...

i could support multiple occurrences of --regex=foo --regex=bar

I had drafted a different issue for that, too! «grin»

POSIX grep allows for multiple -e flags. (Though it doesn't address the backtick use case)

(However, an ability to read a bunch of -wQ or-Qwords from a file, as fgrep -f words does, would be very good, and that would also be a join. While we could accept that without an implied -Q, hewing to the fgrep -f and saying they're words and thus -Qw seems simplest and least-surprise-y to me.)

Yet another issue I started to open was to include --regex-from=<file> à la grep -f.

This would let me use process substitution to do:

bash% ls -ltr | ack -Q --passthru --pattern-from <(grep -l SOME_TAG *)

@petdance
Copy link
Collaborator

Yet another issue I started to open was to include --regex-from= à la grep -f.

That one I'm much less keen to pursue.

Aside: If you haven't looked at DESIGN.md in the ack3 repo, take a look at that.

@n1vux
Copy link

n1vux commented Aug 10, 2017

I wanted to be able to do:

ls -ltr | ack --passthrough "`grep -l SOME_TAG *`"

to get a list of all files, sorted by date, highlighting the ones that matched.

Is there a reason Gnu Grep doesn't work here? It has got highlighting now. Does it lack a --passthrough equivalent?

(On my system, (cd /etc; grep -l $USER *) generates a lot of grep: X11: Is a directory and $fn: cannot open file for reading errors mess ... i hope your real application is in a leaf directory.)

Ok, we like compound queries with ls; Ack3 cookbook has recommended compound queries. Ack has -f, -g, -l, and --passthrough modes, but they don't combine to do exactly this.

I know I can list just the matching ones in date order with full ls -lart thusly ...

ack -l $USER /etc 2>/dev/null | xargs ls -lart

(use the 0 options if afflicted with spaces in path/names)
but that doesn't get you highlighted in context of the non-highlighteds.

I will take this as a challenge to consider alternative solutions for Ack3 Cookbook that are less ugly than the tr sed example as well as variations that get close with elegance.

(Although if GnuGrep or some other tool can do some "it" better than Ack, we are willing in the Cookbook section to recommend Gnu Grep for doing "it".)

@n1vux
Copy link

n1vux commented Aug 10, 2017

This would let me use process substitution to do:

I love process substitution filehandles in recent bash!

@DabeDotCom
Copy link
Author

Yet another issue I started to open was to include --regex-from= à la grep -f.

That one I'm much less keen to pursue.
Aside: If you haven't looked at DESIGN.md in the ack3 repo, take a look at that.

Ironically, ignoring grep -f goes against the one-and-only Guiding principle: When deciding on ack's behavior, try to be grep-compatible if possible.

(And BTW — I'm not trying to stir the pot, here! I can't even begin to express how appreciative I am for you guys' hard work!! I'm just one of those pesky users who's always asking for more features than I'm capable of actually implementing myself... «sigh»)

@DabeDotCom
Copy link
Author

Is there a reason Gnu Grep doesn't work here? It has got highlighting now. Does it lack a --passthrough equivalent?

Correct.

(And as a very trivial nit: ack spells it --passthru. I was bitten by that earlier, too! ag happens to include an alias for --passthrough, which is a little more "liberal in what you accept" ... Postel's Law)

@n1vux
Copy link

n1vux commented Aug 10, 2017

"liberal in what you accept" ... Postel's Law

Amen. RIP Jon, and his acolyte my mentor MAP also.

so to summarize , No total solutions from the beyondgrep.com Other Tools tab spotted in a quick scan:

  • grep has newlines as OR in both inline and -f, but no passthrough (which is odd, GNU have added a kitchen sink and a shower to the formerly Do One Thing Well command)
  • ack has | as OR in single inline pattern, and passthrough, but no --fgrep-f FILEofPATs equivalent, and fixup with | in bash cmdline is ugly
  • ag has passthrough but not --fgrep-f, and
  • glark has -f but not passthrough.

Not that i'm convinced that this is a usecase that must be possible without writing some real code, but it is so tantalizingly close to what we can do with just ack/grep/ls/bash while being a little outlandish that as a Cookbook challenge i'm game to explore how close i can get!

Thanks for the example. I expect i can do it somewhat less inelegantly than your sed, tr bad example (that sort of ugly exactly is why i learned AWK and then Perl4+5 when i first got on Unix back in %DECADE%CENSORED%), it'll be a lovely bit of chrome for the Ack3 cookbook. Stay tuned, I'll try to post my improvement back here but it should be in Ack3 docs when released.

@n1vux
Copy link

n1vux commented Oct 20, 2017

The best I've got so far is ....

Ack doesn't have "--fgrep-f" nor does it accept newlines as OR otherwise,
as newer Grep does. But Grep has no "--passthru". Requestor would like to
view the whole files but highlight any of several words in each, which
needs both. Workaround is ugly:

ack /etc --match "`/bin/ls /home/ | tr '\n' '|' | sed -e 's/|$//'`"

Longer but more readable, use $() instead of `` and Perl instead of tr, sed, which allows us to insert | between as needed without an extra to be removed:

ack /etc --match $(/bin/ls /home/ | perl  -E '@u=<>; chomp for @u; say join q(|), @u' )

or invert the "ls",

ack /etc --match $( perl -E '@u=`ls /home/`; chomp for @u; say join q(|), @u' )

or keep it in one process,

ack /etc --match $( perl -E 'chdir q(/home/); @u=<*>; chomp for @u; say join q(|), @u' )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants