Skip to content

zherczeg/pcresp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pcresp is a stream processing tool which can be used to update
stream data using PCRE2 regular expressions and custom scripts.

Some examples:

  echo ABCD | pcresp '[BC]+'
     By default pcresp prints the matched strings.
     The unmatched characters are discarded by default.
     Result: BC

  echo ABCD | pcresp -s '' -p '[BC]+'
     An empty program prints nothing.
     The unmatched characters are printed because of -p.
     Result: AD

  echo ABCD | pcresp -p '[BC]+' -s '/bin/echo -n {#0}'
     The echo program prints its arguments.
     There is no newline because of the -n parameter.
     Result: A{BC}D

  echo A7 BB29 | pcresp '([A-Z]+)(\d+)' -s "*print -#1- :#2:"
     The script arguments can be printed by the *print flag.
     This is considerably faster than using the echo program.
     Result:
       -A- :7:
       -BB- :29:

  echo 29 30 31 32 33 | pcresp '(\d+)(*SKIP)(?C^ *null /usr/bin/expr #1 % 3 = 0 ^)'
     This command prints those decimal numbers whose are divisible by 3.
     This condition is checked by 'expr number % 3 = 0'.
     The return value of expr influence the matching: zero - continue, non-zero fail.
     Result: 30 33

  SRC='
    INC=`expr $2 + 1`
    echo $1 : $INC
  '
  echo A7 BB29 | pcresp -d prog "$SRC" -s "/bin/bash -c #[prog] arg0 #1 #2" '([A-Z]+)(\d+)'
    This example shows an easy way to embed strings (usually source codes).
    The source referenced #[0] is not processed by pcresp.
    Result:
      A : 8
      BB : 30

  SRC='
    INC=`expr $2 + 1`
    echo $1 : $INC
  '
  echo A7 BB29 | pcresp --def-string prog "$SRC" --shell "/bin/bash -c ? arg0" \
      -s "#[prog] #1 #2" '([A-Z]+)(\d+)'
    Same as above

  export PCRESP_SHELL="/bin/bash -c ? arg0"
  SRC='
    INC=`expr $2 + 1`
    echo $1 : $INC
  '
  echo A7 BB29 | pcresp -d prog "$SRC" -s "#[prog] #1 #2" '([A-Z]+)(\d+)'
    Same as above

Usage pcresp [options] pcre_pattern [file list]

  -h, --help
          Prints help and exit
  -s, --script script
          Executing this script after each successul match
  -p, --print
          Prints characters between matched strings
  -d, --def-string name string
          Define a constant string (see #[name])
  --shell default-shell
          Specify the default shell for each script
  [--pattern] pcre2_pattern
          Specify the pattern. The --pattern can be omitted
          if the pattern does not start with dash
  --limit n
          Stop after n successful match (0 - unlimited)
  -i
          Enable caseless matching
  -m
          Both ^ and $ match newlines
  -x
          Ignore white space and # comments (extended regex)
  -u, --utf
          Enable UTF-8
  --dot-all
          Dot matches to any character
  --newline-[type]
          [type] can be: lf (linefeed), cr (carriage return)
          crlf (CR followed by LF), anycrlf (any combination of
          CR and LF), unicode (any Unicode newline sequence)
  --bsr-[type]
          [type] can be: anycrlf (any combination of CR and LF),
          unicode (any Unicode newline sequence)
  --verbose
          Display executed commands (useful for debugging)
  --end
          End of parameter list. If the pattern has not
          specified yet it must be the next argument

  Reads from stdin if file list is empty

Return value:
  0 - pattern matched at least once
  1 - pattern does not match
  2 - error occured

Pattern format:

  Patterns must follow PCRE2 regular expression syntax
  Scripts can be executed during matching using (?C) callouts

Script format:

  A list of arguments separated by white space(s)
  The first argument must be an executable file

Recognized # (hash mark) sequences:

  Sequences marked by [*] are not accepted by --shell

  #idx         - string value of capture block idx (0-65535) [*]
  #{idx}       - string value of capture block idx (0-65535) [*]
  #[name]      - insert constant string by name [*]
  #{idx,name}  - same as #{idx} if capture block is not empty
                 same as #[name] otherwise [*]
  #M           - current MARK value [*]
  ##           - # (hash mark)
  #<           - less-than sign character
  #>           - greater-than sign character
  #n           - newline (\n) character

Script arguments can be preceeded by control flags:

  *print    - print arguments
  *!nl      - no newline after arguments are printed
  *null     - discard output
  *!sh      - default-shell (--shell) is not used

Arguments enclosed in <> brackets:

  Arguments can be enclosed in <> brackets. These enclosed
  arguments are never recognised as special arguments such
  as control flags but # sequences are still recognized.

  Examples: <argument with spaces> <!null> <?>
            a '/bin/echo <##>' script prints a # sign

Setting the default shell:

  The default shell can be set by the --shell option or by the
  PCRESP_SHELL environment variable.

  Example: --shell '/bin/bash -c ? my_script'

  The arguments defined by the default shell are added before
  the arguments provided by the currently executed script.
  If a question mark argument is present in the default script
  it will be replaced by the first argument of the script and
  this first argument is not appended after the default shell.

About

Stream processing tool

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published