Open
Description
using latest pyre2 and RE2, when trying to proccess a big regex for webserver logs pythons re is far more quick than pyre2.
weblog structure: '$domain $remote_addr $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" "$request_time"
One can reproduce the bug by adding this on tests/performance.py:
def getweblogdata():
return open('/var/log/apache2/access.log') ##Mine is about 9mb
@register_test("weblog scan",
r'^(([\w\d]|[\w][\w\d\-]*[\w\d])\.)*([\w]|[\w][\w\d\-]*[\w\d])[\s]([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})[\s]*(.+)[\s]\[(.*)\][\s]"(GET|HEAD|POST|PUT|DELETE|OPTIONS)[\s](.*)[\s](HTTP.*)"[\s]([\d]*)[\s]([\d]*)[\s]"(.*)"[\s]"(.+)"[\s]"(.+)"',
1,
data=getweblogdata())
def weblog_matches(pattern, data):
"""
Match weblog data line by line.
"""
total=0
for line in data:
p = pattern.match(line)
if p==None:
continue
total += len(p.groups())
data.seek(0)
return total
test |Description |# total runs |re
time(s) |re2
time(s) | % regex
time
weblog scan search and extract weblog data. |1 | 8.678 | 94.259 |1086.13%
i cant understand if this is related to RE2 or pyre2 or my code.
Metadata
Metadata
Assignees
Labels
No labels