Queries are slow (v1) #101

EgorBu · 2018-05-23T06:05:29Z

Hi,
I measured time for the same query that I used in #100 - I think it's very suspicious that it's so slow.
Measurements:

egor@egor-sourced:~/workspace/ml$ time python3 test_filter_libuast.py 100 sourced/ml/__main__.py 
Memory increased by: 2099%

real	0m32.773s
user	0m25.209s
sys	0m0.425s
egor@egor-sourced:~/workspace/ml$ time python3 test_filter_libuast.py 200 sourced/ml/__main__.py 
Memory increased by: 4130%

real	1m22.348s
user	1m5.700s
sys	0m0.776s
egor@egor-sourced:~/workspace/ml$ time python3 test_filter_libuast.py 400 sourced/ml/__main__.py 
Memory increased by: 8275%

real	2m49.173s
user	2m19.193s
sys	0m1.528s

juanjux · 2018-05-30T09:59:18Z

These times are not from a query - there are several levels of anidated foreachs inside every iteration each doing queries plus a complete parse on each iteration.

Timing the queries and parsing individually in the same script gives 0.07 secs for the initial parsing and 0.05-0.06 for each individual xpath query.

The memleak is real trough (I'll add more info soon on #100).

juanjux · 2018-05-30T10:00:36Z

I don't know it 0.05 per query in this case over a non trivial UAST can be considered slow or not so I'll leave this open until we decide, @smola what do you think?

EgorBu · 2018-05-30T10:13:08Z

If xpath query is slower ~20 times than the filtering in pure python - it's definitely strange. Measurements can be found in the closed issue - #92. And can be measured for a new query. Maybe we need to make some kind of guide how to query efficiently like this one.

juanjux · 2018-05-30T10:32:17Z

I don't think it's a problem of query efficiency so much as the libuast calling to some function pointers (like callback) that are implemented on the cpython side which, even while being written in C, use python data structures and runtime that are slow. But I could be wrong and this merit a further investigation.

smola · 2018-06-07T08:24:03Z

XPath might be slower than well-written ad hoc code, this is usually true for SQL too.
If we want to get an idea about how well it is performing, we should probably generate an actual XML from a UAST and compare libuast XPath against pure libxml XPath and Python lxml XPath.

@EgorBu I guess the guide you linked is actually applicable directly.

juanjux · 2018-06-07T08:36:21Z

@smola, definitely, libxml does a lot of allocations when starting up and so does libuast when creating the pseudo-xml. Ad-hoc code on Python works on already allocated and initialized nodes it's not so strange that is faster (specially using filter() et all that are directly implemented using C).

Also, as explained above, libuast calls some callback-style functions that in the case of Python is implemented in CPython but use py_list, py_dict, etc that run at the speed of the python interpreter (this is applicable to all clients).

@EgorBu could you try again this benchmark after the memleak fix? I'm curious about the results but I don't have your system to have comparable times. Would be also nice if you had a timing for individual xpath queries.

bzz · 2019-01-24T14:24:25Z

@EgorBu thank you for putting together the benchmark!

We will re-run this test again as soon as #128 lands and hopefully new libxml-less version will improve the performance!

dennwc · 2019-08-06T14:07:51Z

The new client was released, we need to check the performance again.

EgorBu · 2019-08-07T10:08:08Z

Would be glad to hear news about changes!)

EgorBu changed the title ~~[slow] queries are very slow~~ [performance] queries are slow May 23, 2018

juanjux closed this as completed May 30, 2018

juanjux reopened this May 30, 2018

EgorBu mentioned this issue Jan 3, 2019

Umbrella issue for ML team bblfsh/bblfshd#231

Closed

36 tasks

dennwc changed the title ~~[performance] queries are slow~~ Queries are slow (v1) Aug 6, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Queries are slow (v1) #101

Queries are slow (v1) #101

EgorBu commented May 23, 2018

juanjux commented May 30, 2018

juanjux commented May 30, 2018 •

edited

Loading

EgorBu commented May 30, 2018

juanjux commented May 30, 2018 •

edited

Loading

smola commented Jun 7, 2018

juanjux commented Jun 7, 2018 •

edited

Loading

bzz commented Jan 24, 2019

dennwc commented Aug 6, 2019

EgorBu commented Aug 7, 2019

Queries are slow (v1) #101

Queries are slow (v1) #101

Comments

EgorBu commented May 23, 2018

juanjux commented May 30, 2018

juanjux commented May 30, 2018 • edited Loading

EgorBu commented May 30, 2018

juanjux commented May 30, 2018 • edited Loading

smola commented Jun 7, 2018

juanjux commented Jun 7, 2018 • edited Loading

bzz commented Jan 24, 2019

dennwc commented Aug 6, 2019

EgorBu commented Aug 7, 2019

juanjux commented May 30, 2018 •

edited

Loading

juanjux commented May 30, 2018 •

edited

Loading

juanjux commented Jun 7, 2018 •

edited

Loading