Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

created optimized versions, added unit tests, new benchmarks #2

Open
wants to merge 21 commits into
base: master
Choose a base branch
from

Conversation

hubtheflow
Copy link
Contributor

Created optimized versions of ArrayQuery class

  • one with less function calls(made possible with goto-statements)
  • second with micro optimizations(changed is_array function call to faster for small arrays language construct)
  • created unit tests to make sure all classes work as intended
  • created new benchmarks to compare the optimized versions

@olekukonko
Copy link
Owner

Here are some observations:

  • This is Tight coupling , Micro Optimization and the way you used goto made it more on readable.
  • Removing $addParent && $paths[$px . $key] = $items; would mean that only tail value can be searched and you render $all and $has useless ...
  • Are you planing to only get one record at a time ? goto evaluate would break the loop it means additional record would need to start the loop foreach ( $data as $key => $value ) { again

Am not sure if this is best practice .. and do you have xdebug installed ? This might be affecting your numbers .. because i can see much gain

Thanks for the unit test ... That was very need am always very lady to implement ...

removed line that was added for testing purposes
@hubtheflow
Copy link
Contributor Author

  • regarding goto and other micro optimization - these classes are created for production environment.
    Due to the hard-to-debug nature of goto and other micro optimizations the ArrayQuery class should always be the bleeding edge development class. If some changes are made to the original ArrayQuery class, it will be ported to the ArrayQuery_goto and ArrayQuery_goto_micro classes, so latter could be used in cases where speed matters much.
  • about $addParent && $paths[$px . $key] = $items;, It's a confusion, I've added this line to check the speed without encoding to json. I'm removing it in a new commit.
  • Unit tests have the case when two records are returned, It works the same in both classes. The loop continues because I unset the value which has already been evaluated.

Yes I have xDebug. What numbers are you talking about?

Enhance php unit testing framework is compact and pretty easy to get started with, and due to the similar api you can easily switch to a much heavier PHPunit in cases when you really need it.

@olekukonko
Copy link
Owner

  • Instead of having a unreadable script or something difficult to manage , The solution would be to design multi threaded version using pthread. I demonstrated the power here [pcntl runs the same code several times, assistance required]
  • The disadvantage is that:
    • where would be no performance gain on small data set but significant performance gain on large Dataset
    • You would need to install pthread ( Which is very easy ) but only works on thread safe version of PHP
  • Advantage :
    • Very fast .. exactly what you want
    • Data set can be broken into parts and worked on independently
    • Clean code and easy to maintain without those micro optimization

@hubtheflow
Copy link
Contributor Author

Pthread may be useful for some cases, but it is still a high level optimization, the modifications I want are the low-level optimizations.
I see the ArrayQuery class as a low-level block for apps, which could already be multithreaded. You are right, multithreading really gives a huge performance gain, that's why it should be implemented thoughtfully.

Btw. I prefer ZMQ for most threading tasks as its very scalable and easy to implement.

@olekukonko
Copy link
Owner

ZMQ is a transport layer which you can use to implement your own message queue with multiple workers. This not Multithreading and going this route is as good as saving the data Mongo DB or Redis.

  • What do you mean by high level optimization or low-level optimizations ? I think you need to see this SOLID Principles it fully describes what you are doing
  • I prefer ZMQ for most threading tasks as its very scalable and easy to implement. That is totally wrong except you are just implementing ZMQ::SOCKET_REP and ZMQ::SOCKET_REQ which is not what workers are about.

This would be a Broker Model using ZMQ::SOCKET_ROUTER and ZMQ::SOCKET_DEALER which would definitely take longer time to implement than using threads

@hubtheflow
Copy link
Contributor Author

About Multithreading with MQ. I don't think pthread would give much performance gain compared to ZMQ in the use case you are describing (on the big dataset).
I'm not using ZMQ for this class, I've been talking that it makes sense to use threads on a higher application level, not for just optimizing array queries. I generally need the ArrayQuery class(and I think other users would also consider this use case) for querying small arrays multiple times.

What do I mean about low-level optimization? Imagine the case when one 3 functions are called 10 000 times, eg total 30 000 times, function call in php takes a while, in production environment(with much higher call numbers) we can use the optimized version of the code to minimize function calls at least 3 times.

Lol, why are you mentioning STUPID for a pull request where are no Singletons introduced(I find them not usable too), all code is covered with tests, optimization is made on the top of the working class(I consider that it is Mature optimization in such case), and the Naming is fixed?

Yes the class should remain easily readable and modifiable, but not at the performance cost, all additional features can be implemented in other application classes, and as this has been started(and you greatly helped me with, even created the whole thing) as a simple array querying class with MongDB like syntax.

The good thing would be to implement all basic MongoDB operators, such as $set and $inc.

Also I wanted to ask(didn't test it myself yet) is it possible to use $and and $or with more than two operands?

minor improvements
* changed var++ to ++var, as its faster due to the implementation in PHP core
minor improvements
* changed var++ to ++var, as its faster due to the implementation in PHP core
* removed commented out code
@hubtheflow
Copy link
Contributor Author

Am not sure if this is best practice .. and do you have xdebug installed ? This might be affecting your numbers .. because i can see much gain
Lol, got what you mean, I've tested the performance with both Xdebug enabled and disabled, it doesn't give a big difference in this case(the ArrayQuery_goto_micro is the fastest, then goes ArrayQuery_goto and then the reference ArrayQuery class). I know that Xdebug has terrible impact on speed when using its profiling tools, that's why I've switched to Forp.
Btw, read about the performance of php functions 1 2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants