-
Notifications
You must be signed in to change notification settings - Fork 5
Expression Optimization via Distributive Law
This Documentation is no longer valid . The API/ Interfaces presented here have been removed from source. Kept for Record purpose.
From Britannica
Distributive law the law relating the operations of multiplication and addition, stated symbolically,
a*( b + c ) = a*b + a*c ;
that is, the monomial factor a is distributed, or separately applied, to each term of the binomial factor b + c, resulting in the product ab + ac. From this law, it is easy to show that the result of first adding several numbers and then multiplying the sum by some number is the same as first multiplying each separately by the number and then adding the products.
We do optimization at element level rather than at tensor-level. It has many benefits as this is very good for adaptation to other tensor types such as tensor_sparse
and others. It is due to this approach that we can even optimize the operations that involve scalars.
Consider this example:
tensor<int> s{shape{5,5}, 6};
teensor<int> k{shape{5,5}, 3};
auto expr = 3*s + k*7; // Still optimizes
In the evaluation of this expression, the tensor k
is just a tensor with all values 3. The optimizer uses this as a common operand and optimizes the above expression into: expr = 3*(s + 7)
. In this case, the value put to expr would be 3*(6+7)=39
.
In case k
was a tensor of not all values 3 or only some values 3, the Optimizer would still optimize the expression partially.
Using Execution time as a metric for the performance was not a good idea as depending upon the schedular and process counts the CPU may not do justice, So instead, I have relied on the number of instructions that are executed during the executing. The more the instruction the poor the performance.
We have deliberately chosen the following expression for evaluation a*b + a*c
. Each operand in the expression is a tensor of extent {500, 5000}
int main() {
using namespace boost::numeric::ublas;
using tensor_type = tensor<int>;
std::vector<int> sa(5000*500), sb(2500000);
std::iota(sa.begin(), sa.end(), 1);
std::iota(sb.begin(), sb.end(), 1);
tensor_type a{shape{500, 5000}, sa}, b{shape{500, 5000}, sb}, c{shape{500,5000}, 1};
auto expr = a*b + a*c; // build the expression
tensor_type x = expr; // Evaluate the expression
}
By default we use the optimization and compiled the program with following:
g++ test/tensor/test_basic.cpp -I./include/ -std=c++17 -O3 -o a.out
For Un-optimized binary we compile with the flag for disabling the optimization set as follows:
g++ test/tensor/test_basic.cpp -I./include/ -std=c++17 -O3 -o b.out \
-DBOOST_UBLAS_NO_OPTIMIZATION
valgrind --tool=callgrind ./a.out # Optimized binary
==14545== Callgrind, a call-graph generating cache profiler
==14545== Copyright (C) 2002-2017, and GNU GPL'd, by Josef Weidendorfer et al.
==14545== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info
==14545== Command: ./a.out
==14545==
==14545== For interactive control, run 'callgrind_control -h'.
==14545==
==14545== Events : Ir
==14545== Collected : 59792202
==14545==
==14545== I refs: 59,792,202
valgrind --tool=callgrind ./b.out # Unoptimized binary
==14569== Callgrind, a call-graph generating cache profiler
==14569== Copyright (C) 2002-2017, and GNU GPL'd, by Josef Weidendorfer et al.
==14569== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info
==14569== Command: ./b.out
==14569==
==14569== For interactive control, run 'callgrind_control -h'.
==14569==
==14569== Events : Ir
==14569== Collected : 102292217
==14569==
==14569== I refs: 102,292,217
While I do not present any percentage of improvement in the execution. The lesser instruction to execute to evaluate a tensor is certainly better than evaluating after a lot of instructions. If optimization cannot take place after checking for optimization. The check does not deteriorate the performance by a big factor. In most cases of my run, I found that is was only about 2-5% more instructions.
We offer you control over the optimization using the following macros.
-
BOOST_UBLAS_NO_EXPRESSION_OPTIMIZATION
: If defined this macro causes the expressions do not have any optimization applied. If you do not want us to optimize your expressions please define it during the compilation of your sources. -
BOOST_UBLAS_NO_RECURSIVE_OPTIMIZATION
: If defined this macro causes the optimization to not re-curse to left and right operands in case separated by+
or-
.Informally
tensor<int> a,b,c; // Assume some value in them each of them and same extent. #define BOOST_UBLAS_NO_RECURSIVE_OPTIMIZATION auto expr = (a*b - b*c) - (c*a - a*b); // No optimization takes place the top level operator is - and optimizer does not recurse for optimization to its left and right sides. auto expr2 = (a*b - b*c) + (c*a - a*b); // No optimization, Reason as stated above auto expr3 = (a*b - b*c) * (c*a - a*b); // Does optimizes after recursing even though macro is defined. Because This expression is just one term. Optimizes to (b*(a-c))*(a*(c-b)) #undef BOOST_UBLAS_NO_RECURSIVE_OPTIMIZATION auto expr4 = (a*b - b*c) - (c*a - a*b); // Recurses and optimizes to form: (b*(a-c))-(a*(c-b));
WARNING: THE DATA-TYPE OF THE TENSOR MUST PROVIDE
==
OVERLOAD FOR OPTIMIZATION TO TAKE PLACE. IF YOUR CUSTOM TYPE DOES NOT PROVIDE THIS OPERATOR OVERLOAD, PLEASE DISABLE THE OPTIMIZATION USING THE CONFIGURATION MACRO, ELSE IT IS A COMPILE TIME ERROR.
We both would like to thank our mentor Cem for his constant support and help in achieving our goals. We always find him helpful and he was always easy to reach for help or discussion regarding the work. We would also like to thank Google for the Google Summer of Code Programme, without which all these wouldn't be possible. Lastly, we express our gratitude to our parents for helping and providing us with all the indirect or direct needs for carrying out our work nicely in our homes.
- Home
- Project Proposal
- Milestones and Tasks
- Implementation
- Documentation
- Discussions
- Examples
- Experimental features
- Project Proposal
- Milestones and Tasks
- Implementation
- Documentation
- Discussions
- Example Code