Recently, I dived into a complicated code which aim was to compute some mathematical functions employed by more tangled subsystems. The code was very cool and I immediately thought that it was written by one (or more) scientific computing expert because of its sophistication and correctness. You’re wondering why I dived into that code because of the old saying “what works should never be changed“. Another time, efficiency cooked our goose!
I didn’t know if the code was very slow or terribly slow, anyhow, it could be improved, that was sure.
To let you taste some evidences of mathematical computations cost, these are the average clock cycles required by an IBM Power3 to compute common operations (thanks to CASPUR):
Instruction 
32 bit 
64 bit 
Integer Product 
34 
39 
Integer Division 
21 
34 
Real Sum and Product 
34 
34 
Real Division 
1421 
1825 
Real Square Root 
1423 
2231 
It’s clear we have to avoid expensive operations like divisions and try to reduce them to simpler ones, like sum and product. I’ll give you an example:
// Matrix Normalization for (int i=0; i<N; ++i) for (int j=0; j<N; ++j) c[i][j] = a[i][j] * b[i][j]/(norm)
// Matrix Normalization rev_norm = 1.0/(norm); for (int i=0; i<N; ++i) for (int j=0; j<N; ++j) c[i][j] = a[i][j] * b[i][j] * rev_norm
This time we need 2*N*N products but only one division. Try on your architecture and measure the results – take into account potential approximation errors. Another time, efficiency is about tradeoff.
This kind of optimization is called Strength Reduction, because the target is to reduce the cost (time and/or memory) of the computation.
Other useful considerations about integer divisions (same rules apply to modulo calculations):
 integer division by a power of 2 can be done with a shift operation, which is much faster;
 integer division by a constant is faster than division by a variable;
 integer division by a constant is faster if the dividend is unsigned.
Another notable example is the computation of powers:
pow(a,4) <=> a * a * a * a;
If you execute two simple loops like the following:
for (int i=0; i<10000000; ++i) toCompute = a * a * a * a; for (int i=0; i<10000000; ++i) toCompute = pow(a,4);
it should be quite evident the big difference in terms of time: I measured something like 0.02 secs for the former loop and 0.586 for the latter!
The code I told you about at the beginning of this post was full of pows and other similar “expensive” functions. It was easy to change them and to achieve better results.
I’m not going to show you wagons of tricks, you can both search the web and read a good book/manual, like Optimizing software in C++ by Agner Fog (that is free).
Final considerations: sometimes it can be convenient to rewrite by hand some mathematical functions. Something like sin() is coded in order to equally minimize the error on the whole domain of the real function. But if you are interested in a certain range (relatively little) or smaller precision then you can reduce time of computations by using:
 Series,
 Approximate Formulas,
 Polynomial and Rational Fitting,
 Trigonometric Formulas.
 Accuracy,
 Error accumulation,
 Errors distribution on the function domain.
Time 
Error 
Max Gain 

Runtime Library 
0.41’’ 
– 
– 
III order 
0.19’’ 
8.3e7 
54% 
V order 
0.24’’ 
1.5e10 
42% 
VII order 
0.28’’ 
5.0e11 
32% 