c - Which exponentiation algorithms do CPU/programming languages use? -
i've been learning faster exponentiation algorithms (k-ary, sliding door etc.), , wondering ones used in cpus/programming languages? (i'm fuzzy on whether or not happens in cpu or through compiler)
and kicks, fastest?
edit regarding broadness: it's intentionally broad because know there bunch of different techniques this. checked answer had looking for.
i assume interest in implementation of exponentiation functions can found in standard math libraries hlls, in particular c/c++. these include functions exp(), exp2(), exp10(), , pow(), single-precision counterparts expf(), exp2f(), exp10f(), , powf().
the exponentiation methods mention (such k-ary, sliding window) typically employed in cryptographic algorithms, such rsa, exponentiation based. not typically used exponentiation functions provided via math.h or cmath. implementation details standard math functions exp() differ, common scheme follows three-step process:
- reduction of function argument primary approximation interval
- approximation of suitable base function on primary approximation interval
- mapping result primary interval entire range of function
an auxiliary step handling of special cases. these can pertain special mathematical situations such log(0.0), or special floating-point operands such nan (not number).
the c99 code expf(float) below shows in exemplary fashion steps concrete example. argument a first split such exp(a) = er * 2i, i integer , r in [log(sqrt(0.5), log(sqrt(2.0)], primary approximation interval. in second step, approximate er polynomial. such approximations can designed according various design criteria such minimizing absolute or relative error. polynomial can evaluated in various ways including horner's scheme , estrin's scheme.
the code below uses common approach employing minimax approximation, minimizes maximum error on entire approximation interval. standard algorithm computing such approximations remez algorithm. evaluation via horner's scheme; numerical accuracy of evaluation enhanced use of fmaf().
this standard math function implements known fused multiply-add or fma. computes a*b+c using full product a*b during addition , applying single rounding @ end. on modern hardware, such gpus, ibm power cpus, recent x86 processors (e.g. haswell), recent arm processors (as optional extension), maps straight hardware instruction. on platforms lack such instruction, fmaf() map slow emulation code, in case not want use if interested in performance.
the final computation multiplication 2i, c , c++ provide function ldexp(). in "industrial strength" library code 1 typically uses machine-specific idiom here takes advantage of use of ieee-754 binary arithmetic float. lastly, code cleans cases of overflow , underflow.
the x87 fpu inside x86 processors has instruction f2xm1 computes 2x-1 on [-1,1]. can used second step of computation of exp() , exp2(). there instruction fscale used multiply by2i in third step. common way of implementing f2xm1 microcode utilizes rational or polynomial approximation. note x87 fpu maintained legacy support these days. on modern x86 platform libraries typically use pure software implementations based on sse , algorithms similar 1 shown below. combine small tables polynomial approximations.
pow(x,y) can conceptually implemented exp(y*log(x)), suffers significant loss of accuracy when x near unity , y in large in magnitude, incorrect handling of numerous special cases specified in c/c++ standards. 1 way around accuracy issue compute log(x) , product y*log(x)) in form of extended precision. details fill entire, lengthy separate answer, , not have code handy demonstrate it. in various c/c++ math libraries, pow(double,int) , powf(float, int) computed separate code path applies "square-and-multiply" method bit-wise scanning of the binary representation of integer exponent.
#include <math.h> /* import fmaf(), ldexpf() */ /* rintf(), -0.0f -> +0.0f, , |a| must < 2**22 */ float quick_and_dirty_rintf (float a) { float cvt_magic = 0x1.800000p+23f; return (a + cvt_magic) - cvt_magic; } /* approximate exp(a) on interval [log(sqrt(0.5)), log(sqrt(2.0))]. maximum ulp error = 0.70951 */ float expf_poly (float a) { float r; r = 0x1.6ab95ep-10f; r = fmaf (r, a, 0x1.126d28p-07f); r = fmaf (r, a, 0x1.5558f8p-05f); r = fmaf (r, a, 0x1.55543ap-03f); r = fmaf (r, a, 0x1.fffffap-02f); r = fmaf (r, a, 0x1.000000p+00f); r = fmaf (r, a, 0x1.000000p+00f); return r; } /* approximate exp2() on interval [-0.5,+0.5] maximum ulp error = 0.79961 */ float exp2f_poly (float a) { float r; r = 0x1.4308f2p-13f; r = fmaf (r, a, 0x1.5f0722p-10f); r = fmaf (r, a, 0x1.3b2bd4p-07f); r = fmaf (r, a, 0x1.c6af80p-05f); r = fmaf (r, a, 0x1.ebfbdep-03f); r = fmaf (r, a, 0x1.62e430p-01f); r = fmaf (r, a, 0x1.000000p+00f); return r; } /* approximate exp10(a) on [log(sqrt(0.5))/log(10), log(sqrt(2.0))/log(10)]. maximum ulp error = 0.77879 */ float exp10f_poly (float a) { float r; r = 0x1.a65694p-3f; r = fmaf (r, a, 0x1.158a00p-1f); r = fmaf (r, a, 0x1.2bda9ap+0f); r = fmaf (r, a, 0x1.046f72p+1f); r = fmaf (r, a, 0x1.535248p+1f); r = fmaf (r, a, 0x1.26bb1cp+1f); r = fmaf (r, a, 0x1.000000p+0f); return r; } /* compute exponential base e. maximum ulp error = 0.88790 */ float my_expf (float a) { float t, r; int i; t = * 0x1.715476p+0f; // 1/log(2) t = quick_and_dirty_rintf (t); = (int)t; r = fmaf (t, -0x1.62e430p-1f, a); // log_2_hi r = fmaf (t, 0x1.05c610p-29f, r); // log_2_lo t = expf_poly (r); r = ldexpf (t, i); if (a < -105.0f) r = 0.0f; if (a > 105.0f) r = 1.0f/0.0f; // +inf return r; } /* compute exponential base 2. maximum ulp error = 0.87922 */ float my_exp2f (float a) { float t, r; int i; t = quick_and_dirty_rintf (a); = (int)t; r = - t; t = exp2f_poly (r); r = ldexpf (t, i); if (a < -152.0f) r = 0.0f; if (a > 152.0f) r = 1.0f/0.0f; // +inf return r; } /* compute exponential base 10. maximum ulp error = 0.95588 */ float my_exp10f (float a) { float r, t; int i; t = * 0x1.a934f0p+1f; t = quick_and_dirty_rintf (t); = (int)t; r = fmaf (t, -0x1.344136p-2f, a); // log10(2)_hi r = fmaf (t, 0x1.ec10c0p-27f, r); // log10(2)_lo t = exp10f_poly (r); r = ldexpf (t, i); if (a < -46.0f) r = 0.0f; if (a > 46.0f) r = 1.0f/0.0f; // +inf return r; }
Comments
Post a Comment