Kahan summation algorithm

In numerical analysis, the Kahan summation algorithm (also known as compensated summation) significantly reduces the numerical error in the total obtained by adding a sequence of finite precision floating point numbers, compared to the obvious approach. This algorithm is attributed to William Kahan.

The algorithm
In pseudocode, the algorithm is:

function kahanSum(input, n) var sum = input[1] var c = 0.0         //A running compensation for lost low-order bits. for i = 2 to n  y = input[i] - c    //So far, so good: c is zero. t = sum + y        //Alas, sum is big, y small, so low-order digits of y are lost. c = (t - sum) - y  //(t - sum) recovers the high-order part of y; subtracting y recovers -(low part of y) sum = t            //Algebraically, c should always be zero. Beware eagerly optimising compilers! next i              //Next time around, the lost low part will be added to y in a fresh attempt. return sum

An example in six-digit floating-point decimal arithmetic. Suppose sum has attained the value 100000 and the next value of input(i) is 3.14159 (a six-digit floating point number) and c has the value zero. y = 3.14159 - 0 t = 100000 + 3.14159 = 100003                     Many digits have been lost! c = (100003 - 100000) - 3.14159 This must be evaluated as written! = 3.00000 - 3.14159          The assimilated part of y recovered, vs. the original full y. = -0.141590                  The trailing zero because this is six-digit arithmetic. sum = 100003                     Thus, few digits from input(i) met those of sum.

The sum is so large that only the high-order digits of the input numbers are being accumulated. But on the next step, suppose input(i) has the value 2.71828, and this time, c is not zero...  y = 2.71828 - -0.141590         The shortfall from the previous stage has another chance. = 2.85987                    It is of a size similar to y: most digits meet. t = 100003 + 2.85987           But few meet the digits of sum. = 100006                     Rounding is good, but even with truncation, c = (100006 - 100003) - 2.85987 This extracts whatever went in. = 3.00000 - 2.85987          In this case, too much. = .140130                    But no matter, the excess will be subtracted off next time. sum = 100006

So the summation is performed with two accumulators: sum holds the sum, and c accumulates the parts not assimilated into sum, to nudge the low-order part of sum the next time around. Thus the summation proceeds with "guard digits" in c which is better than not having any but is not as good as performing the calculations with double the precision of the input. However, if input is already in double precision, few systems supply quadruple precision, and if they did, what if input were quadruple precision...

Another approach is to perform the summation on differences from a working mean (in the hope that the value of sum never becomes much larger than individual differences), except that the values might be quite different from the working mean and thus suffer significant truncation. Alternatively, sort the values and pair positive and negative values so that the accumulated sum remains as close to zero as possible, at great cost in computational effort.

Optimising compilers
One of the optimizations performed by some compilers is to attempt to spot and remove redundant code. A sophisticated optimizer might perform symbolic expression simplification and notice that the code t:=sum + y; c:=(t - sum) - y; can be reduced to  c = 0 However, the granular nature of floating-point types means that it is not possible to apply the symbolic simplification that are applied to integer types.

The algorithm's execution can also be affected by non-mathematical optimisations. For instance, it is quite common for the floating-point computations to be carried out in machine registers that have a precision higher than that of the variables held in main storage, as in the IBM-PC and clones where some of the the floating-point registers hold 80-bit floating-point numbers while in main storage they might be held only as 32-bit, or 64-bit as well as 80-bit. The sequence y:=input[i] - c; t:=sum + y;  c:=(t - sum) - y; might be compiled without any of the unwanted mathematical transformations, but, notice that after the code for the first statement is executed, the register holding the result that was stored in y still has (or could still have: the registers might be organised as a stack with overflow to memory) that result and as the next statement refers to y, perhaps the code for it could be arranged so that the value of y need not be fetched from memory; similarly for t in the third statement. If the stored values are held to the same precision as the registers, then there will be no problem: the registers and main storage are thus equivalent except for the speed of access. But if not, the working of the algorithm will be ruined. Optimisation options helpful for some parts of the program will not necessarily be good for all parts of a program.

Computer language features
Some computer languages offer summation features: +/input      APL (read this as "Plus, over input") SUM(input)   Fortran, a semi-function SUM supplied as a compiler intrinsic. And it might be that the implementation will indeed use the best method. Alas, the Fortran manual to hand offers no details of its internal workings, merely that the result will be in the same precision as input. Inspection of the code generated by the Compaq Visual Fortran v6.6 compiler for a simple usage reveals the equivalent of Load sum Add input(i) Store sum There is no attempt to hold the value of sum in a register, therefore, even if the addition were performed with a precision greater than that of sum, that will be lost when the result is stored into sum and later retrieved. So there is nothing beyond a straight summation.

In some programming languages (C,C++,D), there exist a volatile keyword which ensures that all registers are written/read again to/from memory, so one can use this to disable optimisations for particular section of code.

When all else fails
Via careful testing and scrutiny of the code generated, it may be found that no rearrangement of the algorithm nor selection of compiler options delivers good results. In this situation the final word may be obtained through the manual preparation of embedded assembler-like statements to ensure the generation of exactly the machine code desired, if this facility is offered by the compiler, or of course the invocation of a utility routine written separately in machine code.