Monday, February 4, 2019

Fixed Point Failure

A Fixed Point math library and Neural Net demo
for the Arduino...

Or: Multiple cascading failures all in one place!


Last year I found a simple self-contained Artificial Neural Net demo written for the Arduino at: robotics.hobbizine.com/arduinoann.html and spent a goodly amount of time futzing around with it. I now, almost, understand HOW they work, but have only a glimmering of insight into WHY. The demo does something really silly: The inputs are an array of bit patterns used to drive a 7-segment numeric display and the outputs are the binary bit pattern for that digit (basically the reverse of a binary to 7-segment display driver). Someone not totally under the influence of ANNs could do this with a simple 10 byte lookup table. But that is not us. On the plus side it _learns_ how to do the decoding by torturous example, so we don't have to bother our tiny brains with the task of designing the lookup table.

HOW ANNs work on the Arduino is:
  • a) Extremely slowly, because they use a metric shit-ton of floating point arithmetic; and,
  • b) Not very interestingly, because each weight takes up 4 bytes of RAM and there is only about 1Kb kicking around after the locals and stack and whatever else is accounted for -- the simple demo program illustrated here uses about half of that 1K just for the forwardProp() node-weights and then the backProp() demo uses the other half for temporary storage. Leaving just about nothing to implement an actually interesting network.
But. I thought I could make a small contribution by replacing the floating point -- all emulated in software -- with an integer based Fixed Point implementation -- whose basic arithmetic is directly supported by the ATMEGA hardware. This would also halve the number of bytes used by each weight value. Brilliant yes?

And in fact. My FPVAL class works (see below for zip file).  Except, err, well, it doesn't save any execution time. But more on that later....

Anyway. The FPVAL implementation uses a 2-byte int16_t as the basic storage element (half the size of the float) and pays for this with a very limited range and resolution. The top byte of the int16 is used as the "integer" portion of the value -- so the range is +/- 128.  The bottom byte is used as the fraction portion -- so the resolution is 1/256 or about .0039 per step. On first blush, and seemingly also in fact, this is just about all you need for ANN weights.

As it turns out, simple 16 bit integer arithmetic Just Works(TM) to manipulate values, with the proviso that some judicious up and down shifting is used to maintain Engineering Tolerances. This is wrapped in a C++ class which overrides all the common arithmetic and logic operators such that FPVALs can be dropped into slots where floats were used without changing (much of) the program syntax. This is illustrated in the neuralNetFP.cpp file, where you can switch between using real floats and FPVALs with the "USEFLOATS" define in netConfig.h.

Unfortunately it appears that a lot of buggering around is also needed to do the shifting, checking for overflow, and handling rounding errors. This can all be seen in the fpval.cpp implementation file. An interesting(?) aside: I found that I had to do value rounding in the multiply and divide methods -- otherwise the backProp() functions just hit the negative rail without converging.

I also replaced the exponential in the ANN sigmoid activation function with a stepwise linear extrapolation, which rids the code of float dependencies.

I forged ahead and got the danged ANN demo to work with either floats or FPVALs. And that's when I found that I wasn't saving any execution time.  (Except, for some as yet unexplained reason, the number of FPVAL backprop learning cycles seems to be about 1/4 of that needed when using floats[??]).

After a lot of quite painful analysis I determined that calling the functions which implement the FPVAL arithmetic entail enough overhead that they are almost equal in execution time to the optimized GCC float library used on the ATMEGA. Most of the painful part of the analysis was in fighting the optimizer, tooth-and-nail, but I will not belabor that process.

On the other hand, if you are careful to NOT use any floating point values or functions, you can save two bytes per value and around 1Kb of program space. Which might be useful, to someone, sometime.


So. What's in this bolus then is the result of all this peregrination. It is not entirely coherent because I just threw in the towel as described above. But. Here it is:

http://www.etantdonnes.com/DATA/schipAANN.zip

No comments:

Post a Comment