80387+ Coprocessor Programming
Written by : Peter Quiring (Dec 5/96)
Updated : (Dec 12/96)
The description of the status
words has been greatly updated.
This tutorial will teach you how to use the FPU (floating point unit) on 80386+
systems. I may use instructions only available on the 387 because I hope all
your projects target this processor.
I'll go over how the stack works, loading/saving values and executing the
instructions. I'll also show some FPU detection code.
First off is the stack. The 387 has a small stack of its own that is totally
seperate from the rest of the CPU or RAM, the only way to access data on the
stack is with FPU instructions. The stack holds eight 80bit floating point
values. They are labeled st, st(1) thru st(7). st is the top of the stack.
Each stack element is 80bits, where bits 0-63 is the magnitude, bits 64-78 is
the exponent, and bit 79 is the sign bit (although you don't need to worry
All FPU instructions such as sin(), cos(), etc. work only on the values
stored within the stack. So anytime you wanted to use the FPU, you would
normally load values onto the stack, execute your FPU operation, and
then save the result back into memory.
This tutorial will briefly show you some instructions, but I suggest you get
an Intel reference before programming the FPU to know all instructions
Loading values into the stack
When you are loading a value into the FPU you are actually pushing a value
onto the FPU stack. Whenever you load something it is stored in st.
But first st is moved to st(1),
and st(1) is moved to st(2) and so on, until st(6) is moved to st(7) and
if anything is loaded in st(7) it is discarded (stack overflow which is bad).
When loading the FPU you can not use any registers in the CPU, all references
must be either memory or another FPU stack element.
There are many forms of data that can be loaded into the FPU stack. You
can load any of the following:
Here are some examples of each case:
- signed 16,32 or 64 bit integer
- signed 32,64 or 80 bit real (floating point)
- BCD coded 16,32 or 64 bit number
- other FPU stack elements (ie: st(3) )
fld st(4) ;loads st(4) into st
fld dword ptr [ebx] ;loads a real 32bit
fild dword ptr [ebp] ;loads an integer 32bit
fbld qword ptr [ebx] ;loads a BCD coded integer 64bit
There are also other instruction to load constants into the stack such as
pi (3.14...), 0, 1, log2(e), etc.
After every load the number is converted to the 80bit real system.
Saving a value from the FPU stack
After completing the FPU operation you desire, you will need to get
the value from the FPU stack. This works the exact same as loading the value,
except for a few diferences. You may choose weither or not you
want the value poped off the stack after it is stored. If the instruction
ends with a 'p' then the value is poped off (discarded) after the store
Here are some examples of each case:
fst st(4) ;st remains on stack after store operation
fstp st(4) ;stores to st(4) (st is poped off) (*)
fst dword ptr [ebx] ;stores to a real 32bit
fstp dword ptr [ebx] ;stores to a real 32bit (st is poped off)
fist dword ptr [ebp] ;stores to a integer 32bit
fistp dword ptr [ebp] ;stores to a integer 32bit (st is poped off)
fbst dword ptr [ebx] ;stores to a BCD coded integer 32bit
fbstp dword ptr [ebx] ;stores to a BCD coded integer 32bit (st poped off)
After the pop the entire stack moves back up towards st (exact opposite of
loading a value).
(*) in this case the st(4) gets the value of st and then st is poped off
and then the stack shifts back up, so after the opertion st(3) will
hold the value.
Because the FPU runs totally independant of the CPU you may at times need to
WAIT (or FWAIT which is the same) until the FPU is complete before
continuing. For example if you used a store instruction and try to used the
stored data immediately, the FPU may not have completed the store instruction
before you read the data. But each time you start an FPU instrution the CPU
will WAIT automatically till the FPU is done. Therefore usually the only
time you will ever need to WAIT is after using a store instruction.
Just a note : with the 8087 a WAIT was required after
each FPU instrution, but after the 80287+ it was no longer that way.
The FPU also has status words which contain flags about the last operation
completed (like the zero and carry flag in the CPU). To view these flags
you must save them into a memory word (16bit). There are 2 words. One
indicates the flags based on the last operation and the other controls how
the FPU operates. They are:
bit #s : 15 .. .. .. .. .. .. 8 7 .. .. .. .. .. .. 0
Control word: xx xx xx IC --RC- --PC- IE ?? PM UM OM ZM DM IM
Status word: B C3 ---ST--- C2 C1 C0 ES SF PE UE OE ZO DE IE
Note : ?? - MASM documentation left this blank, gotta love M$!
Most bits are unimportant, but the C0-C3 are most important.
ST = stack ptr (a number from 0 to 7)
RC = rounding technique to use
PC = precision control
All other bits control exceptions which you should just ignore.
The instruction to load/save the FPU status/control words are:
fldcw [mem_word] ;load control word into mem_word
fstcw [mem_word] ;save control word from mem_word
fldsw [mem_word] ;load status word into mem_word
fstsw [mem_word] ;save status word from mem_word
To know what the C0-C3 mean look at this little example:
mov ah,byte ptr[tmp_word]
Now the CPU flags reflect the last operation from the FPU.
Each FPU flag corresponds to the following CPU flags:
This applies only when the last operation could have resulted in zero or carry.
There are many other operations that return certain other status conditions
within C0-C3 which your reference will tell you.
The control word defines operation of the FPU as follows:
RC = rounding control
00 = round to nearest or even # (default)
01 = round towards -infinity
So by default the control word is 037fh on a 387+. Remember that the PC is
the size you are using
on the FPU stack, not the size of operands loaded into and saved from the
stack (so just keep it at 64bit).
10 = round towards +infinity
11 = round towards 0(zero)
PC = precision control (size of mantissa)
11 = 64bit - long double (default) (80bit float)
10 = 53bit - double (64bit float)
00 = 24bit - float (32bit float)
IE = undefined on 387+ (was used to enable ints on 8087)
?M = mask exceptions. (by default these are all set = disable exceptions)
If you needed to round towards zero use the following code.
new_cw dw 0f7fh
old_cw dw 037fh
... use frndint or any other load/save operation that uses rounding
fldcw old_cw ;restore to default state
FPU-DETECT Here is some code that will
detect the presence of a 80387 co-processor. This code is part of QLIB
and is always run during QLIB init.
FPU Examples Here are some examples
taking out of QLIB and well documented.
After you see them look into your reference and you'll notice that most
instructions work only on the top part of the stack (st). For example
the FSIN instruction takes the sin() of what's on top, and
replaces it with the answer (other parts of stack are unaffected). The
FSINCOS instruction is more complicated because it takes
the sin and cos of st and replaces the st with the sin result and then
pushes the cos result onto the stack.
Note FCOS, FSIN and FSINCOS are available only on the 387+ FPUs so you
can see that using older processors was much more difficult.
A few last notes about the FPU. Emulation used to be done thru the process
of trapping the FPU exception handler in real mode and then executing
the desired FPU instruction using just the CPU. Although the CPU can
do floating point math with just intergers it is very hard and VERY slow
compared to the FPU.
But this can not be done with PMODE programs, so that's why many users
complained a lot when FRANKE no longer worked with ACAD 386 (ie: me!).
Oh well, an FPU for the 80386 costed me 50 bucks (3 years ago).
When C compilers use FPU emulation, then simple don't use the FPU but instead
use a seperate math LIB that uses simple intergers to do the math stuff.
Copyright © 1995-2005 Nexus Systems