DOS Extender Tutorials
Learn to create your own DOS extender.

UNDER CONSTRUCTION !!!


This page will soon contain all you need to create your own DOS extender.
But... this is currenly under construction so for now you will just have to wait. As each part is added I'll give notice on my Homepage

Currently there is...

Overview - general overview of what this will all be about.
Part #1 - Selectors and Descriptors.
Part #2 - Interrupts.
Part #3 - Control Registers.
Part #4 - 1st example, INT 15h, A20 Gate.
Part #5 - 2nd example, XMS, mode switching.
Part #6 - 3rd example, IRQ redir, DPMI funcs.
Part #7 - 4th example, extending DOS.
Part #8 - 5th example, CALLBACKs.
Part #9 - 6th example, VCPI support.
Part #10 - 7th example, DPMI support.
Part #11 - 8th example, DPMI/DOS funcs.


Download Page - download each stage from here.


Overview

These tutorials are here to try and help those who want to try and make their own DOS extender or want to learn lots about the inner workings of PMODE. I'll go over things such as Real Mode, PMODE, V86 Mode, and VCPI, DPMI, XMS, and INTs, IRQs, Exceptions and all those other wonderful things Intel screwed up. First things first some short hand notation I'm going to use a lot.

RMODE = Real Mode
PMODE = Protected Mode
V86 = Virtual 8086 Mode
INT = Interrupt
EXC = Exception

CPL = current privlegde level
DPL = descriptor privlegde level
RPL = requested privlegde level
GDT = global descriptor table
LDT = local descriptor table
IDT = interrupt descriptor table

PIC = programmable interrupt controller

I also assume you know a little how things work under real mode. (if not start somewhere else, this section is just for advanced users who need to find resources to program their DOS extender).
As I am writing these tutorial I will be creating my own little DOS extender which when done will be compariable to PMODEv3.07 but maybe a little better.
I also wanna go over things such as multi-tasking, virtual memory and other important things not commonly found in DOS extenders. And when I'm done maybe I'll create the best FREE DOS extender, who knows?

It would really help you a lot if you read my other PMODE tutorials first but this is not necessary.

One last thing, I never built a DOS extender that will run under VCPI or DPMI nor have I done any multi-tasking or virtual memory before, but I've got specs, ideas and help from anyone out there that will help. And if you see something said in here that's wrong, please tell me ASAP.

WARNING : These pages will keep changing a lot until I get most of them done so I suggest you don't read them too much now cause you'll have to read them again later, ahh... maybe not read them anyways, you'll have to read them 100 times to get it all anyways.

Part #1 - Selectors and Descriptors.

The Beginning


Since the 80286, Intel has presented 2 new processor modes. These are simply PMODE and V86. Under RMODE you can only access 1MB (1MB on 8086 systems, 286+ can also acces another 64k of the XMS due to the famous seg:off sytle, which is called the HMA). Although EMS cards did allow you to install another 32MBs this RAM can only be allowed to be accessed about 64k at a time. Intel wanted to be able to access more memory directly, and they also wanted something that could multi-task and protect each task in the new enviroments. Hence PMODE was born, although on the 286 system it was only a 16bit PMODE which was simply a stepping stone to 32bit PMODE.

The 80386 Advancements

I feel it is necessary to describe new features here of the 80386. If you are already familar with this I suggest you quickly read this anyways.
32bit Registers: The 80386 introduced new 32bit registers. The AX, BX, CX, DX, BP, IP, SI, DI, SP and the flags all have been Extended to 32bits wide. The names of these new registers are EAX, EBX, ECX, EDX, EBP, EIP, ESI, EDI, and ESP. There are also 2 new segment registers called FS and GS and they work exactly as ES in which you can use them how ever you like.
The opcodes in the 80x86 have not changed to support these new 32bit regs. When operating in a 16bit code segment to access the new 32bit registers a new prefix is added to the instruction. If the data being transfered is 32bits then 066h is prefixed. If the address being used is 32bit then 067h is prefixed. If both the operand and address is 32bit then both are prefixed (067h comes before 066h). When operating in a 32bit code segment these prefixes are used to access 16bit values. Confused? Here's an example:
If "mov ax,bx" was compiled by a 16bit compiler it would look like this if it was in a 16bit code segment:
  89,D8  mov ax,bx 

And if the EXACT same code was viewed in a 32bit code segment it would be:
  89,D8  mov eax,ebx 

And if in the 16bit code segment a 066h was prefixed is would look like:
  66,89,D8  mov eax,ebx 

And if in the 32bit segment that would look like:
  66,89,D8  mov ax,bx 

For 8bit data transfers nothing changes between 16 and 32bit code segments.
32bit Flags: For using the 32bit flags new mnemonics were created for it. PUSHFD and POPFD will push/pop the 32bit flags.
Here is the EFLAGS:
  bit : description
   0     CF - carry flag
   2     PF - parity flag
   4     AF - aux parity flag
   6     ZF - zero flag
   7     SF - sign flag
   8     TF - trap flag  (discussed in debugging)
   9     IF - enable IRQs
   10    DF - direction flag
   11    OF - overflow flag
   12-13 IOPL - IO privledge level  (discussed in protection)
   14    NT - nested task  (discussed in multi-tasking)
   16    RF - resume flag  (discussed in debugging)
   17    VM - V86 Mode flag
   18    AC - enable alignment check (only in PL3)
All other bits not listed are reserved.
Pushing Segment Regs: In 32bit PMODE when you push or pop a segment register a DWORD is moved to/from the stack. This is done to keep DWORD alignment, the other word pushed/poped along with the segment register is usually a zero and ignored.
In 16bit PMODE the usually WORD is push/poped for segment registers.
When in PMODE you must use IRETD (not IRET).
SIB: A new addressing mode called SIB (scaled index base) which is VERY versitile has been added. Now inside your square brackets you can have the following:
 [ reg + reg * #1 + #2] 

Where reg can be any of the 32/16 bit registers (ax,bx,cx,dx,si,di,bp,sp) and #1 can be 1,2,4 or 8. And #2 can be a 32/16 displacment. Either register may be omitted and/or the #2.
The new addressing mode (SIB) and the new registers (32bit) can be used in RMODE, PMODE and in V86 mode.

Selectors

When in PMODE everything changes, the segment:offset was finally scraped and the new selector:offset now exists.
The segment registers themselves have not been changed (still 16bit) but there implementation is totally new. The segment registers now hold what is called a selector. This selector is used to index into system tables called the GDT (global descriptor table) or the LDT (local descriptor table). These tables hold descriptors which describe what the segment attributes, location and such is. This 16bit selector is split up into 3 parts.
 bit # : description
  15-3 = offset into GDT or LDT
  2    = set if descriptor is in LDT (else it's in GDT)
  1-0  = RPL (requested privledge level)
The offset (bits 15-3) is the offset into the one of the new system tables pointing to a decriptor. Bits 2-0 describe special attributes of the decriptor. If bit 2 is set then the descriptor is found in the LDT else it is in the GDT. Bits 1-0 is the RPL which I will describe later in protection.

Descriptors

The GDT and LDT can hold upto 8192 decriptors each (due to selector bits 15-3). The 1st descriptor in the GDT is reserved and can never be used for anything (it's NULL, why they did that I don't know). Each descriptor struct in the tables are as follows:
descriptor struct
  limit_lo dw ?    ;limit bits 15-0
  base_lo dw ?     ;base bits 15-0
  base_mid db ?    ;base bits 23-16
  type1 db ?       ;type of selector
  limit_hi db ?    ;limit bits 19-16 and other info
  base_hi db ?     ;base bits 31-24
descriptor ends
When all the BASE bits are put together they form a 32bit value. This represents the beginning of the segment in memory. With a 32bit address you can access up to 4gigs of RAM (altough on the 286, BASE bits 31-24 did not exist yet because the 286 only had a 24bit address bus so these bits where not used, but since we are not building a 286 DOS extender, who cares). The limit_hi field is broken up as follows:
  bit #  :  Name : Description
   0-3      limit   limit bits 19-16
   4         AVL    available for programmer use (not used by CPU)
   5         -      reserved (must be 0)
   6         D      default size of segment
   7         G      granularity
Now the limit in total is 20bits (which can access upto 1MB) but if the G bit is set then the limit is multiplied by 4096 (4k) which now sets the maximum to four gigabytes (4 GBs). The D bit defines the default size of the code segment, if set it's 32bit else it's 16bit.

The type1 of the descriptor struct defines more attributes of the segment.
  bit # : Name : Description
   4       S     Defines what type of descriptor this is
 If S=1 then it is a standard code/data segment
 and the rest of the struct is as follows:
   3       T     defines if this is a code or data segment
 If T=0 it is a standard data segment
 and the rest of the struct is as follows:  
   0       A     Accessed
   1       W     writable
   2       E     Expand down
 else if T=1 it is a standard code segment
 and the rest of the struct is as follows:
   0       A     accessed
   1       R     readable
   2       C     conforming
 Now if S=0 it is a special system descriptor
 and the rest of the struct is as follows:
   0-3    TYPE   Defines what type it is (LDT, call INT or trap gates, etc.)
 The rest does not depend on T or S:
   5-6     DPL   descriptor privledge level
   7       P     present
Well that's very complex! Here is what most fields mean:
S (bit4) : Defines if it is a code/data segment or a special descriptor.
T (bit3) : Defines if it is a code ot data segment
A (bit0) : This bit is set by the CPU when ever this segment is by a program
R (bit1) : This bit defines if code segments may be read from, this will permit the following (mov ax,cs:[ebx])
W (bit1) : This bit defines if data segments may be written to.
C (bit2) : This will be described later in protection.
E (bit2) : This defines if the data segment expand down.
DPL (bits5-6) : descriptor privledge level
P (bit7) : If set the descriptor is present (valid) else the only the descriptor's type1 byte must hold info and the rest is ignored by the CPU, loading a non-present selector is invalid.

The TYPE field of the descriptor (bits 0-3) defined what type of descriptor it is, possible values are:
  0 =  reserved
  1 = avail. 286 TSS
  2 = LDT
  3 = busy 286 TSS
  4 = 286 call gate
  5 = 286/386 task gate
  6 = 286 INT gate
  7 = 286 trap gate
  8 =  reserved
  9 = avail. 3/486 TSS
  A =  reserved
  B = busy 3/486 TSS
  C = 3/486 call gate
  D =  reserved
  E = 3/486 INT gate
  F = 386 trap gate, 486 task gate
Notes:
The LIMIT of a segment defines the maximum value an offset may be when accessing ram (ie: mov al,[1000h] when LIMIT=9999h is invalid)
The BASE of a segment defines where in memory the segment starts (ie: mov al,[1000h] when the BASE=10000h will access RAM at location 10000h+1000h=11000h)
Data segments can always be read from.
SS must be loaded with a data selector that must be writable.
CS can only be loaded with a code selector.
DS,ES,FS,GS can be loaded with a data or code selector.
If E (expand down) is set then the interpretion of LIMIT changes. Now LIMIT defines the lowest address the can be accessed within the segment. Because stack segments grow towards lower address the LIMIT can be descreased to give more room to your stack.
The LDT is the local descriptor table. The LDT is the exact same as the GDT except that each task (or program) has it's own LDT and the GDT is shared by the whole system. The LDT is simply an entire in the GDT the describes where this LDT is. The LDT can contain anything the GDT can. So if a selector you need to use is in the LDT then bit 2 of a selector is set and the descriptor comes from the LDT.

If any of the rules above are broke then special INTs occur (pretty much the same way as an IRQ is triggerd). These are called exceptions and there are many different ones depending on what rule was broken.

Part #2 - Interrupts.
In RMODE the old 256 INTs at 0:0 are no longer used in PMODE (gee, they scraped pretty much EVERYTHING they did before). Now we have a new set of 256 INTs which is setup much like the GDT. It contains 8 byte descritors just like the GDT. The only difference is that it can only hold certain types of descriptors.
The PMODE INT Table can only hold INT gates, trap gates and task gates.
The first 32 INTs are reserved and are called exceptions. These exceptions are called by the CPU to signal certain events, such as invalid code running, access beyond the LIMIT or anything else that is not valid.
The exceptions are:
  int : description
   0    divide by zero
   1    debug exception
   2    NMI (non-maskable interrupt)
   3    breakpoint
   4    overflow
   5    bounds check
   6    invalid code
   7    device not available
   8    double fault
   9    80486+ = reserved (80386- = co-pro segment overrun)
  10    invalid TSS (task state segment)
  11    segment not present
  12    stack exception
  13    The General Protection Fault
  14    page fault
  15    reserved
  16    FPU error
  17    alignment check
  18-31 reserved exceptions
  32-255  available to software
But the most wonderful thing about these exceptions is that the normal INTs are the same ones. That is to say that IRQ#5 which is INT 13 is the same INT as the General Protection Fault. Now on 80386+ the PIC (programable interrupt controller) can be remapped to setup the IRQs in different locations within the INT table. Older DOS extenders and OS's use to do this while in PMODE but it was considered too slow to do this. Each time the DOS extender had to return to RMODE for whatever reason the PICs had to be reprogramed back to the normal style (ie: IRQ#0 = INT#8 and IRQ#8 = INT #70h) and then once control was returned it had to reprogram again to return to PMODE. This is no longer done by recent DOS extender and OS's (At least I hope they don't do this anymore, I know that Desqview did).
So how does an IRQ handler determine if the event that triggered an INT was from an IRQ, from the user calling the INT or from an exception.
Well it's quite simply. First you look at the PICs to determine if they are waiting for an IRQ to be serviced. If so you branch on to the IRQ handler. Then you check into the code segment and see if the last operation executed was INT x (where x is your current INT handler). If yes then service that INT call for what it is (ie: INT 10h is the same as the FPU error exception and if the program executed INT 10h to call the video services you should branch to the INT 10h handler). If these 2 cases were false then the INT must have been cause by an exception.
Now the problem with this is that every IRQ now has an additional time taken away from it before it gets CPU control. This latency sucks and I consider this a design flaw! Some DOS extenders allow you to setup IRQ handler that will get direct control of the INT which means if an exception does occur it will be erroronously directed to the IRQ handler and can cause the CPU to trigger a double fault and could cause the CPU to crash.
Here is a small explaintion of each exception:
Exc 0 : Divide by zero : Called when ever a div/idiv instruction is attempting to divide by 0
Exc 1 : Debug exception : I'll describe this in debugging later.
Exc 2 : NMI : This is called by anyway hardware device that has failed and is essential for the systems operation (ie: parity fail on RAM)
Exc 3 : breakpoint : Called when ever INT 3 is executed which has special opcode form (0cch). This single byte INT 3 opcode can be easily inserted into code for debugging.
Exc 4 : overflow : Called when the INTO instruction is executed and the overflow flag was set.
Exc 5 : bounds check : Called when the BOUND instruction is executed and the operand was larger than the limit.
Exc 6 : invalid opcode : Called whenever the next instruction at CS:(E)IP is not a valid 80x86 instruction. This could also be called if the instruction is greater than 15bytes long which can occur if too many prefixes are used.
Exc 7 : device not avail. : this is generally to optimize usage of the FPU until and will be discussed later in FPU section.
Exc 8 : double fault : this is generated while the system is processing a exception and another exception occurs. This natually can happen but the system must be very careful now because a third exception will cause the CPU to reset.
Exc 9 : On the 80486 this is never generated. On the 80386 it signalled an FPU error.
Exc 10 : invalid TSS : this is generated when ever an invalid Task is loaded, I'll discuss this in multi-tasking.
Exc 11 : segment not present : this is generated when ever a descriptor is used this has the P bit not set (present bit). This means the segment is not present, which can be restarted. This is usally used in virtual memory management.
Exc 12 : stack exception : this can occur in 2 ways. a) when SS is loaded with a non-present descriptor that is otherwise present. b) when the stack growns outside of the LIMIT or an instruction trys to access an operand within the stack segment. (ie: mov al,ss:[ebx])
Exc 13 : The General Protection Fault : This is what is known is the catch everything else fault. If something invalid happens that does not fit into the catigory of any of the other exceptions it comes here instead.
Exc 14 : Page fault : this is used in the paging system which will be discussed later.
Exc 15 : reserved.
Exc 16 : FPU error : Occurs when the FPU detects an error in the current FPU instruction.
Exc 17 : Alignment check (80486 only) : the 486 added a new feature that forces programmer to align data. Here's a little discussion here but you can ignore this since you'll never use alignment checking anyways unless you are crazy. Anyways...if you know anything, you'll know that most RAM's access time is around 70ns which is VERY slow. This slowness is kinda like a bottle neck for the 80x86 processors speed ever since the 8086. And whenever RAM is accessed that is not aligned on certain addresses it takes 2 reads or writes to perform the operation, which is SO slow. One read operation on a Pentium 200Mhz with 70ns RAM takes about 14 cycles (I think). And most operation on the Pentium run within 1 cycle. So you can see if the operation was not aligned it would require another 14 cycles (but it's not like the CPU is twitling it's thumb waiting, it has other things it can do like calculating paging, linear=>physical, DMA, IRQs, etc.) So alignment check basically causes an exc whenever data is accessed that is not aligned. But you know, I could be all wrong about why this is used.

Anyways...when all of these exceptions are triggered the CS:EIP that is on the stack will point the instruction that caused the problem (or not depending upon what exc. it is). Some exc handlers may return to the problem maker after fixing some condition (ie: virtual memory manager reads swapped out memory from hard drive to RAM and then continues).
When ever an INT is jumped to the CS, EIP and Flags that are reserved on the stack are always 32bit regardless if the caller on the INT was in a 16 or 32bit code segment.

Error Codes

When most of these exeception are called an error code is pushed onto the stack before the exception handler is started. This error code is pushed if the exception relates to a specfic segment. This code looks much like a selector except the RPL field is filled with other info.
The error code look like this:
  bits : desciption
  31-16   undefined
  15-3    offset into GDT or LDT
  2       1=LDT 0=GDT
  1       set if offset is into IDT instead of the GDT/LDT
  0       ext bit.  set if the exception was caused by something external
          to the program
Notes:
Exc 6 (invalid code) does not psuh an error code.

INT, TRAP and TASK Gates

These 3 special system descriptors are the only ones allowed in the IDT.
The INT Gate:
int_gate struct
  offset_lo dw ?     ;offset bits 0-15
  selector dw ?
  r1 db 0            ;bits 0-4 are ignored, bits 5-7 must be 0
  t1 db 7            ;bits 0-4 must be 01110b, bits 5-6 is the DPL, bit 7 is
                     ; the P (present bit)
  offset_hi dw ?     ;offset bits 16-31
int_gate ends
The Task Gate:
task_gate struct
  i1 dw ?            ;ignored
  selector dw ?
  i2 db ?            ;ignored
  t1 db 5            ;bits 0-4 must be 00101b, bits 5-6 is the DPL, bit 7 is
                     ; the P (present bit)
  i3 dw ?            ;ignored
task_gate ends
The Trap Gate:
trap_gate struct
  offset_lo dw ?     ;offset bits 0-15
  selector dw ?
  r1 db 0            ;bits 0-4 are ignored, bits 5-7 must be 0
  t1 db 0fh          ;bits 0-4 must be 01111b, bits 5-6 is the DPL, bit 7 is
                     ; the P (present bit)
  offset_hi dw ?     ;offset bits 16-31
trap_gate ends
Here's some terms you need to understand:
FAULT - this is an exception that is generated before or while an instruction is being executed. In either case the CS:EIP that is on the stack will point to the instruction that caused the fault.
TRAP - this is an exception that is generated immediately after the instruction that caused the exception. So the CS:EIP on the stack will point to the instruction after the instrucion that caused the exception which could be the target of a JMP making it impossible to known exactly where the exception took place.
ABORT - these are never restartable (you can not go back to the trouble maker) nor is the precise location of the exception known, these are used to report severe problems.

An INT gate or TRAP gate when triggered will execute in the current task (program) so that means they will use the same system resouces. The selector must point to a valid code descriptor in the GDT or LDT and the offset is loaded into EIP to begin execution of the INT handler. The only difference between an INT gate and a TRAP gate is the IF (interrupt flag). With INT gates the IF is reset (cleared) when the handler is started, a TRAP gate does not reset it. When either of these two are called the CS, EIP and Eflags are all saved on the stack in 32bit DWORDs, regardless of what the current code segment size is, so you must use a IRETD to return.
Well you call a PROC that is FAR in a 16bit segment the CS and IP (both as WORDs) are saved on the stack. In 32bit mode the CS and ESP are stored both as DWORDs on the stack when you call a FAR PROC. So RETF has two different means depending on which mode you are currently operating in.
Remember that this mode (16 or 32bit mode) is determined by the D bit (default segment size) in the CS descriptor. The D bit when used with the DS, ES, FS or GS is not used. The D bit in the SS defines if ESP or SP is used.

Part #3 - Control Registers.

Control Registers (CRx)


The Control Registers are 4 very important regiters new to the 80386. They are called CR0,CR1,CR2 and CR3. Each is 32bits wide and can only be read or written to with another 32bit general register. (ie: mov cr0,eax)
The CR0 Register:
 bit : name : description
  0     PE     PMODE enabled
  1     MP     Math unit Present (FPU)
  2     EM     Emulate FPU
  3     TS     Task switch
  4     ET     FPU thingy (80386 only)
  5     NE     FPU error enabling (80486 only)
  6-15          reserved
  16    WP     write-protect (80486 only)
  17    ???
  18    AM     alignment mask
  19-28         reserved
  29    NW     cache not-write thru (80486 only)
  30    CD     cache disable (80486 only)
  31    PG     paging enable
Anything that is 80486 only was reserved on the 80386. The most important bit is the PE. When this bit is clear the CPU is in RMODE which is the default mode it's in after the CPU is reset (to stay compatible). When this bit is set you enter into PMODE.
MP and EM will be discussed in FPU section.
TS is related to multi-tasking and the FPU.
ET is set if a 80387 is used on the 80386 else an 80287 is used on the 80386.
NE has to deal with the FPU.
WP this deals with mulit-tasking, it write-protects a parent task from a child task.
AM enables the stupid alignment check exception.
PG enables the paging system.
NW and CD is used to control the caching unit in the 80486+. CD when set disables cache, and NW when set will not allow writes to cached memory to go to the main memory. Here's a table:
  CD  NW  Condition
  1   1   Caching is disable, but anything that is currently cached
          will still be cached so you should flush the cache before continuing.
          Use INVD or WBINVD to do so.  If you do not flush the cache
          the whatever is being cached will be very fast and that part
          of the main RAM will not be used anymore.
  1   0   Caching is disable, but anything that is currently cached
          will still be cached except that write to the cache will
          also go to RAM also.  You can flush the cache here too, to
          clear the cache (or not).
  0   1   Reserved (if set like this it will cause exc 13 with error code=0)
  0   0   Caching fully enabled.

The CR1 Register is reserved!
The CR2 and CR3 Register is used in the paging system.

GDT,LDT and IDT Registers


All three of these tables can reside anywhere in memory, so each has a register defining where they are in memory. The GDT and IDT both have such registers which contains 2 parts, a 32bit address that defines the BASE of the table and another 16bit word defining the LIMIT of the table. That's 48bits which can only be loaded/saved with a memory operand (48bits = FWORD). The LDT is simply a selector which must point to a descriptor defined as being a LDT which will define where the LDT is in memory. The instruction to load/save these are:
 lgdt mem48  - load GDTR (GDT register)
 sgdt mem48  - save GDTR

 lldt mem16/reg16 - load LDTR (LDT register)
 sldt mem16/reg16 - save LDTR

 lidt mem48  - load IDTR (IDT register)
 sidt mem48  - save IDTR
The BASE address is a linear address which means it is not a seg:off type address, if our GDT was in conventional memory (below 640k) then the base would = seg*10h+off which is the linear address.
Loading the GDT might look like this:
.386p
.data
  gdt label fword
  gdt_limit dw ?
  gdt_base dd ?
.code
  lgdt gdt
With the GDT and IDT the LIMIT defines what descriptors are allowed to be used. The actually LIMIT is the Limit in the register plus one. Therefore a limit of 7 would make only the 1st descriptor valid since each descriptor is 8 bytes. A limit of 0ffffh would allow the use of all descriptors but in this case we would have to set them all up which would be a waste if we didn't need them all.
So the limit should = ((# descriptor) * 8 ) - 1
Don't forget that the 1st descriptor must be NULL (just don't use it). But not that it is valid to load this descriptor into segment registers (as long as the null descriptor is present). But if a segment regsiter holds NULL it can not be used for anything (so therefore CS can never hold NULL). It may be usefull to load some segment regs with NULL to catch program bugs, etc.
The LDT's LIMIT and BASE information is contained within the LDT's descriptor that is in the GDT. Refer back to Tutorial #1 and look at the different system descriptor there are, one is called the LDT.

Part #4 - 1st example, INT 15h, A20 Gate.

Rmode to Pmode


Ok, it's time to create our first little program that will simply get us into PMODE from RMODE and do a few little tricks. The program is heavily documented, but if you still don't understand something's don't worry it will all make sense as we move on. Here's a few things you may want to know. We want to get to 32bit PMODE but it's difficult to do so it one jump from RMODE, so I do it in 2 jumps, one to 16bit PMODE and then another to 32bit PMODE. Along the way lots of the 80386 processors registers are setup (GDT,IDT,LDT,etc.)

Click here and get stage#1 to download the 1st example.

You may save the program to disk and compile/run it if you like. All these examples should compile with MASM v6.11 or TASM v4.1.


Here's explaination of the program concepts:

INT 15h memory allocation


This old technique for allocating raw extended memory is old. When you call INT 15h (ah=88h) ax is returned with how much memory is free directly above 1MB. If you want to for example alloc 256k of RAM you would install your own INT 15h handler that would report 256k less RAM than there really is. This is called top down memory allocation because the RAM you alloc is directly above the block of RAM that is now free. To alloc all of the RAM simply make your INT 15h handler return 0 in ax.
This method of RAM allocation was replaced by himem.sys (XMS - extended memory specs) which will alloc all memory from INT 15h so if this is installed then this program will not be able to alloc any memory (don't worry will fix that soon enough). Note:this tutorial source does not do anything with the RAM it just gets it and that's it.

A20 Gate Enabling


On the 8086 an address such as 0ffff:fh will access the very last byte of the 1MB RAM which uses exactly 20 address lines. If you were to access 0ffffh:10h on the 8086 this would wrap around to 0h:0fh. Some programs used this wrap around technique for bazare reasons so the 80386 had to stay compatible and include this feature. So when you want to access RAM above 1MB you must enable the A20 gate which will disable this 1MB wrap around. Once this is done you can access RAM upto 4 GBs (4 gigabytes = 2^32). Note that now if an address goes above 4 GBs it will wrap around to the beginning of RAM which is very useful. Let us say our program segment is at 0f0000000h and we wanted to access 0a0000h (video memory). We would use the following code to access VRAM:

  mov edi,0a0000h
  sub edi,0f0000000h
  mov [edi],al
This will write 1 byte thru our DS to access the VRAM. If you don't follow how just look at this:
DS => 0f0000000h
EDI = 0a0000h
(before sub)
DS:EDI => 0f0000000h + 0a0000h == 0f00a0000h (which is exactly 0f0000000h to much)
(after sub)
DS:EDI => 0f0000000h + 0a0000h - 0f0000000h == 0a0000h (this is correct)
But eventually our little DOS extender will setup a zero base segment making this not necessary.

Part #5 - 2nd example, XMS, mode switching.

Stage #2

OK, in the next ASM source I've updated the program a lot. It now supports XMS, and all INTs and IRQs are redirected to RMODE.

XMS: When an XMS driver (such as himem.sys) is installed we must enable the A20 Gate and alloc our RAM from it. So you can see that after the XMS driver is detected we branch to another part of the program that will use the XMS driver. To call XMS you must call INT 2fh with ax=4310h. The returned ES:BX must be called FAR with the registers loaded with whatever is needed. See XMS reference for functions (look in my files section). Everything that will even need to be done is done in this next source.
Mode switching:In the source the PROC pm_2_rm_rx is used to switch from PMODE back to RMODE. It will setup an IDT that is compatible with RMODE (ie: 1k at 0:0). All segment registers must be loaded with 16bit selectors and 64k limits. This is because this info within the segment registers remains in the segment registers (each segment register holds this info in an invisible portion that you can only alter by loaded with new descriptors.) And the CS must also contain a 16bit 64k selector or else when you get back to RMODE things will not work as they should.
The PROC rm_2_pm_rx is used to get back to PMODE. Note that these PROCs only allow you to load the segment registers, EIP and ESP after the mode switch. All other registers, except EBP, will be destroyed.
Stacks are very hard to understand while switching modes. Because our SS:(E)SP is different in PMODE and RMODE it would be difficult to use one stack for RMODE and one for PMODE. So what happens is that when a mode switch is needed, a new small stack is used for the duration of the mode switch. The variables pm2rm_* define our PMODE to RMODE stacks. Switching from RMODE to PMODE right now is used only to return to PMODE after going to RMODE to redirect an INT (or IRQ). Later we will make things that will allow you to redir INTs (IRQs) from RMODE to PMODE. These things are called CALLBACKs.

Click here and get stage#2.

Part #6 - 3rd example, IRQ redir, DPMI funcs.

Stage #3

Stage 3 now redirects IRQs from RMODE to PMODE. And some DPMI funcs have been added.

DPMI: In the evolution of the PC 4 major servers exist. That's Raw, XMS, VCPI and DPMI. So far stage #3 only supports Raw and XMS. But eventually it will support all four. Now when your program executes under VCPI or DPMI you are not allowed to touch the GDT (well, it depends on how secure the system is setup). But basically you must share the system tables with other programs. So therefore standards have been created to allow program to modify system tables. To do this you make requests to the VCPI or DPMI server. And because each is different it would be a very difficult task for the programmer to determine which server is present and then use requests for that server. Instead we will creat a DOS extender that will accept ALL requests to a DPMI server, and if one does not exist the requests will be sent to VCPI or to XMS or whatever. The DPMI standard is used in this way because it's calls are the most complete and excepted standard. So even though stage #3 does not even support a DPMI server it will accept DPMI requests and execute them according to which server is currently loaded (ie: RAW or XMS).

IRQs: Now all IRQs that occur in RMODE will be redirected up to PMODE only if a PMODE handler for that IRQ has been setup.

DPMI funcs: DPMI funcs 300h,301h,302h,305h,306h were the ones added so far. Basically these are used to call RMODE INTs or far PROCs which will pass all general regs and the segment regs too (unlike just calling the INT directly which will only pass the general regs). This is done thru the use of a struct that will hold all values to give to the INT/PROC and will also be filled in with the regs values after calling the INT/PROC. See any DPMI specs for what these functions exactly look-like, although you should be able to tell by the source.

Click here and get stage#3.

In the next stage we will start extender DOS functions, and add more DPMI functions.

Part #7 - 4th example, extending DOS.

Stage #4

Stage 4 now allows 32bit programs to call some DOS functions with 32bit regs and with linear addresses. So far only open,read and write and extended.

DOS extension: to extend DOS is really quite easy. All you do is setup a PMODE INT 21h handler that will translate the 32bit requests into 16bit requests, and as many as needed until it's done. Because DOS can not access RAM above 1MB all file IO must go thru a temp buffer under 1MB which will then be copied to/from the memory above 1MB. For this I use a 8k buffer. The DOS func close does not to be extended because all it expects is BX=handle so this can simply be redir to rmode. So any function that requires memory to be copied (such as a file name or data) must be extended and everything else is simply extended except for func 42h (move file pointer) since we will use a 32bit register for that which must be split up before redir to rmode. I'll finish all this later, in the next stage I'm going to introduce CALLBACKs and then I'll add VCPI in the stage after that. By that point the ASM file will be getting so BIG and messy that I will be putting it into multiple files.

Click here and get stage#4.

Part #8 - 5th example, CALLBACKs.

Stage #5

Stage 5 has added callbacks to allow PMODE to intercept RMODE INTs and such. A mouse demo has been added to test it out.

Click here and get stage#5.

Part #9 - 6th example, VCPI support.

Stage #6

Stage 6 has added VCPI support which allows it to run under EMM386/QEMM and such.

Click here and get stage#6.

VCPI : VCPI is really nice once you under stand it. The only problem is that paging must be used. Basically what happens is that when you want to go to PMODE you have complete control of where your descriptors go in the GDT and IDT. But you can not use LGDT or LIDT. There are also 3 descriptors the VCPI server needs to use in the GDT but those can go anywhere. The fisrt step is VCPI detection:
  xor ax,ax
  mov es,ax
  cmp dword ptr es:[67h*4],0
  jz noVCPI
  mov ax,0de00h
  int 67h
  or ah,ah
  jnz noVCPI
  ;BX=version
After that you must find out where the VCPI server's PMODE entry point is. This FAR PROC should be called while in PMODE to use funcs 3-5 and 0ch. When you get this entry proc DS:SI must point to a buffer to get the 3 descriptors from the VCPI server (8*3 bytes). And ES:DI must point to a 4K buffer which should be the 1st page of the paging tables. After the INT call DI will be moved to the next location in the page table that is free. The entries in the page table used by the VCPI server are simply to map the 1st meg (or more) of RAM from linear to physical. When you alloc RAM (4k at a time) from the VCPI server you are given a physical address, so these addresses must be placed into the page tables. INT 67h/func 0de0ch is used to switch from V86 mode to PMODE (or PMODE to V86 mode). When calling this func you must supply a struct that contains all info that is needed for the mode switch. See the tutorial for how all this is done.

Part #10 - 7th example, DPMI support.

Stage #7

Stage 7 has added DPMI support which allows it to run under Windoze 95, OS/2 etc.

Click here and get stage#7.

DPMI : DPMI is very different from RAW,XMS or VCPI. Now everything changes. Selectors must now be allocated from the DPMI server. Basically what happens under DPMI is that I alloc 4 descriptors that are needed and set them to what is needed. Then I continue on as normal. The GDT and IDT are no longer setup. The Callbacks do not need to be created and the rmode to pmode IRQ redirectors are not setup. The pmode to rmode INT redirectors are not setup and so on...

Part #11 - 8th example, DPMI/DOS funcs.

Stage #8

Stage 8 has added all DPMI funcs and has extended all DOS funcs.
Click here and get stage#8.

Next stage : Stage #9 will add exception handlers and extend mouse funcs.
But alias, this project has been abandoned...
Copyright © 1995-2005 Nexus Systems Privacy
SourceForge.net Logo