DOS Extender Tutorials
Learn to create your own DOS extender.
|
UNDER CONSTRUCTION !!!
This page will soon contain all you need to create your own DOS extender.
But... this is currenly under construction so for now you will just have to
wait. As each part is added I'll give notice on my Homepage
Currently there is...
Overview - general overview of what this will all
be about.
Part #1 - Selectors and Descriptors.
Part #2 - Interrupts.
Part #3 - Control Registers.
Part #4 - 1st example, INT 15h, A20 Gate.
Part #5 - 2nd example, XMS, mode switching.
Part #6 - 3rd example, IRQ redir, DPMI funcs.
Part #7 - 4th example, extending DOS.
Part #8 - 5th example, CALLBACKs.
Part #9 - 6th example, VCPI support.
Part #10 - 7th example, DPMI support.
Part #11 - 8th example, DPMI/DOS funcs.
Download Page - download each stage from here.
|
These tutorials are here to try and help those who want to try and make their
own DOS extender or want to learn lots about the inner workings of PMODE.
I'll go over things such as Real Mode, PMODE, V86 Mode, and VCPI, DPMI, XMS,
and INTs, IRQs, Exceptions and all those other wonderful things Intel screwed
up. First things first some short hand notation I'm going to use a lot.
RMODE = Real Mode
PMODE = Protected Mode
V86 = Virtual 8086 Mode
INT = Interrupt
EXC = Exception
CPL = current privlegde level
DPL = descriptor privlegde level
RPL = requested privlegde level
GDT = global descriptor table
LDT = local descriptor table
IDT = interrupt descriptor table
PIC = programmable interrupt controller
I also assume you know a little how things work under real mode. (if not
start somewhere else, this section is just for advanced users who need to
find resources to program their DOS extender).
As I am writing these tutorial I will be creating my own little DOS extender
which when done will be compariable to PMODEv3.07 but maybe a little better.
I also wanna go over things such as multi-tasking, virtual memory and other
important things
not commonly found in DOS extenders. And when I'm done maybe I'll create the
best FREE DOS extender, who knows?
It would really help you a lot if you read my other PMODE tutorials first
but this is not necessary.
One last thing, I never built a DOS extender that will run under VCPI or
DPMI nor have I done any multi-tasking or virtual memory before, but I've
got specs, ideas and help from anyone out there that will help. And if you
see something said in here that's wrong, please tell me ASAP.
WARNING : These pages will keep changing a lot until I get most of them
done so I suggest you don't read them too much now cause you'll have to
read them again later, ahh... maybe not read them anyways, you'll have to
read them 100 times to get it all anyways.
Part #1 - Selectors and Descriptors.
|
The Beginning
Since the 80286, Intel has presented 2 new processor modes. These are simply
PMODE and V86. Under RMODE you can only access 1MB (1MB on 8086 systems, 286+ can
also acces
another 64k of the XMS due to the famous seg:off sytle, which is called the HMA).
Although EMS cards did allow you to install another 32MBs this RAM can only
be allowed to be accessed about 64k at a time. Intel wanted to be able to
access more memory directly, and they also wanted something that could
multi-task and protect each task in the new enviroments. Hence PMODE was
born, although on the 286 system it was only a 16bit PMODE which was simply
a stepping stone to 32bit PMODE.
The 80386 Advancements
I feel it is necessary to describe new features here of the 80386. If you
are already familar with this I suggest you quickly read this anyways.
32bit Registers:
The 80386 introduced new 32bit registers. The AX, BX, CX, DX, BP, IP,
SI, DI, SP and the flags all have been Extended to 32bits wide. The names of
these new registers are EAX, EBX, ECX, EDX, EBP, EIP, ESI, EDI, and ESP.
There are also 2 new
segment registers called FS and GS and they work exactly as ES in which you
can use them how ever you like.
The opcodes in the 80x86 have not changed to support these new 32bit regs.
When operating in a 16bit code segment to access the new 32bit registers a new
prefix is added to the instruction. If the data being transfered is 32bits
then 066h is
prefixed. If the address being used is 32bit then 067h is prefixed. If both
the operand and address is 32bit then both are prefixed (067h comes before
066h). When operating in a 32bit code segment these prefixes are used
to access 16bit
values. Confused? Here's an example:
If "mov ax,bx" was compiled by a 16bit compiler it would look like this if it
was in a 16bit code segment:
89,D8 mov ax,bx
And if the EXACT same code was viewed in a 32bit code segment it would be:
89,D8 mov eax,ebx
And if in the 16bit code segment a 066h was prefixed is would look like:
66,89,D8 mov eax,ebx
And if in the 32bit segment that would look like:
66,89,D8 mov ax,bx
For 8bit data transfers nothing changes between 16 and 32bit code segments.
32bit Flags:
For using the 32bit flags new mnemonics were created for it. PUSHFD
and POPFD will push/pop the 32bit flags.
Here is the EFLAGS:
bit : description
0 CF - carry flag
2 PF - parity flag
4 AF - aux parity flag
6 ZF - zero flag
7 SF - sign flag
8 TF - trap flag (discussed in debugging)
9 IF - enable IRQs
10 DF - direction flag
11 OF - overflow flag
12-13 IOPL - IO privledge level (discussed in protection)
14 NT - nested task (discussed in multi-tasking)
16 RF - resume flag (discussed in debugging)
17 VM - V86 Mode flag
18 AC - enable alignment check (only in PL3)
All other bits not listed are reserved.
Pushing Segment Regs:
In 32bit PMODE when you push or pop a segment register a DWORD is
moved to/from the stack. This is done to keep DWORD alignment, the other word
pushed/poped along with the segment register is usually a zero and ignored.
In 16bit PMODE the usually WORD is push/poped for segment registers.
When in PMODE you must use IRETD (not IRET).
SIB:
A new addressing mode called SIB (scaled index base) which is VERY
versitile has been added. Now inside your square brackets you can have the
following:
[ reg + reg * #1 + #2]
Where reg can be any of the 32/16 bit registers (ax,bx,cx,dx,si,di,bp,sp)
and #1 can be 1,2,4 or 8. And #2 can be a 32/16 displacment. Either register
may be omitted and/or the #2.
The new addressing mode (SIB) and the new registers (32bit)
can be used in RMODE, PMODE and in V86 mode.
Selectors
When in PMODE everything changes, the segment:offset
was finally scraped and the new selector:offset now exists.
The segment registers themselves have not been changed (still 16bit) but there
implementation is totally new. The segment registers now hold
what is called a selector. This selector is used to index into system tables
called the GDT (global descriptor table) or the LDT (local descriptor table).
These tables hold descriptors which describe what the segment attributes,
location and such is. This 16bit selector is split up into 3 parts.
bit # : description
15-3 = offset into GDT or LDT
2 = set if descriptor is in LDT (else it's in GDT)
1-0 = RPL (requested privledge level)
The offset (bits 15-3) is the offset into the one of the new system tables pointing
to a decriptor. Bits 2-0 describe special attributes of
the decriptor. If bit 2
is set then the descriptor is found in the LDT else it is in the GDT.
Bits 1-0 is the RPL which I will describe later in protection.
Descriptors
The GDT and LDT
can hold upto 8192 decriptors each (due to selector bits 15-3). The 1st descriptor
in the GDT is reserved and can never be used for anything (it's NULL, why they did that
I don't know). Each descriptor struct in the tables are as follows:
descriptor struct
limit_lo dw ? ;limit bits 15-0
base_lo dw ? ;base bits 15-0
base_mid db ? ;base bits 23-16
type1 db ? ;type of selector
limit_hi db ? ;limit bits 19-16 and other info
base_hi db ? ;base bits 31-24
descriptor ends
When all the BASE bits are put together they form a 32bit value. This represents
the beginning of the segment in memory. With a 32bit address you can
access up to 4gigs of RAM (altough on the 286, BASE bits 31-24 did not exist
yet because the 286 only had a 24bit address bus so these bits where not used,
but since we are not building a 286 DOS extender, who cares). The limit_hi
field is broken up as follows:
bit # : Name : Description
0-3 limit limit bits 19-16
4 AVL available for programmer use (not used by CPU)
5 - reserved (must be 0)
6 D default size of segment
7 G granularity
Now the limit in total is 20bits (which can access upto 1MB) but if the G bit
is set then the limit is multiplied by 4096 (4k) which now sets the maximum
to four gigabytes (4 GBs).
The D bit defines the default size of the code segment,
if set it's 32bit else it's 16bit.
The type1 of the descriptor struct defines more attributes of the segment.
bit # : Name : Description
4 S Defines what type of descriptor this is
If S=1 then it is a standard code/data segment
and the rest of the struct is as follows:
3 T defines if this is a code or data segment
If T=0 it is a standard data segment
and the rest of the struct is as follows:
0 A Accessed
1 W writable
2 E Expand down
else if T=1 it is a standard code segment
and the rest of the struct is as follows:
0 A accessed
1 R readable
2 C conforming
Now if S=0 it is a special system descriptor
and the rest of the struct is as follows:
0-3 TYPE Defines what type it is (LDT, call INT or trap gates, etc.)
The rest does not depend on T or S:
5-6 DPL descriptor privledge level
7 P present
Well that's very complex! Here is what most fields mean:
S (bit4) : Defines if it is a code/data segment or a special descriptor.
T (bit3) : Defines if it is a code ot data segment
A (bit0) : This bit is set by the CPU when ever this segment is by a program
R (bit1) : This bit defines if code segments may be read from, this will
permit the following (mov ax,cs:[ebx])
W (bit1) : This bit defines if data segments may be written to.
C (bit2) : This will be described later in protection.
E (bit2) : This defines if the data segment expand down.
DPL (bits5-6) : descriptor privledge level
P (bit7) : If set the descriptor is present (valid) else the only the descriptor's
type1 byte must hold info and the rest is ignored by the CPU, loading a non-present
selector is invalid.
The TYPE field of the descriptor (bits 0-3)
defined what type of descriptor it is, possible values are:
0 = reserved
1 = avail. 286 TSS
2 = LDT
3 = busy 286 TSS
4 = 286 call gate
5 = 286/386 task gate
6 = 286 INT gate
7 = 286 trap gate
8 = reserved
9 = avail. 3/486 TSS
A = reserved
B = busy 3/486 TSS
C = 3/486 call gate
D = reserved
E = 3/486 INT gate
F = 386 trap gate, 486 task gate
Notes:
The LIMIT of a segment defines the maximum value an offset may be when
accessing ram (ie: mov al,[1000h] when LIMIT=9999h is invalid)
The BASE of a segment defines where in memory the segment starts (ie:
mov al,[1000h] when the BASE=10000h will access RAM at location
10000h+1000h=11000h)
Data segments can always be read from.
SS must be loaded with a data selector that must be writable.
CS can only be loaded with a code selector.
DS,ES,FS,GS can be loaded with a data or code selector.
If E (expand down) is set then the interpretion of LIMIT changes. Now LIMIT
defines the lowest address the can be accessed within the segment. Because
stack segments grow towards lower address the LIMIT can be descreased to
give more room to your stack.
The LDT is the local descriptor table. The LDT is the exact same as the GDT
except that each task (or program) has it's own LDT and the GDT is shared by
the whole system. The LDT is simply an entire in the GDT the describes where
this LDT is. The LDT can contain anything the GDT can. So if a selector you
need to use is in the LDT then bit 2 of a selector is set and the descriptor
comes from the LDT.
If any of the rules above are broke then special INTs occur (pretty much the same
way as an IRQ is triggerd). These are called exceptions and there are many
different ones depending on what rule was broken.
In RMODE the old 256 INTs at 0:0 are no longer used in PMODE (gee, they
scraped pretty much EVERYTHING they did before). Now we have a new set
of 256 INTs which is setup much like the GDT. It contains 8 byte
descritors just like the GDT. The only difference is that it can only hold
certain types of descriptors.
The PMODE INT Table can only hold INT gates, trap gates and task gates.
The first 32 INTs are reserved and are called exceptions. These exceptions
are called by the CPU to signal certain events, such as invalid code running,
access beyond the LIMIT or anything else that is not valid.
The exceptions are:
int : description
0 divide by zero
1 debug exception
2 NMI (non-maskable interrupt)
3 breakpoint
4 overflow
5 bounds check
6 invalid code
7 device not available
8 double fault
9 80486+ = reserved (80386- = co-pro segment overrun)
10 invalid TSS (task state segment)
11 segment not present
12 stack exception
13 The General Protection Fault
14 page fault
15 reserved
16 FPU error
17 alignment check
18-31 reserved exceptions
32-255 available to software
But the most wonderful thing about these exceptions is that the normal
INTs are the same ones. That is to say that IRQ#5 which is INT 13 is the
same INT as the General Protection Fault. Now on 80386+ the PIC (programable
interrupt controller)
can be remapped to setup the IRQs in different locations within the INT table.
Older DOS extenders
and OS's use to do this while in PMODE but it was considered too slow to do
this. Each time the DOS extender had to return to RMODE
for whatever reason the PICs had to
be reprogramed back to the normal style (ie: IRQ#0 = INT#8 and
IRQ#8 = INT #70h)
and then once control was returned it had to reprogram again to return to
PMODE. This is no longer done by recent DOS extender and OS's (At least
I hope they don't do this anymore, I know that Desqview did).
So how does an IRQ handler determine if the event that triggered an INT was
from an IRQ, from the user calling the INT or from an exception.
Well it's quite simply. First you look at the PICs to determine if they
are waiting for an IRQ to be serviced. If so you branch on to the IRQ
handler.
Then you check into the code segment and see if the last operation executed
was INT x (where x is your current INT handler). If yes then service that INT
call for what it is (ie: INT 10h is the same as the FPU error exception and
if the program executed INT 10h to call the video services you should branch to
the INT 10h handler). If these 2 cases were false then the INT must have
been cause by an exception.
Now the problem with this is that every IRQ now has an additional time
taken away from it before it gets CPU control. This latency sucks and I
consider this a design flaw! Some DOS extenders allow you to setup IRQ handler
that will get direct control of the INT which means if an exception does occur
it will be erroronously directed to the IRQ handler and can cause the CPU
to trigger a double fault and could cause the CPU to crash.
Here is a small explaintion of each exception:
Exc 0 : Divide by zero : Called when ever a div/idiv instruction
is attempting to divide by 0
Exc 1 : Debug exception : I'll describe this in debugging later.
Exc 2 : NMI : This is called by anyway hardware device that has
failed and is essential for the systems operation (ie: parity fail on RAM)
Exc 3 : breakpoint : Called when ever INT 3 is executed which has special
opcode form (0cch). This single byte INT 3 opcode can be easily inserted into
code for debugging.
Exc 4 : overflow : Called when the INTO instruction is executed and
the overflow flag was set.
Exc 5 : bounds check : Called when the BOUND instruction is executed
and the operand was larger than the limit.
Exc 6 : invalid opcode : Called whenever the next instruction at CS:(E)IP
is not a valid 80x86 instruction. This could also be called if the instruction
is greater than 15bytes long which can occur if too many prefixes are used.
Exc 7 : device not avail. : this is generally to optimize usage of
the FPU until and will be discussed later in FPU section.
Exc 8 : double fault : this is generated while the system is processing
a exception and another exception occurs. This natually can happen but the
system must be very careful now because a third exception will cause the CPU
to reset.
Exc 9 : On the 80486 this is never generated. On the 80386
it signalled an FPU error.
Exc 10 : invalid TSS : this is generated when ever an invalid Task
is loaded, I'll discuss this in multi-tasking.
Exc 11 : segment not present : this is generated when ever a descriptor
is used this has the P bit not set (present bit). This means the segment is
not present, which can be restarted. This is usally used in virtual memory
management.
Exc 12 : stack exception : this can occur in 2 ways. a) when SS is
loaded with a non-present descriptor that is otherwise present. b) when the
stack growns outside of the LIMIT or an instruction trys to access an
operand within the stack segment. (ie: mov al,ss:[ebx])
Exc 13 : The General Protection Fault : This is what is known is the
catch everything else fault. If something invalid happens that does not
fit into the catigory of any of the other exceptions it comes here instead.
Exc 14 : Page fault : this is used in the paging system which will
be discussed later.
Exc 15 : reserved.
Exc 16 : FPU error : Occurs when the FPU detects an error in the current
FPU instruction.
Exc 17 : Alignment check (80486 only) : the 486 added a new feature
that forces programmer to align data. Here's a little discussion here but
you can ignore this since you'll never use alignment checking anyways
unless you are crazy. Anyways...if you know anything, you'll know that most
RAM's access time is around 70ns which is VERY slow. This slowness is kinda
like a bottle neck for the 80x86 processors speed ever since the 8086. And
whenever RAM is accessed that is not aligned on certain addresses it takes
2 reads or writes to perform the operation, which is SO slow. One read
operation on a Pentium 200Mhz with 70ns RAM takes about 14 cycles (I think).
And most operation on the Pentium run within 1 cycle. So you can see
if the operation was not aligned it would require another 14 cycles (but
it's not like the CPU is twitling it's thumb waiting, it has other things it
can do like calculating paging, linear=>physical, DMA, IRQs, etc.) So
alignment check basically causes an exc whenever data is accessed that is
not aligned. But you know, I could be all wrong about why this is used.
Anyways...when all of these exceptions are triggered the CS:EIP that is
on the stack will point the instruction that caused the problem (or not
depending upon what exc. it is). Some exc handlers may return to the
problem maker after fixing some condition (ie: virtual memory manager
reads swapped out memory from hard drive to RAM and then continues).
When ever an INT is jumped to the CS, EIP and Flags that are reserved
on the stack are always 32bit regardless if the caller on the INT was in
a 16 or 32bit code segment.
Error Codes
When most of these exeception are called an error code is pushed onto the
stack before the exception handler is started. This error code is pushed
if the exception relates to a specfic segment. This code looks much
like a selector except the RPL field is filled with other info.
The error code look like this:
bits : desciption
31-16 undefined
15-3 offset into GDT or LDT
2 1=LDT 0=GDT
1 set if offset is into IDT instead of the GDT/LDT
0 ext bit. set if the exception was caused by something external
to the program
Notes:
Exc 6 (invalid code) does not psuh an error code.
INT, TRAP and TASK Gates
These 3 special system descriptors are the only ones allowed in the IDT.
The INT Gate:
int_gate struct
offset_lo dw ? ;offset bits 0-15
selector dw ?
r1 db 0 ;bits 0-4 are ignored, bits 5-7 must be 0
t1 db 7 ;bits 0-4 must be 01110b, bits 5-6 is the DPL, bit 7 is
; the P (present bit)
offset_hi dw ? ;offset bits 16-31
int_gate ends
The Task Gate:
task_gate struct
i1 dw ? ;ignored
selector dw ?
i2 db ? ;ignored
t1 db 5 ;bits 0-4 must be 00101b, bits 5-6 is the DPL, bit 7 is
; the P (present bit)
i3 dw ? ;ignored
task_gate ends
The Trap Gate:
trap_gate struct
offset_lo dw ? ;offset bits 0-15
selector dw ?
r1 db 0 ;bits 0-4 are ignored, bits 5-7 must be 0
t1 db 0fh ;bits 0-4 must be 01111b, bits 5-6 is the DPL, bit 7 is
; the P (present bit)
offset_hi dw ? ;offset bits 16-31
trap_gate ends
Here's some terms you need to understand:
FAULT - this is an exception that is generated before or while an instruction
is being executed. In either case the CS:EIP that is on the stack will point
to the instruction that caused the fault.
TRAP - this is an exception that is generated immediately after the instruction
that caused the exception. So the CS:EIP on the stack will point to the
instruction after the instrucion that caused the exception which could be the
target of a JMP making it impossible to known exactly where the exception took
place.
ABORT - these are never restartable (you can not go back to the trouble maker)
nor is the precise location of the exception known, these are used to report
severe problems.
An INT gate or TRAP gate when triggered will execute in the current task
(program) so that means they will use the same system resouces. The selector
must point to a valid code descriptor in the GDT or LDT and the offset
is loaded into EIP to begin execution of the INT handler. The only difference
between an INT gate and a TRAP gate is the IF (interrupt flag). With INT
gates the IF is reset (cleared) when the handler is started, a TRAP gate
does not reset it. When either of these two are called the CS, EIP
and Eflags are all saved on the stack in 32bit DWORDs, regardless of what
the current code segment size is, so you must use a IRETD to return.
Well you call a PROC that is FAR in a 16bit segment the CS and IP (both as
WORDs) are saved on the stack. In 32bit mode the CS and ESP are stored
both as DWORDs on the stack when you call a FAR PROC. So RETF has two
different means depending on which mode you are currently operating in.
Remember that this mode (16 or 32bit mode) is determined by the D bit (default
segment size) in the CS descriptor. The D bit when used with the DS, ES, FS
or GS is not used. The D bit in the SS defines if ESP or SP is used.
Part #3 - Control Registers.
|
Control Registers (CRx)
The Control Registers are 4 very important regiters new to the 80386. They
are called CR0,CR1,CR2 and CR3. Each is 32bits wide and can only be read
or written to with another 32bit general register. (ie: mov cr0,eax)
The CR0 Register:
bit : name : description
0 PE PMODE enabled
1 MP Math unit Present (FPU)
2 EM Emulate FPU
3 TS Task switch
4 ET FPU thingy (80386 only)
5 NE FPU error enabling (80486 only)
6-15 reserved
16 WP write-protect (80486 only)
17 ???
18 AM alignment mask
19-28 reserved
29 NW cache not-write thru (80486 only)
30 CD cache disable (80486 only)
31 PG paging enable
Anything that is 80486 only was reserved on the 80386. The most important
bit is the PE. When this bit is clear the CPU is in RMODE which is the
default mode it's in after the CPU is reset (to stay compatible). When this
bit is set you enter into PMODE.
MP and EM will be discussed in FPU section.
TS is related to multi-tasking and the FPU.
ET is set if a 80387 is used on the 80386 else an 80287 is used on the 80386.
NE has to deal with the FPU.
WP this deals with mulit-tasking, it write-protects a parent task from a
child task.
AM enables the stupid alignment check exception.
PG enables the paging system.
NW and CD is used to control the caching unit in the 80486+.
CD when set disables cache, and NW when set will not allow writes to
cached memory to go to the main memory.
Here's a table:
CD NW Condition
1 1 Caching is disable, but anything that is currently cached
will still be cached so you should flush the cache before continuing.
Use INVD or WBINVD to do so. If you do not flush the cache
the whatever is being cached will be very fast and that part
of the main RAM will not be used anymore.
1 0 Caching is disable, but anything that is currently cached
will still be cached except that write to the cache will
also go to RAM also. You can flush the cache here too, to
clear the cache (or not).
0 1 Reserved (if set like this it will cause exc 13 with error code=0)
0 0 Caching fully enabled.
The CR1 Register is reserved!
The CR2 and CR3 Register is used in the paging system.
GDT,LDT and IDT Registers
All three of these tables can reside anywhere in memory, so each has
a register defining where they are in memory. The GDT and IDT both have
such registers which contains 2 parts,
a 32bit address that defines the BASE of the table and another 16bit
word defining the LIMIT of the table. That's 48bits which can only
be loaded/saved with a memory operand (48bits = FWORD). The LDT
is simply a selector which must point to a descriptor defined as being
a LDT which will define where the LDT is in memory.
The instruction to load/save these are:
lgdt mem48 - load GDTR (GDT register)
sgdt mem48 - save GDTR
lldt mem16/reg16 - load LDTR (LDT register)
sldt mem16/reg16 - save LDTR
lidt mem48 - load IDTR (IDT register)
sidt mem48 - save IDTR
The BASE address is a linear address which means it is not a seg:off type
address, if our GDT was in conventional memory (below 640k) then the
base would = seg*10h+off which is the linear address.
Loading the GDT might look like this:
.386p
.data
gdt label fword
gdt_limit dw ?
gdt_base dd ?
.code
lgdt gdt
With the GDT and IDT
the LIMIT defines what descriptors are allowed to be used. The actually
LIMIT is the Limit in the register plus one. Therefore a limit of 7 would
make only the 1st descriptor valid since each descriptor is 8 bytes. A limit
of 0ffffh would allow the use of all descriptors but in this case we would
have to set them all up which would be a waste if we didn't need them all.
So the limit should = ((# descriptor) * 8 ) - 1
Don't forget that the 1st descriptor must be NULL (just don't use it). But
not that it is valid to load this descriptor into segment registers (as
long as the null descriptor is present). But if a segment regsiter
holds NULL it can not be used for anything (so therefore CS can never
hold NULL). It may be usefull to load some segment regs with NULL to
catch program bugs, etc.
The LDT's LIMIT and BASE information is contained within the LDT's
descriptor that is in the GDT. Refer back to Tutorial #1 and look at the
different system descriptor there are, one is called the LDT.
Part #4 - 1st example, INT 15h, A20 Gate.
|
Rmode to Pmode
Ok, it's time to create our first little program that will simply get us
into PMODE from RMODE and do a few little tricks. The program is
heavily documented, but if you still don't understand something's don't
worry it will all make sense as we move on. Here's a few things you
may want to know. We want to get to 32bit PMODE but it's difficult
to do so it one jump from RMODE, so I do it in 2 jumps, one to 16bit PMODE
and then another to 32bit PMODE. Along the way lots of the
80386 processors registers are setup (GDT,IDT,LDT,etc.)
Click here and get stage#1 to download the 1st example.
You may save the program to disk and compile/run it if you like. All these
examples should compile with MASM v6.11 or TASM v4.1.
Here's explaination of the program concepts:
INT 15h memory allocation
This old technique for allocating raw extended memory is old. When you
call INT 15h (ah=88h) ax is returned with how much memory is free
directly above 1MB. If you want to for example alloc 256k of RAM you
would install your own INT 15h handler that would report 256k less RAM
than there really is. This is called top down memory allocation because the
RAM you alloc is directly above the block of RAM that is now free. To
alloc all of the RAM simply make your INT 15h handler return 0 in ax.
This method of RAM allocation was replaced by himem.sys (XMS - extended
memory specs) which will alloc all memory from INT 15h so if this is installed
then this program will not be able to alloc any memory (don't worry will
fix that soon enough). Note:this tutorial source does not do anything with
the RAM it just gets it and that's it.
A20 Gate Enabling
On the 8086 an address such as 0ffff:fh will access the very last byte of
the 1MB RAM which uses exactly 20 address lines. If you were to access
0ffffh:10h on the 8086 this would wrap around to 0h:0fh. Some programs
used this wrap around technique for bazare reasons so the 80386 had
to stay compatible and include this feature. So when you want to access
RAM above 1MB you must enable the A20 gate which will disable this 1MB wrap
around. Once this is done you can access RAM upto 4 GBs (4 gigabytes = 2^32).
Note that now if an address goes above 4 GBs it will wrap around to
the beginning of RAM which is very useful. Let us say our program segment
is at 0f0000000h and we wanted to access 0a0000h (video memory). We would
use the following code to access VRAM:
mov edi,0a0000h
sub edi,0f0000000h
mov [edi],al
This will write 1 byte thru our DS to access the VRAM. If you don't follow
how just look at this:
DS => 0f0000000h
EDI = 0a0000h
(before sub)
DS:EDI => 0f0000000h + 0a0000h == 0f00a0000h (which is exactly 0f0000000h to much)
(after sub)
DS:EDI => 0f0000000h + 0a0000h - 0f0000000h == 0a0000h (this is correct)
But eventually our little DOS extender will setup a zero base segment
making this not necessary.
Part #5 - 2nd example, XMS, mode switching.
|
Stage #2
OK, in the next ASM source I've updated the program a lot. It now supports
XMS, and all INTs and IRQs are redirected to RMODE.
XMS: When an XMS driver (such as himem.sys) is installed we must
enable the A20 Gate and alloc our RAM from it. So you can see that after
the XMS driver is detected we branch to another part of the program that
will use the XMS driver. To call XMS you must call INT 2fh with ax=4310h.
The returned ES:BX must be called FAR with the registers loaded with
whatever is needed. See XMS reference for functions (look in my files
section). Everything that will even need to be done is done in this next
source.
Mode switching:In the source the PROC pm_2_rm_rx is used to switch
from PMODE back to RMODE. It will setup an IDT that is compatible with
RMODE (ie: 1k at 0:0). All segment registers must be loaded with 16bit
selectors and 64k limits. This is because this info within the segment
registers remains in the segment registers (each segment register holds
this info in an invisible portion that you can only alter by loaded with
new descriptors.) And the CS must also contain a 16bit 64k selector or
else when you get back to RMODE things will not work as they should.
The PROC rm_2_pm_rx is used to get back to PMODE. Note that these PROCs
only allow you to load the segment registers, EIP and ESP after the
mode switch. All other registers, except EBP, will be destroyed.
Stacks are very hard to understand while switching modes. Because our
SS:(E)SP is different in PMODE and RMODE it would be difficult to
use one stack for RMODE and one for PMODE. So what happens is that when
a mode switch is needed, a new small stack is used for the duration of
the mode switch. The variables pm2rm_* define our PMODE to RMODE stacks.
Switching from RMODE to PMODE right now is used only to return to PMODE
after going to RMODE to redirect an INT (or IRQ). Later we will make
things that will allow you to redir INTs (IRQs) from RMODE to PMODE. These
things are called CALLBACKs.
Click here and get stage#2.
Part #6 - 3rd example, IRQ redir, DPMI funcs.
|
Stage #3
Stage 3 now redirects IRQs from RMODE to PMODE. And some DPMI funcs have
been added.
DPMI: In the evolution of the PC 4 major servers exist. That's
Raw, XMS, VCPI and DPMI. So far stage #3 only supports Raw and XMS. But
eventually it will support all four. Now when your program executes under
VCPI or DPMI you are not allowed to touch the GDT (well, it depends on how
secure the system is setup). But basically you must share the system
tables with other programs. So therefore standards have been created to
allow program to modify system tables. To do this you make requests to
the VCPI or DPMI server. And because each is different it would be a very
difficult task for the programmer to determine which server is present and
then use requests for that server. Instead we will creat a DOS extender that
will accept ALL requests to a DPMI server, and if one does not exist the
requests will be sent to VCPI or to XMS or whatever. The DPMI standard
is used in this way because it's calls are the most complete and excepted
standard. So even though stage #3 does not even support a DPMI server
it will accept DPMI requests and execute them according to which server is
currently loaded (ie: RAW or XMS).
IRQs: Now all IRQs that occur in RMODE will be redirected up to
PMODE only if a PMODE handler for that IRQ has been setup.
DPMI funcs: DPMI funcs 300h,301h,302h,305h,306h were the ones added
so far. Basically these are used to call RMODE INTs or far PROCs which
will pass all general regs and the segment regs too (unlike just calling
the INT directly which will only pass the general regs). This is done
thru the use of a struct that will hold all values to give to the INT/PROC
and will also be filled in with the regs values after calling the INT/PROC.
See any DPMI specs for what these functions exactly look-like, although
you should be able to tell by the source.
Click here and get stage#3.
In the next stage we will start extender DOS functions, and add more DPMI
functions.
Part #7 - 4th example, extending DOS.
|
Stage #4
Stage 4 now allows 32bit programs to call some DOS functions with 32bit regs
and with linear addresses. So far only open,read and write and extended.
DOS extension: to extend DOS is really quite easy. All you do is
setup a PMODE INT 21h handler that will translate the 32bit requests into
16bit requests, and as many as needed until it's done. Because DOS can not
access RAM above 1MB all file IO must go thru a temp buffer under 1MB which
will then be copied to/from the memory above 1MB. For this I use a 8k
buffer. The DOS func close does not to be extended because all it expects
is BX=handle so this can simply be redir to rmode. So any function that
requires memory to be copied (such as a file name or data) must be extended
and everything else is simply extended except for func 42h (move file pointer)
since we will use a 32bit register for that which must be split up
before redir to rmode. I'll finish all this later, in the next stage I'm
going to introduce CALLBACKs and then I'll add VCPI in the stage after that.
By that point the ASM file will be getting so BIG and messy that I will be
putting it into multiple files.
Click here and get stage#4.
Part #8 - 5th example, CALLBACKs.
|
Stage #5
Stage 5 has added callbacks to allow PMODE to intercept RMODE INTs and
such. A mouse demo has been added to test it out.
Click here and get stage#5.
Part #9 - 6th example, VCPI support.
|
Stage #6
Stage 6 has added VCPI support which allows it to run under EMM386/QEMM and
such.
Click here and get stage#6.
VCPI : VCPI is really nice once you under stand it. The only problem
is that paging must be used. Basically what happens is that when you
want to go to PMODE you have complete control of where your descriptors
go in the GDT and IDT. But you can not use LGDT or LIDT. There are also
3 descriptors the VCPI server needs to use in the GDT but those can go
anywhere. The fisrt step is VCPI detection:
xor ax,ax
mov es,ax
cmp dword ptr es:[67h*4],0
jz noVCPI
mov ax,0de00h
int 67h
or ah,ah
jnz noVCPI
;BX=version
After that you must find out where the VCPI server's PMODE entry point is.
This FAR PROC should be called while in PMODE to use funcs 3-5 and 0ch.
When you get this entry proc DS:SI must point to a buffer to get the 3
descriptors from the VCPI server (8*3 bytes). And ES:DI must point to a 4K
buffer which should be the 1st page of the paging tables. After the INT call
DI will be moved to the next location in the page table that is free. The
entries in the page table used by the VCPI server are simply to map the
1st meg (or more) of RAM from linear to physical. When you alloc RAM (4k at
a time) from the VCPI server you are given a physical address, so these
addresses must be placed into the page tables. INT 67h/func 0de0ch is
used to switch from V86 mode to PMODE (or PMODE to V86 mode). When calling
this func you must supply a struct that contains all info that is needed
for the mode switch. See the tutorial for how all this is done.
Part #10 - 7th example, DPMI support.
|
Stage #7
Stage 7 has added DPMI support which allows it to run under Windoze 95, OS/2
etc.
Click here and get stage#7.
DPMI : DPMI is very different from RAW,XMS or VCPI. Now everything
changes. Selectors must now be allocated from the DPMI server. Basically
what happens under DPMI is that I alloc 4 descriptors that are needed
and set them to what is needed. Then I continue on as normal. The
GDT and IDT are no longer setup. The Callbacks do not need to be created and
the rmode to pmode IRQ redirectors are not setup. The pmode to rmode
INT redirectors are not setup and so on...
Part #11 - 8th example, DPMI/DOS funcs.
|
Stage #8
Stage 8 has added all DPMI funcs and has extended all DOS funcs.
Click here and get stage#8.
Next stage : Stage #9 will add exception handlers and extend mouse funcs.
But alias, this project has been abandoned...
Copyright © 1995-2005 Nexus Systems
Privacy