1. Write a program to swap every pair of bits in the AX register.
2. Give the value of the AX register and the carry flag after each of the following instructions.
stc
mov ax,
adc ah,
xor ah, al mov cl, 4 shr al, cl rcr ah, cl
3. Write a program to swap the nibbles in each byte of the AX register.
4. Calculate the number of one bits in BX and complement an equal number of least significant bits in AX.
HINT: Use the XOR instruction
5. Write a program to multiply two 32bit numbers and store the answer in a 64bit location.
6. Declare a 32byte buffer containing random data. Consider for this problem that the bits in these 32 bytes are numbered from 0 to 255. Declare another byte that contains the starting bit number. Write a program to copy the byte starting at this starting bit number in the AX register. Be careful that the starting bit number may not be a multiple of 8 and therefore the bits of the desired byte will be split into two bytes.
7. AX contains a number between 0-15. Write code to complement the corresponding bit in BX. For example if AX contains 6; complement the 6th bit of BX.
8. AX contains a non-zero number. Count the number of ones in it and store the result back in AX. Repeat the process on the result (AX) until AX contains one. Calculate in BX the number of iterations it took to
make AX one. For example BX should contain 2 in the following case: AX = 1100 0101 1010 0011 (input – 8 ones)
AX = 0000 0000 0000 1000 (after first iteration – 1 one)
AX = 0000 0000 0000 0001 (after second iteration – 1 one) STOP
Selective Bit Clearing
Another use of AND is to make selective bits ze ro in its destination operand. The source operand is loaded with a mask containing one at positions which are retain their old value and zero at positions which are to be zeroed. The effect of applying this operation on the destination with mask in the source is to clear the desired bits. This operation is called masking. For example if the lower nibble is to be cleared then the operation can be applied with F0 in the source. The upper nibble will retain its old value and the lower nibble will be cleared.
Selective Bit Setting
The operation can be used as a masking operation to set selective bits. The bits in the mask are cleared at positions which are to retain their values, and are set at positions which are to be set. For example to set the lower nibble of the destination operand, the operation should be applied with a mask of 0F in the source. The upper nibble will retain its value and the lower nibble will be set as a result.
56
Selective Bit Inversion
XOR can also be used as a masking operation to invert selective bits. The bits in the mask are cleared at positions, which are to retain their values, and are set at positions, which are to be inverted. For example to invert the lower nibble of the destination operand, the operand should be applied with a mask of 0F in the source. The upper nibble will retain its value and the lower nibble will be set as a result. Compare this with NOT which inverts everything. XOR on the other hand allows inverting selective bits.
Selective Bit Testing
AND can be used to check whether particular bits of a number are set or not. Previously we used shifting and JC to test bits one by one. Now we introduce another way to test bits, which is more powerful in the sense that any bit can be tested anytime and not necessarily in order. AND can be applied on a destination with a 1-bit in the desired position and a source, which is to be checked. If the destination is zero as a result, which can be checked with a JZ instruction, the bit at the desired position in the source was clear.
However the AND operation destroys the destination mask, which might be needed later as well. Therefore Intel provided us with another instruction analogous to CMP, which is non-destructive subtraction. This is the TEST instruction and is a non-destructive AND operation. It doesn’t change the destination and only sets the flags according to the AND operation. By checking the flags, we can see if the desired bit was set or cleared.
We change our multiplication algorithm to use selective bit testing instead of checking bits one by one using the shifting operations.
Example 4.3
01 ; 16bit multiplication using test for bit testing
02 [org 0x0100]
03 jmp start
04
05 multiplicand: dd 1300 ; 16bit multiplicand 32bit space
06 multiplier: dw 500 ; 16bit multiplier
07 result: dd 0 ; 32bit result
08
09 start: mov cl, 16 ; initialize bit count to 16
10 mov bx, 1 ; initialize bit mask
11
12 checkbit: test bx, [multiplier] ; move right most bit in carry
13 jz skip ; skip addition if bit is zero
14
15 mov ax, [multiplicand]
16 add [result], ax ; add less significant word
17 mov ax, [multiplicand+2]
18 adc [result+2], ax ; add more significant word
19
20 skip: shl word [multiplicand], 1
21 rcl word [multiplicand+2], 1 ; shift multiplicand left
22 shl bx, 1 ; shift mask towards next bit
23 dec cl ; decrement bit count
24 jnz checkbit ; repeat if bits left
25
26 mov ax, 0x4c00 ; terminate program
27 int 0x21
12 The test instruction is used for bit testing. BX holds the mask and in
every next iteration it is shifting left, as our concerned bit is now the
next bit.
22-24 We can do without counting in this example. We can stop as soon as
our mask in BX becomes zero. These are the small tricks that
assembly allows us to do and optimize our code as a result.
57
Inside the debugger observe that both the memory location and the mask in BX do not change as a result of TEST instruction. Also observe how our mask is shifting towards the left so that the next TEST instruction tests the next bit. In the end we get the same result of 0009EB10 as in the previous example.
The 8088 processor provides us with a few logical operations that operate at the bit level. The logical operations are the same as discussed in computer logic design; however our perspective will be a little different. The four basic operations are AND, OR, XOR, and NOT.
The important thing about these operations is that they are bitwise. This means that if “and ax, bx” instruction is given, then the operation of AND is applied on corresponding bits of AX and BX. There are 16 AND operations as a result; one for every bit of AX. Bit 0 of AX will be set if both its original value and Bit 0 of BX are set, bit 1 will be set if both its original value and Bit 1 of BX are set, and so on for the remaining bits. These operations are conducted in parallel on the sixteen bits. Similarly the operations of other logical operations are bitwise as well.
X 0 0 1 1
Y 0 1 0 1
55
AND operation
AND performs the logical bitwise and of the two
X Y X and Y
operands (byte or word) and returns the result to the
0 0 0
destination operand. A bit in the result is set if both
0 1 0
corresponding bits of the original operands are set;
1 0 0
otherwise the bit is cleared as shown in the truth table.
1 1 1
Examples are “and ax, bx” and “and byte [mem], 5.” All
possibilities that are legal for addition are also le gal for the AND operation.
The different thing is the bitwise behavior of this operation.
OR operation
OR performs the logical bitwise “inclusive or” of the two operands (byte or word) and returns the result to the destination operand. A bit in the result is set if either or both corresponding bits in the original operands are set otherwise the result bit is cleared as shown in the truth table. Examples are “or ax, bx” and “or byte [mem], 5.”
XOR operation
X or Y 0 1 1 1
XOR (Exclusive Or) performs the logical bitwise
X Y X xor Y
“exclusive or” of the two operands and returns the result
0 0 0
to the destination operand. A bit in the result is set if the
0 1 1
corresponding bits of the original operands contain
1 0 1
opposite values (one is set, the other is cleared) otherwise
1 1 0
the result bit is cleared as shown in the truth table. XOR
is a very important operation due to the property that it is a reversible
operation. It is used in many cryptography algorithms, image processing, and
in drawing operations. Examples are “xor ax, bx” and “xor byte [mem], 5.”
NOT operation
NOT inverts the bits (forms the one’s complement) of the byte or word operand. Unlike the other logical operations, this is a single operand instruction, and is not purely a logical operation in the sense the others are, but it is still traditionally counted in the same set. Examples are “not ax” and “not byte [mem], 5.”
We performed a 4bit multiplication to explain the algorithm however the real advantage of the computer is when we ask it to multiply large numbers. Numbers whose multiplication takes real time. If we have an 8bit number we can do the multiplication in word registers, but are we limited to word operations? What if we want to multiply 32bit or even larger numbers? We are certainly not limited. Assembly language only provides us the basic building blocks. We build a plaza out of these blocks, or a building, or a classic piece of architecture is only dependant upon our imagination. With our logic we can extend these algorithms as much as we want.
Our next example will be multiplication of 16bit numbers to produce a 32bit answer. However for a 32bit answer we need a way to shift a 32bit number and a way to add 32bit numbers. We cannot depend on 16bit shifting as we have 16 significant bits in our multiplicand and shifting any bit towards the left may drop a valuable bit causing a totally wrong result. A valuable bit means any bit that is one. Dropping a zero bit doesn’t cause any difference. So we place the 16it number in 32bit space with the upper 16 bits zeroed so that the sixteen shift operations don’t cause any valuable bit to drop. Even though the numbe rs were 16bit we need 32bit operations to multiply correctly.
To clarify this necessity, we take example of a number 40000 or 9C40 in hexadecimal. In binary it is represented as 1001110001000000. To multiply
52
by two we shift it one place to the left. The answer we get is 0011100010000000 and the left most one is dropped in the carry flag. The answer should be the 17bit number 0x13880 but it is 0x3880, which is 14464 in decimal instead of the expected 80000. We should be careful of this situation whenever shifting is used.
Extended Shifting
Using our basic shifting and rotation instructions we can effectively shift a 32bit number in memory word by word. We cannot shift the whole number at once since our architecture is limited to word operations. The algorithm we use consists of just two instructions and we name it extended shifting.
num1: dd 40000
shl word [num1], 1 rcl word [num1+2], 1
The DD directive reserves a 32bit space in memory, however the value we placed there will fit in 16bits. So we can safely shift the number left 16 times. The least significant word is accessible at num1 and the most significant word is accessible at num1+2.
The two instructions are carefully crafted such that the first one shifts the lower word towards the left and the most significant bit of that word is dropped in carry. With the next instruction we push that dropped bit into the least significant bit of the next word effectively joining the two 16bit words. The final carry after the se cond instruction will be the most significant bit of the higher word, which for this number will always be zero.
The following illustration will clarify the concept. The pipe on the right contains the lower half and the pipe on the left contains the upper half. The first instruction forced a zero from the right into the lower half and the left most bit is saved in carry, and from there it is pushed into the upper half and the upper half is shifted as well.
Step 1 C 1 0 1 1 0 1 0 0 0
C 1 0 1 1 0 1 0 0 Step 2
For shifting right the exact opposite is done however care must be taken to shift right the upper half first and then rotate through carry right the lower half for obvious reasons. The instructions to do this are.
num1: dd 40000
shr word [num1+2], 1 rcr word [num1], 1
The same logic has worked. The shift placed the least significant bit of the upper half in the carry flag and it was pushed from right into the lower half. For a singed shift we would have used the shift arithmetic right instruction instead of the shift logical right instruction.
The extension we have done is not limited to 32bits. We can shift a number of any size say 1024 bits. The second instruction will be repeated a number of times and we can achieve the desired effect. Using two simple instructions we have increased the capability of the operation to effectively an unlimited number of bits. The actual limit is the available memory as even the segment limit can be catered with a little thought.
Extended Addition and Subtraction
We also needed 32bit addition for multiplication of 16bit numbers. The idea of extension is same here. However we need to introduce a new instruction at this place. The instruction is ADC or “add with carry.” Normal addition has two operands and the second operand is added to the first
53
operand. However ADC has three operands. The third implied operand is the carry flag. The ADC instruction is specifically placed for extending the capability of ADD. Numbers of any size can be added using a proper combination of ADD and ADC. All basic building blocks are provided for the assembly language programmer, and the programmer can extend its capabilities as much as needed by using these fine instructions in appropriate combinations.
Further clarifying the ope ration of ADC, consider an instruction “ADC AX, BX.” Normal addition would have just added BX to AX, however ADC first adds the carry flag to AX and then adds BX to AX. Therefore the last carry is also included in the result.
The algorithm should be apparent by now. The lower halves of the two numbers to be added are firsted added with a normal addition. For the upper halves a normal addition would lose track of a possible carry from the lower halves and the answer would be wrong. If a carry was generated it should go to the upper half. Therefore the upper halves are added with an addition with carry instruction.
Since one operand must be in register, ax is used to read the lower and upper halves of the source one by one. The destination is directly updated. The set of instructions goes here.
dest: dd 40000
src: dd 80000
mov ax, [src]
add word [dest], ax
mov ax, [src+2]
adc word [dest+2], ax
To further extend it more addition with carries will be used. However the carry from last addition will be wasted as there will always be a size limit where the results and the numbers are stored. This carry will remain in the carry flag to be tested for a possible overflow.
For subtraction the same logic will be used and just like addition with carry there is an instruction to subtract with borrow called SBB. Borrow in the name means the carry flag and is used just for clarity. Or we can say that the carry flag holds the carry for addition instructions and the borrow for subtraction instructions. Also the carry is generated at the 17th bit and the borrow is also taken from the 17th bit. Also there is no single instruction that needs borrow and carry in their independent meanings at the same time. Therefore it is logical to use the same flag for both tasks.
We extend subtraction with a very similar algorithm. The lower halves must be subtracted normally while the upper halves must be subtracted with a subtract with borrow instruction so that if the lower halves needed a borrow, a one is subtracted from the upper halves. The algorithm is as under.
dest: dd 40000
src: dd 80000
mov ax, [src]
sub word [dest], ax
mov ax, [src+2]
sbb word [dest+2], ax
Extended Multiplication
We use extended shifting and extended addition to formulate our algorithm to do extended multiplication. The multiplier is still stored in 16bits since we only need to check its bits one by one. The multiplicand however cannot be stored in 16bits otherwise on left shifting its significant bits might get lost. Therefore it has to be stored in 32bits and the shifting and addition used to accumulate the result must be 32bits as well.
Example 4.2
54
01 ; 16bit multiplication
02 [org 0x0100]
03 jmp start
04
05 multiplicand: dd 1300 ; 16bit multiplicand 32bit space
06 multiplier: dw 500 ; 16bit multiplier
07 result: dd 0 ; 32bit result
08
09 start: mov cl, 16 ; initialize bit count to 16
10 mov dx, [multiplier] ; load multiplier in dx
11
12 checkbit: shr dx, 1 ; move right most bit in carry
13 jnc skip ; skip addition if bit is zero
14
15 mov ax, [multiplicand]
16 add [result], ax ; add less significant word
17 mov ax, [multiplicand+2]
18 adc [result+2], ax ; add more significant word
19
20 skip: shl word [multiplicand], 1
21 rcl word [multiplicand+2], 1 ; shift multiplicand left
22 dec cl ; decrement bit count
23 jnz checkbit ; repeat if bits left
24
25 mov ax, 0x4c00 ; terminate program
26 int 0x21
05-07 The multiplicand and the multiplier are stored in 32bit space while
10 the multiplier is stored as a word.
The multiplier is loaded in DX where it will be shifted bit by bit. It
can be directly shifted in memory as well.
15-18 The multiplicand is added to the result using extended 32bit
20-21 addition.
The multiplicand is shifted left as a 32bit number using extended
shifting operation.
The multiplicand will occupy the space from 0103-0106, the multiplier will occupy space from 0107-0108 and the result will occupy the space from 0109-010C. Inside the debugger observe the changes in these memory locations during the course of the algorithm. The extended shifting and addition operations provide the same effect as would be provided if there were 32bit addition and shifting operations available in the instruction set.
At the end of the algorithm the result memory locations contain the value 0009EB10 which is 65000 in decimal; the desired answer. Also observe that the number 00000514 which is 1300 in decimal, our multiplicand, has become 05140000 after being left shifted 16 times. Our extended shifting has given the same result as if a 32bit number is left shifted 16 times as a unit.
There are many other important applications of the shifting and rotation operations in addition to this example of the multiplication algorithm. More examples will come in coming chapters.
In the multiplication algorithm discussed above we revised the way we multiplied number in lower classes, and gave an example of that method on binary numbers. We make a simple modification to the traditional algorithm before we proceed to formulate it in assembly language.
In the traditional algorithm we calculate all intermediate answers and then sum them to get the final answer. If we add every intermediate answer to accumulate the result, the result will be same in the end, except that we do not have to remember a lot of intermediate answers during the whole multiplication. The multiplication with the new algorithm is shown below.
1101 = 13 Accumulated Result
0101 = 5
----- 0 (Initial Value)
1101 = 13 0 + 13 = 13
0000x = 0 13 + 0 = 13
1101xx = 52 13 + 52 = 65
0000xxx = 0 65 + 0 = 65 (Answer)
We try to identify steps of our algorithm. First we set the result to zero. Then we check the right most bit of multiplier. If it is one add the multiplicand to the result, and if it is zero perform no addition. Left shift the multiplicand before the next bit of multiplier is tested. The left shifting of the multiplicand is performed regardless of the value of the multiplier’s right most bit. Just like the crosses in traditional multiplication are always placed to mark the ones, tens, thousands, etc. places. Then check the next bit and if it is one add the shifted value of the multiplicand to the result. Repeat for as many digits as there are in the multiplier, 4 in our example. Formulating the steps of the algorithm we get:
• Shift the multiplier to the right.
• If CF=1 add the multiplicand to the result.
• Shift the multiplicand to the right.
• Repeat the algorithm 4 times.
For an 8bit multiplication the algorithm will be repeated 8 times and for a sixteen bit multiplication it will be repeated 16 times, whatever the size of the multiplier is.
The algorithm uses the fact that shifting right forces the right most bit to drop in the carry flag. If we test the carry flag using JC we are effectively testing the right most bit of the multiplier. Another shifting will cause the next bit to drop in the next iterati on and so on. So our task of checking bits one by one is satisfied using the shift operation. There are many other methods to do this bit testing as well, however we exemplify one of the methods in this example.
In the first iteration there is no shifting just like there is no cross in traditional multiplication in the first pass. Therefore we placed the left shifting of the multiplicand after the addition step. However the right shifting of multiplier must be before the addition as the addition step’s exe cution depends upon its result.
We introduce an assembly language program to perform this 4bit multiplication. The algorithm is extensible to more bits but there are a few complications, which are left to be discussed later. For now we do a 4bit multiplication to keep the algorithm simple.
Example 4.1
01 ; 4bit multiplication algorithm
02 [org 0x100]
03 jmp start
04
05 multiplicand: db 13 ; 4bit multiplicand (8bit space)
06 multiplier: db 5 ; 4bit multiplier
51
07 result: db 0 ; 8bit result
08
09 start: mov cl, 4 ; initialize bit count to four
10 mov bl, [multiplicand] ; load multiplicand in bl
11 mov dl, [multiplier] ; load multiplier in dl
12
13 checkbit: shr dl, 1 ; move right most bit in carry
14 jnc skip ; skip addition if bit is zero
15
16 add [result], bl ; accumulate result
17
18 skip: shl bl, 1 ; shift multiplicand left
19 dec cl ; decrement bit count
20 jnz checkbit ; repeat if bits left
21
22 mov ax, 0x4c00 ; terminate program
23 int 0x21
04-06 The numbers to be multiplied are constants for now. The multiplication is four bit so the answer is stored in an 8bit register. If the operands were 8bit the answer would be 16bit and if the
7 operands were 16bit the answer would be 32bit. Since eight bits can fit in a byte we have used 4bit multiplication as our first example. Since addition by zero means nothing we skip the addition step if
14-16 the rightmost bit of the multiplier is zero. If the jump is not taken
the shifted value of the multiplicand is added to the result.
The multiplicand is left shifted in every iteration regardless of the
18 multiplier bit.
DEC is a new instruction but its operation should be immediately
19 understandable with the knowledge gained till now. It simply subtracts one from its single operand.
The JNZ instruction causes the algorithm to repeat till any bits of
20
the multiplier are left
Inside the debugger observe the working of the SHR and SHL instructions. The SHR instruction is effectively dividing its operand by two and the remainder is stored in the carry flag from where we test it. The SHL instruction is multiplying its operand by two so that it is added at one place more towards the left in the result.
The set of shifting and rotation instructions is one of the most useful set in any processor’s instruction set. They simplify really complex tasks to a very
48
neat and concise algorithm. The following shifting and rotation operations are available in our processor.
Shift Logical Right (SHR)
The shift logical right operation inserts a zero from the left and moves every bit one position to the right and copies the rightmost bit in the carry flag. Imagine that there is a pipe filled to capacity with eight balls. The pipe is open from both ends and there is a basket at the right end to hold anything dropping from there. The operation of shift logical right is to force a white ball from the left end. The operation is depicted in the following illustration.
0 1 0 1 1 0 1 0 0 C
White balls represent zero bits while black balls represent one bits. Sixteen bit shifting is done the same way with a pipe of double capacity.
Shift Logical Left (SHL) / Shift Arithmetic Left (SAL)
The shift logical left operation is the exact opposite of shift logical right. In this operation the zero bit is inserted from the right and every bit moves one position to its left with the most significant bit dropping into the carry flag. Shift arithmetic left is just another name for shift logical left. The operation is again exemplified with the following illustration of ball and pipes.
C 1 0 1 1 0 1 0 0 0
Shift Arithmetic Right (SAR)
A signed number holds the sign in its most significant bit. If this bit was one a logical right shifting will change the sign of this number because of insertion of a zero from the left. The sign of a signed number should not change because of shifting.
The operation of shift arithmetic right is therefore to shift every bit one place to the right with a copy of the most significant bit left at the most significant place. The bit dropped from the right is caught in the carry basket. The sign bit is retained in this operation. The operation is further illustrated below.
1 0 1 1 0 1 0 0 C
The left shifting operation is basically multiplication by 2 while the right shifting operation is division by two. However for signed numbers division by two can be accomplished by using shift arithmetic right and not shift logical right. The left shift operation is equivalent to multiplication except when an important bit is dropped from the left. The overflow flag will signal this condition if it occurs and can be checked with JO. For division by 2 of a signed number logical right shifting will give a wrong answer for a negative number as the zero inserted from the left will change its sign. To retain the sign flag and still effectively divide by two the shift arithmetic right instruction must be used on signed numbers.
49
Rotate Right (ROR)
In the rotate right operation every bit moves one position to the right and the bit dropped from the right is inserted at the left. This bit is also copied into the carry flag. The operation can be understood by imagining that the pipe used for shifting has been molded such that both ends coincide. Now when the first ball is forced to move forward, every ball moves one step forward with the last ball entering the pipe from its other end occupying the first ball’s old position. The carry basket takes a snapshot of this ball leaving one end of the pipe and entering from the other.
1 0 1 1 0 1 0 0 C
Rotate Left (ROL)
In the operation of rotate left instruction, the most significant bit is copied to the carry flag and is inserted from the right, causing every bit to move one position to the left. It is the reverse of the rotate right instruction. Rotation can be of eight or sixteen bits. The following illustration will make the concept clear using the same pipe and balls example.
C 1 0 1 1 0 1 0 0
Rotate Through Carry Right (RCR)
In the rotate through carry right instruction, the carry flag is inserted from the left, every bit moves one position to the right, and the right most bit is dropped in the carry flag. Effectively this is a nine bit or a seventeen bit rotation instead of the eight or sixteen bit rotation as in the case of simple rotations.
Imagine the circular molded pipe as used in the simple rotations but this time the carry position is part of the circle between the two ends of the pipe. Pushing the carry ball from the left causes every ball to move one step to its right and the right most bit occupying the carry place. The idea is further illustrated below.
1 0 1 1 0 1 0 0 C
Rotate Through Carry Left (RCL)
The exact opposite of rotate through carry right instruction is the rotate through carry left instruction. In its operation the carry flag is inserted from the right causing every bit to move one location to its left and the most significant bit occupying the carry flag. The concept is illustrated below in the same manner as in the last example.
C 1 0 1 1 0 1 0 0
With the important capability of decision making in our repertoire we move on to the discussion of an algorithm, which will help us uncover an important set of instructions in our processor used for bit manipulations.
Multiplication is a common process that we use, and we were trained to do in early schooling. Remember multiplying by a digit and then putting a cross and then multiplying with the next digit and putting two crosses and so on and summing the intermediate results in the end. Very familiar process but we never saw the process as an algorithm, and we need to see it as an algorithm to convey it to the processor.
To highlight the important thing in the algorithm we revise it on two 4bit binary numbers. The numbers are 1101 i.e. 13 and 0101 i.e. 5. The answer should be 65 or in binary 01000001. Observe that the answer is twice as long as the multiplier and the multiplicand. The multiplication is shown in the following figure.
1101 = 13
0101 = 5
-----
1101
0000x 1101xx 0000xxx
--------
01000001 = 65
We take the first digit of the multiplier and multiply it with the multiplicand. As the digit is one the answer is the multiplicand itself. So we place the multiplicand below the bar. Before multiplying with the next digit a cross is placed at the right most place on the next line and the result is placed shifted one digit left. However since the digit is zero, the result is zero. Next digit is one, multiplying with which, the answer is 1101. We put two crosses on the next line at the right most positions and place the result there shifted two places to the left. The fourth digit is zero, so the answer 0000 is placed with three crosses to its right.
Observe the beauty of binary base, as no real multiplication is needed at the digit level. If the digit is 0 the answer is 0 and if the digit is 1 the answer is the multiplicand itself. Also observe that for every next digit in the multiplier the answer is written shifted one more place to the left. No shifting for the first digit, once for the second, twice for the third and thrice for the fourth one. Adding all the intermediate answers the result is 01000001=65 as desired. Crosses are treated as zero in this addition.
Before formulating the algorithm for this problem, we need some more instructions that can shift a number so that we use this instruction for our multiplicand shifting and also some way to check the bits of the multiplier one by one.
1. Which registers are changed by the CMP instruction?
2. What are the different types of jumps available? Describe position relative addressing.
3. If AX=8FFF and BX=0FFF and “cmp ax, bx” is executed, which of the following jumps will be taken? Each part is independent of others. Also give the value of Z, S, and C flags.
a. jg greater
b. jl smaller
c. ja above
d. jb below
4. Write a program to find the maximum number and the minimum number from an array of ten numbers.
5. Write a program to search a particular element from an array using binary search. If the element is found set AX to one and otherwise to zero.
6. Write a program to calculate the factorial of a number where factorial is defined as:
factorial(x) = x*(x-1)*(x-2)*...*1 factorial(0) = 1
Moving ahead from our example of adding numbers we progress to a program that can sort a list of numbers using the tools that we have accumulated till now. Sorting can be ascending or descending like if the largest number comes at the top, followed by a smaller number and so on till the smallest number the sort will be called descending. The other order starting with the smallest number and ending at the largest is called ascending sort. This is a common problem and many algorithms have been developed to solve it. One simple algorithm is the bubble sort algorithm.
In this algorithm we compare consecutive numbers. If they are in required order e.g. if it is a descending sort and the first is larger then the second, then we leave them as it is and if they are not in order, we swap them. Then we do the same process for the next two numbers and so on till the last two are compared and possibly swapped.
A complete iteration is called a pass over the array. We need N passes at least in the simplest algorithm if N is the number of elements to be sorted. A finer algorithm is to check if any swap was done in this pass and stop as soon as a pass goes without a swap. The array is now sorted as every pair of elements is in order.
For example if our list of numbers is 60, 55, 45, and 58 and we want to sort them in ascending order, the first comparison will be of 60 and 55 and as the order will be reversed to 55 and 60. The next comparison will be of 60 and 45 and again the two will be swapped. The next comparison of 60 and 58 will also cause a swap. At the end of first pass the numbers will be in order of 55, 45, 58, and 60. Observe that the largest number has bubbled down to the bottom. Just like a bubble at bottom of water. In the next pass 55 and 45 will be swapped. 55 and 58 will not be swapped and 58 and 60 will also not be swapped. In the next pass there will be no swap as the elements are in order i.e. 45, 55, 58, and 60. The passes will be stopped as the last pass did not cause any swap. The application of bubble sort on these numbers is further explained with the following illustration.
43
State of Data Swap Done Swap Flag
Pass 1 Off
60 55 45 58 Yes On
Yes On
55 60 45 58
55 45 60 58 Yes On
Pass 2 Off
Yes On
55 45 58 60
45 55 58 60 No On
No On
45 55 58 60
Pass 3 Off
45 55 58 60 No Off
No Off
45 55 58 60
45 55 58 60 No Off
No more passes since swap flag is Off
Example 3.3
001 ; sorting a list of ten numbers using bubble sort
002 [org 0x0100]
003 jmp start
004
005 data: dw 60, 55, 45, 50, 40, 35, 25, 30, 10, 0
006 swap: db 0
007
008 start: mov bx, 0 ; initialize array index to zero
009 mov byte [swap], 0 ; rest swap flag to no swaps
010
011 loop1: mov ax, [data+bx] ; load number in ax
012 cmp ax, [data+bx+2] ; compare with next number
013 jbe noswap ; no swap if already in order
014
015 mov dx, [data+bx+2] ; load second element in dx
016 mov [data+bx+2], ax ; store first number in second
017 mov [data+bx], dx ; store second number in first
018 mov byte [swap], 1 ; flag that a swap has been done
019
020 noswap: add bx, 2 ; advance bx to next index
021 cmp bx, 18 ; are we at last index
022 jne loop1 ; if not compare next two
023
024 cmp byte [swap], 1 ; check if a swap has been done
025 je bsort ; if yes make another pass
026
027 mov ax, 0x4c00 ; terminate program
028 int 0x21
44
003 The jump instruction is placed to skip over data.
6 The swap flag can be stored in a register but as an example it is stored in memory and also to extend the concept at a later stage.
011-012 One element is read in AX and it is compared with the next element because memory to memory comparisons are not allowed.
13 If the JBE is changed to JB, not only the unnecessary swap on equal will be performed, there will be a major algorithmic flaw due to a logical error as in the case of equal elements the algorithm will never stop. JBE won’t swap in the case of equal elements.
015-017 The swap is done using DX and AX registers in such a way that the values are crossed. The code uses the information that one of the elements is already in the AX register.
21 This time BX is compared with 18 instead of 20 even though the number of elements is same. This is because we pick an element and compare it with the next element. When we pick the 9th element we compare it with the next element and this is the last comparison, since if we pick the 10th element we will compare it with the 11th element and there is no 11th element in our case.
024-025 If a swap is done we repeat the whole process for possible more swaps.
Inside the debugger we observe that the JBE is changed to JNA due to the same reason as discussed for JNE and JNZ. The passes change the data in the same manner as we presented in our illustration above. If JBE in the code is changed to JAE the sort will change from ascending to descending. For signed numbers we can use JLE and JGE respectively for ascending and descending sort.
To clarify the difference of signed and unsigned jumps we change the data array in the last program to include some negative numbers as well. When JBE will be used on this data, i.e. with unsigned interpretation of the data and an ascendi ng sort, the negative numbers will come at the end after the largest positive number. However JLE will bring the negative numbers at the very start of the list to bring them in proper ascending order according to a signed interpretation, even though they are large in magnitude. The data used is shown as below.
data: dw 60, 55, 45, 50, -40, -35, 25, 30, 10, 0
This data includes some signed numbers as well. The JBE instruction will treat this data as an unsigned number and will cater only for the magnitude ignoring the sign. If the program is loaded in the debugger, the numbers will appear in their hexadecimal equivalent. The two numbers -40 and -35 are especially important as they are represented as FFD8 and FFDD. This data is not telling whether it is signed or unsigned. Our interpretation will decide whether it is a very large unsigned number or a signed number in two’s complement form.
If the sorting algorithm is applied on the above data with JBE as the comparison instruction to sort in ascending order with unsigned interpretation, observe the comparisons of the two numbers FFD8 and FFDD. For example it will decide that FFDD > FFD8 since the first is larger in magnitude. At the end of sorting FFDD will be at the end of the list being declared the largest number and FFD8 will precede it to be the second largest.
If however the comparison instruction is changed to JLE and sorting is done on the same data it works similarly except on the two numbers FFDD and FFD8. This time JLE declares them to be smaller than every other number and also declares FFDD < FFD8. At the end of sorting, FFDD is
45
declared to be the smallest number followed by FFD8 and then 0000. This is in contrast to the last example where JBE was used. This happened because JLE interpreted our data as signed numbers, and as a signed number FFDD has its sign bit on signaling that it is a negative number in two’s complement form which is smaller than 0000 and every positive number. However JBE did not give any significance to the sign bit and included it in the magnitude. Therefore it declared the negative numbers to be the largest numbers.
If the required interpretation was of signed numbers the result produced by JLE is correct and if the required interpretation was of unsigned numbers the re sult produced by JBE is correct. This is the very difference between signed and unsigned integers in higher level languages, where the compiler takes the responsibility of making the appropriate jump depending on the type of integer used. But it is only at this level that we can understand the actual mechanism going on. In assembly language, use of proper jump is the responsibility of the programmer, to convey the intentions to use the data as signed or as unsigned.
The remaining possibilities of signed descending sort and unsigned descending sort can be done on the same lines and are left as an exercise. Other conditional jumps work in the same manner and can be studied from the reference at the end. Several will be discussed in more detail when they are used in subsequent chapters.
The three types of jump, near, short, and far, differ in the size of instruction and the range of memory they can jump to with the smallest short form of two bytes and a range of just 256 bytes to the far form of five bytes and a range coveri ng the whole memory.
Short Jump
EB Disp
Near Jump
EB Disp Low Disp High
Far Jump
EB IP Low IP High CS Low CS High
Near Jump
When the relative address stored with the instruction is in 16 bits as in the last example the jump is called a near jump. Using a near jump we can jump anywhere within a segment. If we add a large number it will wrap around to
42
the lower part. A negative number actually is a large number and works this way using the wraparound behavior.
Short Jump
If the offset is stored in a single byte as in 75F2 with the opcode 75 and operand F2, the jump is called a short jump. F2 is added to IP as a signed byte. If the byte is negative the complement is negated from IP otherwise the byte is added. Unconditional jumps can be short, near, and far. The far type is yet to be discussed. Conditional jumps can only be short. A short jump can go +127 bytes ahead in code and -128 bytes backwards and no more. This is the limitation of a byte in singed representation.
Far Jump
Far jump is not position relative but is absolute. Both segment and offset must be given to a far jump. The previous two jumps were used to jump within a segment. Sometimes we may need to go from one code segment to another, and near and short jumps cannot take us there. Far jump must be used and a two byte segment and a two byte offset are given to it. It loads CS wit the segment part and IP with the offset part. Execution therefore resumes from that location in physical memory. The three instructions that have a far form are JMP, CALL, and RET, are related to program control. Far capability makes intra segment control possible.
Inside the debugger the instruction is shown as JMP 0119 and the location 0119 contains the original first instruction of the logic of our program. This jump is unconditional, it will always be taken. Now looking at the opcode we see F21600 where F2 is the opcode and 1600 is the operand to it. 1600 is 0016 in proper word order. 0119 is not given as a parameter rather 0016 is given.
This is position relative addressing in contrast to absolute addressing. It is not telling the exact address rather it is telling how much forward or backward to go from the current position of IP in the current code segment. So the instruction means to add 0016 to the IP register. At the time of execution of the first instruction at 0100 IP was pointing to the next instruction at 0103, so after adding 16 it became 0119, the desired target location. The mechanism is important to know, however all calculations in this mechanism are done by the assembler and by the processor. We just use a label with the JMP instruction and are ensured that the instruction at the target label will be the one to be executed.
Till now we have been placing data at the end of code. There is no such restriction and we can define data anywhere in the code. Taking the previous example, if we place data at the start of code inste ad of at the end and we load our program in the debugger. We can see our data placed at the start but the debugger is intending to start execution at our data. The COM file definition said that the first executable instruction is at offset 0100 but we have placed data there instead of code. So the debugger will try to interpret that data as code and showed whatever it could make up out of those opcodes.
We introduce a new instruction called JMP. It is the unconditional jump that executes regardless of the state of all flags. So we write an unconditional jump as the very first instruction of our program and jump to the next instruction that follows our data declarations. This time 0100 contains a valid first instruction of our program.
Example 3.2
001 ; a program to add ten numbers without a separate counter
002 [org 0x0100]
003 jmp start ; unconditionally jump over data
004
005 num1: dw 10, 20, 30, 40, 50, 10, 20, 30, 40, 50
006 total: dw 0
41
007
008 start: mov bx, 0 ; initialize array index to zero
009 mov ax, 0 ; initialize sum to zero
010
011 l1: add ax, [num1+bx] ; add number to ax
012 add bx, 2 ; advance bx to next index
013 cmp bx, 20 ; are we beyond the last index
014 jne l1 ; if not add next number
015
016 mov [total], ax ; write back sum in memory
017
018 mov ax, 0x4c00 ; terminate program
019 int 0x21
003 JMP jumps over the data declarations to the start label and
execution resumes from there.
For every interesting or meaningful situation of flags, a conditional jump is there. For example JZ and JNZ check the zero flag. If in a comparison both operands are same, the result of subtraction will be zero and the zero flag will be set. Thus JZ and JNZ can be used to test equality. That is why there are renamed versions JE and JNE read as jump if equal or jump if not equal. They seem more logical in writing but mean exactly the same thing with the same opcode. Many jumps are renamed with two or three names for the same jump, so that the appropriate logi c can be conveyed in assembly language programs. This renaming is done by Intel and is a standard for iAPX88. JC and JNC test the carry flag. For example we may need to test whether there was an overflow in the last unsigned addition or subtraction. Carry flag will also be set if two unsigned numbers are subtracted and the first is smaller than the second. Therefore the renamed versions JB, JNAE, and JNB, JAE are there standing for jump if below, jump if not above or equal, jump if not below, and jump if above or equal respectively. The operation of all jumps can be seen from the following table.
JC Jump if carry CF = 1 This jump is taken if
JB Jump if below the last arithmetic
JNAE Jump if not above or equal operation generated a
carry or required a
borrow. After a CMP it
is taken if the
unsigned source is
smaller than the
unsigned destination.
JNC Jump if not carry CF = 0 This jump is taken if
JNB Jump if not below the last arithmetic
JAE Jump if above or equal operation did not
38
generated a carry or
required a borrow.
After a CMP it is taken
if the unsigned source
is larger or equal to
the unsigned
destination.
JE Jump if equal ZF = 1 This jump is taken if
JZ Jump if zero the last arithmetic
operation produced a
zero in its destination.
After a CMP it is taken
if both operands were
equal.
JNE Jump if not equal ZF = 0 This jump is taken if
JNZ Jump if not zero the last arithmetic
operation did not
produced a zero in its
destination. After a
CMP it is taken if both
operands were
different.
JA Jump if above ZF = 0 AND This jump is taken
JNBE Jump if not below or equal CF = 0 after a CMP if the
unsigned source is
larger than the
unsigned destination.
JNA Jump if not above ZF = 1 OR This jump is taken
JBE Jump if not below or equal CF = 1 after a CMP if the
unsigned source is
smaller than or equal
to the unsigned
destination.
JL Jump if less SF OF This jump is taken
JNGE Jump if not greater or equal after a CMP if the
signed source is
smaller than the
signed destination.
JNL Jump if not less SF = OF This jump is taken
JGE Jump if greater or equal after a CMP if the
signed source is larger
than or equal to the
signed destination.
JG Jump if greater ZF = 0 AND This jump is taken
JNLE Jump if not less or equal SF = OF after a CMP if the
signed source is larger
than the signed
destination.
JNG Jump if not greater ZF = 1 OR This jump is taken
JLE Jump if less or equal SF OF after a CMP if the
signed source is
smaller than or equal
to the signed
destination.
39
JO Jump if overflow. OF = 1 This jump is taken if
the last arithmetic
operation changed the
sign unexpectedly.
JNO Jump if not overflow OF = 0 This jump is taken if
the last arithmetic
operation did not
change the sign
unexpectedly.
JS Jump if sign SF = 1 This jump is taken if
the last arithmetic
operation produced a
negative number in its
destination.
JNS Jump if not sign SF = 0 This jump is taken if
the last arithmetic
operation produced a
positive number in its
destination.
JP Jump if parity PF = 1 This jump is taken if
JPE Jump if even parity the last arithmetic
operation produced a
number in its
destination that has
even parity.
JNP Jump if not parity PF = 0 This jump is taken if
JPO Jump if odd parity the last arithmetic
operation produced a
number in its
destination that has
odd parity.
JCXZ Jump if CX is zero CX = 0 This jump is taken if
the CX register is zero.
The CMP instruction sets the flags reflecting the relation of the destination to the source. This is important as when we say jump if above, then what is above what. The destination is above the source or the source is above the destination.
The JA and JB instructions are related to unsigned numbers. That is our interpretation for the destination and source operands is unsigned. The 16th bit holds data and not the sign. In the JL and JG instructions standing for jump if lower and jump if greater respectively, the interpretation is signed. The 16th bit holds the sign and not the data. The difference between them will be made clear as an elaborate example will be given to explain the difference.
One jump is special that it is not dependant on any flag. It is JCXZ, jump if the CS register is zero. This is because of the special treatment of the CX register as a counter. This jump is regardless of the zero flag. There is no counterpart or not form of this instruction.
The adding numbers example of the last chapter can be a little simplified using the compare instruction on the BX register and eliminating the need for a separate counter as below.
40
Example 3.1
001 ; a program to add ten numbers without a separate counter
002 [org 0x0100]
003 mov bx, 0 ; initialize array index to zero
004 mov ax, 0 ; initialize sum to zero
005
006 l1: add ax, [num1+bx] ; add number to ax
007 add bx, 2 ; advance bx to next index
008 cmp bx, 20 ; are we beyond the last index
009 jne l1 ; if not add next number
010
011 mov [total], ax ; write back sum in memory
012
013 mov ax, 0x4c00 ; terminate program
014 int 0x21
015
016 num1: dw 10, 20, 30, 40, 50, 10, 20, 30, 40, 50
017 total: dw 0
006 The format of memory access is still base + offset.
8 BX is used as the array index as well as the counter. The offset of 11th number will be 20, so as soon as BX becomes 20 just after the 10th number has been added, the addition is stopped.
9 The jump is displayed as JNZ in the debugger even though we have written JNE in our example. This is because it is a renamed jump with the same opcode as JNZ and the debugger has no way of knowing the mnemonic that we used after looking just at the opcode. Also every code and data reference that we used till now is seen in the opcode as well. However for the jump instruction we see an operand of F2 in the opcode and not 0116. This will be discussed in detail with unconditional jumps. It is actually a short relative jump and the operand is stored in the form of positive or negative offset from this instruction.
With conditional branching in hand, there are just a few small things left in assembly language that fills some gaps. Now there is just imagination and the skill to conceive programs that can make you write any program.
Conditional jump was introduced in the last chapter to loop for the addition of a fixed number of array elements. The jump was based on the zero flag. There are many other conditions possible in a program. For example an operand can be greater than another operand or it can be smaller. We use comparisons and boolean expressions extensively in higher level languages. They must be available is some form in assembly language, otherwise they could not possibly be made available in a higher level language. In fact they are available in a very fi ne and purified form.
The basic root instruction for all comparisons is CMP standing for compare. The operation of CMP is to subtract the source operand from the destination operand, updating the flags without changing either the source or the destination. CMP is one of the key instructions as it introduces the capability of conditional routing in the processor.
A closer thought reveals that with subtraction we can check many different conditions. For example if a larger number is subtracted from a smaller number a borrow is needed. The carry flag plays the role of borrow during the subtraction operation. And in this condition the carry flag will be set. If two equal numbers are subtracted the answer is zero and the zero flag will be set. Every significant relation between the destination and source is evident from the sign flag, carry flag, zero flag, and the overflow flag. CMP is meaningless without a conditional jump immediately following it.
Another important distinction at this point is the difference between signed and unsigned numbers. In unsigned numbers only the magnitude of the number is important, whereas in signed numbers both the magnitude and the sign are important. For example -2 is greater than -3 but 2 is smaller than 3. The sign has affected our comparisons.
Inside the computer signed numbers are represented in two’s complement notation. In essence a number in this representation is still a number, just that now our interpretation of this number will be signed. Whether we use jump above and below or we use jump greater or less will convey our intention to the processor. The jump above and greater operations at first sight seem to be doing the same operation, and similarly below and less operations seem to be similar. However for signed numbers JG and JL will work properly and for unsigned JA and JB will work properly and not the other way around.
It is important to note that at the time of comparison, the intent of the programmer to treat the numbers as signed or unsigned is not clear. The subtraction in CMP is a normal subtraction. It is only after the comparison, during the conditional jump operation, that the intent is conveyed. At that time with a specific combination of flags checked the intent is satisfied.
For example a number 2 is represented in a word as 0002 while the number -2 is represented as FFFE. In a byte they would be represented as 02 and FE. Now both have the same magnitude however the different sign has caused very different representation in two’s complement form. Now if the intent is to use FFFE or decimal 65534 then the same data would be placed in the word as in case of -2. In fact if -2 and 65534 are compared the processor will set the zero flag signaling that they are exactly equal. As regards an unsigned comparison the number 65534 is much greater than 2.
36
So if a JA is taken after comparing -2 in the destination with 2 in the source the jump will be taken. If however JG is used after the same comparison the jump will not be taken as it will consider the sign and with the sign -2 is smaller than 2. The key idea is that -2 and 65534 were both stored in memory in the same form. It was the interpretation that treated it as a signed or as an unsigned number.
The unsigned comparisons see the numbers as 0 being the smallest and 65535 being the largest with the order that 0 < 1 < 2 … < 65535. The signed comparisons see the number -32768 which has the same memory representation as 32768 as the smallest number and 32767 as the largest with the order -32768 < -32767 < … < -1 < 0 < 1 < 2 < … < 32767. All the negative numbers have the same representation as an unsigned number in the range 32768 … 65535 however the signed interpretation of the signed comparisons makes them be treated as negative numbers smaller than zero.
All meaningful situations both for signed and unsigned numbers than occur after a comparison are detailed in the following table.
DEST = SRC ZF = 1 When the source is subtracted
from the destination and both are
equal the result is zero and
therefore the zero flag is set. This
works for both signed and
unsigned numbers.
UDEST < USRC CF = 1 When an unsigned source is
subtracted from an unsigned
destination and the destination is
smaller, a borrow is needed
which sets the carry flag.
UDEST USRC ZF = 1 OR CF = 1 If the zero flag is set, it means
that the source and destination
are equal and if the carry flag is
set it means a borrow was needed
in the subtraction and therefore
the destination is smaller.
UDEST USRC CF = 0 When an unsigned source is
subtracted from an unsigned
destination no borrow will be
needed either when the operands
are equal or when the destination
is greater than the source.
UDEST > USRC ZF = 0 AND CF = 0 The unsigned source and
destination are not equal if the
zero flag is not set and the
destination is not smaller since
no borrow was taken. Therefore
the destination is greater than
the source.
SDEST < SSRC SF OF When a signed source is
subtracted from a signed
destination and the answer is
negative with no overflow than
the destination is smaller than
the source. If however there is an
overflow meaning that the sign
has changed unexpectedly, the
meanings are reversed and a
37
positive number signals that the
destination is smaller.
SDEST SSRC ZF = 1 OR SF OF If the zero flag is set, it means
that the source and destination
are equal and if the sign and
overflow flags differ it means that
the destination is smaller as
described above.
SDEST SSRC SF = OF When a signed source is
subtracted from a signed
destination and the answer is
positive with no overflow than the
destination is greater than the
source. When an overflow is there
signaling that sign has changed
unexpectedly, we interpret a
negative answer as the signal
that the destination is greater.
SDEST > SSRC ZF = 0 AND SF = OF If the zero flag is not set, it means
that the signed operands are not
equal and if the sign and overflow
match in addition to this it
means that the destination is
greater than the source.
1. What is a label and how does the assembler differentiates between code labels and data labels?
2. List the seven addressing modes available in the 8088 architecture.
3. Differentiate between effective address and physical address.
4. What is the effective address generated by the following instructions? Every instruction is independent of others. Initially BX=0x0100, num1=0x1001, [num1]=0x0000, and SI=0x0100
a. mov ax, [bx+12]
b. mov ax, [bx+num1]
c. mov ax, [num1+bx]
d. mov ax, [bx+si]
5. What is the effective address generated by the following combinations if they are valid. If not give reason. Initially BX=0x0100, SI=0x0010, DI=0x0001, BP=0x0200, and SP=0xFFFF
a. bx-si
b. bx-bp
c. bx+10
d. bx-10
e. bx+sp
f. bx+di
6. Identify the problems in the following instructions and correct them by replacing them with one or two instruction having the same effect.
33
a. mov [02], [ 22]
b. mov [wordvar], 20
c. mov bx, al
d. mov ax, [si+di+100]
7. What is the function of segment override prefix and what changes it brings to the opcode?
8. What are the two types of address wraparound? What physical address is accessed with [BX+SI] if FFFF is loaded in BX, SI, and DS.
9. Write instructions to do the following.
a. Copy contents of memory location with offset 0025 in the current data segment into AX.
b. Copy AX into memory location with offset 0FFF in the current data segment.
c. Move contents of memory location with offset 0010 to memory location with offset 002F in the current data segment.
10. Write a program to calculate the square of 20 by using a loop that adds 20 to the accumulator 20 times.
The iAPX88 processor supports seven modes of memory access. Remember that immediate is not an addressing mode but an operand type. Operands can be immediate, register, or memory. If the operand is memory one of the seven addressing modes will be used to access it. The memory access mechanisms can also be written in the general form “base + index + offset” and we can define the possible addressing modes by saying that any one, two, or none can be skipped from the general form to form a legal memory access.
There are a few common mistakes done in forming a valid memory access. Part of a register cannot be used to access memory. Like BX is allowed to hold an address but BL or BH are not. Address is 16bit and must be contained in a 16bit register. BX-SI is not possible. The only thing that we can do is addition of a base register with an index register. Any other operation is disallowed. BS+BP and SI+DI are both disallowed as we cannot have two base or two index registers in one memory access. One has to be a base register and the other has to be an index register and that is the reason of naming them differently.
Direct
A fixed offset is given in brackets and the memory at that offset is accessed. For example “mov [1234], ax” stores the contents of the AX registers in two bytes starting at address 1234 in the current data segment. The instruction “mov [1234], al” stores the contents of the AL register in the byte at offset 1234.
Based Register Indirect
A base register is used in brackets and the actual address accessed depends on the value contained in that register. For example “mov [bx], ax” moves the two byte contents of the AX register to the address contained in the BX register in the current data segment. The instruction “mov [bp], al” moves the one byte content of the AL register to the address contained in the BP register in the current stack segment.
Indexed Register Indirect
An index register is used in brackets and the actual address accessed depends on the value contained in that register. For example “mov [si], ax” moves the contents of the AX register to the word starting at address contained in SI in the current data segment. The instruction “mov [di], ax” moves the word contained in AX to the offset stored in DI in the current data segment.
Based Register Indirect + Offset
A base register is used with a constant offset in this addressing mode. The value contained in the base register is added with the constant offset to get the effective address. For example “mov [bx+300], ax” stores the word contained in AX at the offset attained by adding 300 to BX in the current data segment. The instruction “mov [bp+300], ax” stores the word in AX to the offset attained by adding 300 to BP in the current stack segment.
32
Indexed Register Indirect + Offset
An index register is used with a constant offset in this addressing mode. The value contained in the index register is added with the constant offset to get the effective address. For example “mov [si+300], ax” moves the word contained in AX to the offset attained by adding 300 to SI in the current data segment and the instruction “mov [di+300], al” moves the byte contained in AL to the offset attained by adding 300 to DI in the current data segment.
Base + Index
One base and one index register is used in this addressing mode. The value of the base register and the index register are added together to get the effective address. For example “mov [bx+si], ax” moves the word contained in the AX register to offset attained by adding BX and SI in the current data segment. The instruction “mov [bp+di], al” moves the byte contained in AL to the offset attained by adding BP and DI in the current stack segment. Observe that the default segment is based on the base register and not on the index register. This is why base registers and index registers are named separately. Other examples are “mov [bx+di], ax” and “mov [bp+si], ax.” This method can be used to access a two di mensional array such that one dimension is in a base register and the other is in an index register.
Base + Index + Offset
This is the most complex addressing method and is relatively infrequently used. A base register, an index register, and a constant offset are all used in this addressing mode. The values of the base register, the index register, and the constant offset are all added together to get the effective address. For example “mov [bx+si+300], ax” moves the word contents of the AX register to the word in memory starting at offset attained by adding BX, SI, and 300 in the current data segment. Default segment association is again based on the base register. It might be used with the array base of a two dimensional array as the constant offset, one dimension in the base register and the other in the index register. This way all calculation of location of the desired element has been delegated to the processor.
There are two types of wraparounds. One is within a single segment and the other is inside the whole physical memory. Segment wraparound occurs when during the effective address calculation a carry is generated. This carry is dropped giving the effect that when we try to access beyond the segment limit, we are actually wrapped around to the first cell in the segment. For example if BX=9100, DS=1500 and the access is [bx+0x7000] we form the effective address 9100 + 7000 = 10100. The carry generated is dropped forming the actual effective address of 0100. Just like a circle when we reached the end we started again from the beginning. An arc at 370 degrees is the same as an arc at 10 degrees. We tried to cross the segment boundary and it pushed us back to the start. This is called segment wraparound. The physical address in the above example will be 15100.
The same can also happen at the time of physical address calculation. For example BX=0100, DS=FFF0 and the access under consideration is [bx+0x0100]. The effective address will be 0200 and the physical address will
31
be 100100. This is a 21bit answer and cannot be sent on the address bus which is 20 bits wide. The carry is dropped and just like the segment wraparound our physical memory has wrapped around at its very top. When we tried to access beyond limits the actual access is made at the very start. This second wraparound is a bit different in newer processor with more address lines but that will be explained in later chapters.
All the addressing mechanisms in iAPX88 return a number called effective address. For example in base + offset addressing, neither the base nor the offset alone tells the desired cell in memory to be accessed. It is only after the addition is done that the processor knows which cell to be accessed. This number which came as the result of addition is called the effective address. But the effective address is just an offset and is meaningless without a segment. Only after the segment is known, we can form the physical address that is needed to access a memory cell.
We discussed the segmented memory model of iAPX88 in reasonable detail at the end of previous chapter. However during the discussion of addressing modes we have not seen the effect of segments. Segmentation is there and it’s all happening relative to a segment base. We saw DS, CS, SS, and ES
30
inside the debugger. Everything is relative to its segment base, even though we have not explicitly explained its functionality. An offset alone is not complete without a segment. As previously discussed there is a default segment associated to every register which accesses memory. For example CS is associated to IP by default; rather it is tied with it. It cannot access memory in any other segment.
In case of data, there is a bit relaxation and nothing is tied. Rather there is a default association which can be overridden. In the case of register indirect memory access, if the register used is one of SI, DI, or BX the default segment is DS. If however the register used in BP the default segment used is SS. The stack segment has a very critical and fine use and there is a reason why BP is attached to SS by default. However these will be discussed in detail in the chapter on stack. IP is tied to CS while SP is tied to SS. The association of these registers cannot be changed; they are locked with no option. Others are not locked and can be changed.
To override the association for one instruction of one of the registers BX, BP, SI or DI, we use the segment override prefix. For example “mov ax, [cs:bx]” associates BX with CS for this one instruction. For the next instruction the default association will come back to act. The processor places a special byte before the instruction called a prefix, just like prefixes and suffixes in English language. No prefix is needed or placed for default association. For example for CS the byte 2E is placed and for ES the byte 26 is placed. Opcode has not changed, but the prefix byte has modified the default association to association with the desired segment register for this one instruction.
In all our examples, we never declared a segment or used it explicitly, but everything seemed to work fine. The important thing to note is that CS, DS, SS, and ES all had the same value. The value itself is not important but the fact that all had the same value is important. All four segment windows exactly overlap. Whatever segment register we use the same physical memory will be accessed. That is why everything was working without the mention of a single segment register. This is the formation of COM files in IBM PC. A single segment contains code, data, and the stack. This format is operating system dependant, in our case defined by DOS. And our operating system defines the format of COM files such that all segments have the same value. Thus the only meaningful thing that remains is the offset.
For example if BX=0100, SI=0200, and CS=1000 and the memory access under consideration is [cs:bx+si+0x0700], the effective address formed is bx+si+0700 = 0100 + 0200 + 0700 = 0A00. Now multiplying the segment value by 16 makes it 10000 and adding the effective address 00A00 forms the physical address 10A00.
Direct addressing and indirect addressing using a single register are two basic forms of memory access. Another possibility is to use different combinations of direct and indirect references. In the above example we used BX to access different array elements which were placed consecutively in memory like an array. We can also place in BX only the array index and not the exact address and form the exact address when we are going to access the actual memory. This way the same register can be used for accessing different arrays and also the register can be used for index comparison like the following example does.
Example 2.8
001 ; a program to add ten numbers using register + offset addressing
002 [org 0x0100]
003 mov bx, 0 ; initialize array index to zero
004 mov cx, 10 ; load count of numbers in cx
005 mov ax, 0 ; initialize sum to zero
006
007 l1: add ax, [num1+bx] ; add number to ax
008 add bx, 2 ; advance bx to next index
009 sub cx, 1 ; numbers to be added reduced
010 jnz l1 ; if numbers remain add next
011
012 mov [total], ax ; write back sum in memory
013
014 mov ax, 0x4c00 ; terminate program
015 int 0x21
016
017 num1: dw 10, 20, 30, 40, 50, 10, 20, 30, 40, 50
018 total: dw 0
003 This time BX is initialized to zero instead of array base
7 The format of memory access has changed. The array base is added to BX containing array index at the time of memory access.
8 As the array is of words, BX jumps in steps of two, i.e. 0, 2, 4. Higher level languages do appropriate incrementing themselves and we always use sequential array indexes. However in assembly language we always calculate in bytes and therefore we need to take care of the size of one array element which in this case is two.
Inside the debugger we observe that the memory access instruction is shown as “mov ax, [011F+bx]” and the actual memory accessed is the one whose address is the sum of 011F and the value contained in the BX register. This form of access is of the register indirect family and is called base + offset or index + offset depending on whether BX or BP is used or SI or DI is used.
We have done very elementary data access till now. Assume that the numbers we had were 100 and not just three. This way of adding them will cost us 200 instructions. There must be some method to do a task repeatedly on data placed in consecutive memory cells. The key to this is the need for some register that can hold the address of data. So that we can change the address to access some other cell of memory using the same instruction. In direct addressing mode the memory cell accessed was fixed inside the instruction. There is another method in which the address can be placed in a register so that it can be changed. For the following example we will take 10 instead of 100 numbers but the algorithm is extensible to any size.
There are four registers in iAPX88 architecture that can hold address of data and they are BX, BP, SI, and DI. There are minute differences in their working which will be discussed later. For the current example, we will use the BX register and we will take just three numbers and extend the concept with more numbers in later examples.
Example 2.6
001 ; a program to add three numbers using indirect addressing
002 [org 0x100]
003 mov bx, num1 ; point bx to first number
004 mov ax, [bx] ; load first number in ax
005 add bx, 2 ; advance bx to second number
27
006 add ax, [bx] ; add second number to ax
007 add bx, 2 ; advance bx to third number
008 add ax, [bx] ; add third number to ax
009 add bx, 2 ; advance bx to result
010 mov [bx], ax ; store sum at num1+6
011
012 mov ax, 0x4c00 ; terminate program
013 int 0x21
014
015 num1: dw 5, 10, 15, 0
3 Observe that no square brackets around num1 are used this time. The address is loaded in bx and not the contents. Value of num1 is 0005 and the address is 0117. So BX will now contain 0117.
4 Brackets are now used around BX. In iapx88 architecture brackets can be used around BX, BP, SI, and DI only. In iapx386 more registers are allowed. The instruction will be read as “move into ax the contents of the memory location whose address is in bx.” Now since bx contains the address of num1 the contents of num1 are transferred to the ax register. Without square brackets the meaning of the instruction would have been totally different.
5 This instruction is changing the address. Since we have words not bytes, we add two to bx so that it points to the next word in memory. BX now contains 0119 the address of the second word in memory. This was the mechanism to change addresses that we needed.
Inside the debugger we observe that the first instruction is “mov bx, 011C.” A constant is moved into BX. This is because we did not use the square brackets around “num1.” The address of “num1” has moved to 011C because the code size has changed due to changed instructions. In the second instruction BX points to 011C and the value read in AX is 0005 which can be verified from the data window. After the addition BX points to 011E containing 000A, our next word, and so on. This way the BX register points to our words one after another and we can add them using the same instruction “mov ax, [bx]” without fixing the address of our data in the instructions. We can also subtract from BX to point to previous cells. The address to be accessed is now in total program control.
One thing that we needed in our problem to add hundred numbers was the capability to change address. The second thing we need is a way to repeat the same instruction and a way to know that the repetition is done a 100 times, a terminal condition for the repetition. For the task we are introducing two new instructions that you should read and understand as simple English language concepts. For simplicity only 10 numbers are added in this example. The algorithm is extensible to any size.
Example 2.7
001 ; a program to add ten numbers
002 [org 0x0100]
003 mov bx, num1 ; point bx to first number
004 mov cx, 10 ; load count of numbers in cx
005 mov ax, 0 ; initialize sum to zero
006
007 l1: add ax, [bx] ; add number to ax
008 add bx, 2 ; advance bx to next number
009 sub cx, 1 ; numbers to be added reduced
010 jnz l1 ; if numbers remain add next
011
012 mov [total], ax ; write back sum in memory
013
014 mov ax, 0x4c00 ; terminate program
015 int 0x21
016
017 num1: dw 10, 20, 30, 40, 50, 10, 20, 30, 40, 50
28
018 total: dw 0
006 Labels can be used on code as well. Just like data labels they remembe r the address at which they are used. The assembler does not differentiate between code labels and data labels. The programmer is responsible for using a data label as data and a code label as code. The label l1 in this case is the address of the following instruction.
9 SUB is the counterpart to ADD with the same rules as that of the ADD instruction.
010 JNZ stands for “jump if not zero.” NZ is the condition in this instruction. So the instruction is read as “jump to the location l1 if the zero flag is not set.” And revisiting the zero flag definition “the zero flag is set if the last mathematical or logical operation has produced a zero in its destination.” For example “mov ax, 0” will not set the zero flag as it is not a mathematical or logical instruction. However subtraction and addition will set it. Also it is set even when the destination is not a register. Now consider the subtraction immediately preceding it. If the CX register becomes zero as a result of this subtraction the zero flag will be set and the jump will be taken. And jump to l1, the processor needs to be told each and everything and the destination is an important part of every jump. Just like when we ask someone to go, we mention go to this market or that house. The processor is much more logical than us and needs the destination in every instruction that asks it to go somewhere. The processor will load l1 in the IP register and resume execution from there. The processor will blindly go to the label we mention even if it contains data and not code.
The CX register is used as a counter in this example, BX contains the changing address, while AX accumulates the result. We have formed a loop in assembly language that executes until its condition remains true. Inside the debugger we can observe that the subtract instruction clears the zero flag the first nine times and sets it on the tenth time. While the jump instruction moves execution to address l1 the first nine times and to the following line the tenth time. The jump instruction breaks program flow.
The JNZ instruction is from the program control group and is a conditional jump, meaning that if the condition NZ is true (ZF=0) it will jump to the address mentioned and otherwise it will progress to the next instruction. It is a selecti on between two paths. If the condition is true go right and otherwise go left. Or we can say if the weather is hot, go this way, and if it is cold, go this way. Conditional jump is the most important instruction, as it gives the processor decision making capability, so it must be given a careful thought. Some processors call it branch, probably a more logical name for it, however the functionality is same. Intel chose to name it “jump.”
An important thing in the above example is that a register is used to reference memory so this form of access is called register indirect memory access. We used the BX register for it and the B in BX and BP stands for base therefore we call register indirect memory access using BX or BP, “based addressing.” Similarly when SI or DI is used we name the method “indexed addressing.” They have the same functionality, with minor differences because of which the two are called base and index. The differences will be explained later, however for the above example SI or DI could be use d as well, but we would name it indexed addressing instead of based addressing.