google ads

google search

Tuesday, June 9, 2009

System Software and Machine Architecture

One characteristic in which most system software differs from application software is machine dependency.

System software – support operation and use of computer. Application software - solution to a problem. Assembler translates mnemonic instructions into machine code. The instruction formats, addressing modes etc., are of direct concern in assembler design. Similarly, Compilers must generate machine language code, taking into account such hardware characteristics as the number and type of registers and the machine instructions available. Operating systems are directly concerned with the management of nearly all of the resources of a computing system.

There are aspects of system software that do not directly depend upon the type of computing system, general design and logic of an assembler, general design and logic of a compiler and, code optimization techniques, which are independent of target machines. Likewise, the process of linking together independently assembled subprograms does not usually depend on the computer being used.

e-Notes

The subject introduces the design and implementation of system software. Software is set of instructions or programs written to carry out certain task on digital computers. It is classified into system software and application software. System software consists of a variety of programs that support the operation of a computer. Application software focuses on an application or problem to be solved. System software consists of a variety of programs that support the operation of a computer. Examples for system software are Operating system, compiler, assembler, macro processor, loader or linker, debugger, text editor, database management systems (some of them) and, software engineering tools. These software’s make it possible for the user to focus on an application or other problem to be solved, without needing to know the details of how the machine works internally.

Monday, June 8, 2009

Assembly Process

It is useful to consider how a person would process a program before trying to think about how it is done by a program. For this purpose, consider the program in Figure 2.1. It is important to note that the assembly process does not require any understanding of the program being assembled. Thus, it is unnecessary to understand the integer division algorithm implemented by the code in Figure 2.1, and little understanding of the particular machine code being used is needed (for those who are curious, the code is written for an R6502 microprocessor, the processor used in the historically important Apple II family of personal computers from the late 1970's).
; UNSIGNED INTEGER DIVIDE ROUTINE
; Takes dividend in A, divisor in Y
; Returns remainder in A, quotient in Y
START: STA IDENDL ;Store the low half of the dividend
STY ISOR ;Store the divisor
LDA #0 ;Zero the high half of the dividend (in register A)
TAX ;Zero the loop counter (in register X)
LOOP: ASL IDENDL ;Shift the dividend left (low half first)
ROL ; (high half second)
CMP ISOR ;Compare high dividend with divisor
BCC NOSUB ;If IDEND < ISOR don't subtract
SBC ISOR ;Subtract ISOR from IDEND
INC IDENDL ;Put a one bit in the quotient
NOSUB: INX ;Count times through the loop
CPX #8
BNE LOOP ;Repeat loop 8 times
LDY IDENDL ;Return quotient in Y
RTS ;Return remainder in A

IDENDL:B 0 ;Reserve storage for the low dividend/quotient
ISOR: B 0 ;Reserve storage for the divisor
Figure 2.1. An example assembly language program.
When a person who knows the Roman alphabet looks at text such as that illustrated in Figure 2.1, an important, almost unconscious processing step takes place: The text is seen not as a random pattern on the page, but as a sequence of lines, each composed of a sequence of punctuation marks, numbers, and word-like strings. This processing step is formally called lexical analysis, and the words and similar structures recognized at this level are called lexemes.
If the person knows the language in which the text is written, a second and still possibly unconscious processing step will occur: Lexical elements of the text will be classified into structures according to their function in the text. In the case of an assembly language, these might be labels, opcodes, operands, and comments; in English, they might be subjects, objects, verbs, and subsidiary phrases. This level of analysis is called syntactic analysis, and is performed with respect to the grammar or syntax of the language in question.
A person trying to hand translate the above example program must know that the R6502 microprocessor has a 16 bit memory address, that memory is addressed in 8 bit (one byte) units, and that instructions have a one byte opcode field followed optionally by additional bytes for the operands. The first step would typically involve looking at each instruction to find out how many bytes of memory it occupies. Table 2.1 lists the instructions used in the above example and gives the necessary information for this step.
Opcode Bytes Hex Code

ASL 3 0E aa aa
B 1 cc
BCC 2 90 oo
BNE 2 D0 oo
CMP 3 CD aa aa
CPX # 2 E0 cc
INC 3 EE aa aa
INX 1 E8
LDA # 2 A9 cc
LDY 3 AC aa aa
ROL 1 2A
RTS 1 60
SBC 3 ED aa aa
STA 3 8D aa aa
STY 3 8C aa aa
TAX 1 AA

Notes: aa aa - two byte address, least significant byte first.
oo - one byte relative address.
cc - one byte of constant data.
Table 2.1. Opcodes on the R6502.
To begin the translation of the example program to machine code, we take the data from table 2.1 and attach it to each line of code. Each significant line of an assembly language program includes the symbolic name of one machine instruction, for example, STA. This is called the opcode or operation code for that line. The programmer, of course, needs to know what the program is supposed to do and what these opcodes are supposed to do, but the translator has no need to know this! For the curious, the STA instruction stores the contents of the accumulator register in the indicated memory address, but you do not need to know this to assemble the program!
Table 2.1 shows the numerical equivalent of each opcode code in hexadecimal, base 16. We could have used any number base; inside the computer, the bytes are stored in binary, and because hexidecimal to binary conversion is trivial, we use that base here. While we're at it, we will strip off all the irrelevant commentary and formatting that was only included only for the human reader, and leave only the textual description of the program.
8D START: STA IDENDL
aa
aa
8C STY ISOR
aa
aa
A9 LDA #0
cc
AA TAX
0E LOOP: ASL IDENDL
aa
aa
2A ROL
CD CMP ISOR
aa
aa
90 BCC NOSUB
oo
ED SBC ISOR
aa
aa
EE INC IDENDL
aa
aa
E8 NOSUB: INX
E0 CPX #8
cc
D0 BNE LOOP
oo
AC LDY IDENDL
aa
aa
60 RTS
cc IDENDL:B 0
cc ISOR: B 0
Figure 2.2. Partial translation of the example to machine language
The result of this first step in the translation is shown in Figure 2.2. This certainly does not complete the job! Table 2.1 included constant data, relative offsets and addresses, as indicated by the lower case notatons cc, oo and aaaa, and to finish the translation to machine code, we must substitute numeric values for these!
Constants are the easiest. We simply incorporate the appropriate constants from the source code into the machine code, translating each to hexadecimal. Relative offsets are a bit more difficult! These give the number of bytes ahead (if positive) or behind (if negative) the location immediately after the location that references the offset. Negative offsets are represented using 2's complement notation.
8D START: STA IDENDL
aa
aa
8C STY ISOR
aa
aa
A9 LDA #0
00
AA TAX
0E LOOP: ASL IDENDL
aa
aa
2A ROL
CD CMP ISOR
aa
aa
90 BCC NOSUB
06
ED SBC ISOR
aa
aa
EE INC IDENDL
aa
aa
E8 NOSUB: INX
E0 CPX #8
08
D0 BNE LOOP
EC
AC LDY IDENDL
aa
aa
60 RTS
00 IDENDL:B 0
00 ISOR: B 0
Figure 2.3. Additional translation of the example to machine language
The result of this next translation step is shown in boldface in Figure 2.3. We cannot complete the translation without determining where the code will be placed in memory. Suppose, for example, that we place this code in memory starting at location 020016. This allows us to determine which byte goes in what memory location, and it allows us to assign values to the two labels IDENDL and ISOR, and thus, fill out the values of all of the 2-byte address fields to complete the translation.
0200: 8D START: STA IDENDL
0201: 21
0202: 02
0203: 8C STY ISOR
0204: 22
0205: 02
0206: A9 LDA #0
0207: 00
0208: AA TAX
0209: 0E LOOP: ASL IDENDL
020A: 21
020B: 02
020C: 2A ROL
020D: CD CMP ISOR
020E: 22
020F: 02
0210: 90 BCC NOSUB
0211: 06
0212: ED SBC ISOR
0213: 22
0214: 02
0215: EE INC IDENDL
0216: 21
0217: 02
0218: E8 NOSUB: INX
0219: E0 CPX #8
021A: 08
021B: D0 BNE LOOP
021C: EC
021D: AC LDY IDENDL
021E: 21
021F: 02
0220: 60 RTS
0221: 00 IDENDL:B 0
0222: 00 ISOR: B 0
Figure 2.4. Complete translation of the example to machine language
Again, in completing the translation to machine code, the changes from Figure 2.3 to Figure 2.4 are shown in boldface. For hand assembly of a small program, we don't need anything additional, but if we were assembling a program that ran on for pages and pages, it would be helpful to read through it once to find the numerical addresses of each label in the program, and then read through it again, substituting those numerical values into the code where they are needed.
symbol address

START 0200
LOOP 0209
NOSUB 0218
IDENDL 0221
ISOR 0222
Table 2.2. The symbol table for Figure 2.4.
Table 2.2 shows the symbol table for this small example, sorted into numerical order. For a really large program, we might rewrite the table into alphabetical order to before using it to finish the assembly.
It is worth noting the role which the meaning of the assembly code played in the assembly process. None! The programmer writing the line STA IDENDL must have understood its meaning, "store the value of the A register in the location labeled IDENDL", and the CPU, when it executes the corresponding binary instruction 8D 21 02 must know that this means "store the value of the A register in the location 0221", but there is no need for the person or computer program that translates assembly code to machine code to understand this!
This same assertion holds for compilers for high level languages. A C++ compiler does not understand that for(;;)x(); involves a loop, but only that, prior to the code for a call to the function x, the compiler should note the current memory address, and after the call, the compiler should output some particular instruction that references that address. The person who wrote the compiler knew that this instruction is a branch back to the start of the loop, but the compiler has no understanding of this!
To translator performing the assembly process, whether that translator is a human clerk or an assembler, the line STA IDENDL means "allocate 3 consecutive bytes of memory, put 8D in the first byte, and put the 16 bit value of the symbol IDENDL in the remaining 2 bytes." If the symbol IDENDL is mapped to the value 0221 by the symbol table, then the interpretation of the result of the assembler's interpretation of the source code is the same as the programmers interpretation. These relationships may be illustrated in Figure 2.5.
Source Text
/ \ compiler or
programmer's / \ assembler's
view of meaning / \ view of meaning
/ \
Abstract Meaning ----- Machine Code

hardware's
view of meaning
Figure 2.5. Views of the meaning of a program.

What is an Assembler?

The first idea a new computer programmer has of how a computer works is learned from a programming language. Invariably, the language is a textual or symbolic method of encoding programs to be executed by the computer. In fact, this language is far removed from what the computer hardware actually "understands". At the hardware level, after all, computers only understand bits and bit patterns. Somewhere between the programmer and the hardware the symbolic programming language must be translated to a pattern of bits. The language processing software which accomplishes this translation is usually centered around either an assembler, a compiler, or an interpreter. The difference between these lies in how much of the meaning of the language is "understood" by the language processor.
An interpreter is a language processor which actually executes programs written in its source language. As such, it can be considered to fully understand that language. At the lowest level of any computer system, there must always be some kind of interpreter, since something must ultimately execute programs. Thus, the hardware may be considered to be the interpreter for the machine language itself. Languages such as BASIC, LISP, and SNOBOL are typically implemented by interpreter programs which are themselves interpreted by this lower level hardware interpreter.
Interpreters running as machine language programs introduce inefficiency because each instruction of the higher level language requires many machine instructions to execute. This motivates the translation of high level language programs to machine language. This translation is accomplished by either assemblers or compilers. If the translation can be accomplished with no attention to the meaning of the source language, then the language is called an assembly or low level language, and the translator is called an assembler. If the meaning must be considered, the translator is called a compiler and the source language is called a high level language. The distinction between high and low level languages is somewhat artificial since there is a continuous spectrum of possible levels of complexity in language design. In fact, many assembly languages contain some high level features, and some high level languages contain low level features.
Since assemblers are the simplest of symbolic programming languages, and since high level languages are complex enough to be the subject of entire texts, only assembly languages will be discussed here. Although this simplifies the discussion of language processing, it does not limit its applicability; most of the problems faced by an implementor of an assembly language are also faced in high level language implementations. Furthermore, most of these problems are present in even the simplest of assembly languages. For this reason, little reference will be made to the comparatively complex assembly languages of real machines in the following sections.

Short and long game thinking, tests driving design and CRAP metrics

Kent Beck recently posted on the complex “theory versus practice” issue of always automating tests, where he states,”Then a cult of [agile] dogmatism sprang up around testing–if you can conceivably write a test you must”. By classifying projects into long game and short game, he argues that ROI becomes a major issue on whether a test stays manual. He says “Not writing the test for the second defect gave me time to try a new feature”, but several people commented that this was a technical debt tradeoff, and Guilherme Chapiewski noted he had done the same thing with a Proof of Concept that went live then he had to rewrite major chunks later. It is interesting that this ROI discussion is reflecting the experiences of the pre-agile functional automation community. Back in November 2001 (Wow! Long time ago!!), I posted to the Agile Testing list some . While many of these were from the context of two separate development teams and the automaters using expensive test tools, the risks of incomplete automation and insufficient ROI dominate. The benefits of having the same people both develop the code and the tests are great, and beyond my experience when I wrote that post.
I think the ROI issue for code-based tests will go away over time. Much of the creation of code-based tests is mechanical. Just as programming languages replaced assembler and took care of fiddly details (what registers to use, low level comparisons etc) and build utilities replaced simple text file include statements, I think that soon it will be standard practice to have tool-created unit testing to handle mocking, dependency injection and assert-based testing. Mocking was originally very manual, then tools were developed. Dependency Injection was very manual,then tools were developed. For assert-based testing, we’ve already seen and now amongst others. I think these tools will become standard, just as coverage tools are now standard in IDEs when they originally were luxuries costing tens of thousands of dollars. Another variation of this is tools like recently by Jeffrey Frederick. Celerity is a fast way to run GUI web tests, but could be handled as a mechanical translation not a manual one. Some meta language could generate Celerity and selected browser tests in a single step.
Mechanically generated tests are cheap to produce and overcome ROI issues. However, they only reflect the current code. The benefits of test design infusing the coding approach are missing. If tests are not being automated for whatever reason, some analysis of the refactoring risk should be done, at least to know where and what the error-prone code is. One way of doing this is using the Agitar-created , which Bob Martin recently as a way to keep design clean. While I currently believe all code should be created test first wherever possible, techniques like the CRAP metric can highlight the complicated bits for refactoring where possible. While it may be a great intellectual challenge, there is no need to refactor a complex industry standard algorithm. [Aside: is there an inherent advantage to doing test first design all the time? Perhaps, just as renaissance masters only painted and sculpted hand and faces and left the rest to their workshop staff, we only need to focus on core functions for test first and do the rest test last?]
As Kent says,”By insisting that I always write tests I learned that I can test pretty much anything given enough time.” Time is often a rare commodity, so Kent argues compromises are often needed in short goal projects. As Ron Jeffries said in a comment on Kent’s post, “My long experience suggests that there is a sort of knee in the curve of impact for short-game-focused decisions. Make too many and suddenly reliability and the ability to progress drop substantially.” I hope that advancements in mechanical generation of tests don’t push us into a short game perspective, impacting the use of hand crafting tests to drive design. At the same time, metrics that can be run as part of the build to highlight areas for refactoring on all projects are proving valuable (and I’m looking forward to ). By any measure, these are interesting times we live in. Long live long game thinking!

IBM offers $2B in financing for federal HIT projects

Now here's a model we expect to see more of over the next several months: Vendor financing of key infrastructure needed to meet federal IT demands. Global technology giant IBM has announced that it is making up to $2 billion available to finance technology projects related to the demands of the new stimulus package. Clever move--not only does this bind clients to IBM technology, but the federal stimulus funds make it far less likely that IBM will get stiffed.

IBM's Global Financing arm is stepping in where banks fear to tread, offering to structure flexible payment arrangements, deferred payments, lines of credit and project financing packages for clients. The idea, IBM said, is to help healthcare organizations get going on projects before government begins doling out the stimulus funding.

Question about Model-Based Testing

First, a quick note on terms. I tend to use James Bach’s definition of Testing as “Questioning a product in order to evaluate it”. All test rely on /mental/ models of the application under test. The term Model-Based Testing though is typically used to describe programming a model which can be explored via automation. For example, one might specify a number of states that an application can be in, various paths between those states, and certain assertions about what should occur in on the transition between those states.
There are real costs here: building a useful model, creating algorithms for exploring it, logging systems that allow one to weed through for interesting failures, etc. Whether or not the costs are reasonable has a lot to do with *what are the questions you want to answer?* In general, start with “What do I want to know? And how can I best learn about it?” rather than looking for a use for an interesting technique.
All that said, some excellent testers have gotten a lot of mileage out of automated model-based tests. Sometimes we have important questions about the application under test are best explored by automated, high-volume semi-randomized tests. Here’s one very colorful example from Harry Robinson (one of the leading theorists and proponents of model-based testing) where he discovered many interesting bugs in Google driving directions using a model-based test (written with ruby’s Watir library): http://model.based.testing.googlepages.com/exploratory-automation.pdf
Robinson has used MBT successfully at companies including Bell Labs, Microsoft, and Google, and has a number of essays here: http://www.harryrobinson.net/
Ben Simo (another great testing thinker and writer) has also written quite a bit worth reading on model-based testing: http://www.questioningsoftware.com/search/label/Model-Based%20Testing
Finally, a few cautions: To make good use of a strategy, one needs to explore both its strengths and its weaknesses. Toward that end, James Bach has an excellent essay on the limits of Model Based Testing http://www.satisfice.com/blog/archives/87 has links to his hour long talk (and associated slides on the Unbearable Lightness of Model Based Testing.
I’ll end with a note about what Boris Beizer calls the Pesticide Paradox: “Every method you use to prevent or find bugs leaves a residue of subtler bugs against which those methods are ineffective.” Scripted tests (whether executed by a computer or a person) are particularly vulnerable to the pesticide paradox, tending to find less and less useful information each time the same script is executed. Folks sometimes turn to model-based testing thinking that it gets around the pesticide problem. One should remember that in some contexts model-based testing may well find a much larger set of bugs than a given set of scripted tests…but that it is still fundamentally limited by the Pesticide Paradox. Remembering its limits — and starting with questions MBT addresses well — it has the potential to be a very powerful testing strategy.
If you haven’t been to yet, it’s an interesting forum for asking technical questions — and sorting through the answers — written by Joel Spolsky and Jeff Atwood.
I noticed a question on Model-Based Testing over there that I had something to say about. I wanted to link to articles by Harry Robinson, Ben Simo and James Bach…but as a new user, I’m allowed to add only one link. What to do? How about using my one link to go to my blog..
And here’s my answer, complete with links:
First, a quick note on terms. I tend to use James Bach’s definition of Testing as “Questioning a product in order to evaluate it”. All test rely on /mental/ models of the application under test. The term Model-Based Testing though is typically used to describe programming a model which can be explored via automation. For example, one might specify a number of states that an application can be in, various paths between those states, and certain assertions about what should occur in on the transition between those states. Then one can have scripts execute semi-random permutations of transitions within the state model, logging potentially interesting results.
There are real costs here: building a useful model, creating algorithms for exploring it, logging systems that allow one to weed through for interesting failures, etc. Whether or not the costs are reasonable has a lot to do with *what are the questions you want to answer?* In general, start with “What do I want to know? And how can I best learn about it?” rather than looking for a use for an interesting technique.
All that said, some excellent testers have gotten a lot of mileage out of automated model-based tests. Sometimes we have important questions about the application under test are best explored by automated, high-volume semi-randomized tests. Here’s one very colorful example from Harry Robinson (one of the leading theorists and proponents of model-based testing) where he (written with ruby’s Watir library).
Robinson has used MBT successfully at companies including Bell Labs, Microsoft, and Google, and has .
Ben Simo (another great testing thinker and writer) has also written
Finally, a few cautions: To make good use of a strategy, one needs to explore both its strengths and its weaknesses. Toward that end, James Bach has an excellent talk on the limits and challenges of Model-Based Testing. links to his hour long talk (and associated slides).
I’ll end with a note about what Boris Beizer calls the Pesticide Paradox: “Every method you use to prevent or find bugs leaves a residue of subtler bugs against which those methods are ineffective.” Scripted tests (whether executed by a computer or a person) are particularly vulnerable to the pesticide paradox, tending to find less and less useful information each time the same script is executed. Folks sometimes turn to model-based testing thinking that it gets around the pesticide problem. In some contexts model-based testing may well find a much larger set of bugs than a given set of scripted tests…but one should remember that it is still fundamentally limited by the Pesticide Paradox. Remembering its limits — and starting with questions MBT addresses well — it has the potential to be a very powerful testing strategy.