COMPUTER EVOLUTION AND PERFORMANCE
A BRIEF HISTORY OF COMPUTERS
The First Generation: Vacuum Tubes
What is ENIAC?
ENIAC The ENIAC (Electronic Numerical Integrator And Computer), designed and constructed at the University of Pennsylvania, was the world’s first general purpose electronic digital computer. The project was a response to U.S. needs during World War II.
What is BRL?
Ballistics Research Laboratory (BRL), an agency responsible for developing range and trajectory tables for new weapons, was having difficulty supplying these tables accurately and within a reasonable time frame. Without these firing tables, the new weapons and artillery were useless to gunners. The BRL employed more than 200 people who, using desktop calculators, solved the necessary ballistics equations. Preparation of the tables for a single weapon would take one person many hours, even days.
Who proposed the BRL?
John Mauchly, a professor of electrical engineering at the University of Pennsylvania, and John Eckert, one of his graduate students, proposed to build a general-purpose computer using vacuum tubes for the BRL’s application. In 1943, the Army accepted this proposal, and work began on the ENIAC.
What ENIAC Machine is?
The ENIAC was a decimal rather than a binary machine. That is, numbers were represented in decimal form, and arithmetic was performed in the decimal system. Its memory consisted of 20 “accumulators,” each capable of holding a 10-digit decimal number.
What was the major drawback of the ENIAC?
The major drawback of the ENIAC was that it had to be programmed manually by setting switches and plugging and unplugging cables. The ENIAC was completed in 1946, too late to be used in the war effort.
What was the use of ENIAC?
The use of the ENIAC for a purpose other than that for which it was built demonstrated its general-purpose nature. The ENIAC continued to operate under BRL management until 1955, when it was disassembled.
Describe the VON NEUMANN MACHINE.
The task of entering and altering programs for the ENIAC was extremely tedious. The programming process could be facilitated if the program could be represented in a form suitable for storing in memory alongside the data.
What is the concept of John Von Neumannn?
This idea, known as the stored-program concept, is usually attributed to the ENIAC designers, most notably the mathematician John von Neumann, who was a consultant on the ENIAC project. Alan Turing developed the idea at about the same time. The first publication of the idea was in a 1945 proposal by von Neumann for a new computer, the EDVAC (Electronic Discrete Variable Computer). In 1946, von Neumann and his colleagues began the design of a new stored program computer, referred to as the IAS computer, at the Princeton Institute for Advanced Studies. The IAS computer, although not completed until 1952, is the prototype of all subsequent general-purpose computers.
It consists of
- A main memory, which stores both data and instructions.
- An arithmetic and logic unit (ALU) capable of operating on binary data.
• A control unit, which interprets the instructions in memory and causes them to be executed
• Input and output (I/O) equipment operated by the control unit
What are the steps for the Von Neumann’s Proposal?
This structure was outlined in von Neumann’s earlier proposal, which is worth quoting at this point [VONN45]:
Because the device is primarily a computer, it will have to perform the elementary operations of arithmetic most frequently. These are addition, subtraction, multiplication and division. It is therefore reasonable that it should contain specialized organs for just these operations. It must be observed, however, that while this principle as such is probably sound, the specific way in which it is realized requires close scrutiny. At any rate a central arithmetical part of the device will probably have to exist and this constitutes the first specific part: CA.
The logical control of the device, that is, the proper sequencing of its operations, can be most efficiently carried out by a central control organ. If the device is to be elastic, that is, as nearly as possible all purpose, then a distinction must be made between the specific instructions given for and defining a particular problem, and the general control organs which see to it that these instructions—no matter what they are—are carried out. The former must be stored in some way; the latter are represented by definite operating parts of the device.
Any device which is to carry out long and complicated sequences of operations (specifically of calculations) must have a considerable memory. The instructions which govern a complicated problem may constitute considerable material, particularly so, if the code is circumstantial (which it is in most arrangements). This material must be remembered. At any rate, the total memory constitutes the third specific part of the device: M.
The three specific parts CA, CC (together C), and M correspond to the associative neurons in the human nervous system. It remains to discuss the equivalents of the sensory or afferent and the motor or efferent neurons. These are the input and output organs of the device.
The device must be endowed with the ability to maintain input and output (sensory and motor) contact with some specific medium of this type. The medium will be called the outside recording medium of the device: R.
The device must have organs to transfer the information from R into its specific parts C and M. These organs form its input, the fourth specific part: I. It will be seen that it is best to make all transfers from R (by I) into M and never directly from C.
The device must have organs to transfer from its specific parts C and M into R.These organs form its output, the fifth specific part:O. It will be seen that it is again best to make all transfers from M (by O) into R, and never directly from C. With rare exceptions, all of today’s computers have this same general structure and function and are thus referred to as von Neumann machines. Thus, it is worthwhile at this point to describe briefly the operation of the IAS computer [BURK46].
What are the Registers?
Both the control unit and the ALU contain storage locations, called registers, defined as follows:
- Memory buffer register (MBR): Contains a word to be stored in memory or sent to the I/O unit, or is used to receive a word from memory or from the I/O unit.
- Memory address register (MAR): Specifies the address in memory of the word to be written from or read into the MBR.
- Instruction register (IR): Contains the 8-bit opcode instruction being executed.
- Instruction buffer register (IBR): Employed to hold temporarily the righthand instruction from a word in memory.
- Program counter (PC): Contains the address of the next instruction-pair to be fetched from memory.
- Accumulator (AC) and multiplier quotient (MQ): Employed to hold temporarily operands and results of ALU operations. The IAS computer had a total of 21 instructions. These can be grouped as follows:
- Data transfer: Move data between memory and ALU registers or between two ALU registers.
- Unconditional branch: Normally, the control unit executes instructions in sequence from memory. This sequence can be changed by a branch instruction, which facilitates repetitive operations.
- Conditional branch: The branch can be made dependent on a condition, thus allowing decision points.
- Arithmetic: Operations performed by the ALU.
- Address modify: Permits addresses to be computed in the ALU and then inserted into instructions stored in memory.This allows a program considerable addressing flexibility.
Describe COMMERCIAL COMPUTERS
The 1950s saw the birth of the computer industry with two companies, Sperry and IBM, dominating the marketplace.
In 1947, Eckert and Mauchly formed the Eckert-Mauchly Computer Corporation to manufacture computers commercially.Their first successful machine was the UNIVAC I (Universal Automatic Computer), which was commissioned by the Bureau of the Census for the 1950 calculations.The Eckert-Mauchly Computer Corporation became part of the UNIVAC division of Sperry-Rand Corporation, which went on to build a series of successor machines.
The UNIVAC I was the first successful commercial computer. It was intended for both scientific and commercial applications. The first paper describing the system listed matrix algebraic computations, statistical problems, premium billings for a life insurance company, and logistical problems as a sample of the tasks it could perform.
The UNIVAC II, which had greater memory capacity and higher performance than the UNIVAC I, was delivered in the late 1950s and illustrates several trends that have remained characteristic of the computer industry.
First, advances in technology allow companies to continue to build larger, more powerful computers. Second, each company tries to make its new machines backward compatible3 with the older machines.
IBM, then the major manufacturer of punched-card processing equipment, delivered its first electronic stored-program computer, the 701, in 1953.The 701 was intended primarily for scientific applications [BASH81]. In 1955, IBM introduced the companion 702 product, which had a number of hardware features that suited it to business applications. These were the first of a long series of 700/7000 computers that established IBM as the overwhelmingly dominant computer manufacturer.
The Second Generation: Transistors
The first major change in the electronic computer came with the replacement of the vacuum tube by the transistor. The transistor is smaller, cheaper, and dissipates less heat than a vacuum tube but can be used in the same way as a vacuum tube to construct computers. Unlike the vacuum tube, which requires wires, metal plates, a glass capsule, and a vacuum, the transistor is a solid-state device, made from silicon.
When and where Transistor invented?
The transistor was invented at Bell Labs in 1947 and by the 1950s had launched an electronic revolution. It was not until the late 1950s, however, that fully transistorized computers were commercially available. IBM again was not the first company to deliver the new technology. NCR and, more successfully, RCA were the front-runners with some small transistor machines. IBM followed shortly with the 7000 series.
What is the Use of Transistor?
The use of the transistor defines the second generation of computers. It has become widely accepted to classify computers into generations based on the fundamental hardware technology employed. Each new generation is characterized by greater processing performance, larger memory capacity, and smaller size than the previous one.
What is IBM 7094
From the introduction of the 700 series in 1952 to the introduction of the last member of the 7000 series in 1964, this IBM product line underwent an evolution that is typical of computer products. Successive members of the product line show increased performance, increased capacity, and/or lower cost.
The final column indicates the relative execution speed of the central processing unit (CPU). Speed improvements are achieved by improved electronics and more complex circuitry. For example, the IBM 7094 includes an Instruction Backup Register, used to buffer the next instruction. The control unit fetches two adjacent words from memory for an instruction fetch. Except for the occurrence of a branching instruction, which is typically infrequent, this means that the control unit has to access memory for an instruction on only half the instruction cycles. This prefetching significantly reduces the average instruction cycle time.
What are the use of Data Channels?
The most important of these is the use of data channels. A data channel is an independent I/O module with its own processor and its own instruction set. In a computer system with such devices, the CPU does not execute detailed I/O instructions. Such instructions are stored in a main memory to be executed by a special-purpose processor in the data channel itself. The CPU initiates an I/O transfer by sending a control signal to the data channel, instructing it to execute a sequence of instructions in memory.
The data channel performs its task independently of the CPU and signals the CPU when the operation is complete. This arrangement relieves the CPU of a considerable processing burden. Another new feature is the multiplexor, which is the central termination point for data channels, the CPU, and memory. The multiplexor schedules access to the memory from the CPU and data channels, allowing these devices to act independently.
The Third Generation: Integrated Circuits
A single, self-contained transistor is called a discrete component. Throughout the 1950s and early 1960s, electronic equipment was composed largely of discrete components—transistors, resistors, capacitors, and so on. Discrete components were manufactured separately, packaged in their own containers, and soldered or wired together onto masonite-like circuit boards, which were then installed in computers, oscilloscopes, and other electronic equipment. Whenever an electronic device called for a transistor, a little tube of metal containing a pinhead-sized piece of silicon had to be soldered to a circuit board. The entire manufacturing process, from transistor to circuit board, was expensive and cumbersome.
These facts of life were beginning to create problems in the computer industry. Early second-generation computers contained about 10,000 transistors.
What are the Microelectronics?
Microelectronics means, literally, “small electronics.” Since the beginnings of digital electronics and the computer industry, there has been a persistent and consistent trend toward the reduction in size of digital electronic circuits. Before examining the implications and benefits of this trend, we need to say something about the nature of digital electronics.
The basic elements of a digital computer, as we know, must perform storage, movement, processing, and control functions.
What are the fundamentals types of components in Microelectronics?
Only two fundamental types of components are required: gates and memory cells. A gate is a device that implements a simple Boolean or logical function, such as IF A AND B ARE TRUE THEN C IS TRUE (AND gate). Such devices are called gates because they control data flow in much the same way that canal gates do. The memory cell is a device that can store one bit of data; that is, the device can be in one of two stable states at any time.
What are the basic functions of Microelectronics?
We can relate this to our four basic functions as follows:
- Data storage: Provided by memory cells.
- Data processing: Provided by gates.
- Data movement: The paths among components are used to move data from memory to memory and from memory through gates to memory.
- Control: The paths among components can carry control signals. For example, a gate will have one or two data inputs plus a control signal input that activates the gate.When the control signal is ON, the gate performs its function on the data inputs and produces a data output. Similarly, the memory cell will store the bit that is on its input lead when the WRITE control signal is ON and will place the bit that is in the cell on its output lead when the READ control signal is ON.
Thus, a computer consists of gates, memory cells, and interconnections among these elements. The gates and memory cells are, in turn, constructed of simple digital electronic components.
What are Consequences of Moore’s Law?
The consequences of Moore’s law are profound:
- The cost of a chip has remained virtually unchanged during this period of rapid growth in density. This means that the cost of computer logic and memory circuitry has fallen at a dramatic rate.
- Because logic and memory elements are placed closer together on more densely packed chips, the electrical path length is shortened, increasing operating speed.
- The computer becomes smaller, making it more convenient to place in a variety of environments.
- There is a reduction in power and cooling requirements.
- The interconnections on the integrated circuit are much more reliable than solder connections.
Describe IBM SYSTEM/360.
By 1964, IBM had a firm grip on the computer market with its 7000 series of machines. In that year, IBM announced the System/360, a new family of computer products. Although the announcement itself was no surprise, it contained some unpleasant news for current IBM customers: the 360 product line was incompatible with older IBM machines. Thus, the transition to the 360 would be difficult for the current customer base. This was a bold step by IBM, but one IBM felt was necessary to break out of some of the constraints of the 7000 architecture and to produce a system capable of evolving with the new integrated circuit technology. The strategy paid off both financially and technically. The 360 was the success of the decade and cemented IBM as the overwhelmingly dominant computer vendor, with a market share above 70%.
What are the characteristics of IBM System/360?
- Similar or identical instruction set: In many cases, the exact same set of machine instructions is supported on all members of the family. Thus, a program that executes on one machine will also execute on any other. In some cases, the lower end of the family has an instruction set that is a subset of that of the top end of the family. This means that programs can move up but not down.
- Similar or identical operating system: The same basic operating system is available for all family members. In some cases, additional features are added to the higher-end members.
- Increasing speed: The rate of instruction execution increases in going from lower to higher family members.
- Increasing number of I/O ports: The number of I/O ports increases in going from lower to higher family members.
- Increasing memory size: The size of main memory increases in going from lower to higher family members.
- Increasing cost: At a given point in time, the cost of a system increases in going from lower to higher family members.
The System/360 not only dictated the future course of IBM but also had a profound impact on the entire industry. Many of its features have become standard on other large computers.
Describe DEC PDP-8.
In the same year that IBM shipped its first System/360, another momentous first shipment occurred: PDP-8 from Digital Equipment Corporation (DEC). At a time when the average computer required an air conditioned room, the PDP-8 was small enough that it could be placed on top of a lab bench or be built into other equipment.
The low cost and small size of the PDP-8 enabled another manufacturer to purchase a PDP-8 and integrate it into a total system for resale.
With the introduction of large scale integration (LSI), more than 1000 components can be placed on a single integrated circuit chip.
With the rapid pace of technology, the high rate of introduction of new products, and the importance of software and communications as well as hardware, the classification by generation becomes less clear and less meaningful.
Describe the SEMICONDUCTOR MEMORY
The first application of integrated circuit technology to computers was construction of the processor out of integrated circuit chips.
In the 1950s and 1960s, most computer memory was constructed from tiny rings of ferromagnetic material, each about a sixteenth of an inch in diameter. These rings were strung up on grids of fine wires suspended on small screens inside the computer. Magnetized one way, a ring (called a core) represented a one; magnetized the other way, it stood for a zero. Magnetic-core memory was rather fast; it took as little as a millionth of a second to read a bit stored in memory. But it was expensive, bulky, and used destructive readout.
Just as the density of elements on memory chips has continued to rise, so has the density of elements on processor chips.
A breakthrough was achieved in 1971, when Intel developed its 4004. The 4004 was the first chip to contain all of the components of a CPU on a single chip: The microprocessor was born. The 4004 can add two 4-bit numbers and can multiply only by repeated addition.
This evolution can be seen most easily in the number of bits that the processor deals with at a time. There is no clear-cut measure of this, but perhaps the best measure is the data bus width: the number of bits of data that can be brought into or sent out of the processor at a time. Another measure is the number of bits in the accumulator or in the set of general-purpose registers.
What are the Improvements in Chip Organization and Architecture?
There are three approaches to achieving increased processor speed:
- Increase the hardware speed of the processor.
- Increase the size and speed of caches that are interposed between the processor and main memory.
- Make changes to the processor organization and architecture that increase the effective speed of instruction execution.
- Power: As the density of logic and the clock speed on a chip increase, so does the power density (Watts/cm2).
- RC delay: The speed at which electrons can flow on a chip between transistors is limited by the resistance and capacitance of the metal wires connecting them; specifically, delay increases as the RC product increases.
- Memory latency: Memory speeds lag processor speeds.
THE EVOLUTION OF THE INTEL x86 ARCHITECTURE
The current x86 offerings represent the results of decades of design effort on complex instruction set computers (CISCs). The x86 incorporates the sophisticated design principles once found only on mainframes and supercomputers and serves as an excellent example of CISC design. An alternative approach to processor design in the reduced instruction set computer (RISC). The ARM architecture is used in a wide variety of embedded systems and is one of the most powerful and best-designed RISC-based systems on the market. In this section and the next, we provide a brief overview of these two systems. In terms of market share, Intel has ranked as the number one maker of microprocessors for non-embedded systems for decades, a position it seems unlikely to yield. The evolution of its flagship microprocessor product serves as a good indicator of the evolution of computer technology in general.
What are the highlights of the Evolution of the INTEL X86 Architecture?
- 8080: The world’s first general-purpose microprocessor. This was an 8-bit machine, with an 8-bit data path to memory. The 8080 was used in the first personal computer, the Altair.
- 8086: A far more powerful, 16-bit machine. In addition to a wider data path and larger registers, the 8086 sported an instruction cache, or queue, that pre-fetches a few instructions before they are executed. A variant of this processor, the 8088, was used in IBM’s first personal computer, securing the success of Intel. The 8086 is the first appearance of the x86 architecture.
- 80286: This extension of the 8086 enabled addressing a 16-MByte memory instead of just 1 MByte.
- 80386: Intel’s first 32-bit machine, and a major overhaul of the product. This was the first Intel processor to support multitasking, meaning it could run multiple programs at the same time.
- 80486: The 80486 introduced the use of much more sophisticated and powerful cache technology and sophisticated instruction pipelining. The 80486 also offered a built-in math coprocessor, offloading complex math operations from the main CPU.
- Pentium: With the Pentium, Intel introduced the use of superscalar techniques, which allow multiple instructions to execute in parallel.
- Pentium Pro: The Pentium Pro continued the move into superscalar organization begun with the Pentium, with aggressive use of register renaming, branch prediction, data flow analysis, and speculative execution.
- Pentium II: The Pentium II incorporated Intel MMX technology, which is designed specifically to process video, audio, and graphics data efficiently.
- Pentium III: The Pentium III incorporates additional floating-point instructions to support 3D graphics software.
- Pentium 4: The Pentium 4 includes additional floating-point and other enhancements for multimedia.
- Core: This is the first Intel x86 microprocessor with a dual core, referring to the implementation of two processors on a single chip.
- Core 2: The Core 2 extends the architecture to 64 bits. The Core 2 Quad provides four processors on a single chip.
EMBEDDED SYSTEMS AND THE ARM
The ARM architecture refers to a processor architecture that has evolved from RISC design principles and is used in embedded systems.
What are the Embedded Systems?
The term embedded system refers to the use of electronics and software within a product, as opposed to a general-purpose computer, such as a laptop or desktop system.
What are the requirements and constraints of Embedded Systems?
Embedded systems far outnumber general-purpose computer systems, encompassing a broad range of applications. These systems have widely varying requirements and constraints, such as the following [GRIM05]:
- Small to large systems, implying very different cost constraints, thus different needs for optimization and reuse.
- Relaxed to very strict requirements and combinations of different quality requirements, for example, with respect to safety, reliability, real-time, and legislation
- Short to long life times
- Different environmental conditions in terms of, for example, radiation, vibrations, and humidity
- Different application characteristics resulting in static versus dynamic loads, slow to fast speed, compute versus interface intensive tasks, and/or combinations thereof
- Different models of computation ranging from discrete-event systems to those involving continuous time dynamics.
THE SYSTEM CLOCK
Operations performed by a processor, such as fetching an instruction, decoding the instruction, performing an arithmetic operation, and so on, are governed by a system clock. Typically, all operations begin with the pulse of the clock. Thus, at the most fundamental level, the speed of a processor is dictated by the pulse frequency produced by the clock, measured in cycles per second, or Hertz (Hz).
What are the clock signals?
Typically, clock signals are generated by a quartz crystal, which generates a constant signal wave while power is applied. This wave is converted into a digital voltage pulse stream that is provided in a constant flow to the processor circuitry For example, a 1-GHz processor receives 1 billion pulses per second.The rate of pulses is known as the clock rate, or clock speed. One increment, or pulse, of the clock is referred to as a clock cycle, or a clock tick. The time between pulses is the cycle time. When a signal is placed on a line inside the processor, it takes some finite amount of time for the voltage levels to settle down so that an accurate value (1 or 0) is available.
INSTRUCTION EXECUTION RATE
A processor is driven by a clock with a constant frequency for, equivalently, a constant cycle time , where . Define the instruction count, Ic, for a program as the number of machine instructions executed for that program until it runs to completion or for some defined time interval. If all instructions required the same number of clock cycles, then CPI would be a constant value for a processor. The processor time T needed to execute a given program can be expressed as
T = Ic * CPI * t
We can refine this formulation by recognizing that during the execution of an instruction, part of the work is done by the processor, and part of the time a word is being transferred to or from memory. We can rewrite the preceding equation as
T = Ic * 3p + (m * k)4 * t
p is the number of processor cycles needed to decode and execute the instruction,
m is the number of memory references needed, and
k is the ratio between memory cycle time and processor cycle time.
What are the factors of the equation?
The five performance factors in the preceding equation (Ic, p,m, k, ) are influenced by four system attributes: the design of the instruction set (known as instruction set architecture), compiler technology , processor implementation, and cache and memory hierarchy. A common measure of performance for a processor is the rate at which instructions are executed, expressed as millions of instructions per second (MIPS), referred to as the MIPS rate.
Measures such as MIPS and MFLOPS have proven inadequate to evaluating the performance of processors. Because of differences in instruction sets, the instruction execution rate is not a valid means of comparing the performance of different architectures. For example, consider this high-level language statement:
A = B + C
With a traditional instruction set architecture, referred to as a complex instruction set computer (CISC), this instruction can be compiled into one processor instruction:
add mem(B), mem(C), mem (A)
On a typical RISC machine, the compilation would look something like this:
load mem(B), reg(1);
load mem(C), reg(2);
add reg(1), reg(2), reg(3);
store reg(3), mem (A)
Please List of characteristics of a benchmark program.
- It is written in a high-level language,making it portable across different machines.
- It is representative of a particular kind of programming style, such as systems programming, numerical programming, or commercial programming.
- It can be measured easily.
- It has wide distribution.
SPEC performance measurements are widely used for comparison and research purposes. The best known of the SPEC benchmark suites is SPEC CPU2006.This is the industry standard suite for processor-intensive applications. SPEC CPU2006 is appropriate for measuring performance for applications that spend most of their time doing computation rather than I/O.
The CPU2006 suite is based on existing applications that have already been ported to a wide variety of platforms by SPEC industry members. It consists of 17 floating-point programs written in C, C__, and Fortran; and 12 integer programs written in C and C__.The suite contains over 3 million lines of code.This is the fifth generation of processor-intensive suites from SPEC, replacing SPEC CPU2000, SPEC CPU95, SPEC CPU92, and SPEC CPU89 [HENN07]. Other SPEC suites include the following:
- SPECjvm98: Intended to evaluate performance of the combined hardware and software aspects of the Java Virtual Machine (JVM) client platform
- SPECjbb2000 (Java Business Benchmark): A benchmark for evaluating server-side Java-based electronic commerce applications
- SPECweb99: Evaluates the performance of World Wide Web (WWW) servers
- SPECmail2001: Designed to measure a system’s performance acting as a mail server
How to Average the results?
To obtain a reliable comparison of the performance of various computers, it is preferable to run a number of different benchmark programs on each machine and then average the results execution times. But this is not inversely proportional to the sum of execution times.
What are the Fundamental metrics?
Two fundamental metrics are of interest: a speed metric and a rate metric.
- The speed metric measures the ability of a computer to complete a single task. SPEC defines a base runtime for each benchmark program using a reference machine. Results for a system under test are reported as the ratio of the reference run time to the system run time.
- The rate metric measures the throughput or rate of a machine carrying out a number of tasks. For the rate metrics, multiple copies of the benchmarks are run simultaneously. Typically, the number of copies is the same as the number of processors on the machine. A ratio is used to report results, although the calculation is more complex.