We have mentioned previously, in history of the computer, that mainframes are large scale devices. We have also mentioned various forms of error correction. We will be looking at that subject in more detail in another episode.
(If you are looking for information on what to do when the boss fires you, sorry - you've wasted your time! Try the job ads in the local paper.)
This time we are going to look at various ways of avoiding the consequences of errors in the system. This has been necessary because of the inherent unreliability of mechanical equipment, and early electronic devices. The large scale of systems, as well as being able to provide greater processing power, and throughput, has also made it possible to minimise, or eliminate, the effect of failures.
This philosophy is known as redundancy. By this we mean that we design the system so that there are always at least two ways to accomplish any task. In this way, in the event of a failure in one component, there is always a way around it, using another component. A complete failure is only possible if both alternatives fail at the same time. This is extremely unlikely, but has happened!
As an example, consider a disk drive. What can go wrong with this drive? We can have a write error, where the data to be written to the disk cannot be relied on, this is detected in the write logic. We can have a read error, where the data previously written on the disk has become corrupted since it was written. Also possibly caused by a failure in the mechanical read head access mechanism, or the logic.
To provide a way of preventing this problem from affecting our operation, we can attempt an internal recovery, for instance retry the read operation. If this doesn't work, our alternative could be a duplication of all data on the disk on another drive, so that data is written to and read from both devices in parallel. In the event of a failure of one device, we can still use the other. In other words, the drive is redundant.
Now we move further up the chain to the next possible source of failure. The disk drives on a large system are normally arranged in a 'string' of 8 or 16 drives. These strings are attached to a controller, which handles the writing and reading of data to and from the drives. As well as the data, it needs to have addressing capability, to select a drive, and some way of allocating the correct data to and from the drive.
Thus there are various cables between the controller and each drive. Some of these cables can be 'daisy-chained', that is they go to each drive in turn. other cables go only to a specific drive. Power cables also go to the drives from the controller.
These cables, and the logic they are connected to, present the next possibility of a failure. For example, the cables are normally run either under the cabinets, or under the false floor. There is a possibility that a cable could be disturbed, or be under stress, or connected poorly. The logic at either end of the cable could have a failure, or a plug could work loose, perhaps from being poorly positioned.
The effect of all this, as far as the system is concerned, is the same as if a drive were at fault. We have difficulty reading from or writing to, or even accessing a drive. The defect may affect just one drive, or all the drives in the string.
How can we avoid this situation from causing a problem? Simple - we provide an alternate path for the data and controls. We do this by running a duplicate set of cables between the controller and the drives. To eliminate the possibility of logic or connector failures, we duplicate this, both in the controller, and in each drive. We now have a redundant path between the controller and the drives. If one cable or connector fails, we can still use the other one.
In Part 2 we look further up the chain, and discuss the advantages of redundancy.
Tony is an experienced computer engineer. He is currently webmaster and contributor to http://www.what-why-wisdom.com . A set of diagrams accompanying these articles may be seen at http://www.what-why-wisdom.com/history-of-the-computer-0.html . RSS feed also available - use http://www.what-why-wisdom.com/Educational.xml