There are 2 important concepts to be understood in the design and implementation of disk arrays:
1. Data striping, for improved performance.
2. Redundancy for improved availability.
Data striping transparently distributes data over multiple disks to
make them appear as a single fast, large disk. Striping improves aggregate
I/O performance by allowing multiple I/Os to be serviced in parallel. There
are 2 aspects to this parallelism.
1. the granularity of data interleaving and
2. the way in which the redundant data is computed and stored across
the disk array.
Data interleaving can be either fine grained or coarse grained.
Fine grained disk arrays conceptually interleave data in relatively small units so that all I/O requests, regardless of their size, access all of the disks in the disk array. This results in very high data transfer rate for all I/O requests but has the disadvantages that only one logical I/O request can be in service at any given time and all disks must waste time positioning for every request.
Coarse grained disk arrays interleave data in relatively large units
so that small I/O requests need access only a small number of disks while
large requests can access all the disks in the disk array. This allows
multiple small requests to be serviced simultaneously while still allowing
large requests to see the higher transfer rates afforded by using multiple
disks.
Since larger number of disks lower the overall reliability of the array of disks, it is important to incorporate redundancy in the array of disks to tolerate disk failures and allow for the continuous operation of the system without any loss of data.
The incorporation of redundancy in disk arrays brings up two problems:
1. Selecting the method for computing the redundant information. Most redundant disks arrays today use parity, though some use Hamming or Reed-Solomon codes.
2. Selecting a method for distribution of the redundant information across the disk array. The distribution method can be classified into 2 different schemes:
Finally, it is important to mention that selecting between the many
possible data striping and redundancy schemes involves complex tradeoffs
between availability, performance and cost, which have been discussed in
the next few sections.