striping

Data striping and Redundancy

There are 2 important concepts to be understood in the design and implementation of disk arrays:

1. Data striping, for improved performance.
2. Redundancy for improved availability.

Data Striping

Data striping transparently distributes data over multiple disks to make them appear as a single fast, large disk. Striping improves aggregate I/O performance by allowing multiple I/Os to be serviced in parallel. There are 2 aspects to this parallelism.

Multiple, independent requests can be serviced in parallel by separate disks. This decreases the queueing time seen by I/O requests.
Single, multiple block requests can be serviced by multiple disks acting in co-ordination. This increases the effective transfer rate seen by a single request. The performance benefits increase with the number of disks in the array. Unfortunately, a large number of disks lowers the overall reliability of the disk array.

Most of the redundant disk array organizations can be distinguished based on 2 features:

1. the granularity of data interleaving and
2. the way in which the redundant data is computed and stored across the disk array.

Data interleaving can be either fine grained or coarse grained.

Fine grained disk arrays conceptually interleave data in relatively small units so that all I/O requests, regardless of their size, access all of the disks in the disk array. This results in very high data transfer rate for all I/O requests but has the disadvantages that only one logical I/O request can be in service at any given time and all disks must waste time positioning for every request.

Coarse grained disk arrays interleave data in relatively large units so that small I/O requests need access only a small number of disks while large requests can access all the disks in the disk array. This allows multiple small requests to be serviced simultaneously while still allowing large requests to see the higher transfer rates afforded by using multiple disks.

Redundancy

Since larger number of disks lower the overall reliability of the array of disks, it is important to incorporate redundancy in the array of disks to tolerate disk failures and allow for the continuous operation of the system without any loss of data.

The incorporation of redundancy in disk arrays brings up two problems:

1. Selecting the method for computing the redundant information. Most redundant disks arrays today use parity, though some use Hamming or Reed-Solomon codes.

2. Selecting a method for distribution of the redundant information across the disk array. The distribution method can be classified into 2 different schemes:

Schemes that concentrate redundant information on a small number of disks.
Schemes that distribute redundant information uniformly across all of the disks.

Such schemes are generally more desirable because they avoid hot spots and other load balancing problems suffered by schemes that do not uniformly distribute redundant information.

Finally, it is important to mention that selecting between the many possible data striping and redundancy schemes involves complex tradeoffs between availability, performance and cost, which have been discussed in the next few sections.

BACK / HOME / NEXT