RAID is the organization of multiple disks into a large, high performance logical disk.
Disk arrays stripe data across multiple disks and access them
in parallel to achieve:
Large disk arrays, however are highly vulnerable to disk failures.
A disk array with a hundred disks is a hundred times more likely to fail
than a single disk. An MTTF (mean-time-to-failure) 500,000 hours for a
single disk implies an MTTF of 500,000/100
i.e. 5000 hours for a disk array with a hundred disks.
The solution to the problem of lower reliability in disk arrays is to improve the availability of the system. This can be achieved by employing redundancy in the form of error-correcting codes to tolerate disk failures. A redundant disk array can now retain data for much longer time than an unprotected single disk.
Do not confuse between reliability and availability.
Reliability is how well a system can work without any failures in its components. If there is a failure, the system was not reliable.
Availability is how well a system can work in times of a failure. If a system is able to work even in the presence of a failure of one or more system components, the system is said to be available.
Redundancy improves the availability of a system, but cannot improve
the reliability. Reliability can only be increased by improving manufacturing
technologies or using lesser individual components in a system.
Every time there is a write operation, there is a change of data. This change also, has to be reflected in the disks storing redundant information. This worsens the performance of writes in redundant disk arrays significantly compared to the performance of writes in non redundant disk arrays.
Also, keeping the redundant information consistent in the presence
of concurrent I/O operation and the possibility of system crashes can be