|
||||
![]() |
||||
|
|
CODES: About Data Linkage & Analysis |
|||
|
CODES links data from various data sources using probabilistic linkage and creates a linked data set that allows for tracking people injured in a motor vehicle crash from the crash scene through the medical system. The probabilistic linkage uses the shared information included in the crash, hospital and emergency medical services (EMS) or emergency department data. Data from different databases are linked by matching specific event characteristics, such as day of crash or location; person characteristics, such as age or sex; and vehicle characteristics. Exact matches using specific join fields select the candidate pairs for linkage. Multiple passes are necessary to ensure that as many cases as possible are included among the candidate pairs. The quality of the linked candidate pairs is then evaluated and the true matches identified, by using match specification fields that are the same for all passes. Most of the administrative statewide datasets available to CODES researchers lack common unique personal identifiers which makes it impossible to relate records for the same person in different databases using a deterministic approach. Another limitation of the datasets is the presence of missing and/or inaccurate data because of paperbased data collection and an emergency environment. Missing data also include the records excluded by state reporting thresholds. Thus, the records that are available and have complete, accurate information are more likely to link, but they may represent only a relatively small and potentially biased sample from the actual population of true record pairs. Eliminating the records with missing data, or guessing what the missing data should be, weakens the data and can lead to biased analysis and incorrect inferences. To solve this problem, the processes used to obtain CODES data multiply impute complete, representative linked datasets. Multiple imputation is a statistical technique used for analyzing incomplete datasets that takes into consideration the uncertainty of missing data. (Taken from CODES Data model) CODES Data Analysis: A Concise Discussion (NHTSA)
Check out published CODES papers nationwide at the National Center for Statistics and Analysis. |
||||
|
|
||||