There are various reasons why persons holding a driver’s license no longer retain the ability to drive a car. This might be e.g. stroke traumatic brain damage, or early dementia. In order to assess the driving ability in persons with suspected cognitive impairment, there is a need for good tests that can categorize persons in to three groups: (1) inability to drive a car, (2) sufficient ability to drive a car, (3) should be referred to a more comprehensive assessment of cognitive ability.
In this report, we have provided an overview of existing cognitive screening tests for assessing functions of relevance for ability to drive a car, and how good the tests are for predicting who will pass an on-road driving test or who will experience a car accident during the first years after the screening test.
Our key messages are:
- We have not found any cognitive screening tests that have good documentation of diagnostic test accuracy for predicting results on on-road driving tests. Tests that could detect at least 65 percent of dangerous drivers in all studies were the Montreal Cognitive Assessment (MoCa, detected 70-85%), the Clock Drawing Test (detected 65-71%) and the Trail-Making Test-B (detected 70-77%). We have in most cases little or very little confidence in the results
- There was large variation in how good the tests were for predicting results on an on-road test
- There is a need for standardization of the outcome measures and the test batteries in research about screening tests for driving ability
- We can therefore not conclude about which tests are best for detecting persons with a reduced ability to drive among persons with a suspected cognitive impairment
The Norwegian Knowledge Center for the Health Services was commissioned by Haukeland University Hospital to establish which cognitive screening tests are valid and reliable instruments for predicting whether cognitive function is sufficient for safe driving.
We searched systematically for studies that had reported diagnostic test accuracy for cognitive screening tests designed to predict results on standardized driving tests. We searched specifically for studies in which all participants had taken part in one or more screening tests (index tests) and afterwards had taken an on-road test (reference test). We included studies from which we could find or estimate the four numbers: true positives, false positives, false negatives, and true negatives. This made it possible to calculate sensitivity (the proportion of the drivers performing poorly on the on-road test that the screening tests detected) and specificity ( the proportion of the drivers performing well that the screening tests detected).
Two persons independently screened the references located in the search. Risk of bias in the included studies was independently assessed by two researchers with the instrument QUADAS 2. Results for sensitivity and specificity are presented for each study, and the quality of the documentation is assessed with GRADE.
We found 53 studies that fulfilled our inclusion criteria. Forty-seven studies had compared results on screening tests with results on an on-road test. Three studies had a simulator as reference test, and three studies had both simulator and on-road test. We did not find any studies with accidents as outcome. Nine studies had reported separate results for Stroke Driver’s Screening Assessment (SDSA) with sensitivity between 0.30 and 0.88, and five studies reported separate results for UFOV (sensitivity between 0.48 and 0.89). Three studies used the clock drawing test with sensitivity between 0.65 and 0.71, and three had used MMSE (sensitivity between 0.10 and 0.80). Trail-Making Test A was reported for three studies (sensitivity between 0.40 and 0.82), and the same was true for Trail-Making Test B (sensitivity between 0.70 and 0.77). Two studies had reported results from the Montreal Cognitive Assessment with sensitivity between 0.50 and 0.73). We have constructed forest plots and HSROC curves for these tests. Apart from this, all the other studies had used different combinations of outcomes. In these cases, it was not reasonable to pool results. There was large variability between the studies both for sensitivity (SDSA: 0.30-0.88, UFOV: 0.48-0.89, Clock Drawing Test: 0.65-0.71, MMSE: 0.10-0.80, MoCa: 0.70-0.85, TMT-A: 0.40-0.82, TMT-B: 0.70-0.77) and specificity (SDSA: 0.46-0.97, UFOV: 0.42-0.93, Clock Drawing Test: 0.42-0.98, MMSE: 0.60-0.98, MoCa: 0.50-0.73, TMT-A: 0.60-0.91, TMT-B: 0.49-0.68). The quality of the documentation were mostly judged to be low or very low.
The lack of standardized outcome measures is a weakness in this field, making it difficult to summarize the results. Firstly, the studies have mostly used different combinations of tests and test batteries. Secondly, the studies that have used the same tests/ batteries have different cut-off values for differentiating between pass and fail on the tests. These cut-off values are often generated to be optimal for the actual study sample. This causes both sensitivity and specificity to become artificially high. A weakness with this systematic review is that we did not find results for the ultimate outcome: traffic accidents.