Preclinical stroke research – advantages and disadvantages of the most common rodent models of focal ischaemia

Disadvantages of manual system in predicting strokes

MCA occlusion by electrocoagulation or application of an occluding device

All the models within this category require a craniectomy and section of the dura mater to expose the MCA. Models that require a craniectomy are associated with low or absent mortality (Table 2) since the craniectomy prevents ischaemia-induced increases in intracranial pressure. This can be a significant advantage, particularly where large strokes are studied over a number of days. Oedema and brain swelling correlate with infarct size and the increases in brain volume associated with large MCA strokes over the first 24–48 h post-stroke would cause significant mortality if the skull was intact.

Following surgical exposure, the MCA is permanently occluded by coagulating the blood within it using an electric current passed through the tips of fine diathermy forceps (Tamura et al., 1981). The occluded portion of the artery is then cut confirming complete occlusion. Electrocoagulation models have been successfully adapted for use in larger species such as the cat (Bullock et al., 1990), miniature pig (Imai et al., 2006) and baboon (Yonas et al., 1981). Maintaining sterile conditions and careful post-stroke care of the wound is particularly important to avoid any infection associated with the surgery, which will increase ischaemic damage and confound the ability to determine drug-induced effects on infarct size and functional recovery. The main advantages of the model are good reproducibility in infarct size and functional deficit, low mortality, visual confirmation of successful MCAO and the ability to adapt the model to produce infarcts of different size and location. The main disadvantages are that the model induces permanent focal ischaemia and is not therefore suitable for investigation of thrombolytic agents or drugs designed to target the reperfusion phase following ischaemia. Inducing MCAO is technically demanding. Exposing the artery and applying electrocoagulation without rupturing the blood vessel or damaging the underlying cortex requires significant surgical skill. The surgery required for proximal MCA occlusion can also cause jaw alignment problems in rats requiring replacement of standard chow with soft diet and regular monitoring and tooth filing to avoid overgrowth.

Alternative MCA occlusion models use devices such as microaneurysm clips, hooks (used to lift the artery from the cortical surface until flow ceases), ligatures and in, larger species, inflatable cuffs to occlude the artery remotely (Shigeno et al., 1985; van Bruggen et al., 1999). These models have the advantage of control over the duration of ischaemia and allow subsequent reperfusion of ischaemic tissue. They provide visual confirmation of successful MCAO and reperfusion when the occluding device is removed. However, microaneurysm clips are too small to apply by hand in rats and have to be loaded into a special applicator for attachment to the MCA. Applying and removing the clips without damaging the artery is technically difficult, particularly when targeting the proximal MCA. Other disadvantages include greater variability in infarct size compared with electrocoagulation models, particularly when a single point on the MCA is occluded. Reproducibility is improved by occluding the artery at more than one site, combining MCAO with hypotension or ipsilateral common carotid artery occlusion, or using rat strains with poor collateral supply (e.g. the spontaneously hypertensive rat and the spontaneously hypertensive stroke-prone rat; see Coyle and Jokelainen, 1983) to increase the severity of the ischaemic insult. However, for considering neuroprotection studies, it is worth considering that steps such as these taken to improve reproducibility in infarct size are also likely to result in less potentially salvageable penumbral tissue being available for rescue.

— Update: 19-03-2023 — found an additional article Diffusion Tensor Imaging Biomarkers to Predict Motor Outcomes in Stroke: A Narrative Review from the website for the keyword disadvantages of manual system in predicting strokes.


Stroke is the second cause of death and the third leading cause of loss of DALYs (Disability-Adjusted Life Years) worldwide. Despite substantial advances in prevention and treatment, the global burden of this condition remains massive (1). In ischemic stroke (IS; 80–85% of the cases), hypoperfusion leads to cell death and tissue loss while in hemorrhagic stroke (HS), primary injury derives from hematoma formation and secondary injury, from a cascade of events resulting in edema and cellular death (2). In IS, cytotoxic edema is a result of glucose and oxygen deprivation, leading to a failure of ion pumps in the cell membranes and consequently to collapse of osmotic regulation, when water shifts from the extracellular to the intracellular compartment (3). In HS, heme degradation products are the primary cytotoxic event and secondarily, an inflammatory process based on degradation of the hematoma takes place (4).

Diffusion MRI (dMRI) is a powerful diagnostic tool in acute IS (5) and is widely used in clinical practice (6). dMRI sequences are sensitive to water displacement. Acute infarcts appear hyperintense on diffusion-weighted imaging (DWI) reflecting the decrease in the apparent diffusion coefficient of water molecules. DWI can be acquired and interpreted over a few minutes. It provides key information for eligibility to reperfusion therapies from 6 to 24 h after onset of symptoms (DAWN study) (7) and in wake-up strokes (8). A search on MEDLINE using the terms “stroke” and “diffusion MRI” yielded 1 article in 1991 and 279, in 2018. Diffusion tensor imaging (DTI) involves more complex post-processing, mathematical modeling of the DW signal (9) and provides measures associated with white matter (WM) microstructural properties (10).

Stroke can directly injure WM tracts and also lead to Wallerian degeneration, the anterograde distal degeneration of injured axons accompanied by demyelination (11). DTI metrics have been studied as biomarkers of recovery or responsiveness to rehabilitation interventions (12–14). The bulk of DTI studies addressed specifically the corticospinal tract (CST), crucial for motor performance or recovery (12, 15), and frequently affected by stroke lesions. Paresis occurs in the majority of the subjects in the acute phase and contributes substantially to disability (16). It is thus understandable that the CST is in the spotlight of research in the field.

Two meta-analyses included from six to eight studies and reported strong correlations between DTI metrics and upper-limb motor recovery in IS and HS (17, 18). In both meta-analyses, heterogeneity between the studies was moderate. In addition, the quality of the evidence of DTI as a predictor of motor recovery was considered only moderate by a systematic review of potential biomarkers (19). The main limitations of the reviewed studies were the lack of cross-validation and evaluation of minimal clinically important differences for motor outcomes as well as the small sample sizes. Heterogeneity in DTI data collection and analysis strategies may also contribute to inconsistencies and hinder comparisons between studies.

In this narrative review, first we review the key concepts of dMRI. Second, we present an overview of state-of-art methodological practices in DTI processing. Third, we critically review challenges of DTI in stroke and results of studies that investigated the correlation between DTI metrics in the CST and motor outcomes at different stages after stroke, according to recommendations of the Stroke Recovery and Rehabilitation Roundtable taskforce (20).

Concepts of Diffusion MRI

Different MRI paradigms address WM qualitatively and quantitatively (i.e., volume, contrast as signal hyperintensities), but only dMRI allows indirect inferences about WM microstructure by providing information about the underlying organization of the tissue. In regions of little restriction of water displacement (such as the ventricles), water molecules tend to move almost freely (randomly). On the other hand, within tracts, the environment tends to be organized within sets of axons aligned in parallel orientation. Water movement usually follows a specific orientation near axons compactly organized and constrained by the myelin packing (21).

Hence, the tensor calculation is typically based on a 3 × 3 symmetric matrix, in which the eigenvalues derived from each combination of directions provide different metrics. At least one b0 (non-diffusion-weighted) and 6 non-collinear directions of diffusion-weighted acquisitions are required to minimally describe water displacement with DTI (10). Generally, the more directions, the better.

The most widely used DTI metrics are: fractional anisotropy (FA), mean diffusivity (MD), radial diffusivity (RD), and axial diffusivity (AD). FA describes the degree of anisotropy (represented as an ellipsoid), a value between 0 (isotropic) and 1 (the most anisotropic). Anisotropy tends to increase in the presence of highly oriented fibers (Figure 1). The biggest value is supposed to be found in the center of the tracts. In particular, for CST analysis in stroke or other focal brain lesions, FA results can be reported as ratios between FA extracted from the ipsilesional and the contralesional hemispheres (rFA = FA ipsilesional/FA contralesional). Alternatively, asymmetry in FA can be described (aFA = (FA ipsilesional – FA contralesional)/(FA ipsilesional + FA contralesional).

MD describes the magnitude of diffusion and the biggest value is supposed to be found in the ventricles. RD represents the average diffusivity perpendicular to the first eigenvector and AD is the first eigenvalue (λ1) representing the diffusivity along the dominant diffusion direction.

Many studies have focused exclusively on FA. The proper interpretation of FA often demands knowledge about results of the other three DTI metrics (22). Changes in anisotropy may reflect several biological underpinnings, such as axonal packing density, axonal diameter, myelinization, neurite density, and orientation distribution (21, 23). FA can be decreased in conditions that injure the WM but also when multiple crossing fibers are present in the voxel. In case of partial volume effects, both FA and MD may be altered (24, 25).

dMRI Acquisition and Processing

DWI is a noise-sensitive and artifact-prone sequence, emphasizing the need for robust acquisitions and processing handling to avoid bias (26). Several dMRI sequences and subsequent post-processing mathematical modeling of the diffusion signal are available. Choices directly impact accuracy, reliability, and validity of the results (27).

dMRI acquisitions and analytical strategies are based on the goal of the study, balancing the pros (i.e., greater reliability of signal reconstruction) and cons (i.e., time-consuming acquisition). In addition to constraints related to the number of subjects with stroke in the studies, criteria to perform a reliable protocol should be weighted prior to data collection [for a review, see Price et al. (28)].

Diffusion images are typically acquired with sequences based on echo planar imaging (EPI) acquisitions. Two high-amplitude magnetic gradients are applied. The -value is a scalar that reflects the degree of diffusion, influenced by the duration, amplitude, and interval between the gradients. -values are comparable to an inverse zoom factor: the higher they are (“high” -values are usually above 1,000 s/mm2), the smaller the sampled space (29).

EPI acquisitions are prone to many unexpected distortions (30), therefore care should be taken during data collection. For tensor modeling, some suggestions are: parameters to minimize EPI artifacts; coverage of the entire brain; isotropic voxels; appropriate number of directions and b0s; to acquire at least one low -value (b0 for example), for every 5–6 volumes with high -value and leave it interspersed with those with high values; optimal sampling schemes of the directions in the sphere of distribution and gradient ordering (28, 31). Optimized distribution of gradients can be obtained, for example, with MRtrix software ( or ExploreDTI (

Off-resonance artifacts such as eddy currents and magnetic field inhomogeneities are intrinsic to EPI acquisitions and interfere in the expected signal, causing susceptibility-induced distortions (32). Acquisition parameters tailored to prevent and mitigate these artifacts include: parallel imaging; field maps; phase encoding with opposed gradients to correct a geometrical mismatch in the antero-posterior axis; multiple b0s (33). These alternatives demand extra data collection and prolonged scan time (34). In accordance with the chosen acquisition parameters, corrections are performed in the pre-processing step.

In stroke studies, the duration of scans should be planned by pondering the risk of fatigue and increased head motion in patients with neurologic impairments. These impairments are often not restricted to motor deficits and may involve executive dysfunction or anxiety that contribute to increase head motion and hence, artifacts. Again, trade-offs between “optimal” acquisition parameters, feasibility and noise must be weighted during study design.

Software embedded in the MRI scanner can perform tensor calculations but advanced a posteriori processing is strongly recommended. The most appropriate choice heavily depends on the objectives of the study and on acquisition limitations such as: the number of diffusion directions; image resolution; -values; number of -values; number of averages, repetitions to improve signal in relation to noise and tensor estimation (the number of excitations, NEX) (31, 35).

Many open-source softwares and pipelines are available to process diffusion images, each of them showing particular strengths—a helpful overview can be found in Soares et al. (35). A list of softwares is available on the Neuroimaging Informatics Tools and Resources Clearinghouse ( There is no consensus but some agreement about diffusion imaging processing. One can decide to use a mix of softwares to process the data, as long as key steps are completed and a detailed methodological report is made. Documentation is invariably available on-line and discussion forums can provide additional support. It is desirable, to allow reproducibility and comparisons across studies, to transparently report analytical procedures when in-house pipelines are employed (36).

Here, we will briefly cite some suggestions for processing practices, considering an ordinary single-shell acquisition (when only one single -value, in addition to the b0 is acquired) with a value around 1,000 s/mm2, with subsequent tensor modeling.


Images must be checked for artifacts, such as susceptibility effects (signal loss and geometric distortions), eddy currents-induced distortions and subject motion (31, 37), so that corrections or exclusions of subjects, volumes or slices are made accordingly. Preferably, automated, quantitative, and not exclusively visual inspection should be performed. Soares et al. (35) provide useful guidelines and a comprehensive list of softwares for quality control.

A gold-standard pre-processing pipeline does not exist. Pre-processing is intrinsically dependent on the chosen software. Users can employ different softwares to perform a miscellaneous of corrections, but it is mandatory to follow the basic steps recommended by each developer. Steps of a typical preprocessing pipeline might be:

1. A procedure frequently required, DICOM or PAR/REC conversion to NIfTI format (most diffusion processing softwares use this format).

2. Inspection of DWI images for motion, artifacts (e.g., Gibbs ringing or signal drift) (38, 39) and structural abnormalities: different softwares provide visual and quantitative inspection procedures. It is also important to inspect anatomical images such as T1, T2, and FLAIR.

3. B-matrix rotation: this notion was first introduced by Leemans and Jones (40). The rotation involved in registration of the image must be also applied to the encoding vectors. Neglecting this step may lead to biases in the estimation of the principal vector, affecting all the metrics and tract reconstruction.

4. Brain extraction: an automated segmentation method to delete non-brain tissue from the whole-head. This optional but frequently performed procedure improves registration and normalization.

5. Eddy currents and EPI distortions correction: off-resonance artifacts (as detailed previously) must be corrected. Tools are available, for example, in the ExploreDTI software and in the FSL platform (Topup and Eddy). Further details on how to acquire data and how to perform corrections can be also found at It must be emphasized that an adequate acquisition is required in order to be able to perform such corrections (41).

6. Tensor estimation in each voxel and generation of maps of FA, MD, RD, and AD (Figure 2). This estimation can be based on different methods and a variety of softwares can perform this calculation but visual inspection of tensor orientation is highly recommended (42–44). If distortions of the expected orientation occur, it is necessary to modify the gradient table, perform reorientation and re-processing, starting over from the first steps (35).

Read more  Cramp in your legs ‘is early warning sign of deadly heart attack or stroke’


DTI maps generated in the native space for each subject can be co-registered so that group-wise comparisons can be performed. Co-registration refers to intra or inter-subject spatial alignment of images within or between MRI sequences. Decisions about co-registration tools must consider the paradigm of study, assumptions and specific steps of image processing (45–47). Typical steps of post processing pipeline are highly dependent on the chosen software, but in general, images are co-registered and normalized. Normalization of images to a standard space is a fundamental step to perform comparisons, which is particularly challenging for diffusion images, since they are highly directional and topological (35, 48, 49). After that, group-wise statistics can be performed.

We will review the types of analyses more frequently applied in DTI studies in stroke: ROI-based analysis, tractography, and whole-brain analysis.

Region-Specific Analysis

ROI Analysis

ROIs can be drawn on T1, T2, FA, or ADC images. They can be placed on the abnormal/lesion regions or on predetermined anatomic regions. In the WM, the homogenous signal and EPI distortions might impair robust anatomical delimitation of ROI and reproducibility.

Basic steps of ROI processing are:

1. Registration to improve delineation and to align corresponding voxels in different datasets.

2. Normalization to allow standardized localization and comparisons between subjects within a study. For instance, data from each subject can be transferred to standard space, using a validated template or atlas (such as MNI or Talairach, among others) (34). The choice of the atlas involves checking whether characteristics of the subjects in a given study (i.e., elderly people) are comparable to those of the subjects scanned to build the template (50).

3. Definition of the ROI, manually or semi-automatically. Manual delineation can be achieved by free-hand drawing, by placement of basic shapes such as circles/squares or by drawing of the region. In the former, ROI size differs between subjects while in the latter, it remains constant. Small ROIs may be more specific, but also more prone to errors while large ROIs may be less specific for definition of particular structures and more prone to partial volume effects (inclusion of structures other than the target area) (34).

4. Manual segmentation has high precision but has disadvantages such as the risk of low reproducibility due to dependence on prior knowledge of the researcher and the lack of feasibility of use in large datasets (51). Semi-automated delimitation can be a useful alternative by combining the automated identification of the ROI with a manual, interactive selection and modification by the user (52). Although fully automated delimitation is promising, such as reported by Koyama et al. (53, 54), more studies with large datasets in different phases of stroke are advisable to create a state-of-art automatic method (28, 50–52, 55).

5. Quality control involves: assessment of accuracy of segmentation and registration; report of intra- and inter-rater reliabilities of ROI delineation; clarity of criteria for the location of the ROI (such as anatomical location) – for details, refer to Froeling et al. (34).

6. Extraction of DTI metrics from the ROI, as absolute values from the ipsilesional/contralesional site or ratios between both (56).

7. When more than one ROI is chosen, the correction for multiple comparisons is recommended to reduce false positives—for details, refer to Froeling et al. (34).


Tractography corresponds to the mathematical reconstruction of tracts (57, 58). By following the preferred direction of water voxel by voxel, it is possible to trace the tracts tree-dimensionally and non-invasively (59, 60). This represents an advantage over ROIs, allowing qualitative and quantitative investigation along of the entire tract of interest. DTI metrics can be extracted from the entire reconstructed tract or from a segment (ROI) of the tract. There are two main approaches for path reconstruction:

1. Deterministic, following the best-fit pathway (the main eigenvector λ1), the principal axis of the tensor aligning with the principal direction of the fibers. It estimates the most likely fiber orientation in each voxel. This method tends to show the best valid/invalid connection trade-offs, but presents low spatial bundle coverage in comparison to the probabilistic method (61).

2. Probabilistic, based on the estimation of uncertainty in fiber orientation (60, 62). It is frequently considered more robust and deals better with partial volume averaging effects, crossing fibers, as well as noise (63). Yet, it is faced with pitfalls, is more time-consuming and computationally expensive.

Noise and artifacts affect reconstructions. There is no “ground-truth” solution to validate tracking results (64). Several efforts are in progress to investigate the ground-truth of diffusion and tracts trajectory by using phantoms, post-mortem, and histological information. The trajectory from the initial (“seed”) voxel to the end point can be represented by a streamline. A streamline refers to the unitary path of reconstruction within a tract and does not indicate an actual nerve fiber or tract (64). Streamlines can vary in different subjects and across experimental paradigms.

Path reconstruction can be constrained by three main steps: seeding, propagation and termination (35). Usually, streamline tractography is based on the placement of multiple ROIs: starting from seed points using a predefined ROI, guiding the path reconstruction by preserving only streamlines passing through or touching other predefined ROIs; full brain tractography keeping the streamlines accordingly with conjunctions, disjunctions, or exclusions ROIs (65). The seeding strategy can also be performed on a voxel-wise level across the brain, running a whole-brain tractography .

Termination of streamlines is usually guided by a set of parameters: FA threshold (between 0.1 and 0.3 for adult brain), turning angle threshold (depending on the considered tract anatomy—typically between 40 and 70°) to avoid streamlines propagating voxels of high uncertainty, such as the cerebrospinal fluid (CSF) and gray matter (35). Fully automated clustering methods can be alternatives to manual ROI-based approaches (65).

Several methods of CST reconstruction are available with no consensus. For instance, DTI metrics can be extracted from the entire tract or from ROIs within these tracts, as absolute values from the ipsilesional/contralesional site or as a ratio between both (56). Recently, a DTI challenge of CST reconstruction with tractography demonstrated a consistent presence of false-negative and false-positive pathways. Most of these reconstructions were limited to the medial portion of the motor strip and few were able to trace lateral projections (such as hand-related). Generally, improved results depend on strategies, such as: method of reconstruction, improved signal; sharp estimations of fiber distribution; priors on spatial smoothness; seeding strategies. Anatomically, there are a variety of possible reconstructions, for instance, defined as the pathways coursing through the cerebral peduncles to the pre- and post-central gyrus (61). Park et al. (56) provide detailed information about how to seed and how to confine fibers. Figure 3A shows an example of a probabilistic and Figure 3B, of a deterministic CST tractography.

One of the weaknesses of tensor-based tractography is the assumption that the diffusion related to fibers within a voxel follows a Gaussian distribution, represented by a single direction. This assumption is violated by the presence of crossing fibers and multiple axonal orientations (estimated as ~90% of WM voxels) (66) (Figure 4A). It was hypothesized that increasing the number of directions in the MRI acquisition (such as at least 28 directions in low -values – ~1,000 mm/s2) would solve this problem (26, 67). However, it became clear that more advanced models were needed (26, 66).

Beyond DTI-Based Tractography: HARDI Models

High angular resolution diffusion imaging (HARDI) uses a larger number of diffusion gradient directions, often in combination with multiple -values, to measure the diffusion signal (68). By doing so, a more reliable reconstruction of the underlying diffusion and fiber orientation distribution can be obtained, overcoming pitfalls such as crossing fibers (Figure 4B). To reach a deeper understanding of the evolution of HARDI models, we refer to Daducci et al. and to Descoteaux et al. (29, 69). HARDI models are superior to DTI to reconstruct the CST (70, 71). However, the higher angular resolution in combination with higher b-values is frequently more time-costly and noisy.

Another approach to model the fiber orientation distribution is Constrained Spherical Deconvolution (CSD) (Figure 4B), typically relying on a single-shell HARDI acquisition and even “low” values in the range of 1,000 s/mm2 (72, 73). CSD has medium requirements of acquisition and computation as well as has higher accuracy in fiber orientation estimates than DTI (74). It has been demonstrated that CSD-based tractography consistently reconstructs the fan-shaped CST within the sensorimotor cortex, whereas DTI-based tractography does not (75). Excellent inter-rater and test-retest reliability were reported for FA extracted from CSD-based reconstructions of the CST (76).

Whole-Brain Analysis

Whole-brain analysis is an exploratory approach that can be applied to investigate global WM changes or whether such changes are heterogeneous across patients within a study. Analyses can be performed and measures can be extracted using different approaches, such as:

1. Histogram analysis of all voxels in the brain. Histograms that express the frequencies of voxels with a specific value for a DTI metric such as FA can be built. Median, mean, peak height, and peak location of DTI metrics can thus be estimated (59, 77).

2. Brain or WM voxels defined by a mask created from either segmentation of an anatomical MRI or by whole brain tractography. If the former strategy is chosen, DTI values in the voxels can be extracted after registration of anatomical MRI to the non-diffusion weighted image by means of an affine transformation. If whole-brain tractography is performed, then DTI measures can be extracted from voxels that are part of the streamlines.

3. The most popular approach is voxel-based analysis (VBA) and compares DTI metrics in every voxel of the brain (59). This strategy has high reproducibility, is time-efficient and provides excellent spatially localized information, based on the atlases coordinates (78). It provides conservative corrections for multiple comparisons throughout all voxels in the brain, enhancing type II error. Still, it is recommended that corrected results be presented. An alternative is running a cluster-based analysis and correcting them instead of correcting voxel-by-voxel. In addition, novel cluster-based approaches are available to avoid the arbitrary choice of a threshold. TFCE (Threshold-Free Cluster Enhancement) (79) embedded in the tract-based spatial statistics (TBSS – FSL), offers a more robust approach to find significant clusters. TBSS overcomes issues about alignment and smoothing in voxel-based analysis by focusing registration and statistical testing exclusively on the center of the tracts (80). TBSS reduces type II error, at the expense of ignoring findings in the periphery of the tracts. However, TBSS is known to suffer from several methodological limitations that complicate outcome interpretation [for details, see Bach et al. (81)].

Challenges of DTI in Stroke

Major Challenge: Heterogeneity of Lesions

The main challenge of DTI in stroke is heterogeneity of lesions—for a review, see de Haan and Karnath (50). Lesion location and size vary across subjects and large lesions often disrupt tracts (80) or promote shifts that impact registration and normalization. In the chronic phase after stroke, loss of brain tissue and secondary dilation of CSF-filled spaces represent an extra-challenge for normalization (82). Special care is advised when inferences are based on large lesions (28). Lesions influence eligibility criteria (so that reliable statistical comparisons between subjects can be made) and impact image processing, demanding a variety of techniques to overcome distortions of the typical anatomy.

The mismatch between images from patients and templates in atlases based on brains from healthy subjects affects normalization (50). Two possible solutions to overcome this mismatch are cost function masking and enantiomorphic normalization. The first approach, which involves masking out voxels of the lesions, may be more useful for small and bilateral lesions. The second approach “replaces” the lesion with brain homolog tissue from the contralesional hemisphere, being useful for large and unilateral lesions placed in symmetric regions, for example as performed by Moulton et al. (83).

Lesion masks can be created by changing the intensity of pixels inside or outside the segmented lesion and hence, obtaining binary images (zero-one intensity). Lesion masks can be manually drawn [for details, see Liew et al. (51)], but several efforts are in course to improve machine-learning algorithms for automatically and accurately segment lesions. Recently, the large open-source T1-weighted dataset ATLAS (Anatomical Tracings of Lesions After Stroke) was released (51). Also, PALS (Pipeline for Analyzing Lesions after Stroke) was developed as a specific tool to improve similarity between manually delimitated lesions. It consists of image reorientation, lesion correction for WM voxels and load calculation, as well as visual inspection of the automated output (84).

Masking out lesions may require large deformations, particularly in WM regions adjacent to gray matter and cerebrospinal fluid. An interesting approach to deal with this problem is DR-TAMAS (Diffeomorphic Registration for Tensor Accurate alignMent of Anatomical Structures) (85) that optimizes normalization by including information not only of FA maps, but of anatomical T1 and T2 images. DR-TAMAS allows creation of atlases based on the diffusion tensor or anatomical images provided by the user. Recently, group-wise registrations without masking lesions were reliably performed on fiber-oriented distribution (FOD)-based algorithms that exclusively rely on diffusion images (CSD-based acquisitions) (83). This approach increased sensitivity to capture FA changes in the CST.

Challenges for ROI-Based Analysis

The low resolution of DTI images can hinder delineation of the ROI. Registration of the DTI dataset to anatomical T1/T2 images can improve spatial resolution and facilitate ROI drawing. However, misregistration/misalignment can occur, mainly driven by the different distortions in the two types of images and the lower resolution of DTI images resulting in partial volume effects (34, 78). Slight shifts could lead to extraction of metrics from different anatomic regions other than the ROI.

Furthermore, the best choice for ROI placement within the CST remains an open question. According to Koyama et al. (53), outcome prediction is more accurate when fully automated ROIs are placed in the cerebral peduncle. According to Park et al. (56), the extraction of DTI measures from the posterior limb of internal capsule (PLIC) is reliable. Tang et al. (86) reported that ROIs in the brain stem are more subjected to partial volumes problems (caused by the proximity with CSF) than at the PLIC.

Challenges for Tractography

In stroke, tractography may be used to reconstruct a tract of interest based on a prior hypothesis, to obtain qualitative anatomical information (visual evidence of disruption of the tract), extract quantitative measures (volumetric and diffusion metrics) or make inferences about connectivity (87).

To track the CST, a standard template based on healthy subjects can be reliably used to extract metrics from the whole-tract or from a section of it, such as within the PLIC (56). In strokes that affect the CST, tractography may not be feasible because of the loss of normal pathway of axons within the tract, leading to an unreliable morphology of the tracts (64). In turn, the placement of individual ROIs can be problematic because it is operator-depending biased, time-consuming, limited in feasibility and generalizability. For this reason, the use of a template from healthy volunteers to guide extraction is a possible alternative (56). Limitations regarding anatomical accuracy and quantitative evaluation of tractography in stroke should be considered [for details, please see Jbabdi and Johansen-Berg (88); Thomas et al. (89)].

Challenges for Whole-Brain Analysis

Typical steps of whole-brain processing pipeline involve co-registration and normalization so that group-wise statistical comparisons can be made. Stroke lesions can be obstacles for automatic whole-brain voxel-wise analysis such as TBSS. The cost function masking and enantiomorphic normalization can be used as alternatives to overcome lesion deformations.

Challenges for Replicability

Results are dependent on the adoption of good practices regarding acquisition parameters, pre and post-processing. Researchers may tend to use their own tools or manual methods (84), but guidelines to improve repeatability and reproducibility are available, such as those made available by the Quantitative Imaging Biomarkers Alliance (QIBA) Also, it is crucial to use the same package and software version within the same study and while processing longitudinal datasets. Whenever possible, the most updated version should be chosen (64).

Read more  Study of ECG changes and its relation to mortality in cases of cerebrovascular accidents

dMRI as a Biomarker of Recovery in Stroke

In this section, we review studies that assessed correlations between DTI measures on the CST to predict motor recovery.

LMM and RL searched MEDLINE (Medical Literature Analysis and Retrieval System Online; through the PubMed interface) and Web of Science, using the following keywords: motor (stroke or infarct or infarction or hemorrhage) and corticospinal tract and diffusion (imaging or tensor imaging). A complementary search was made using the first two keywords combination and tractography or FA. Studies were selected according to the following criteria.

Inclusion criteria: evaluation of patients with IS or HS; publication from January, 2008 until December 5th, 2018; collection of MRI data for DTI metrics in the hyperacute (< 1 day after onset of symptoms) (Table 1), acute (2–7 days) (Table 2), or early subacute (7 days−3 months) (Table 3) phases after stroke, according to definitions of the Stroke Recovery and Rehabilitation Roundtable taskforce (20); original articles; evaluation of at least one DTI metric (FA, AD, RD, or MD) in the CST; prospective assessment of motor outcomes (at least 4 weeks after stroke) with measures of body structure and function (such as the Medical Research Council Scale, NIH Stroke Scale, Motricity Index of Arm and Leg, Fugl-Meyer Motor Assessment, among others) or with measures of activity (such as the Action Research Arm Test, or Wolf Motor Function Test), according to the International Classification of Functioning, Disability and Health (ICF)—WHO 2001— (90); evaluation of correlations between DTI metric(s), and motor outcomes (but not changes in motor outcomes relative to baseline); minimal sample size, 10 patients; post-processing of images performed with whole-brain, ROI (region-of-interest) or tractography strategies. Studies that performed tractography but did not report DTI metrics were excluded. Cross-sectional studies were not included in the review.

The following information was retrieved from the manuscripts (Tables 1–3): type of stroke; lesion site or affected arterial territory; number of subjects; age; gender; MRI field; number of directions/b0; value (s/mm2); methods of analysis (technique/software/metrics); whether lesion masks were mentioned; whether ipsilesional and contralesional CST were assessed; when motor evaluation was performed (time from stroke); motor outcome; whether DTI correlated with outcome and correlation coefficients.

MRI scans were performed on 3T scanners in 57.9% of the studies. The number of directions during diffusion acquisitions ranged from 6 to 64 and the number of b0, from 1 to 10. 83.3% of the studies used b values of 1,000 s/mm2.

Only 15.8% of the studies explicitly mentioned lesion masks during pre-processing and 18 different softwares were used for data analysis. 52.6% measured DTI metrics according to ROI-based methods, 36.8%, according to ROI in tractographies, and 10.5% within the entire CST according to tractography; 10.5% extracted the entire CST as a ROI based in whole-brain processing in TBSS (97). The most commonly chosen ROIs were the cerebral peduncle (61%) and the pons (33%).

Despite great heterogeneity in methods of collecting and analyzing the data, the majority of studies reported statistically significant correlations between DTI biomarkers and motor outcomes: 66.7% in the hyperacute, 83.3% in the acute, and 92.3% in the early subacute phases after stroke. Motor impairments were evaluated from 4 weeks to up to 6 months later in the hyperacute/acute studies, and up to 2 years in the subacute studies. DTI results closer to normal, from the 1 day up to 3 months after stroke, were correlated with less severe impairments.

FA, rFA, or aFA were measured in 100% of the studies. At least one of these metrics was significantly correlated with motor outcomes in 66.7% of hyperacute or acute, and in 92.3% of early subacute studies. FA values vary across subjects and are influenced not only by the stroke, but also by subclinical white matter lesions that are frequent in patients with vascular disease in the ipsilesional as well as in the contralesional hemisphere (98, 99). However, the changes in FA values in the CST due to chronic white matter lesions is expected to be less severe than those caused by stroke. None of the identified studies reported discrepant results in regard to correlations between clinical outcomes and FA metrics (for instance, correlation of outcomes with rFA but not with FA). Two studies (93, 100) reported absences of correlations between clinical outcomes and FA or rFA. Other studies that described correlations between rFA or aFA and motor outcomes did not mention whether correlations were also present between ipsilesional FA and outcomes (Tables 1–3). Therefore, it is not possible to define whether measures of asymmetry are more strongly correlated to motor outcomes, when compared to absolute ipsilesional FA values.

Puig et al., Groisser et al., and Jang et al. did not find significant correlations between FA metrics and motor outcomes at some of the stages (93, 94, 100).

Puig et al. (93) assessed FA and did not find a significant correlation between this measure < 12 h or at 3 days, or impairments at 3 months, in 60 patients after stroke. In this study, there was no significant asymmetry in FA values for the CST (ROI: pons) measured hyperacutely or at 3 days post-stroke, but there was a significant asymmetry 1 month later. FA abnormalities at 1 month correlated with motor performance also assessed at 1 month. Only MCA infarcts were included, and it is possible that measurements extracted from the CST at the pons, away from the infarcts at a time when Wallerian degeneration might not yet fully ensued, may have contributed to this negative finding (93).

On the other hand, Groisser et al. found that changes in FA measured at a later stage (1–2 months) correlated with hand grip, Motricity Index and nine-hole peg test measured at 6 months, in line with other studies that assessed DTI at the early subacute phase post-stroke (94).

Jang et al. were the only authors who did not report correlations between FA or rFA at the early subacute phase, and motor impairments. Only subjects with pontine infarcts were included, and measures were made at the pons, from 7 to 28 days post-stroke, according to tractography. The authors hypothesized that lack of a significant difference in the directionality of the residual CST at this level may have contributed to this finding (100).

Few of the selected studies measured AD, MD, and RD (83, 111, 112). FA is a highly sensitive, but quite non-specific measure (22, 113). Nevertheless, the results of this narrative review suggest a consistent relation between FA measured in the CST at early stages after stroke, and motor impairments, in line with results of meta-analyses (17–19). However, studies included in this review predominantly assessed motor impairments, rather than activity (disability) according to the ICF. It remains to be clarified if DTI measures within the first hours to 3 months after stroke can predict long-term disability.

A key question is whether DTI results enhance the predictive value of models of motor disability based on clinical information such as age and motor impairments, and neurophysiological testing. For instance, Stinear et al. reviewed data from 207 patients clinically assessed for upper limb impairments (SAFE score: shoulder abduction and wrist extension) and overall neurological impairments (NIH stroke scale) within 3 days post-stroke. The patients underwent transcranial magnetic stimulation to determine the presence of upper limb motor evoked potentials contralateral to the lesion, and MRI at 10–14 days to assess: FA asymmetry (ROI: posterior limb of the internal capsule), lesion load evaluated with tractography in the CST and in sensorimotor tracts. The primary upper limb motor outcome was the Action Research Arm Test, a measure of upper limb activity according to the ICF. Different prediction models were tested and the authors concluded that the PREP2 score, that includes age, SAFE and NIHSS scores as well as transcranial magnetic stimulation results, without any MRI biomarker, made correct predictions for 75% of the patients (114). DTI results were not included in the model because prediction accuracies of decision trees remained equivalent, whether or not these results were included. In order to build robust predictive models testing the magnitude of effect of different variables on upper limb motor outcomes, large samples of subjects are required.

The analysis of large sets of data, such as the ongoing ENIGMA project ( is expected to help in closing the gap in knowledge about the relevance of DTI biomarkers in research and clinical practice, to define motor prognosis. At the moment, DTI is not routinely performed in clinical practice for motor prognostication in stroke.

This study has some limitations. First, for the purpose of the review, we excluded studies not reporting metrics, such as: myelin quantification, apparent diffusion coefficients, WM volume or qualitative tractography-based information. We also excluded studies not based on the tensor, such as kurtosis or HARDI modeling, as well as microstructural-directed sequences, such as CHARMED/NODDI. All of them may convey complementary, critical information about the underlying WM alterations in the CST in stroke. Second, the choice of keywords may have led to non-inclusion of studies that addressed the aims of this review.

Conclusions and Future Directions

FA in the CST, measured within the first hours to 3 months after stroke, has emerged as a potential DTI biomarker of motor recovery. Further research about its relevance, involving analysis of large sets of data from multiple centers, will benefit from definition of minimal standards and optimal pipelines for data acquisition, analysis, and reporting.

To perform whole-brain voxel wise and ROI analysis, according to the published studies in the field, it is suggested to: (1) acquire at least 30 non-collinear directions, as more accurate sampling reduces orientational dependence and enhances accuracy and precision of DTI metrics (10); (2) use at least 6 interspersed low -value images (such as zero), reducing the risk of systematic errors due to subject motion (10); (3) use an optimal -value (around 1,000 s/mm2), depending on the other physical parameters (28, 31, 33); (4) report parameters of acquisition employed for correction of EPI distortions (31, 115, 116); (5) whenever possible, opt for a HARDI protocol if the goal is to perform tractography. The suggested steps of pre and post-processing discussed in this review should take into consideration the limitations of the acquisition. Clear information about acquisition parameters and methodological choices of processing strategies should be provided—if necessary, due to limits in the number of words according to guidelines of different journals, as on-line supplemental material.

The decrease in methodological heterogeneity and enhancement of reproducibility will advance the field by setting the stage for large studies with good-quality data in order to define the clinical relevance of DTI in prediction of motor disability from stroke.

Finally, in the revised studies, the goal was not to test comprehensive predictive models that included DTI results. In order to determine whether DTI will have a role on prediction of motor recovery after stroke, it is necessary to test different models in large sets of data. DTI may reach a place in clinical practice if accuracy of a model is enhanced by this imaging tool, compared to models that only include variables that can be quickly and easily obtained such as bedside clinical evaluation.

Author Contributions

AC and LM contributed to the conception and design of the study. LM wrote the first draft of the manuscript. All authors contributed to manuscript revision, read, and approved the submitted version.


AC, RL, and LM received scholarship from Grant R01NS076348-01 (National Institutes of Health). Funds for publication fees were paid by this grant. JPP received a scholarship from CAPES (Coordenação de Aperfeiçoamento de Pessoal de Nível Superior). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Supplementary Material

The Supplementary Material for this article can be found online at:


— Update: 24-03-2023 — found an additional article Machine Learning in Action: Stroke Diagnosis and Outcome Prediction from the website for the keyword disadvantages of manual system in predicting strokes.


The term machine learning (ML) was coined by Arthur Samuel in 1959 (1). He investigated two machine learning procedures using the game of checkers and concluded that computers can be programmed quickly to play a better game of checkers than the person who wrote the program. Simply put, machine learning can be defined as a subfield of artificial intelligence (AI) that uses computerized algorithms to automatically improve performance through iterative learning process or experience (i.e., data acquisition) (2). Of late, the field of ML has vastly evolved with the development of various computerized algorithms for pattern recognition and data assimilation to improve predictions, decisions, perceptions, and actions across various fields and serves as an extension to the traditional statistical approaches. In our day-to-day life, a relatable example of ML is the application of spam filters to the 319 billion emails sent and received daily worldwide, of which, nearly 50% can be classified as spam (3). Use of ML technology has made this process efficient and manageable. The ML technology utilizes various methods for automated data analysis including linear and logistic regression models as well as other methods such as the support vector machines (SVM), random forests (RF), classification trees and discriminant analysis that allow combination of features (data points) in a non-linear manner with flexible decision boundaries. The advent of neural networks and deep learning (DL) technology has transformed the field of ML with automatic and efficient feature identification and processing within a covert analytic network, without the need for a priori feature selection. Notably, performance of DL is known to improve with access to larger datasets, whereas classic ML methods tend to plateau at relatively lower performance levels. Hence, in this era of big data where clinicians are constantly inundated with plethora of clinical information, use of DL technology has significnalty enhanced our ability to assimilate the vast amount of clinical data to make expeditious clinical decision.

Stroke is a leading cause of death, disability, and cognitive impairment in the United States (4). According to the 2013 policy statement from the American Heart Association, an estimated 4% of US adults will suffer from a stroke by 2030, accounting for total annual stroke-related medical cost of $240.67 billion by 2030 (5). For ischemic stroke, acute management is highly dependent on prompt diagnosis. According to the current ischemic stroke guidelines, patients are eligible for intravenous thrombolysis up to 4.5 h from symptom onset and endovascular thrombectomy without advanced imaging within 6 h of symptom onset (6–8). For patients presenting between 6 and 24 h of symptom onset (or last known well time), advanced imaging is recommended to assess salvageable penumbra for decisions regarding endovascular therapy (9–11). Similarly for hemorrhagic stroke, timely diagnosis utilizing imaging technology to evaluate the type and etiology of hemorrhage is important in guiding acute treatment decisions. Prompt diagnosis with emergent treatment decision and accurate prognostication is hence the cornerstone of acute stroke management. Over the recent years, a multitude of ML methodologies have been applied to stroke for various purposes, including diagnosis of stroke (12, 13), prediction of stroke symptom onset (14, 15), assessment of stroke severity (16, 17), characterization of clot composition (18), analysis of cerebral edema (19), prediction of hematoma expansion (20), and outcome prediction (21–23). In particular, there has been a rapid increase in the trend of ML application for stroke diagnosis and outcome prediction. The Ischemic Stroke Lesion Segmentation Challenge (ISLES: provides a global competing platform encouraging teams across the world to develop advanced tools for stroke lesion analysis using ML. In this platform, competitors train their algorithms on a standardized dataset and eventually generate benchmarks for algorithm performance.

Read more  The Effects of an Occipital Lobe Stroke

Deciding which type of ML to use on a specific dataset depends on factors such as the size of dataset, need for supervision, ability to learn, and the generalizability of the model (24). DL technology such as the deep neural networks has significantly improved the ability for image segmentation, automated featurization (e.g., conversion of raw signal into clinically useful parameter), and multimodal prognostication in stroke; and it is increasingly utilized in stroke-based applications (25–27). For example, DL algorithms can be applied to extract meaningful imaging features for image processing in an increasing order of hierarchical complexity to make predictions, such as the final infarct volume (27). Some commonly used ML types with their respective algorithms and practical examples are outlined in Figures 1–3. In the healthcare setting, supervised and unsupervised algorithms are both commonly used. In this review, we will specifically focus on ML strategies for stroke diagnosis and outcome prediction. Table 1 provides an overview of pertinent studies with use of ML in stroke diagnosis (Section A) and outcome prediction (Section B). A glossary of machine learning terms with brief description is separately provided in Supplementary Table 1.


We searched PubMed, Google Scholar, Web of Science, and IEEE Xplore® for relevant articles using various combination of the following key words: “machine learning,” “artificial intelligence,” “stroke,” “ischemic stroke,” “hemorrhagic stroke,” “diagnosis,” “prognosis,” “outcome,” “big data,” and “outcome prediction.” Resulting abstracts were screened by all authors and articles were hand-picked for full review based on relevance and scientific integrity. Final article list was reviewed and approved by all authors.

Machine Learning in Stroke Diagnosis

The time-sensitive nature of stroke care underpins the need for accurate and rapid tools to assist in stroke diagnosis. Over the recent years, the science of brain imaging has vastly advanced with the availability of a myriad of AI based diagnostic imaging algorithms (77). Machine learning is particularly useful in diagnosis of acute stroke with large vessel occlusion (LVO). Various automated methods for detection of stroke core and penumbra size as well as mismatch quantification and detection of vascular thrombi have recently been developed (77). Over the past decade, 13 different companies have developed automated and semi-automated commercially available software for acute stroke diagnostics (Aidoc®, Apollo Medical Imaging Technology®, Brainomix®, inferVISION®, RAPID®, JLK Inspection®, Max-Q AI®, Nico.lab®, Olea Medical®,®,®, and Zebra Medical Vision®) (78). The RapidAI® and® technology have been approved under the medical device category of computer-assisted triage by the United States Food and Drug Administration (FDA). The RAPID MRI® (Rapid processing of Perfusion and Diffusion) software allows for an unsupervised, fully-automated processing of perfusion and diffusion data to identify those who may benefit from thrombectomy based on the mismatch ratio (79). Such commercial platforms available for automatic detection of ischemic stroke and LVO have facilitated rapid treatment decisions. When compared to manual segmentation of lesion volume and mismatch identification from patients enrolled in DEFUSE 2, the RAPID results were found to be well-correlated (2 = 0.99 and 0.96 for diffusion and perfusion weighted imaging, respectively) with 100% sensitivity and 91% specificity for mismatch identification (80). Since 2008, the RapidAI® platform has expanded to include other products (Rapid® ICH, ASPECTS, CTA, LVO, CTP, MRI, Angio, and Aneurysm) that assist across the entire spectrum of stroke. Viz LVO® was the first FDA-cleared software to detect and alert clinicians of LVO the “Viz Platform” (81). In a recent single center study with 1,167 CTAs analyzed, Viz LVO® was found to have a sensitivity of 0.81 and a negative predictive value of 0.99 with an accuracy of 0.94 (82).

Other areas of stroke diagnostics that have seen an increase in attention over the past decade are the identification of intracerebral hemorrhage (ICH) and patients at risk for delayed cerebral ischemia in the setting of aneurysmal subarachnoid hemorrhage (aSAH). While most studies tend to have good accuracy in detecting an ICH there is more variability in subclassification and measurements of hematoma volume. A summary of recent publications on ML in stroke diagnosis is presented in Table 1 (Section A).

Machine Learning in Stroke Outcome Prediction

Despite recent advances in stroke care, it remains the second leading cause of death and disability world-wide (4, 83). Although acute stroke diagnosis and determination of the time of stroke onset are the initial steps of comprehensive stroke management, clinicians are also often charged with the task of determining stroke outcomes. These outcomes range from discrete radiological outcomes (e.g., final infarct volume, the likelihood of hemorrhagic transformation, etc.), the likelihood of morbidity (e.g., stroke-associated pneumonia) and mortality, and various measures of functional independence (e.g., mRS score, Barthel Index score, cognitive, and language function, etc.).

Prognostication after an acute brain injury is notoriously challenging, particularly within the first 24–48 h (84). However, a clinician may be called upon to provide estimates of a patient's short-term and long-term mortality and degree of functional dependence to assist with decision-making regarding the intensity of care (e.g., use of thrombolytics or endovascular treatment, intubation, code status, etc.) (60, 64, 66, 67, 69, 70, 72–76). Like all medical emergencies, it is incumbent upon the stroke clinician to ensure that all care provided is concordant with an individual patient's goals (85). For example, a surrogate decision-maker may decline to reverse a patient's longstanding “do not intubate” order to facilitate mechanical thrombectomy if the clinician predicts the patient has a high likelihood of functional dependence or short-term mortality. Hence, accuracy in outcome prediction is critical in guiding management of our patients.

Determining a patient's likelihood of developing symptomatic intracranial hemorrhage (sICH) is of obvious, immediate value in acute stroke management in determining candidacy for thrombolytic therapy or endovascular treatment. Historically, clinician-based prognostication tools to predict the risk of symptomatic intracranial hemorrhage after IV thrombolysis, such as the SEDAN (Sugar, Early Infarct signs, Dense cerebral artery sign, Age, and NIHSS) and HAT (Hemorrhage After Thrombolysis) scores have been used to predict the risk of symptomatic intracranial hemorrhage after IV thrombolysis (23). Advances in ML and DL have allowed for the development of more accurate models which outperform the traditional SEDAN and HAT scores (23, 54, 55). Similarly, the ability to predict final infarct volume and the likelihood of the development of malignant cerebral edema have important treatment implications and remain a significant focus of ML in stroke (26, 51–53).

In patients with intracerebral hemorrhage (ICH), the ICH-score is one of the most widely used clinical prediction scores (85–88). Although ML technology for outcome prediction has rapidly advanced for ischemic stroke, recent ML studies predicting functional outcomes after ICH have also demonstrated high-discriminating power (63, 89). A recent study by Sennfält et al. tracked long-term functional dependence and mortality after an acute ischemic stroke of more than 20,000 Swedish patients (90). The 30-day mortality rate was 11.1%. At 5 years, 70.6% of ischemic stroke patients were functionally dependent (defined as mRS score of ≥3) or had died (5-year mortality rate of 50.6%). These sobering outcomes partially account for the development of many stroke prognostic models over the years, which frequently serve as benchmarks in stroke research. Recently, Matsumoto et al. compared the performance of six existing stroke prognostic models for predicting poor functional outcomes and in-hospital mortality with linear regression or decision tree ensemble models (59). The novel prediction models performed slightly better than the conventional models in predicting poor functional outcomes (AUC 0.88–0.94 vs. AUC 0.70–0.92) but were equivalent or marginally worse in predicting in-hospital death (AUC 0.84–0.88 vs. AUC 0.87–0.88). Many such stroke prediction models have emerged over the recent years. An overview of ML based automated algorithms for stroke outcome prediction is provided in Table 1 (Section B).


In recent years, some DL algorithms have approached human levels of performance in object recognition (91). One of the greatest strengths of ML is its ability to endlessly process data and tirelessly perform an iterative task. Further, creation of a ML model can be performed much faster (i.e., in a matter of 5–6 days compared with 5–6 months or even years) than traditional computer-aided detection and diagnosis (CAD) (92). which makes ML an attractive field for computer experts and scientists. Several ML tools are currently in use including the FDA-approved ML algorithms previously discussed for rapid stroke diagnosis which have significantly enhanced the workflow of acute ischemic stroke patients.

Despite the prolific advent of new and improved ML algorithms with increasing clinical applications, it is important to recognize that computer-based algorithms are only as good as the data used to train the models. For a reliable algorithm, it is important to develop well-defined training, validation, and testing sets. Testing should be done on a diverse set of data points reflective of a real-world scenario. Overfitting can be an issue in ML algorithms when the model is trained on a group of highly-selected, specific features, which when tested on a larger dataset with varied features, fails to perform adequately. Similarly, underfitting can occur when a model is oversimplified with generalized feature selection in the training set which then becomes unable to capture the relevant features within a complex pattern of a larger or more diverse testing set. The aphorism “garbage in, garbage out” remains true as the use of inadequate or unvalidated data points (e.g., unverified clinical reports from electronic health record) in the training set can lead to poor performance of the ML algorithm in the testing set. Hence, it is important to note that the algorithmic decision-making tools do not guarantee accurate and unbiased interpretation compared to established logistic regression models (56, 59, 93). Comparisons to well-established models should be standard when developing new ML algorithms given the high cost associated with ML (e.g., the time required to collect data, train the model, perform internal and external validations, cost of reliable and secure data storage, etc.) (94). Specifically, as it relates to diagnostics there are a myriad of considerations that must be taken into account. Not only should the algorithm provide accurate information quickly, but it should have the ability to integrate into the electornic health record (EHR) to improve end user experience and efficiency in workflow. Programs such as RAPID®,®, and Brainomix® have started to successfully integrate into the EHR, which has helped expedite acute stroke diagnosis and triage process. One of the major technical challenges of ML include the ability to develop an algorithm with a “reasonable” detection rate of pathology without an excessive rate of false-positives. For example, there are notable discrepancies among various ML studies for ICH diagnosis, with varying accuracy depending on the type of ICH (e.g., spontaneous ICH, SDH, aSAH, or IVH). Overfitting and underfitting of the model could lead to poor applicability and therefore, image preprocessing with meticulous feature selection is necessary. Furthermore, the “black-box” nature of ML precludes the clinicians from identifying and addressing biases within the algorithms (95, 96). Hence, proper external validation is necessary to ensure generalizability of the algorithm in diverse clinical scenarios.

For stroke prediction, most existing ML algorithms utilize dichotomized outcomes. Functional outcome is frequently defined as “good” when mRS score is 0–2 and “poor” when mRS score is 3–6 by convention and IS studies often measure mRS score at 90 days after stroke (64–69, 97). However, the medical community is increasingly embracing patient-centered outcomes. People are starting to recognize the need for longitudinal patient follow-up given potential for functional improvement beyond conventional norms of 90 days (98). Once patient-centered outcomes are clinically validated (e.g., MRS cutoff of 0–2 vs. 3–6, 0–3 vs. 4–6, or 0–4 vs. 5–6), new ML algorithms incorporating such outcomes would be increasingly helpful to the clinicians. The use of high-yield, ML programs using patient-centered outcomes could ease the commonplace but challenging discussions of the anticipated quality of life and the risk of long-term dependency or death before deciding on a patient's goals-of-care. It is however important to apply caution while using ML algorithms for outcome prediction as patient demographics and clinical practice continue to evolve and updates to the ML algorithms would be necessary to remain applicable to evolving patient populations and clinical standards. Additionally, developers often retrieve data from existing datasets (e.g., clinical trial data) with its inherent biases including selection bias, observer bias and other confounders (e.g., withdrawal of life supporting therapy may be more common in older patients with large hemispheric stroke compared to younger patients, which could confound outcome prediction in older patients compared to younger ones).

Overall, compared to other diseases such as Alzheimer's disease, there is a relative paucity of large, high-quality datasets within stroke. Some limitations that have stymied the development of large, open-access stroke registries include the need for data-sharing agreements, patient privacy concerns, high costs of data storage and security, arbitration of quality control of the input data, etc. (95). Cohesive and collaborative efforts across hospital systems, regions, and nations with data acquisition and harmonization is needed to improve future ML-based programs in stroke. With adoption of EHR systems, healthcare data is rapidly accumulating with an estimated over 35 zettabytes of existing healthcare data! (99). Adoption of AI and ML algorithms allow us to efficiently process the plethora of information that surround us every day. Nonetheless, as we continue to adapt to this evolving landscape of medical practice surrounding big data, clinicians need to remain aware of the limitations of this modern day “black box” magic.


The emerging ML technology has rapidly integrated into multiple fields of medicine including stroke. Deep learning has significantly enhanced practical applications of ML and some newer algorithms are known to have comparable accuracy to humans. However, the diagnosis and prognosis of a disease, including stroke, is highly intricate and depends on various clinical and personal factors. The development of optimal ML programs requires comprehensive data collection and assimilation to improve diagnostic and prognostic accuracy. Given the “black box” or cryptic nature of these algorithms, it is extremely important for the end-user (i.e., clinicians) to understand the intended use and limitations of any ML algorithm to avoid inaccurate data interpretation. Although ML algorithms have improved stroke systems of care, blind dependence on such computerized technology may lead to misdiagnosis or inaccurate prediction of prognostic trajectories. At the current state, ML tools are best used as “aids” for clinical decision making while still requiring oversight to address relevant clinical aspects that are overlooked by the algorithm.

Author Contributions

SM: substantial contributions including conception and design of the work, literature review, interpretation and summarization of data, drafting the complete manuscript, revising it critically for important intellectual content, and final approval of the manuscript to be published. MD and KS: contribution including conception and design of the work, literature review, interpretation and summarization of the data, drafting of critical portion of the manuscript, critical revision for important intellectual content, and final approval of the manuscript. All authors contributed to the article and approved the submitted version.


This article was supported by the Virginia Commonwealth University, Department of Neurology.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher's Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary Material

The Supplementary Material for this article can be found online at: