An automated machine learning approach to language changes in Alzheimer’s disease and frontotemporal dementia across Latino and English-speaking populations

Alzheimer’s disease (AD) and frontotemporal dementia (FTD) are major causes of disability and death in the US1 and in its main immigration source, Latin America. Domestically and globally, dementia prevalence is unevenly high among Latinos, the US’s largest and fastest growing minority. Worryingly, due to financial inequities, Latinos are challenged to access standard diagnostic and monitoring procedures. The scenario is further complicated by their socio-biological and phenotypic diversity, which affects dementia presentation, and by the limited availability of culturally valid tests. Thus, inexpensive, objective, scalable, culturally-sensitive tools are needed to assist differential diagnosis and to capture neurocognitive alterations in this vulnerable population. Automated speech analyses are a low-cost innovation in mental health research. All participants need to do is speak (e.g., describing pictures), thereby producing multiple acoustic (sound wave) and linguistic (e.g., semantic) data that can be digitally extracted. In emerging machine-learning studies, speech markers identify AD and FTD participants with up to 97% accuracy and correlate with cognitive and neural disruptions. While most reports target English speakers, those on Spanish-speaking Latinos have prioritized other diseases, and none has compared its results with standard linguistic, cognitive or imaging measures. Also, findings come from small cohorts with poor control of socio-biological factors (e.g., sex, race, education) and no concern for validity across languages, dialects or linguistic profiles. These are key aspects when studying diverse, sub-represented groups, requiring vast international recruitment and strict standardization/normalization methods. Our long-term goal is to forge an automated speech analysis framework to identify Spanish-speaking Latinos with AD and FTD and predict their neurocognitive disruptions in heterogeneous groups. To meet requisites for robust machine learning while ensuring linguistic and socio-biological diversity, we will harness a unique cohort of 3300 participants, comprising 825 with typical AD, 825 with FTD [including behavioral variant FTD (bvFTD) and non-fluent, semantic or logopenic variants of primary progressive aphasia (nfvPPA, svPPA, lvPPA)], and 1650 controls. The Consortium to Expand Dementia Research in Latin America (ReDLat), a multi-partner-funded network for multimodal dementia research, is already characterizing 3000 Spanish speakers from five countries and 500 English speakers from UCSF. UCSF also contributes 100 US-based Spanish speakers for testing an immigrant speech community. The Global Brain Health Institute (GBHI), a dementia training hub at UCSF, hosts expert clinicians from all sites. This is an unmatched opportunity for massive, culturally valid speech testing with huge cost savings. Our design enables robust generalizations across countries via cutting-edge machine learning tools, overcoming cultural and financial challenges to large, heterogeneous recruitment of US-based Latinos.

PI: Agustín Ibáñez

Support: National Institutes of Health (NIH), USA.