Skip to content

Latest commit

 

History

History
604 lines (589 loc) · 128 KB

README.md

File metadata and controls

604 lines (589 loc) · 128 KB

Updated on 2025-02-21

Brain

Publish Date Title Authors URL Abstract
2025-02-19 Freezing of Gait as a Complication of Pallidal Deep Brain Stimulation in DYT- KMT2B Patients with Evidence of Striatonigral Degeneration Laura Cif, Diane Demailly, Xavier Vasques, Delphine de Verbizier, Philippe Coubes, Kathleen Gorman, Manju A Kurian Link Background: Mutations in KMT2B are a recognized cause of early-onset complex dystonia, with deep brain stimulation (DBS) of the internal globus pallidus (GPi-DBS) being an effective treatment. However, gait impairment, particularly freezing of gait (FOG), remains a significant challenge in DYT-KMT2B patients post-DBS. Objectives: To characterize the emergence of FOG in DYT-KMT2B patients treated with GPi-DBS and explore potential underlying mechanisms, including striatonigral degeneration. Methods: Five patients (four females) with KMT2B-related dystonia and protein-truncating variants (PTVs) were retrospectively analyzed. Clinical progression, response to GPi-DBS, and the presence of FOG were documented. Dopaminergic function was assessed using DaTscan (SPECT for ^123I-ioflupane) in four patients. Results: FOG developed in all patients, with onset ranging from 1 to 15.5 years post-DBS. DaTscan abnormalities, indicative of bilateral striatal dopaminergic denervation, were observed in four cases. Prior to DBS, all patients exhibited dystonia unresponsive to L-dopa, and post-DBS, FOG remained refractory to dopaminergic treatment in most cases. Despite initial improvements in gait post-DBS, only one patient maintained independent ambulation at the last follow-up. Conclusions: FOG is an emerging complication in DYT-KMT2B patients with PTVs undergoing GPi-DBS, potentially linked to underlying striatonigral degeneration. The findings suggest a need for long-term motor surveillance and consideration of alternative therapeutic strategies, including dopaminergic trials, in this patient population. Further studies are required to elucidate the precise mechanisms driving DBS-related hypokinetic gait disturbances in DYT-KMT2B dystonia.
2025-02-19 Emergence of the Primacy Effect in Structured State-Space Models Takashi Morita Link Human and animal memory for sequentially presented items is well-documented to be more accurate for those at the beginning and end of a sequence, phenomena known as the primacy and recency effects, respectively. By contrast, artificial neural network (ANN) models are typically designed with a memory that decays monotonically over time. Accordingly, ANNs are expected to show the recency effect but not the primacy effect. Contrary to this theoretical expectation, however, the present study reveals a counterintuitive finding: a recently developed ANN architecture, called structured state-space models, exhibits the primacy effect when trained and evaluated on a synthetic task that mirrors psychological memory experiments. Given that this model was originally designed for recovering neuronal activity patterns observed in biological brains, this result provides a novel perspective on the psychological primacy effect while also posing a non-trivial puzzle for the current theories in machine learning.
2025-02-19 MoM: Linear Sequence Modeling with Mixture-of-Memories Jusen Du, Weigao Sun, Disen Lan, Jiaxi Hu, Yu Cheng Link Linear sequence modeling methods, such as linear attention, state space modeling, and linear RNNs, offer significant efficiency improvements by reducing the complexity of training and inference. However, these methods typically compress the entire input sequence into a single fixed-size memory state, which leads to suboptimal performance on recall-intensive downstream tasks. Drawing inspiration from neuroscience, particularly the brain's ability to maintain robust long-term memory while mitigating "memory interference", we introduce a novel architecture called Mixture-of-Memories (MoM). MoM utilizes multiple independent memory states, with a router network directing input tokens to specific memory states. This approach greatly enhances the overall memory capacity while minimizing memory interference. As a result, MoM performs exceptionally well on recall-intensive tasks, surpassing existing linear sequence modeling techniques. Despite incorporating multiple memory states, the computation of each memory state remains linear in complexity, allowing MoM to retain the linear-complexity advantage during training, while constant-complexity during inference. Our experimental results show that MoM significantly outperforms current linear sequence models on downstream language tasks, particularly recall-intensive tasks, and even achieves performance comparable to Transformer models. The code is released at https://github.com/OpenSparseLLMs/MoM and is also released as a part of https://github.com/OpenSparseLLMs/Linear-MoE.
2025-02-19 Long-term follow-up of DYT1 dystonia patients treated by deep brain stimulation: an open-label study Laura Cif, Xavier Vasques, Victoria Gonzalez, Patrice Ravel, Brigitte Biolsi, Gwenaelle Collod-Beroud, Sylvie Tuffery-Giraud, Hassan Elfertit, Mireille Claustres, Philippe Coubes Link Long-term efficacy of internal globus pallidus (GPi) deep-brain stimulation (DBS) in DYT1 dystonia and disease progression under DBS was studied. Twenty-six patients of this open-label study were divided into two groups: (A) with single bilateral GPi lead, (B) with a second bilateral GPi lead implanted owning to subsequent worsening of symptomatology. Dystonia was assessed with the Burke Scale. Appearance of new symptoms and distribution according to body region were recorded. In the whole cohort, significant decreases in motor and disability subscores (P < 0.0001) were observed at 1 year and maintained up to 10 years. Group B showed worsening of the symptoms. At 1 year, there were no significant differences between Groups A (without subsequent worsening) and B; at 5 years, a significant difference was found for motor and disability scores. Within Group B, four patients exhibited additional improvement after the second DBS surgery. In the 26 patients, significant difference (P = 0.001) was found between the number of body regions affected by dystonia preoperatively and over the whole follow-up. DBS efficacy in DYT1 dystonia can be maintained up to 10 years (two patients). New symptoms appear with long-term follow-up and may improve with additional leads in a subgroup of patients.
2025-02-19 LaVCa: LLM-assisted Visual Cortex Captioning Takuya Matsuyama, Shinji Nishimoto, Yu Takagi Link Understanding the property of neural populations (or voxels) in the human brain can advance our comprehension of human perceptual and cognitive processing capabilities and contribute to developing brain-inspired computer models. Recent encoding models using deep neural networks (DNNs) have successfully predicted voxel-wise activity. However, interpreting the properties that explain voxel responses remains challenging because of the black-box nature of DNNs. As a solution, we propose LLM-assisted Visual Cortex Captioning (LaVCa), a data-driven approach that uses large language models (LLMs) to generate natural-language captions for images to which voxels are selective. By applying LaVCa for image-evoked brain activity, we demonstrate that LaVCa generates captions that describe voxel selectivity more accurately than the previously proposed method. Furthermore, the captions generated by LaVCa quantitatively capture more detailed properties than the existing method at both the inter-voxel and intra-voxel levels. Furthermore, a more detailed analysis of the voxel-specific properties generated by LaVCa reveals fine-grained functional differentiation within regions of interest (ROIs) in the visual cortex and voxels that simultaneously represent multiple distinct concepts. These findings offer profound insights into human visual representations by assigning detailed captions throughout the visual cortex while highlighting the potential of LLM-based methods in understanding brain representations. Please check out our webpage at https://sites.google.com/view/lavca-llm/
2025-02-19 Improving the Sparse Structure Learning of Spiking Neural Networks from the View of Compression Efficiency Jiangrong Shen, Qi Xu, Gang Pan, Badong Chen Link The human brain utilizes spikes for information transmission and dynamically reorganizes its network structure to boost energy efficiency and cognitive capabilities throughout its lifespan. Drawing inspiration from this spike-based computation, Spiking Neural Networks (SNNs) have been developed to construct event-driven models that emulate this efficiency. Despite these advances, deep SNNs continue to suffer from over-parameterization during training and inference, a stark contrast to the brain's ability to self-organize. Furthermore, existing sparse SNNs are challenged by maintaining optimal pruning levels due to a static pruning ratio, resulting in either under- or over-pruning. In this paper, we propose a novel two-stage dynamic structure learning approach for deep SNNs, aimed at maintaining effective sparse training from scratch while optimizing compression efficiency. The first stage evaluates the compressibility of existing sparse subnetworks within SNNs using the PQ index, which facilitates an adaptive determination of the rewiring ratio for synaptic connections based on data compression insights. In the second stage, this rewiring ratio critically informs the dynamic synaptic connection rewiring process, including both pruning and regrowth. This approach significantly improves the exploration of sparse structure training in deep SNNs, adapting sparsity dynamically from the point view of compression efficiency. Our experiments demonstrate that this sparse training approach not only aligns with the performance of current deep SNNs models but also significantly improves the efficiency of compressing sparse SNNs. Crucially, it preserves the advantages of initiating training with sparse models and offers a promising solution for implementing edge AI on neuromorphic hardware.
2025-02-19 Dynamic directed functional connectivity as a neural biomarker for objective motor skill assessment Anil Kamat, Rahul Rahul, Anirban Dutta, Lora Cavuoto, Uwe Kruger, Harry Burke, Matthew Hackett, Jack Norfleet, Steven Schwaitzberg, Suvranu De Link Objective motor skill assessment plays a critical role in fields such as surgery, where proficiency is vital for certification and patient safety. Existing assessment methods, however, rely heavily on subjective human judgment, which introduces bias and limits reproducibility. While recent efforts have leveraged kinematic data and neural imaging to provide more objective evaluations, these approaches often overlook the dynamic neural mechanisms that differentiate expert and novice performance. This study proposes a novel method for motor skill assessment based on dynamic directed functional connectivity (dFC) as a neural biomarker. By using electroencephalography (EEG) to capture brain dynamics and employing an attention-based Long Short-Term Memory (LSTM) model for non-linear Granger causality analysis, we compute dFC among key brain regions involved in psychomotor tasks. Coupled with hierarchical task analysis (HTA), our approach enables subtask-level evaluation of motor skills, offering detailed insights into neural coordination that underpins expert proficiency. A convolutional neural network (CNN) is then used to classify skill levels, achieving greater accuracy and specificity than established performance metrics in laparoscopic surgery. This methodology provides a reliable, objective framework for assessing motor skills, contributing to the development of tailored training protocols and enhancing the certification process.
2025-02-18 Identifying rapid changes in the hemodynamic response in event-related functional magnetic resonance imaging Friederike Preusse, Thorsten Dickhaus, André Brechmann Link The hemodynamic response (HR) in event-related functional magnetic resonance imaging is typically assumed to be stationary. While there are some approaches in the literature to model nonstationary HRs, few focus on rapid changes. In this work, we propose two procedures to investigate rapid changes in the HR. Both procedures make inference on the existence of rapid changes for multi-subject data. We allow the change point locations to vary between subjects, conditions and brain regions. The first procedure utilizes available information about the change point locations to compare multiple shape parameters of the HR over time. In the second procedure, the change point locations are determined for each subject separately. To account for the estimation of the change point locations, we propose the notion of post selection variance. The power of the proposed procedures is assessed in simulation studies. We apply the procedure for pre-specified change point locations to data from a category learning experiment.
2025-02-18 Neuro-oscillatory models of cortical speech processing Olesia Dogonasheva, Denis Zakharov, Anne-Lise Giraud, Boris Gutkin Link In this review, we examine computational models that explore the role of neural oscillations in speech perception, spanning from early auditory processing to higher cognitive stages. We focus on models that use rhythmic brain activities, such as gamma, theta, and delta oscillations, to encode phonemes, segment speech into syllables and words, and integrate linguistic elements to infer meaning. We analyze the mechanisms underlying these models, their biological plausibility, and their potential applications in processing and understanding speech in real time, a computational feature that is achieved by the human brain but not yet implemented in speech recognition models. Real-time processing enables dynamic adaptation to incoming speech, allowing systems to handle the rapid and continuous flow of auditory information required for effective communication, interactive applications, and accurate speech recognition in a variety of real-world settings. While significant progress has been made in modeling the neural basis of speech perception, challenges remain, particularly in accounting for the complexity of semantic processing and the integration of contextual influences. Moreover, the high computational demands of biologically realistic models pose practical difficulties for their implementation and analysis. Despite these limitations, these models provide valuable insights into the neural mechanisms of speech perception. We conclude by identifying current limitations, proposing future research directions, and suggesting how these models can be further developed to achieve a more comprehensive understanding of speech processing in the human brain.
2025-02-18 Mind the Gap: Aligning the Brain with Language Models Requires a Nonlinear and Multimodal Approach Danny Dongyeop Han, Yunju Cho, Jiook Cha, Jay-Yoon Lee Link Self-supervised language and audio models effectively predict brain responses to speech. However, traditional prediction models rely on linear mappings from unimodal features, despite the complex integration of auditory signals with linguistic and semantic information across widespread brain networks during speech comprehension. Here, we introduce a nonlinear, multimodal prediction model that combines audio and linguistic features from pre-trained models (e.g., LLAMA, Whisper). Our approach achieves a 17.2% and 17.9% improvement in prediction performance (unnormalized and normalized correlation) over traditional unimodal linear models, as well as a 7.7% and 14.4% improvement, respectively, over prior state-of-the-art models. These improvements represent a major step towards future robust in-silico testing and improved decoding performance. They also reveal how auditory and semantic information are fused in motor, somatosensory, and higher-level semantic regions, aligning with existing neurolinguistic theories. Overall, our work highlights the often neglected potential of nonlinear and multimodal approaches to brain modeling, paving the way for future studies to embrace these strategies in naturalistic neurolinguistics research.

EEG

Publish Date Title Authors URL Abstract
2025-02-19 Dynamic directed functional connectivity as a neural biomarker for objective motor skill assessment Anil Kamat, Rahul Rahul, Anirban Dutta, Lora Cavuoto, Uwe Kruger, Harry Burke, Matthew Hackett, Jack Norfleet, Steven Schwaitzberg, Suvranu De Link Objective motor skill assessment plays a critical role in fields such as surgery, where proficiency is vital for certification and patient safety. Existing assessment methods, however, rely heavily on subjective human judgment, which introduces bias and limits reproducibility. While recent efforts have leveraged kinematic data and neural imaging to provide more objective evaluations, these approaches often overlook the dynamic neural mechanisms that differentiate expert and novice performance. This study proposes a novel method for motor skill assessment based on dynamic directed functional connectivity (dFC) as a neural biomarker. By using electroencephalography (EEG) to capture brain dynamics and employing an attention-based Long Short-Term Memory (LSTM) model for non-linear Granger causality analysis, we compute dFC among key brain regions involved in psychomotor tasks. Coupled with hierarchical task analysis (HTA), our approach enables subtask-level evaluation of motor skills, offering detailed insights into neural coordination that underpins expert proficiency. A convolutional neural network (CNN) is then used to classify skill levels, achieving greater accuracy and specificity than established performance metrics in laparoscopic surgery. This methodology provides a reliable, objective framework for assessing motor skills, contributing to the development of tailored training protocols and enhancing the certification process.
2025-02-18 Dimension reduction methods, persistent homology and machine learning for EEG signal analysis of Interictal Epileptic Discharges Annika Stiehl, Stefan Geißelsöder, Nicole Ille, Fabienne Anselstetter, Harald Bornfleth, Christian Uhl Link Recognizing specific events in medical data requires trained personnel. To aid the classification, machine learning algorithms can be applied. In this context, medical records are usually high-dimensional, although a lower dimension can also reflect the dynamics of the signal. In this study, electroencephalogram data with Interictal Epileptic Discharges (IEDs) are investigated. First, the dimensions are reduced using Dynamical Component Analysis (DyCA) and Principal Component Analysis (PCA), respectively. The reduced data are examined using topological data analysis (TDA), specifically using a persistent homology algorithm. The persistent homology results are used for targeted feature generation. The features are used to train and evaluate a Support Vector Machine (SVM) to distinguish IEDs from background activities.
2025-02-18 A Survey on Bridging EEG Signals and Generative AI: From Image and Text to Beyond Shreya Shukla, Jose Torres, Abhijit Mishra, Jacek Gwizdka, Shounak Roychowdhury Link Integration of Brain-Computer Interfaces (BCIs) and Generative Artificial Intelligence (GenAI) has opened new frontiers in brain signal decoding, enabling assistive communication, neural representation learning, and multimodal integration. BCIs, particularly those leveraging Electroencephalography (EEG), provide a non-invasive means of translating neural activity into meaningful outputs. Recent advances in deep learning, including Generative Adversarial Networks (GANs) and Transformer-based Large Language Models (LLMs), have significantly improved EEG-based generation of images, text, and speech. This paper provides a literature review of the state-of-the-art in EEG-based multimodal generation, focusing on (i) EEG-to-image generation through GANs, Variational Autoencoders (VAEs), and Diffusion Models, and (ii) EEG-to-text generation leveraging Transformer based language models and contrastive learning methods. Additionally, we discuss the emerging domain of EEG-to-speech synthesis, an evolving multimodal frontier. We highlight key datasets, use cases, challenges, and EEG feature encoding methods that underpin generative approaches. By providing a structured overview of EEG-based generative AI, this survey aims to equip researchers and practitioners with insights to advance neural decoding, enhance assistive technologies, and expand the frontiers of brain-computer interaction.
2025-02-17 Early Detection of Human Handover Intentions in Human-Robot Collaboration: Comparing EEG, Gaze, and Hand Motion Parag Khanna, Nona Rajabi, Sumeyra U. Demir Kanik, Danica Kragic, Mårten Björkman, Christian Smith Link Human-robot collaboration (HRC) relies on accurate and timely recognition of human intentions to ensure seamless interactions. Among common HRC tasks, human-to-robot object handovers have been studied extensively for planning the robot's actions during object reception, assuming the human intention for object handover. However, distinguishing handover intentions from other actions has received limited attention. Most research on handovers has focused on visually detecting motion trajectories, which often results in delays or false detections when trajectories overlap. This paper investigates whether human intentions for object handovers are reflected in non-movement-based physiological signals. We conduct a multimodal analysis comparing three data modalities: electroencephalogram (EEG), gaze, and hand-motion signals. Our study aims to distinguish between handover-intended human motions and non-handover motions in an HRC setting, evaluating each modality's performance in predicting and classifying these actions before and after human movement initiation. We develop and evaluate human intention detectors based on these modalities, comparing their accuracy and timing in identifying handover intentions. To the best of our knowledge, this is the first study to systematically develop and test intention detectors across multiple modalities within the same experimental context of human-robot handovers. Our analysis reveals that handover intention can be detected from all three modalities. Nevertheless, gaze signals are the earliest as well as the most accurate to classify the motion as intended for handover or non-handover.
2025-02-15 Hybrid Brain-Machine Interface: Integrating EEG and EMG for Reduced Physical Demand Daniel Wang, Katie Hong, Zachary Sayyah, Malcolm Krolick, Emma Steinberg, Rohan Venkatdas, Sidharth Pavuluri, Yipeng Wang, Zihan Huang Link We present a hybrid brain-machine interface (BMI) that integrates steady-state visually evoked potential (SSVEP)-based EEG and facial EMG to improve multimodal control and mitigate fatigue in assistive applications. Traditional BMIs relying solely on EEG or EMG suffer from inherent limitations; EEG-based control requires sustained visual focus, leading to cognitive fatigue, while EMG-based control induces muscular fatigue over time. Our system dynamically alternates between EEG and EMG inputs, using EEG to detect SSVEP signals at 9.75 Hz and 14.25 Hz and EMG from cheek and neck muscles to optimize control based on task demands. In a virtual turtle navigation task, the hybrid system achieved task completion times comparable to an EMG-only approach, while 90% of users reported reduced or equal physical demand. These findings demonstrate that multimodal BMI systems can enhance usability, reduce strain, and improve long-term adherence in assistive technologies.
2025-02-13 Revisiting Euclidean Alignment for Transfer Learning in EEG-Based Brain-Computer Interfaces Dongrui Wu Link Due to the non-stationarity and large individual differences of EEG signals, EEG-based brain-computer interfaces (BCIs) usually need subject-specific calibration to tailor the decoding algorithm for each new subject, which is time-consuming and user-unfriendly, hindering their real-world applications. Transfer learning (TL) has been extensively used to expedite the calibration, by making use of EEG data from other subjects/sessions. An important consideration in TL for EEG-based BCIs is to reduce the data distribution discrepancies among different subjects/session, to avoid negative transfer. Euclidean alignment (EA) was proposed in 2020 to address this challenge. Numerous experiments from 10 different BCI paradigms demonstrated its effectiveness and efficiency. This paper revisits the EA, explaining its procedure and correct usage, introducing its applications and extensions, and pointing out potential new research directions. It should be very helpful to BCI researchers, especially those who are working on EEG signal decoding.
2025-02-12 Deep EEG Super-Resolution: Upsampling EEG Spatial Resolution with Generative Adversarial Networks Isaac Corley, Yufei Huang Link Electroencephalography (EEG) activity contains a wealth of information about what is happening within the human brain. Recording more of this data has the potential to unlock endless future applications. However, the cost of EEG hardware is increasingly expensive based upon the number of EEG channels being recorded simultaneously. We combat this problem in this paper by proposing a novel deep EEG super-resolution (SR) approach based on Generative Adversarial Networks (GANs). This approach can produce high spatial resolution EEG data from low resolution samples, by generating channel-wise upsampled data to effectively interpolate numerous missing channels, thus reducing the need for expensive EEG equipment. We tested the performance using an EEG dataset from a mental imagery task. Our proposed GAN model provided 10^4 fold and 10^2 fold reduction in mean-squared error (MSE) and mean-absolute error (MAE), respectively, over the baseline bicubic interpolation method. We further validate our method by training a classifier on the original classification task, which displayed minimal loss in accuracy while using the super-resolved data. The proposed SR EEG by GAN is a promising approach to improve the spatial resolution of low density EEG headsets.
2025-02-12 EEG Artifact Detection and Correction with Deep Autoencoders David Aquilué-Llorens, Aureli Soria-Frisch Link EEG signals convey important information about brain activity both in healthy and pathological conditions. However, they are inherently noisy, which poses significant challenges for accurate analysis and interpretation. Traditional EEG artifact removal methods, while effective, often require extensive expert intervention. This study presents LSTEEG, a novel LSTM-based autoencoder designed for the detection and correction of artifacts in EEG signals. Leveraging deep learning, particularly LSTM layers, LSTEEG captures non-linear dependencies in sequential EEG data. LSTEEG demonstrates superior performance in both artifact detection and correction tasks compared to other state-of-the-art convolutional autoencoders. Our methodology enhances the interpretability and utility of the autoencoder's latent space, enabling data-driven automated artefact removal in EEG its application in downstream tasks. This research advances the field of efficient and accurate multi-channel EEG preprocessing, and promotes the implementation and usage of automated EEG analysis pipelines for brain health applications.
2025-02-15 From Brainwaves to Brain Scans: A Robust Neural Network for EEG-to-fMRI Synthesis Kristofer Grover Roos, Atsushi Fukuda, Quan Huu Cap Link While functional magnetic resonance imaging (fMRI) offers rich spatial resolution, it is limited by high operational costs and significant infrastructural demands. In contrast, electroencephalography (EEG) provides millisecond-level precision in capturing electrical activity but lacks the spatial resolution necessary for precise neural localization. To bridge these gaps, we introduce E2fNet, a simple yet effective deep learning model for synthesizing fMRI images from low-cost EEG data. E2fNet is specifically designed to capture and translate meaningful features from EEG across electrode channels into accurate fMRI representations. Extensive evaluations across three datasets demonstrate that E2fNet consistently outperforms existing methods, achieving state-of-the-art results in terms of the structural similarity index measure (SSIM). Our findings suggest that E2fNet is a promising, cost-effective solution for enhancing neuroimaging capabilities. The code is available at https://github.com/kgr20/E2fNet.
2025-02-18 From Thought to Action: How a Hierarchy of Neural Dynamics Supports Language Production Mingfang Zhang, Jarod Lévy, Stéphane d'Ascoli, Jérémy Rapin, F. -Xavier Alario, Pierre Bourdillon, Svetlana Pinet, Jean-Rémi King Link Humans effortlessly communicate their thoughts through intricate sequences of motor actions. Yet, the neural processes that coordinate language production remain largely unknown, in part because speech artifacts limit the use of neuroimaging. To elucidate the unfolding of language production in the brain, we investigate with magnetoencephalography (MEG) and electroencephalography (EEG) the neurophysiological activity of 35 skilled typists, while they typed sentences on a keyboard. This approach confirms the hierarchical predictions of linguistic theories: the neural activity preceding the production of each word is marked by the sequential rise and fall of context-, word-, syllable-, and letter-level representations. Remarkably, each of these neural representations is maintained over long time periods within each level of the language hierarchy. This phenomenon results in a superposition of successive representations that is supported by a hierarchy of dynamic neural codes. Overall, these findings provide a precise computational breakdown of the neural dynamics that coordinate the production of language in the human brain.

BCI

Publish Date Title Authors URL Abstract
2025-02-18 A Survey on Bridging EEG Signals and Generative AI: From Image and Text to Beyond Shreya Shukla, Jose Torres, Abhijit Mishra, Jacek Gwizdka, Shounak Roychowdhury Link Integration of Brain-Computer Interfaces (BCIs) and Generative Artificial Intelligence (GenAI) has opened new frontiers in brain signal decoding, enabling assistive communication, neural representation learning, and multimodal integration. BCIs, particularly those leveraging Electroencephalography (EEG), provide a non-invasive means of translating neural activity into meaningful outputs. Recent advances in deep learning, including Generative Adversarial Networks (GANs) and Transformer-based Large Language Models (LLMs), have significantly improved EEG-based generation of images, text, and speech. This paper provides a literature review of the state-of-the-art in EEG-based multimodal generation, focusing on (i) EEG-to-image generation through GANs, Variational Autoencoders (VAEs), and Diffusion Models, and (ii) EEG-to-text generation leveraging Transformer based language models and contrastive learning methods. Additionally, we discuss the emerging domain of EEG-to-speech synthesis, an evolving multimodal frontier. We highlight key datasets, use cases, challenges, and EEG feature encoding methods that underpin generative approaches. By providing a structured overview of EEG-based generative AI, this survey aims to equip researchers and practitioners with insights to advance neural decoding, enhance assistive technologies, and expand the frontiers of brain-computer interaction.
2025-02-18 An Innovative Brain-Computer Interface Interaction System Based on the Large Language Model Jing Jin, Yutao Zhang, Ruitian Xu, Yixin Chen Link Recent advancements in large language models (LLMs) provide a more effective pathway for upgrading brain-computer interface (BCI) technology in terms of user interaction. The widespread adoption of BCIs in daily application scenarios is still limited by factors such as their single functionality, restricted paradigm design, weak multilingual support, and low levels of intelligence. In this paper, we propose an innovative BCI system that deeply integrates a steady-state visual evoked potential (SSVEP) speller with an LLM application programming interface (API). It allows natural language input through the SSVEP speller and dynamically calls large models to generate SSVEP paradigms. The command prompt, blinking frequency, and layout position are adjustable to meet the user's control requirements in various scenarios. More than ten languages are compatible with the multilingual support of LLM. A variety of task scenarios, such as home appliance control, robotic arm operation, and unmanned aerial vehicle (UAV) management are provided. The task interfaces of the system can be personalized according to the user's habits, usage scenarios, and equipment characteristics. By combining the SSVEP speller with an LLM, the system solves numerous challenges faced by current BCI systems and makes breakthroughs in functionality, intelligence, and multilingual support. The introduction of LLM not only enhances user experience but also expands the potential applications of BCI technology in real-world environments.
2025-02-16 SSVEP-BiMA: Bifocal Masking Attention Leveraging Native and Symmetric-Antisymmetric Components for Robust SSVEP Decoding Yuxin Liu, Zhenxi Song, Guoyang Xu, Zirui Wang, Feng Wan, Yong Hu, Min Zhang, Zhiguo Zhang Link Brain-computer interface (BCI) based on steady-state visual evoked potentials (SSVEP) is a popular paradigm for its simplicity and high information transfer rate (ITR). Accurate and fast SSVEP decoding is crucial for reliable BCI performance. However, conventional decoding methods demand longer time windows, and deep learning models typically require subject-specific fine-tuning, leaving challenges in achieving optimal performance in cross-subject settings. This paper proposed a biofocal masking attention-based method (SSVEP-BiMA) that synergistically leverages the native and symmetric-antisymmetric components for decoding SSVEP. By utilizing multiple signal representations, the network is able to integrate features from a wider range of sample perspectives, leading to more generalized and comprehensive feature learning, which enhances both prediction accuracy and robustness. We performed experiments on two public datasets, and the results demonstrate that our proposed method surpasses baseline approaches in both accuracy and ITR. We believe that this work will contribute to the development of more efficient SSVEP-based BCI systems.
2025-02-13 Revisiting Euclidean Alignment for Transfer Learning in EEG-Based Brain-Computer Interfaces Dongrui Wu Link Due to the non-stationarity and large individual differences of EEG signals, EEG-based brain-computer interfaces (BCIs) usually need subject-specific calibration to tailor the decoding algorithm for each new subject, which is time-consuming and user-unfriendly, hindering their real-world applications. Transfer learning (TL) has been extensively used to expedite the calibration, by making use of EEG data from other subjects/sessions. An important consideration in TL for EEG-based BCIs is to reduce the data distribution discrepancies among different subjects/session, to avoid negative transfer. Euclidean alignment (EA) was proposed in 2020 to address this challenge. Numerous experiments from 10 different BCI paradigms demonstrated its effectiveness and efficiency. This paper revisits the EA, explaining its procedure and correct usage, introducing its applications and extensions, and pointing out potential new research directions. It should be very helpful to BCI researchers, especially those who are working on EEG signal decoding.
2025-02-12 Uncertainty Aware Human-machine Collaboration in Camouflaged Object Detection Ziyue Yang, Kehan Wang, Yuhang Ming, Yong Peng, Han Yang, Qiong Chen, Wanzeng Kong Link Camouflaged Object Detection (COD), the task of identifying objects concealed within their environments, has seen rapid growth due to its wide range of practical applications. A key step toward developing trustworthy COD systems is the estimation and effective utilization of uncertainty. In this work, we propose a human-machine collaboration framework for classifying the presence of camouflaged objects, leveraging the complementary strengths of computer vision (CV) models and noninvasive brain-computer interfaces (BCIs). Our approach introduces a multiview backbone to estimate uncertainty in CV model predictions, utilizes this uncertainty during training to improve efficiency, and defers low-confidence cases to human evaluation via RSVP-based BCIs during testing for more reliable decision-making. We evaluated the framework in the CAMO dataset, achieving state-of-the-art results with an average improvement of 4.56\% in balanced accuracy (BA) and 3.66\% in the F1 score compared to existing methods. For the best-performing participants, the improvements reached 7.6\% in BA and 6.66\% in the F1 score. Analysis of the training process revealed a strong correlation between our confidence measures and precision, while an ablation study confirmed the effectiveness of the proposed training policy and the human-machine collaboration strategy. In general, this work reduces human cognitive load, improves system reliability, and provides a strong foundation for advancements in real-world COD applications and human-computer interaction. Our code and data are available at: https://github.com/ziyuey/Uncertainty-aware-human-machine-collaboration-in-camouflaged-object-identification.
2025-02-07 Geometric Machine Learning on EEG Signals Benjamin J. Choi Link Brain-computer interfaces (BCIs) offer transformative potential, but decoding neural signals presents significant challenges. The core premise of this paper is built around demonstrating methods to elucidate the underlying low-dimensional geometric structure present in high-dimensional brainwave data in order to assist in downstream BCI-related neural classification tasks. We demonstrate two pipelines related to electroencephalography (EEG) signal processing: (1) a preliminary pipeline removing noise from individual EEG channels, and (2) a downstream manifold learning pipeline uncovering geometric structure across networks of EEG channels. We conduct preliminary validation using two EEG datasets and situate our demonstration in the context of the BCI-relevant imagined digit decoding problem. Our preliminary pipeline uses an attention-based EEG filtration network to extract clean signal from individual EEG channels. Our primary pipeline uses a fast Fourier transform, a Laplacian eigenmap, a discrete analog of Ricci flow via Ollivier's notion of Ricci curvature, and a graph convolutional network to perform dimensionality reduction on high-dimensional multi-channel EEG data in order to enable regularizable downstream classification. Our system achieves competitive performance with existing signal processing and classification benchmarks; we demonstrate a mean test correlation coefficient of >0.95 at 2 dB on semi-synthetic neural denoising and a downstream EEG-based classification accuracy of 0.97 on distinguishing digit- versus non-digit thoughts. Results are preliminary and our geometric machine learning pipeline should be validated by more extensive follow-up studies; generalizing these results to larger inter-subject sample sizes, different hardware systems, and broader use cases will be crucial.
2025-02-06 Transfer Learning for Covert Speech Classification Using EEG Hilbert Envelope and Temporal Fine Structure Saravanakumar Duraisamy, Mateusz Dubiel, Maurice Rekrut, Luis A. Leiva Link Brain-Computer Interfaces (BCIs) can decode imagined speech from neural activity. However, these systems typically require extensive training sessions where participants imaginedly repeat words, leading to mental fatigue and difficulties identifying the onset of words, especially when imagining sequences of words. This paper addresses these challenges by transferring a classifier trained in overt speech data to covert speech classification. We used electroencephalogram (EEG) features derived from the Hilbert envelope and temporal fine structure, and used them to train a bidirectional long-short-term memory (BiLSTM) model for classification. Our method reduces the burden of extensive training and achieves state-of-the-art classification accuracy: 86.44% for overt speech and 79.82% for covert speech using the overt speech classifier.
2025-02-07 Decoding Human Attentive States from Spatial-temporal EEG Patches Using Transformers Yi Ding, Joon Hei Lee, Shuailei Zhang, Tianze Luo, Cuntai Guan Link Learning the spatial topology of electroencephalogram (EEG) channels and their temporal dynamics is crucial for decoding attention states. This paper introduces EEG-PatchFormer, a transformer-based deep learning framework designed specifically for EEG attention classification in Brain-Computer Interface (BCI) applications. By integrating a Temporal CNN for frequency-based EEG feature extraction, a pointwise CNN for feature enhancement, and Spatial and Temporal Patching modules for organizing features into spatial-temporal patches, EEG-PatchFormer jointly learns spatial-temporal information from EEG data. Leveraging the global learning capabilities of the self-attention mechanism, it captures essential features across brain regions over time, thereby enhancing EEG data decoding performance. Demonstrating superior performance, EEG-PatchFormer surpasses existing benchmarks in accuracy, area under the ROC curve (AUC), and macro-F1 score on a public cognitive attention dataset. The code can be found via: https://github.com/yi-ding-cs/EEG-PatchFormer .
2025-02-05 Fine-Tuning Strategies for Continual Online EEG Motor Imagery Decoding: Insights from a Large-Scale Longitudinal Study Martin Wimpff, Bruno Aristimunha, Sylvain Chevallier, Bin Yang Link This study investigates continual fine-tuning strategies for deep learning in online longitudinal electroencephalography (EEG) motor imagery (MI) decoding within a causal setting involving a large user group and multiple sessions per participant. We are the first to explore such strategies across a large user group, as longitudinal adaptation is typically studied in the single-subject setting with a single adaptation strategy, which limits the ability to generalize findings. First, we examine the impact of different fine-tuning approaches on decoder performance and stability. Building on this, we integrate online test-time adaptation (OTTA) to adapt the model during deployment, complementing the effects of prior fine-tuning. Our findings demonstrate that fine-tuning that successively builds on prior subject-specific information improves both performance and stability, while OTTA effectively adapts the model to evolving data distributions across consecutive sessions, enabling calibration-free operation. These results offer valuable insights and recommendations for future research in longitudinal online MI decoding and highlight the importance of combining domain adaptation strategies for improving BCI performance in real-world applications. Clinical Relevance: Our investigation enables more stable and efficient long-term motor imagery decoding, which is critical for neurorehabilitation and assistive technologies.
2025-02-05 Multimodal Brain-Computer Interfaces: AI-powered Decoding Methodologies Siyang Li, Hongbin Wang, Xiaoqing Chen, Dongrui Wu Link Brain-computer interfaces (BCIs) enable direct communication between the brain and external devices. This review highlights the core decoding algorithms that enable multimodal BCIs, including a dissection of the elements, a unified view of diversified approaches, and a comprehensive analysis of the present state of the field. We emphasize algorithmic advancements in cross-modality mapping, sequential modeling, besides classic multi-modality fusion, illustrating how these novel AI approaches enhance decoding of brain data. The current literature of BCI applications on visual, speech, and affective decoding are comprehensively explored. Looking forward, we draw attention on the impact of emerging architectures like multimodal Transformers, and discuss challenges such as brain data heterogeneity and common errors. This review also serves as a bridge in this interdisciplinary field for experts with neuroscience background and experts that study AI, aiming to provide a comprehensive understanding for AI-powered multimodal BCIs.

fMRI

Publish Date Title Authors URL Abstract
2025-02-16 Mixture of Tunable Experts - Behavior Modification of DeepSeek-R1 at Inference Time Robert Dahlke, Henrik Klagges, Dan Zecha, Benjamin Merkel, Sven Rohr, Fabian Klemm Link We present the Mixture-of-Tunable-Experts (MoTE), a method that extends the Mixture-of-Experts architecture of Large Language Models (LLMs). Without additional training, MoTE enables meaningful and focused behavior changes in LLMs on-the-fly during inference time. By analyzing the digital LLM brain of DeepSeek-R1 using a technique we dub 'functional Token Resonance Imaging' (fTRI) - inspired by fMRI and using prompts designed to elicit specific behavior (e.g., 'What happened {time}{place}?') - we empirically identify distinctive experts associated with behaviors like refusal responses. Using MoTE we are able to intervene and control such specific behavior. We switched off the top 10 most refusal-relevant experts (0.07% of R1's 14,848 routed experts), achieving a 52% refusal reduction on sensitive reference prompts without performance degradation on MT-Bench. Random expert deactivation resulted in smaller behavioral shifts with increased noise, whereas forced expert activation led to significantly higher refusal rates. Our approach shares similarities with sparse autoencoders (SAEs) in terms of explainability and steerability. Unlike SAEs, MoTE does not require large training efforts, as within MoEs with a vast number of experts, specialization already emerged naturally during pretraining. Our findings suggest that significant functional mechanisms in Mixture-of-Experts architectures can at least partially be localized in a small number of specific experts, rather than being distributed throughout the model's weights. Expert subgroups can be tuned to trigger significant behavior variations, providing insights into the inner workings of LLMs.
2025-02-15 Towards Zero-Shot Task-Generalizable Learning on fMRI Jiyao Wang, Nicha C. Dvornek, Peiyu Duan, Lawrence H. Staib, James S. Duncan Link Functional MRI measuring BOLD signal is an increasingly important imaging modality in studying brain functions and neurological disorders. It can be acquired in either a resting-state or a task-based paradigm. Compared to resting-state fMRI, task-based fMRI is acquired while the subject is performing a specific task designed to enhance study-related brain activities. Consequently, it generally has more informative task-dependent signals. However, due to the variety of task designs, it is much more difficult than in resting state to aggregate task-based fMRI acquired in different tasks to train a generalizable model. To resolve this complication, we propose a supervised task-aware network TA-GAT that jointly learns a general-purpose encoder and task-specific contextual information. The encoder-generated embedding and the learned contextual information are then combined as input to multiple modules for performing downstream tasks. We believe that the proposed task-aware architecture can plug-and-play in any neural network architecture to incorporate the prior knowledge of fMRI tasks into capturing functional brain patterns.
2025-02-12 Neuronal Correlates of Semantic Event Classes during Presentation of Complex Naturalistic Stimuli: Anatomical Patterns, Context-Sensitivity, and Potential Impact on shared Human-Robot Ontologies Florian Ahrens, Mihai Pomarlan, Daniel Beßler, Michael Beetz, Manfred Herrmann Link The present study forms part of a research project that aims to develop cognition-enabled robotic agents with environmental interaction capabilities close to human proficiency. This approach is based on human-derived neuronal data in combination with a shared ontology to enable robots to learn from human experiences. To gain further insight into the relation between human neuronal activity patterns and ontological classes, we introduced General Linear Model (GLM) analyses on fMRI data of participants who were presented with complex naturalistic video stimuli comparable to the robot tasks. We modeled four event classes (pick, place, fetch and deliver) attached to different environmental and object-related context and employed a Representational Similarity Analysis (RSA) on associated brain activity patterns as a starting point for an automatic hierarchical clustering. Based on the default values for the Hemodynamic Response Function (HRF), the activity patterns were reliably grouped according to their parent classes of object interaction and navigation. Although fetch and deliver events were also distinguished by neuronal patterns, pick and place events demonstrated higher ambiguity with respect to neuronal activation patterns. Introducing a shorter HRF time-to-peak leads to a more reliable grouping of all four semantic classes, despite contextual factors. These data might give novel insights into the neuronal representation of complex stimuli and may enable further research in ontology validation in cognition-enabled robotics.
2025-02-15 From Brainwaves to Brain Scans: A Robust Neural Network for EEG-to-fMRI Synthesis Kristofer Grover Roos, Atsushi Fukuda, Quan Huu Cap Link While functional magnetic resonance imaging (fMRI) offers rich spatial resolution, it is limited by high operational costs and significant infrastructural demands. In contrast, electroencephalography (EEG) provides millisecond-level precision in capturing electrical activity but lacks the spatial resolution necessary for precise neural localization. To bridge these gaps, we introduce E2fNet, a simple yet effective deep learning model for synthesizing fMRI images from low-cost EEG data. E2fNet is specifically designed to capture and translate meaningful features from EEG across electrode channels into accurate fMRI representations. Extensive evaluations across three datasets demonstrate that E2fNet consistently outperforms existing methods, achieving state-of-the-art results in terms of the structural similarity index measure (SSIM). Our findings suggest that E2fNet is a promising, cost-effective solution for enhancing neuroimaging capabilities. The code is available at https://github.com/kgr20/E2fNet.
2025-02-10 Direct Estimation of Pediatric Heart Rate Variability from BOLD-fMRI: A Machine Learning Approach Using Dynamic Connectivity Abdoljalil Addeh, Karen Ardila, Rebecca J Williams, G. Bruce Pike, M. Ethan MacDonald Link In many pediatric fMRI studies, cardiac signals are often missing or of poor quality. A tool to extract Heart Rate Variation (HRV) waveforms directly from fMRI data, without the need for peripheral recording devices, would be highly beneficial. We developed a machine learning framework to accurately reconstruct HRV for pediatric applications. A hybrid model combining one-dimensional Convolutional Neural Networks (1D-CNN) and Gated Recurrent Units (GRU) analyzed BOLD signals from 628 ROIs, integrating past and future data. The model achieved an 8% improvement in HRV accuracy, as evidenced by enhanced performance metrics. This approach eliminates the need for peripheral photoplethysmography devices, reduces costs, and simplifies procedures in pediatric fMRI. Additionally, it improves the robustness of pediatric fMRI studies, which are more sensitive to physiological and developmental variations than those in adults.
2025-02-09 Topological Time Frequency Analysis of Functional Brain Signals Moo K. Chung, Aaron F. Struck Link We present a novel topological framework for analyzing functional brain signals using time-frequency analysis. By integrating persistent homology with time-frequency representations, we capture multi-scale topological features that characterize the dynamic behavior of brain activity. This approach identifies 0D (connected components) and 1D (loops) topological structures in the signal's time-frequency domain, enabling robust extraction of features invariant to noise and temporal misalignments. The proposed method is demonstrated on resting-state functional magnetic resonance imaging (fMRI) data, showcasing its ability to discern critical topological patterns and provide insights into functional connectivity. This topological approach opens new avenues for analyzing complex brain signals, offering potential applications in neuroscience and clinical diagnostics.
2025-02-08 Multi-Site rs-fMRI Domain Alignment for Autism Spectrum Disorder Auxiliary Diagnosis Based on Hyperbolic Space Yiqian Luo, Qiurong Chen, Yangsong Zhang Link In the medical field, most resting-state fMRI (rs-fMRI) data are collected from multiple hospital sites. Multi-site rs-fMRI data can increase the volume of training data, enabling auxiliary diagnostic algorithms for brain diseases to learn more accurate and stable models. However, due to the significant heterogeneity and domain shift in rs-fMRI data across different sites, the accuracy of auxiliary diagnosis remains unsatisfactory. Moreover, there has been limited exploration of multi-source domain adaptation algorithms, and the interpretability of models is often poor. To address these challenges, we proposed a domain-adaptive algorithm based on hyperbolic space embedding. Hyperbolic space is naturally suited for representing the topology of complex networks such as brain functional networks. Therefore, we embedded the brain functional network into hyperbolic space and constructed the corresponding hyperbolic space community network to effectively extract brain network representations. To address the heterogeneity of data across different sites and the issue of domain shift, we introduce a constraint loss function, HMMD (Hyperbolic Maximum Mean Discrepancy), to align the marginal distributions in the hyperbolic space. Additionally, we employ class prototype alignment to align the conditional distributions. This significantly improves the quality of brain representations and enhances diagnostic classification accuracy for Autism Spectrum Disorder (ASD). Experimental results demonstrated that the proposed algorithm is robust to multi-site heterogeneity and shows promising potential for brain network mechanism analysis.
2025-02-07 MindAligner: Explicit Brain Functional Alignment for Cross-Subject Visual Decoding from Limited fMRI Data Yuqin Dai, Zhouheng Yao, Chunfeng Song, Qihao Zheng, Weijian Mai, Kunyu Peng, Shuai Lu, Wanli Ouyang, Jian Yang, Jiamin Wu Link Brain decoding aims to reconstruct visual perception of human subject from fMRI signals, which is crucial for understanding brain's perception mechanisms. Existing methods are confined to the single-subject paradigm due to substantial brain variability, which leads to weak generalization across individuals and incurs high training costs, exacerbated by limited availability of fMRI data. To address these challenges, we propose MindAligner, an explicit functional alignment framework for cross-subject brain decoding from limited fMRI data. The proposed MindAligner enjoys several merits. First, we learn a Brain Transfer Matrix (BTM) that projects the brain signals of an arbitrary new subject to one of the known subjects, enabling seamless use of pre-trained decoding models. Second, to facilitate reliable BTM learning, a Brain Functional Alignment module is proposed to perform soft cross-subject brain alignment under different visual stimuli with a multi-level brain alignment loss, uncovering fine-grained functional correspondences with high interpretability. Experiments indicate that MindAligner not only outperforms existing methods in visual decoding under data-limited conditions, but also provides valuable neuroscience insights in cross-subject functional analysis. The code will be made publicly available.
2025-02-07 A Foundational Brain Dynamics Model via Stochastic Optimal Control Joonhyeong Park, Byoungwoo Park, Chang-Bae Bang, Jungwon Choi, Hyungjin Chung, Byung-Hoon Kim, Juho Lee Link We introduce a foundational model for brain dynamics that utilizes stochastic optimal control (SOC) and amortized inference. Our method features a continuous-discrete state space model (SSM) that can robustly handle the intricate and noisy nature of fMRI signals. To address computational limitations, we implement an approximation strategy grounded in the SOC framework. Additionally, we present a simulation-free latent dynamics approach that employs locally linear approximations, facilitating efficient and scalable inference. For effective representation learning, we derive an Evidence Lower Bound (ELBO) from the SOC formulation, which integrates smoothly with recent advancements in self-supervised learning (SSL), thereby promoting robust and transferable representations. Pre-trained on extensive datasets such as the UKB, our model attains state-of-the-art results across a variety of downstream tasks, including demographic prediction, trait analysis, disease diagnosis, and prognosis. Moreover, evaluating on external datasets such as HCP-A, ABIDE, and ADHD200 further validates its superior abilities and resilience across different demographic and clinical distributions. Our foundational model provides a scalable and efficient approach for deciphering brain dynamics, opening up numerous applications in neuroscience.
2025-02-06 Dark Brain Energy: Toward an Integrative Model of Spontaneous Slow Oscillations ZhuQing Gong, XiNian Zuo Link Neural oscillations facilitate the functioning of the human brain in spatial and temporal dimensions at various frequencies. These oscillations feature a universal frequency architecture that is governed by brain anatomy, ensuring frequency specificity remains invariant across different measurement techniques. Initial magnetic resonance imaging (MRI) methodology constrained functional MRI (fMRI) investigations to a singular frequency range, thereby neglecting the frequency characteristics inherent in blood oxygen level-dependent oscillations. With advancements in MRI technology, it has become feasible to decode intricate brain activities via multi-band frequency analysis (MBFA). During the past decade, the utilization of MBFA in fMRI studies has surged, unveiling frequency-dependent characteristics of spontaneous slow oscillations (SSOs) believed to base dark energy in the brain. There remains a dearth of conclusive insights and hypotheses pertaining to the properties and functionalities of SSOs in distinct bands. We surveyed the SSO MBFA studies during the past 15 years to delineate the attributes of SSOs and enlighten their correlated functions. We further proposed a model to elucidate the hierarchical organization of multi-band SSOs by integrating their function, aimed at bridging theoretical gaps and guiding future MBFA research endeavors.

MEG

Publish Date Title Authors URL Abstract
2025-02-18 From Thought to Action: How a Hierarchy of Neural Dynamics Supports Language Production Mingfang Zhang, Jarod Lévy, Stéphane d'Ascoli, Jérémy Rapin, F. -Xavier Alario, Pierre Bourdillon, Svetlana Pinet, Jean-Rémi King Link Humans effortlessly communicate their thoughts through intricate sequences of motor actions. Yet, the neural processes that coordinate language production remain largely unknown, in part because speech artifacts limit the use of neuroimaging. To elucidate the unfolding of language production in the brain, we investigate with magnetoencephalography (MEG) and electroencephalography (EEG) the neurophysiological activity of 35 skilled typists, while they typed sentences on a keyboard. This approach confirms the hierarchical predictions of linguistic theories: the neural activity preceding the production of each word is marked by the sequential rise and fall of context-, word-, syllable-, and letter-level representations. Remarkably, each of these neural representations is maintained over long time periods within each level of the language hierarchy. This phenomenon results in a superposition of successive representations that is supported by a hierarchy of dynamic neural codes. Overall, these findings provide a precise computational breakdown of the neural dynamics that coordinate the production of language in the human brain.
2025-02-07 Estimated Roadway Segment Traffic Data by Vehicle Class for the United States: A Machine Learning Approach Brittany Antonczak, Meg Fay, Aviral Chawla, Gregory Rowangould Link The Highway Performance Monitoring System, managed by the Federal Highway Administration, provides essential data on average annual daily traffic across U.S. roadways, but it has limited representation of medium- and heavy-duty vehicles on non-interstate roads. This gap limits research and policy analysis on the impacts of truck traffic, especially concerning air quality and public health. To address this, we use random forest regression to estimate medium- and heavy-duty vehicle traffic volumes in areas with sparse data. This results in a more comprehensive dataset, which enables the estimation of traffic density at the census block level as a proxy for traffic-related air pollution exposure. Our high-resolution spatial data products, rigorously validated, provide a more accurate representation of truck traffic and its environmental and health impacts. These datasets are valuable for transportation planning, public health research, and policy decisions aimed at mitigating the effects of truck traffic on vulnerable communities exposed to air pollution.
2025-02-07 Shifting Attention to You: Personalized Brain-Inspired AI Models Stephen Chong Zhao, Yang Hu, Jason Lee, Andrew Bender, Trisha Mazumdar, Mark Wallace, David A. Tovar Link The integration of human and artificial intelligence represents a scientific opportunity to advance our understanding of information processing, as each system offers unique computational insights that can enhance and inform the other. The synthesis of human cognitive principles with artificial intelligence has the potential to produce more interpretable and functionally aligned computational models, while simultaneously providing a formal framework for investigating the neural mechanisms underlying perception, learning, and decision-making through systematic model comparisons and representational analyses. In this study, we introduce personalized brain-inspired modeling that integrates human behavioral embeddings and neural data to align with cognitive processes. We took a stepwise approach, fine-tuning the Contrastive Language-Image Pre-training (CLIP) model with large-scale behavioral decisions, group-level neural data, and finally, participant-level neural data within a broader framework that we have named CLIP-Human-Based Analysis (CLIP-HBA). We found that fine-tuning on behavioral data enhances its ability to predict human similarity judgments while indirectly aligning it with dynamic representations captured via MEG. To further gain mechanistic insights into the temporal evolution of cognitive processes, we introduced a model specifically fine-tuned on millisecond-level MEG neural dynamics (CLIP-HBA-MEG). This model resulted in enhanced temporal alignment with human neural processing while still showing improvement on behavioral alignment. Finally, we trained individualized models on participant-specific neural data, effectively capturing individualized neural dynamics and highlighting the potential for personalized AI systems. These personalized systems have far-reaching implications for the fields of medicine, cognitive research, human-computer interfaces, and AI development.
2025-02-06 Detecting Mild Traumatic Brain Injury with MEG Scan Data: One-vs-K-Sample Tests Jian Zhang, Gary Green Link Magnetoencephalography (MEG) scanner has been shown to be more accurate than other medical devices in detecting mild traumatic brain injury (mTBI). However, MEG scan data in certain spectrum ranges can be skewed, multimodal and heterogeneous which can mislead the conventional case-control analysis that requires the data to be homogeneous and normally distributed within the control group. To meet this challenge, we propose a flexible one-vs-K-sample testing procedure for detecting brain injury for a single-case versus heterogeneous controls. The new procedure begins with source magnitude imaging using MEG scan data in frequency domain, followed by region-wise contrast tests for abnormality between the case and controls. The critical values for these tests are automatically determined by cross-validation. We adjust the testing results for heterogeneity effects by similarity analysis. An asymptotic theory is established for the proposed test statistic. By simulated and real data analyses in the context of neurotrauma, we show that the proposed test outperforms commonly used nonparametric methods in terms of overall accuracy and ability in accommodating data non-normality and subject-heterogeneity.
2025-01-31 Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming Mrinank Sharma, Meg Tong, Jesse Mu, Jerry Wei, Jorrit Kruthoff, Scott Goodfriend, Euan Ong, Alwin Peng, Raj Agarwal, Cem Anil, Amanda Askell, Nathan Bailey, Joe Benton, Emma Bluemke, Samuel R. Bowman, Eric Christiansen, Hoagy Cunningham, Andy Dau, Anjali Gopal, Rob Gilson, Logan Graham, Logan Howard, Nimit Kalra, Taesung Lee, Kevin Lin, Peter Lofgren, Francesco Mosconi, Clare O'Hara, Catherine Olsson, Linda Petrini, Samir Rajani, Nikhil Saxena, Alex Silverstein, Tanya Singh, Theodore Sumers, Leonard Tang, Kevin K. Troy, Constantin Weisser, Ruiqi Zhong, Giulio Zhou, Jan Leike, Jared Kaplan, Ethan Perez Link Large language models (LLMs) are vulnerable to universal jailbreaks-prompting strategies that systematically bypass model safeguards and enable users to carry out harmful processes that require many model interactions, like manufacturing illegal substances at scale. To defend against these attacks, we introduce Constitutional Classifiers: safeguards trained on synthetic data, generated by prompting LLMs with natural language rules (i.e., a constitution) specifying permitted and restricted content. In over 3,000 estimated hours of red teaming, no red teamer found a universal jailbreak that could extract information from an early classifier-guarded LLM at a similar level of detail to an unguarded model across most target queries. On automated evaluations, enhanced classifiers demonstrated robust defense against held-out domain-specific jailbreaks. These classifiers also maintain deployment viability, with an absolute 0.38% increase in production-traffic refusals and a 23.7% inference overhead. Our work demonstrates that defending against universal jailbreaks while maintaining practical deployment viability is tractable.
2025-01-28 "Ownership, Not Just Happy Talk": Co-Designing a Participatory Large Language Model for Journalism Emily Tseng, Meg Young, Marianne Aubin Le Quéré, Aimee Rinehart, Harini Suresh Link Journalism has emerged as an essential domain for understanding the uses, limitations, and impacts of large language models (LLMs) in the workplace. News organizations face divergent financial incentives: LLMs already permeate newswork processes within financially constrained organizations, even as ongoing legal challenges assert that AI companies violate their copyright. At stake are key questions about what LLMs are created to do, and by whom: How might a journalist-led LLM work, and what can participatory design illuminate about the present-day challenges about adapting ``one-size-fits-all'' foundation models to a given context of use? In this paper, we undertake a co-design exploration to understand how a participatory approach to LLMs might address opportunities and challenges around AI in journalism. Our 20 interviews with reporters, data journalists, editors, labor organizers, product leads, and executives highlight macro, meso, and micro tensions that designing for this opportunity space must address. From these desiderata, we describe the result of our co-design work: organizational structures and functionality for a journalist-controlled LLM. In closing, we discuss the limitations of commercial foundation models for workplace use, and the methodological implications of applying participatory methods to LLM co-design.
2025-01-26 The Advanced Muon Facility: a proposed multi-purpose muon facility at Fermilab Sophie Middleton Link Charged lepton flavor violation (CLFV) is expected in a diverse set of new physics scenarios. The current generation of experiments probe CLFV in the muon sector in three complementary channels: $\mu^-N \rightarrow e^- N$ (Mu2e, COMET), $\mu^+ \rightarrow e^+ \gamma$ (MEG-II), and $\mu^+ \rightarrow e^+e^+e^-$s (Mu3e). These experiments aim to enhance existing limits by several orders-of-magnitude in the coming decade and offer discovery potential to many new physics models. The proposed Advanced Muon Facility (AMF) would be a multi-purpose muon facility based at Fermilab and introduces an innovative approach based on a muon storage ring to enable a full suite of muon CLFV experiments. AMF would host CLFV experiments with sensitivities orders-of-magnitude beyond the present era. In the event of a signal in these currently planned experiments, AMF would enable additional measurements to elucidate the nature of the new physics observed. The design and R$\&$D for AMF is in its infancy. This article outlines the motivations for AMF, detailing on-going R$\&$D efforts, and highlighting potential synergies with the proposed muon collider.
2025-01-28 Scaling laws for decoding images from brain activity Hubert Banville, Yohann Benchetrit, Stéphane d'Ascoli, Jérémy Rapin, Jean-Rémi King Link Generative AI has recently propelled the decoding of images from brain activity. How do these approaches scale with the amount and type of neural recordings? Here, we systematically compare image decoding from four types of non-invasive devices: electroencephalography (EEG), magnetoencephalography (MEG), high-field functional Magnetic Resonance Imaging (3T fMRI) and ultra-high field (7T) fMRI. For this, we evaluate decoding models on the largest benchmark to date, encompassing 8 public datasets, 84 volunteers, 498 hours of brain recording and 2.3 million brain responses to natural images. Unlike previous work, we focus on single-trial decoding performance to simulate real-time settings. This systematic comparison reveals three main findings. First, the most precise neuroimaging devices tend to yield the best decoding performances, when the size of the training sets are similar. However, the gain enabled by deep learning - in comparison to linear models - is obtained with the noisiest devices. Second, we do not observe any plateau of decoding performance as the amount of training data increases. Rather, decoding performance scales log-linearly with the amount of brain recording. Third, this scaling law primarily depends on the amount of data per subject. However, little decoding gain is observed by increasing the number of subjects. Overall, these findings delineate the path most suitable to scale the decoding of images from non-invasive brain recordings.
2025-01-21 Probing Type II Seesaw Leptogenesis Through Lepton Flavor Violation Chengcheng Han, Yijun Han, Sihui Huang, Zhanhong Lei Link Lepton flavor violation (LFV) offers a powerful probe of physics beyond the Standard Model, particularly in models addressing neutrino masses and the baryon asymmetry of the universe. In this study, we investigate LFV processes within the framework of type II seesaw leptogenesis, where the Standard Model is extended by an $SU(2)_L$ triplet Higgs field. We focus on key LFV processes including $\mu^+\to e^+\gamma$, $\mu^+ \to e^+e^-e^+$, and $\mu \rightarrow e$ conversion in nuclei, deriving stringent constraints on the parameter space from current experimental data. We scan the 3$\sigma$ range of neutrino oscillation parameters and identify the most conservative bounds consistent with existing measurements. Our results reveal that the MEG experiment currently provides the strongest constraints in the normal ordering (NO) scenario, while the SINDRUM experiment offers comparable sensitivity in the inverted ordering (IO) case. Future experiments, such as MEG II, Mu3e, Mu2e, and COMET, are predicted to significantly improve the sensitivity, testing larger regions of the parameter space. This work underscores the crucial role of LFV experiments in probing type II seesaw leptogenesis, providing an avenue to explore the connections between neutrino mass generation, baryogenesis, and inflation at experimentally accessible energy scales.
2025-01-20 Artificial Neural Networks for Magnetoencephalography: A review of an emerging field Arthur Dehgan, Hamza Abdelhedi, Vanessa Hadid, Irina Rish, Karim Jerbi Link Magnetoencephalography (MEG) is a cutting-edge neuroimaging technique that measures the intricate brain dynamics underlying cognitive processes with an unparalleled combination of high temporal and spatial precision. MEG data analytics has always relied on advanced signal processing and mathematical and statistical tools for various tasks ranging from data cleaning to probing the signals' rich dynamics and estimating the neural sources underlying the surface-level recordings. Like in most domains, the surge in Artificial Intelligence (AI) has led to the increased use of Machine Learning (ML) methods for MEG data classification. More recently, an emerging trend in this field is using Artificial Neural Networks (ANNs) to address many MEG-related tasks. This review provides a comprehensive overview of how ANNs are being used with MEG data from three vantage points: First, we review work that employs ANNs for MEG signal classification, i.e., for brain decoding. Second, we report on work that has used ANNs as putative models of information processing in the human brain. Finally, we examine studies that use ANNs as techniques to tackle methodological questions in MEG, including artifact correction and source estimation. Furthermore, we assess the current strengths and limitations of using ANNs with MEG and discuss future challenges and opportunities in this field. Finally, by establishing a detailed portrait of the field and providing practical recommendations for the future, this review seeks to provide a helpful reference for both seasoned MEG researchers and newcomers to the field who are interested in using ANNs to enhance the exploration of the complex dynamics of the human brain with MEG.

neuroAI

Publish Date Title Authors URL Abstract
2025-01-04 Asynchronous Hebbian/anti-Hebbian networks Henrique Reis Aguiar, Matthias H. Hennig Link Lateral inhibition models coupled with Hebbian plasticity have been shown to learn factorised causal representations of input stimuli, for instance, oriented edges are learned from natural images. Currently, these models require the recurrent dynamics to settle into a stable state before weight changes can be applied, which is not only biologically implausible, but also impractical for real-time learning systems. Here, we propose a new Hebbian learning rule which is implemented using plausible biological mechanisms that have been observed experimentally. We find that this rule allows for efficient, time-continuous learning of factorised representations, very similar to the classic noncontinuous Hebbian/anti-Hebbian learning. Furthermore, we show that this rule naturally prevents catastrophic forgetting when stimuli from different distributions are shown sequentially.
2024-11-27 NeuroAI for AI Safety Patrick Mineault, Niccolò Zanichelli, Joanne Zichen Peng, Anton Arkhipov, Eli Bingham, Julian Jara-Ettinger, Emily Mackevicius, Adam Marblestone, Marcelo Mattar, Andrew Payne, Sophia Sanborn, Karen Schroeder, Zenna Tavares, Andreas Tolias Link As AI systems become increasingly powerful, the need for safe AI has become more pressing. Humans are an attractive model for AI safety: as the only known agents capable of general intelligence, they perform robustly even under conditions that deviate significantly from prior experiences, explore the world safely, understand pragmatics, and can cooperate to meet their intrinsic goals. Intelligence, when coupled with cooperation and safety mechanisms, can drive sustained progress and well-being. These properties are a function of the architecture of the brain and the learning algorithms it implements. Neuroscience may thus hold important keys to technical AI safety that are currently underexplored and underutilized. In this roadmap, we highlight and critically evaluate several paths toward AI safety inspired by neuroscience: emulating the brain's representations, information processing, and architecture; building robust sensory and motor systems from imitating brain data and bodies; fine-tuning AI systems on brain data; advancing interpretability using neuroscience methods; and scaling up cognitively-inspired architectures. We make several concrete recommendations for how neuroscience can positively impact AI safety.
2024-11-21 Evaluating Representational Similarity Measures from the Lens of Functional Correspondence Yiqing Bo, Ansh Soni, Sudhanshu Srivastava, Meenakshi Khosla Link Neuroscience and artificial intelligence (AI) both face the challenge of interpreting high-dimensional neural data, where the comparative analysis of such data is crucial for revealing shared mechanisms and differences between these complex systems. Despite the widespread use of representational comparisons and the abundance classes of comparison methods, a critical question remains: which metrics are most suitable for these comparisons? While some studies evaluate metrics based on their ability to differentiate models of different origins or constructions (e.g., various architectures), another approach is to assess how well they distinguish models that exhibit distinct behaviors. To investigate this, we examine the degree of alignment between various representational similarity measures and behavioral outcomes, employing group statistics and a comprehensive suite of behavioral metrics for comparison. In our evaluation of eight commonly used representational similarity metrics in the visual domain -- spanning alignment-based, Canonical Correlation Analysis (CCA)-based, inner product kernel-based, and nearest-neighbor methods -- we found that metrics like linear Centered Kernel Alignment (CKA) and Procrustes distance, which emphasize the overall geometric structure or shape of representations, excelled in differentiating trained from untrained models and aligning with behavioral measures, whereas metrics such as linear predictivity, commonly used in neuroscience, demonstrated only moderate alignment with behavior. These insights are crucial for selecting metrics that emphasize behaviorally meaningful comparisons in NeuroAI research.
2024-10-25 A prescriptive theory for brain-like inference Hadi Vafaii, Dekel Galor, Jacob L. Yates Link The Evidence Lower Bound (ELBO) is a widely used objective for training deep generative models, such as Variational Autoencoders (VAEs). In the neuroscience literature, an identical objective is known as the variational free energy, hinting at a potential unified framework for brain function and machine learning. Despite its utility in interpreting generative models, including diffusion models, ELBO maximization is often seen as too broad to offer prescriptive guidance for specific architectures in neuroscience or machine learning. In this work, we show that maximizing ELBO under Poisson assumptions for general sequence data leads to a spiking neural network that performs Bayesian posterior inference through its membrane potential dynamics. The resulting model, the iterative Poisson VAE (iP-VAE), has a closer connection to biological neurons than previous brain-inspired predictive coding models based on Gaussian assumptions. Compared to amortized and iterative VAEs, iP-VAElearns sparser representations and exhibits superior generalization to out-of-distribution samples. These findings suggest that optimizing ELBO, combined with Poisson assumptions, provides a solid foundation for developing prescriptive theories in NeuroAI.
2024-09-09 Evidence from fMRI Supports a Two-Phase Abstraction Process in Language Models Emily Cheng, Richard J. Antonello Link Research has repeatedly demonstrated that intermediate hidden states extracted from large language models are able to predict measured brain response to natural language stimuli. Yet, very little is known about the representation properties that enable this high prediction performance. Why is it the intermediate layers, and not the output layers, that are most capable for this unique and highly general transfer task? In this work, we show that evidence from language encoding models in fMRI supports the existence of a two-phase abstraction process within LLMs. We use manifold learning methods to show that this abstraction process naturally arises over the course of training a language model and that the first "composition" phase of this abstraction process is compressed into fewer layers as training continues. Finally, we demonstrate a strong correspondence between layerwise encoding performance and the intrinsic dimensionality of representations from LLMs. We give initial evidence that this correspondence primarily derives from the inherent compositionality of LLMs and not their next-word prediction properties.
2024-07-22 Predictive Coding Networks and Inference Learning: Tutorial and Survey Björn van Zwol, Ro Jefferson, Egon L. van den Broek Link Recent years have witnessed a growing call for renewed emphasis on neuroscience-inspired approaches in artificial intelligence research, under the banner of NeuroAI. A prime example of this is predictive coding networks (PCNs), based on the neuroscientific framework of predictive coding. This framework views the brain as a hierarchical Bayesian inference model that minimizes prediction errors through feedback connections. Unlike traditional neural networks trained with backpropagation (BP), PCNs utilize inference learning (IL), a more biologically plausible algorithm that explains patterns of neural activity that BP cannot. Historically, IL has been more computationally intensive, but recent advancements have demonstrated that it can achieve higher efficiency than BP with sufficient parallelization. Furthermore, PCNs can be mathematically considered a superset of traditional feedforward neural networks (FNNs), significantly extending the range of trainable architectures. As inherently probabilistic (graphical) latent variable models, PCNs provide a versatile framework for both supervised learning and unsupervised (generative) modeling that goes beyond traditional artificial neural networks. This work provides a comprehensive review and detailed formal specification of PCNs, particularly situating them within the context of modern ML methods. Additionally, we introduce a Python library (PRECO) for practical implementation. This positions PC as a promising framework for future ML innovations.
2023-10-29 Beyond Geometry: Comparing the Temporal Structure of Computation in Neural Circuits with Dynamical Similarity Analysis Mitchell Ostrow, Adam Eisen, Leo Kozachkov, Ila Fiete Link How can we tell whether two neural networks utilize the same internal processes for a particular computation? This question is pertinent for multiple subfields of neuroscience and machine learning, including neuroAI, mechanistic interpretability, and brain-machine interfaces. Standard approaches for comparing neural networks focus on the spatial geometry of latent states. Yet in recurrent networks, computations are implemented at the level of dynamics, and two networks performing the same computation with equivalent dynamics need not exhibit the same geometry. To bridge this gap, we introduce a novel similarity metric that compares two systems at the level of their dynamics, called Dynamical Similarity Analysis (DSA). Our method incorporates two components: Using recent advances in data-driven dynamical systems theory, we learn a high-dimensional linear system that accurately captures core features of the original nonlinear dynamics. Next, we compare different systems passed through this embedding using a novel extension of Procrustes Analysis that accounts for how vector fields change under orthogonal transformation. In four case studies, we demonstrate that our method disentangles conjugate and non-conjugate recurrent neural networks (RNNs), while geometric methods fall short. We additionally show that our method can distinguish learning rules in an unsupervised manner. Our method opens the door to comparative analyses of the essential temporal structure of computation in neural circuits.
2023-05-25 Explaining V1 Properties with a Biologically Constrained Deep Learning Architecture Galen Pogoncheff, Jacob Granley, Michael Beyeler Link Convolutional neural networks (CNNs) have recently emerged as promising models of the ventral visual stream, despite their lack of biological specificity. While current state-of-the-art models of the primary visual cortex (V1) have surfaced from training with adversarial examples and extensively augmented data, these models are still unable to explain key neural properties observed in V1 that arise from biological circuitry. To address this gap, we systematically incorporated neuroscience-derived architectural components into CNNs to identify a set of mechanisms and architectures that comprehensively explain neural activity in V1. We show drastic improvements in model-V1 alignment driven by the integration of architectural components that simulate center-surround antagonism, local receptive fields, tuned normalization, and cortical magnification. Upon enhancing task-driven CNNs with a collection of these specialized components, we uncover models with latent representations that yield state-of-the-art explanation of V1 neural activity and tuning properties. Our results highlight an important advancement in the field of NeuroAI, as we systematically establish a set of architectural components that contribute to unprecedented explanation of V1. The neuroscience insights that could be gleaned from increasingly accurate in-silico models of the brain have the potential to greatly advance the fields of both neuroscience and artificial intelligence.
2024-11-09 A Deep Probabilistic Spatiotemporal Framework for Dynamic Graph Representation Learning with Application to Brain Disorder Identification Sin-Yee Yap, Junn Yong Loo, Chee-Ming Ting, Fuad Noman, Raphael C. -W. Phan, Adeel Razi, David L. Dowe Link Recent applications of pattern recognition techniques on brain connectome classification using functional connectivity (FC) are shifting towards acknowledging the non-Euclidean topology and dynamic aspects of brain connectivity across time. In this paper, a deep spatiotemporal variational Bayes (DSVB) framework is proposed to learn time-varying topological structures in dynamic FC networks for identifying autism spectrum disorder (ASD) in human participants. The framework incorporates a spatial-aware recurrent neural network with an attention-based message passing scheme to capture rich spatiotemporal patterns across dynamic FC networks. To overcome model overfitting on limited training datasets, an adversarial training strategy is introduced to learn graph embedding models that generalize well to unseen brain networks. Evaluation on the ABIDE resting-state functional magnetic resonance imaging dataset shows that our proposed framework substantially outperforms state-of-the-art methods in identifying patients with ASD. Dynamic FC analyses with DSVB-learned embeddings reveal apparent group differences between ASD and healthy controls in brain network connectivity patterns and switching dynamics of brain states. The code is available at https://github.com/Monash-NeuroAI/Deep-Spatiotemporal-Variational-Bayes.
2023-03-11 Towards NeuroAI: Introducing Neuronal Diversity into Artificial Neural Networks Feng-Lei Fan, Yingxin Li, Hanchuan Peng, Tieyong Zeng, Fei Wang Link Throughout history, the development of artificial intelligence, particularly artificial neural networks, has been open to and constantly inspired by the increasingly deepened understanding of the brain, such as the inspiration of neocognitron, which is the pioneering work of convolutional neural networks. Per the motives of the emerging field: NeuroAI, a great amount of neuroscience knowledge can help catalyze the next generation of AI by endowing a network with more powerful capabilities. As we know, the human brain has numerous morphologically and functionally different neurons, while artificial neural networks are almost exclusively built on a single neuron type. In the human brain, neuronal diversity is an enabling factor for all kinds of biological intelligent behaviors. Since an artificial network is a miniature of the human brain, introducing neuronal diversity should be valuable in terms of addressing those essential problems of artificial networks such as efficiency, interpretability, and memory. In this Primer, we first discuss the preliminaries of biological neuronal diversity and the characteristics of information transmission and processing in a biological neuron. Then, we review studies of designing new neurons for artificial networks. Next, we discuss what gains can neuronal diversity bring into artificial networks and exemplary applications in several important fields. Lastly, we discuss the challenges and future directions of neuronal diversity to explore the potential of NeuroAI.

medical

Publish Date Title Authors URL Abstract
2025-02-19 Application of autoresonance in rapid beam extraction of synchrotrons X. Ding, S. Ruan, H. Ren, G. Wang, R. H. Zhu, J. C. Yang, H. Zhao Link In recent years, the ultra-high dose rate (FLASH) radiotherapy has become a novel cancer treatment technique because of its similar tumor-killing efficacy as conventional particle therapy while significantly protecting normal tissues. However, due to the limitation of number of particles, achieving FLASH effect in a compact heavy-ion synchrotron requires a short extraction time of tens of milliseconds, which is challenging for conventional RF-KO method. The conventional RF-KO method excites particle amplitudes through local resonance, but the extremely high dose rates required for FLASH therapy necessitate large excitation amplitudes that current driver technologies cannot adequately provide within the required millisecond scale. To address this challenge, this paper introduces autoresonance into resonant extraction for the first time. Leveraging the principles of autoresonance, this technique achieves whole beam amplitude resonance through frequency locking, facilitating rapid and uniform beam extraction using a single-period frequency sweeping excitation. Compared to conventional slow extraction methods, this innovative approach requires only the addition of an octupole magnet. We derived the autoresonance threshold under rapid extraction conditions and conducted some simulations. Simulation results confirm the theoretical analysis, demonstrating that this method can achieve millisecond scale extraction, validating the effectiveness of the proposed Autoresonance Rapid Extraction method.
2025-02-19 MSVCOD:A Large-Scale Multi-Scene Dataset for Video Camouflage Object Detection Shuyong Gao, Yu'ang Feng, Qishan Wang, Lingyi Hong, Xinyu Zhou, Liu Fei, Yan Wang, Wenqiang Zhang Link Video Camouflaged Object Detection (VCOD) is a challenging task which aims to identify objects that seamlessly concealed within the background in videos. The dynamic properties of video enable detection of camouflaged objects through motion cues or varied perspectives. Previous VCOD datasets primarily contain animal objects, limiting the scope of research to wildlife scenarios. However, the applications of VCOD extend beyond wildlife and have significant implications in security, art, and medical fields. Addressing this problem, we construct a new large-scale multi-domain VCOD dataset MSVCOD. To achieve high-quality annotations, we design a semi-automatic iterative annotation pipeline that reduces costs while maintaining annotation accuracy. Our MSVCOD is the largest VCOD dataset to date, introducing multiple object categories including human, animal, medical, and vehicle objects for the first time, while also expanding background diversity across various environments. This expanded scope increases the practical applicability of the VCOD task in camouflaged object detection. Alongside this dataset, we introduce a one-steam video camouflage object detection model that performs both feature extraction and information fusion without additional motion feature fusion modules. Our framework achieves state-of-the-art results on the existing VCOD animal dataset and the proposed MSVCOD. The dataset and code will be made publicly available.
2025-02-19 MGFI-Net: A Multi-Grained Feature Integration Network for Enhanced Medical Image Segmentation Yucheng Zeng Link Medical image segmentation plays a crucial role in various clinical applications. A major challenge in medical image segmentation is achieving accurate delineation of regions of interest in the presence of noise, low contrast, or complex anatomical structures. Existing segmentation models often neglect the integration of multi-grained information and fail to preserve edge details, which are critical for precise segmentation. To address these challenges, we propose a novel image semantic segmentation model called the Multi-Grained Feature Integration Network (MGFI-Net). Our MGFI-Net is designed with two dedicated modules to tackle these issues. First, to enhance segmentation accuracy, we introduce a Multi-Grained Feature Extraction Module, which leverages hierarchical relationships between different feature scales to selectively focus on the most relevant information. Second, to preserve edge details, we incorporate an Edge Enhancement Module that effectively retains and integrates boundary information to refine segmentation results. Extensive experiments demonstrate that MGFI-Net not only outperforms state-of-the-art methods in terms of segmentation accuracy but also achieves superior time efficiency, establishing it as a leading solution for real-time medical image segmentation.
2025-02-19 Muscle Activation Estimation by Optimzing the Musculoskeletal Model for Personalized Strength and Conditioning Training Xi Wu, Chenzui Li, Kehan Zou, Ning Xi, Fei Chen Link Musculoskeletal models are pivotal in the domains of rehabilitation and resistance training to analyze muscle conditions. However, individual variability in musculoskeletal parameters and the immeasurability of some internal biomechanical variables pose significant obstacles to accurate personalized modelling. Furthermore, muscle activation estimation can be challenging due to the inherent redundancy of the musculoskeletal system, where multiple muscles drive a single joint. This study develops a whole-body musculoskeletal model for strength and conditioning training and calibrates relevant muscle parameters with an electromyography-based optimization method. By utilizing the personalized musculoskeletal model, muscle activation can be subsequently estimated to analyze the performance of exercises. Bench press and deadlift are chosen for experimental verification to affirm the efficacy of this approach.
2025-02-19 Medical Image Classification with KAN-Integrated Transformers and Dilated Neighborhood Attention Omid Nejati Manzari, Hojat Asgariandehkordi, Taha Koleilat, Yiming Xiao, Hassan Rivaz Link Convolutional networks, transformers, hybrid models, and Mamba-based architectures have demonstrated strong performance across various medical image classification tasks. However, these methods were primarily designed to classify clean images using labeled data. In contrast, real-world clinical data often involve image corruptions that are unique to multi-center studies and stem from variations in imaging equipment across manufacturers. In this paper, we introduce the Medical Vision Transformer (MedViTV2), a novel architecture incorporating Kolmogorov-Arnold Network (KAN) layers into the transformer architecture for the first time, aiming for generalized medical image classification. We have developed an efficient KAN block to reduce computational load while enhancing the accuracy of the original MedViT. Additionally, to counteract the fragility of our MedViT when scaled up, we propose an enhanced Dilated Neighborhood Attention (DiNA), an adaptation of the efficient fused dot-product attention kernel capable of capturing global context and expanding receptive fields to scale the model effectively and addressing feature collapse issues. Moreover, a hierarchical hybrid strategy is introduced to stack our Local Feature Perception and Global Feature Perception blocks in an efficient manner, which balances local and global feature perceptions to boost performance. Extensive experiments on 17 medical image classification datasets and 12 corrupted medical image datasets demonstrate that MedViTV2 achieved state-of-the-art results in 27 out of 29 experiments with reduced computational complexity. MedViTV2 is 44\% more computationally efficient than the previous version and significantly enhances accuracy, achieving improvements of 4.6\% on MedMNIST, 5.8\% on NonMNIST, and 13.4\% on the MedMNIST-C benchmark.
2025-02-19 Pericoronary adipose tissue attenuation as a predictor of functional severity of coronary stenosis Marta Pillitteri, Guido Nannini, Simone Saitta, Luca Mariani, Riccardo Maragna, Andrea Baggiano, Gianluca Pontone, Alberto Redaelli Link Objective: This study aims to evaluate the functional significance of coronary stenosis by analyzing low-level radiomic features of the pericoronary adipose tissue (PCAT) surrounding the lesions, which are indicative of its inflammation status. Methods: A dataset of 72 patients who underwent coronary computed tomography angiography (CCTA) was analyzed, with 3D segmentation and computational fluid dynamics (CFD) simulations from a prior study. Centerlines of the main epicardial branches were automatically extracted, and lesions identified using Gaussian kernel regression to estimate healthy branch caliber. PCAT features were computed per vessel following guideline recommendations and per lesion within a region extending radially for two vessel radii. Features like fat volume and mean attenuation (FAI) were analyzed for their relationship with CFD-derived hemodynamic biomarkers, such as fractional flow reserve (FFR) and wall shear stress (WSS). These features also informed a machine learning (ML) model for classifying potentially ischemic lesions. Results: PCAT exhibited, on average, higher attenuation in the presence of hemodynamically significant lesions (i.e., FFR < 0.80), although this difference was of limited statistical significance. The ML classifier, trained on PCAT features, successfully distinguished potentially ischemic lesions, yielding average accuracy of 0.84. Conclusion: PCAT attenuation is correlated with the functional status of coronary stenosis and can be used to inform ML models for predicting potential ischemia. Significance: PCAT features, readily available from CCTA, can be used to predict the hemodynamic characteristics of a lesion without the need for an invasive FFR examination.
2025-02-19 MobileViM: A Light-weight and Dimension-independent Vision Mamba for 3D Medical Image Analysis Wei Dai, Steven Wang, Jun Liu Link Efficient evaluation of three-dimensional (3D) medical images is crucial for diagnostic and therapeutic practices in healthcare. Recent years have seen a substantial uptake in applying deep learning and computer vision to analyse and interpret medical images. Traditional approaches, such as convolutional neural networks (CNNs) and vision transformers (ViTs), face significant computational challenges, prompting the need for architectural advancements. Recent efforts have led to the introduction of novel architectures like the ``Mamba'' model as alternative solutions to traditional CNNs or ViTs. The Mamba model excels in the linear processing of one-dimensional data with low computational demands. However, Mamba's potential for 3D medical image analysis remains underexplored and could face significant computational challenges as the dimension increases. This manuscript presents MobileViM, a streamlined architecture for efficient segmentation of 3D medical images. In the MobileViM network, we invent a new dimension-independent mechanism and a dual-direction traversing approach to incorporate with a vision-Mamba-based framework. MobileViM also features a cross-scale bridging technique to improve efficiency and accuracy across various medical imaging modalities. With these enhancements, MobileViM achieves segmentation speeds exceeding 90 frames per second (FPS) on a single graphics processing unit (i.e., NVIDIA RTX 4090). This performance is over 24 FPS faster than the state-of-the-art deep learning models for processing 3D images with the same computational resources. In addition, experimental evaluations demonstrate that MobileViM delivers superior performance, with Dice similarity scores reaching 92.72%, 86.69%, 80.46%, and 77.43% for PENGWIN, BraTS2024, ATLAS, and Toothfairy2 datasets, respectively, which significantly surpasses existing models.
2025-02-19 Unlocking Multimodal Integration in EHRs: A Prompt Learning Framework for Language and Time Series Fusion Shuai Niu, Jing Ma, Hongzhan Lin, Liang Bai, Zhihua Wang, Wei Bi, Yida Xu, Guo Li, Xian Yang Link Large language models (LLMs) have shown remarkable performance in vision-language tasks, but their application in the medical field remains underexplored, particularly for integrating structured time series data with unstructured clinical notes. In clinical practice, dynamic time series data such as lab test results capture critical temporal patterns, while clinical notes provide rich semantic context. Merging these modalities is challenging due to the inherent differences between continuous signals and discrete text. To bridge this gap, we introduce ProMedTS, a novel self-supervised multimodal framework that employs prompt-guided learning to unify these heterogeneous data types. Our approach leverages lightweight anomaly detection to generate anomaly captions that serve as prompts, guiding the encoding of raw time series data into informative embeddings. These embeddings are aligned with textual representations in a shared latent space, preserving fine-grained temporal nuances alongside semantic insights. Furthermore, our framework incorporates tailored self-supervised objectives to enhance both intra- and inter-modal alignment. We evaluate ProMedTS on disease diagnosis tasks using real-world datasets, and the results demonstrate that our method consistently outperforms state-of-the-art approaches.
2025-02-19 Enhancing Chest X-ray Classification through Knowledge Injection in Cross-Modality Learning Yang Yan, Bingqing Yue, Qiaxuan Li, Man Huang, Jingyu Chen, Zhenzhong Lan Link The integration of artificial intelligence in medical imaging has shown tremendous potential, yet the relationship between pre-trained knowledge and performance in cross-modality learning remains unclear. This study investigates how explicitly injecting medical knowledge into the learning process affects the performance of cross-modality classification, focusing on Chest X-ray (CXR) images. We introduce a novel Set Theory-based knowledge injection framework that generates captions for CXR images with controllable knowledge granularity. Using this framework, we fine-tune CLIP model on captions with varying levels of medical information. We evaluate the model's performance through zero-shot classification on the CheXpert dataset, a benchmark for CXR classification. Our results demonstrate that injecting fine-grained medical knowledge substantially improves classification accuracy, achieving 72.5\% compared to 49.9\% when using human-generated captions. This highlights the crucial role of domain-specific knowledge in medical cross-modality learning. Furthermore, we explore the influence of knowledge density and the use of domain-specific Large Language Models (LLMs) for caption generation, finding that denser knowledge and specialized LLMs contribute to enhanced performance. This research advances medical image analysis by demonstrating the effectiveness of knowledge injection for improving automated CXR classification, paving the way for more accurate and reliable diagnostic tools.
2025-02-19 RGAR: Recurrence Generation-augmented Retrieval for Factual-aware Medical Question Answering Sichu Liang, Linhai Zhang, Hongyu Zhu, Wenwen Wang, Yulan He, Deyu Zhou Link Medical question answering requires extensive access to specialized conceptual knowledge. The current paradigm, Retrieval-Augmented Generation (RAG), acquires expertise medical knowledge through large-scale corpus retrieval and uses this knowledge to guide a general-purpose large language model (LLM) for generating answers. However, existing retrieval approaches often overlook the importance of factual knowledge, which limits the relevance of retrieved conceptual knowledge and restricts its applicability in real-world scenarios, such as clinical decision-making based on Electronic Health Records (EHRs). This paper introduces RGAR, a recurrence generation-augmented retrieval framework that retrieves both relevant factual and conceptual knowledge from dual sources (i.e., EHRs and the corpus), allowing them to interact and refine each another. Through extensive evaluation across three factual-aware medical question answering benchmarks, RGAR establishes a new state-of-the-art performance among medical RAG systems. Notably, the Llama-3.1-8B-Instruct model with RGAR surpasses the considerably larger, RAG-enhanced GPT-3.5. Our findings demonstrate the benefit of extracting factual knowledge for retrieval, which consistently yields improved generation quality.