Skip to content

zezhishao/DailyArXiv

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Daily Papers

The project automatically fetches the latest papers from arXiv based on keywords.

The subheadings in the README file represent the search keywords.

Only the most recent articles for each keyword are retained, up to a maximum of 100 papers.

You can click the 'Watch' button to receive daily email notifications.

Last update: 2024-12-23

Time Series

Title Date Abstract Comment
DroughtSet: Understanding Drought Through Spatial-Temporal Learning 2024-12-19
Show

Drought is one of the most destructive and expensive natural disasters, severely impacting natural resources and risks by depleting water resources and diminishing agricultural yields. Under climate change, accurately predicting drought is critical for mitigating drought-induced risks. However, the intricate interplay among the physical and biological drivers that regulate droughts limits the predictability and understanding of drought, particularly at a subseasonal to seasonal (S2S) time scale. While deep learning has been demonstrated with potential in addressing climate forecasting challenges, its application to drought prediction has received relatively less attention. In this work, we propose a new dataset, DroughtSet, which integrates relevant predictive features and three drought indices from multiple remote sensing and reanalysis datasets across the contiguous United States (CONUS). DroughtSet specifically provides the machine learning community with a new real-world dataset to benchmark drought prediction models and more generally, time-series forecasting methods. Furthermore, we propose a spatial-temporal model SPDrought to predict and interpret S2S droughts. Our model learns from the spatial and temporal information of physical and biological features to predict three types of droughts simultaneously. Multiple strategies are employed to quantify the importance of physical and biological features for drought prediction. Our results provide insights for researchers to better understand the predictability and sensitivity of drought to biological and physical conditions. We aim to contribute to the climate field by proposing a new tool to predict and understand the occurrence of droughts and provide the AI community with a new benchmark to study deep learning applications in climate science.

Accepted by AAAI25
Clustering of timed sequences -- Application to the analysis of care pathways 2024-12-19
Show

Improving the future of healthcare starts by better understanding the current actual practices in hospital settings. This motivates the objective of discovering typical care pathways from patient data. Revealing typical care pathways can be achieved through clustering. The difficulty in clustering care pathways, represented by sequences of timestamped events, lies in defining a semantically appropriate metric and clustering algorithms. In this article, we adapt two methods developed for time series to the clustering of timed sequences: the drop-DTW metric and the DBA approach for the construction of averaged time sequences. These methods are then applied in clustering algorithms to propose original and sound clustering algorithms for timed sequences. This approach is experimented with and evaluated on synthetic and real-world data.

Unveiling social vibrancy in urban spaces with app usage 2024-12-19
Show

Urban vibrancy is an important measure of the energetic nature of a city that is related to why and how people use urban spaces, and it is inherently connected with our social behaviour. Increasingly, people use a wide range of mobile phone apps in their daily lives to connect socially, search for information, make decisions, and arrange travel, amongst many other reasons. However, the relationship between online app usage and urban vibrancy remains unclear, particularly regarding how sociospatial behaviours interact with urban features. Here, we use app-usage data as a digital signature to investigate this question. To do this, we use a high-resolution data source of mobile service-level traffic volumes across eighteen cities in France. We investigate the social component of cities using socially relevant urban features constructed from OpenStreetMap 'Points of Interest'. We developed a methodology for identifying and classifying multidimensional app usage time series based on similarity. We used these in predictive models to interpret the results for each city and across France. Across cities, there were spatial behavioural archetypes, characterised by multidimensional properties. We found patterns between the week and the weekend, and across cities, and the country. These archetypes correspond to changes in socially relevant urban features that impact urban vibrancy. Our results add further evidence for the importance of using computational approaches to understand urban environments, the use of sociological concepts in computational science, and urban vibrancy in cities.

40 pa...

40 pages, 10 figures. Submitted to Computers, Environment and Urban Systems

An Analysis of Sea Level Spatial Variability by Topological Indicators and $k$-means Clustering Algorithm 2024-12-19
Show

The time-series data of sea level rise and fall contains crucial information on the variability of sea level patterns. Traditional $k$-means clustering is commonly used for categorizing regional variability of sea level, however, its results are not robust against a number of factors. This study analyzed fourteen datasets of monthly sea level in fourteen shoreline regions of Peninsular Malaysia. We applied a hybridization of clustering technique to analyze data categorization and topological data analysis method to enhance the performance of our clustering analysis. Specifically, our approach utilized the persistent homology and $k$-means/$k$-means++ clustering. The fourteen data sets from fourteen tide gauge stations were categorized in classes based on a prior categorization that was determined by topological information, and the probability of data points that belong to certain groups that is yielded by $k$-means/$k$-means++ clustering. Our results demonstrated that our method significantly improves the performance of traditional clustering techniques.

the p...

the paper contains error

Hybridization of Persistent Homology with Neural Networks for Time-Series Prediction: A Case Study in Wave Height 2024-12-19
Show

Time-series prediction is an active area of research across various fields, often challenged by the fluctuating influence of short-term and long-term factors. In this study, we introduce a feature engineering method that enhances the predictive performance of neural network models. Specifically, we leverage computational topology techniques to derive valuable topological features from input data, boosting the predictive accuracy of our models. Our focus is on predicting wave heights, utilizing models based on topological features within feedforward neural networks (FNNs), recurrent neural networks (RNNs), long short-term memory networks (LSTM), and RNNs with gated recurrent units (GRU). For time-ahead predictions, the enhancements in $R^2$ score were significant for FNNs, RNNs, LSTM, and GRU models. Additionally, these models also showed significant reductions in maximum errors and mean squared errors.

the p...

the paper contain errors

A Comprehensive Forecasting Framework based on Multi-Stage Hierarchical Forecasting Reconciliation and Adjustment 2024-12-19
Show

Ads demand forecasting for Walmart's ad products plays a critical role in enabling effective resource planning, allocation, and management of ads performance. In this paper, we introduce a comprehensive demand forecasting system that tackles hierarchical time series forecasting in business settings. Though traditional hierarchical reconciliation methods ensure forecasting coherence, they often trade off accuracy for coherence especially at lower levels and fail to capture the seasonality unique to each time-series in the hierarchy. Thus, we propose a novel framework "Multi-Stage Hierarchical Forecasting Reconciliation and Adjustment (Multi-Stage HiFoReAd)" to address the challenges of preserving seasonality, ensuring coherence, and improving accuracy. Our system first utilizes diverse models, ensembled through Bayesian Optimization (BO), achieving base forecasts. The generated base forecasts are then passed into the Multi-Stage HiFoReAd framework. The initial stage refines the hierarchy using Top-Down forecasts and "harmonic alignment." The second stage aligns the higher levels' forecasts using MinTrace algorithm, following which the last two levels undergo "harmonic alignment" and "stratified scaling", to eventually achieve accurate and coherent forecasts across the whole hierarchy. Our experiments on Walmart's internal Ads-demand dataset and 3 other public datasets, each with 4 hierarchical levels, demonstrate that the average Absolute Percentage Error from the cross-validation sets improve from 3% to 40% across levels against BO-ensemble of models (LGBM, MSTL+ETS, Prophet) as well as from 1.2% to 92.9% against State-Of-The-Art models. In addition, the forecasts at all hierarchical levels are proved to be coherent. The proposed framework has been deployed and leveraged by Walmart's ads, sales and operations teams to track future demands, make informed decisions and plan resources.

Publi...

Published in 2024 IEEE International Conference on Big Data (BigData)

Learning Deep Dissipative Dynamics 2024-12-19
Show

This study challenges strictly guaranteeing ``dissipativity'' of a dynamical system represented by neural networks learned from given time-series data. Dissipativity is a crucial indicator for dynamical systems that generalizes stability and input-output stability, known to be valid across various systems including robotics, biological systems, and molecular dynamics. By analytically proving the general solution to the nonlinear Kalman-Yakubovich-Popov (KYP) lemma, which is the necessary and sufficient condition for dissipativity, we propose a differentiable projection that transforms any dynamics represented by neural networks into dissipative ones and a learning method for the transformed dynamics. Utilizing the generality of dissipativity, our method strictly guarantee stability, input-output stability, and energy conservation of trained dynamical systems. Finally, we demonstrate the robustness of our method against out-of-domain input through applications to robotic arms and fluid dynamics. Code is https://github.com/kojima-r/DeepDissipativeModel

AAAI 2025
DualDynamics: Synergizing Implicit and Explicit Methods for Robust Irregular Time Series Analysis 2024-12-19
Show

Real-world time series analysis faces significant challenges when dealing with irregular and incomplete data. While Neural Differential Equation (NDE) based methods have shown promise, they struggle with limited expressiveness, scalability issues, and stability concerns. Conversely, Neural Flows offer stability but falter with irregular data. We introduce 'DualDynamics', a novel framework that synergistically combines NDE-based method and Neural Flow-based method. This approach enhances expressive power while balancing computational demands, addressing critical limitations of existing techniques. We demonstrate DualDynamics' effectiveness across diverse tasks: classification of robustness to dataset shift, irregularly-sampled series analysis, interpolation of missing data, and forecasting with partial observations. Our results show consistent outperformance over state-of-the-art methods, indicating DualDynamics' potential to advance irregular time series analysis significantly.

Publi...

Published at the 39th Annual AAAI Conference on Artificial Intelligence (AAAI 2025)

Leveraging Time Series Categorization and Temporal Fusion Transformers to Improve Cryptocurrency Price Forecasting 2024-12-19
Show

Organizing and managing cryptocurrency portfolios and decision-making on transactions is crucial in this market. Optimal selection of assets is one of the main challenges that requires accurate prediction of the price of cryptocurrencies. In this work, we categorize the financial time series into several similar subseries to increase prediction accuracy by learning each subseries category with similar behavior. For each category of the subseries, we create a deep learning model based on the attention mechanism to predict the next step of each subseries. Due to the limited amount of cryptocurrency data for training models, if the number of categories increases, the amount of training data for each model will decrease, and some complex models will not be trained well due to the large number of parameters. To overcome this challenge, we propose to combine the time series data of other cryptocurrencies to increase the amount of data for each category, hence increasing the accuracy of the models corresponding to each category.

Cherry-Picking in Time Series Forecasting: How to Select Datasets to Make Your Model Shine 2024-12-19
Show

The importance of time series forecasting drives continuous research and the development of new approaches to tackle this problem. Typically, these methods are introduced through empirical studies that frequently claim superior accuracy for the proposed approaches. Nevertheless, concerns are rising about the reliability and generalizability of these results due to limitations in experimental setups. This paper addresses a critical limitation: the number and representativeness of the datasets used. We investigate the impact of dataset selection bias, particularly the practice of cherry-picking datasets, on the performance evaluation of forecasting methods. Through empirical analysis with a diverse set of benchmark datasets, our findings reveal that cherry-picking datasets can significantly distort the perceived performance of methods, often exaggerating their effectiveness. Furthermore, our results demonstrate that by selectively choosing just four datasets - what most studies report - 46% of methods could be deemed best in class, and 77% could rank within the top three. Additionally, recent deep learning-based approaches show high sensitivity to dataset selection, whereas classical methods exhibit greater robustness. Finally, our results indicate that, when empirically validating forecasting algorithms on a subset of the benchmarks, increasing the number of datasets tested from 3 to 6 reduces the risk of incorrectly identifying an algorithm as the best one by approximately 40%. Our study highlights the critical need for comprehensive evaluation frameworks that more accurately reflect real-world scenarios. Adopting such frameworks will ensure the development of robust and reliable forecasting methods.

Proce...

Proceedings of the 39th AAAI Conference on Artificial Intelligence (AAAI-25), February 25-March 4, 2025, Philadelphia, Pennsylvania, USA

DTW+S: Shape-based Comparison of Time-series with Ordered Local Trend 2024-12-18
Show

Measuring distance or similarity between time-series data is a fundamental aspect of many applications including classification, clustering, and ensembling/alignment. Existing measures may fail to capture similarities among local trends (shapes) and may even produce misleading results. Our goal is to develop a measure that looks for similar trends occurring around similar times and is easily interpretable for researchers in applied domains. This is particularly useful for applications where time-series have a sequence of meaningful local trends that are ordered, such as in epidemics (a surge to an increase to a peak to a decrease). We propose a novel measure, DTW+S, which creates an interpretable "closeness-preserving" matrix representation of the time-series, where each column represents local trends, and then it applies Dynamic Time Warping to compute distances between these matrices. We present a theoretical analysis that supports the choice of this representation. We demonstrate the utility of DTW+S in several tasks. For the clustering of epidemic curves, we show that DTW+S is the only measure able to produce good clustering compared to the baselines. For ensemble building, we propose a combination of DTW+S and barycenter averaging that results in the best preservation of characteristics of the underlying trajectories. We also demonstrate that our approach results in better classification compared to Dynamic Time Warping for a class of datasets, particularly when local trends rather than scale play a decisive role.

Longe...

Longer version of the paper "Aligning Time-series by Local Trends: Applications in Public Health" accepted at The 39th Annual AAAI Conference on Artificial Intelligence (AAAI 2025)

Spatio-Temporal Forecasting of PM2.5 via Spatial-Diffusion guided Encoder-Decoder Architecture 2024-12-18
Show

In many problem settings that require spatio-temporal forecasting, the values in the time-series not only exhibit spatio-temporal correlations but are also influenced by spatial diffusion across locations. One such example is forecasting the concentration of fine particulate matter (PM2.5) in the atmosphere which is influenced by many complex factors, the most important ones being diffusion due to meteorological factors as well as transport across vast distances over a period of time. We present a novel Spatio-Temporal Graph Neural Network architecture, that specifically captures these dependencies to forecast the PM2.5 concentration. Our model is based on an encoder-decoder architecture where the encoder and decoder parts leverage gated recurrent units (GRU) augmented with a graph neural network (TransformerConv) to account for spatial diffusion. Our model can also be seen as a generalization of various existing models for time-series or spatio-temporal forecasting. We demonstrate the model's effectiveness on two real-world PM2.5 datasets: (1) data collected by us using a recently deployed network of low-cost PM$_{2.5}$ sensors from 511 locations spanning the entirety of the Indian state of Bihar over a period of one year, and (2) another publicly available dataset that covers severely polluted regions from China for a period of 4 years. Our experimental results show our model's impressive ability to account for both spatial as well as temporal dependencies precisely.

9 pag...

9 pages, 4 figures, International Conference on Data Science and Management of Data (CODS-COMAD), IIT Jodhpur, 2024

TimeCMA: Towards LLM-Empowered Multivariate Time Series Forecasting via Cross-Modality Alignment 2024-12-18
Show

Multivariate time series forecasting (MTSF) aims to learn temporal dynamics among variables to forecast future time series. Existing statistical and deep learning-based methods suffer from limited learnable parameters and small-scale training data. Recently, large language models (LLMs) combining time series with textual prompts have achieved promising performance in MTSF. However, we discovered that current LLM-based solutions fall short in learning disentangled embeddings. We introduce TimeCMA, an intuitive yet effective framework for MTSF via cross-modality alignment. Specifically, we present a dual-modality encoding with two branches: the time series encoding branch extracts disentangled yet weak time series embeddings, and the LLM-empowered encoding branch wraps the same time series with text as prompts to obtain entangled yet robust prompt embeddings. As a result, such a cross-modality alignment retrieves both disentangled and robust time series embeddings, ``the best of two worlds'', from the prompt embeddings based on time series and prompt modality similarities. As another key design, to reduce the computational costs from time series with their length textual prompts, we design an effective prompt to encourage the most essential temporal information to be encapsulated in the last token: only the last token is passed to downstream prediction. We further store the last token embeddings to accelerate inference speed. Extensive experiments on eight real datasets demonstrate that TimeCMA outperforms state-of-the-arts.

Accep...

Accepted by AAAI 2025 (Main Technical Track)

QuLTSF: Long-Term Time Series Forecasting with Quantum Machine Learning 2024-12-18
Show

Long-term time series forecasting (LTSF) involves predicting a large number of future values of a time series based on the past values and is an essential task in a wide range of domains including weather forecasting, stock market analysis, disease outbreak prediction. Over the decades LTSF algorithms have transitioned from statistical models to deep learning models like transformer models. Despite the complex architecture of transformer based LTSF models `Are Transformers Effective for Time Series Forecasting? (Zeng et al., 2023)' showed that simple linear models can outperform the state-of-the-art transformer based LTSF models. Recently, quantum machine learning (QML) is evolving as a domain to enhance the capabilities of classical machine learning models. In this paper we initiate the application of QML to LTSF problems by proposing QuLTSF, a simple hybrid QML model for multivariate LTSF. Through extensive experiments on a widely used weather dataset we show the advantages of QuLTSF over the state-of-the-art classical linear models, in terms of reduced mean squared error and mean absolute error.

submi...

submitted for conference publication

Context Matters: Leveraging Contextual Features for Time Series Forecasting 2024-12-18
Show

Time series forecasts are often influenced by exogenous contextual features in addition to their corresponding history. For example, in financial settings, it is hard to accurately predict a stock price without considering public sentiments and policy decisions in the form of news articles, tweets, etc. Though this is common knowledge, the current state-of-the-art (SOTA) forecasting models fail to incorporate such contextual information, owing to its heterogeneity and multimodal nature. To address this, we introduce ContextFormer, a novel plug-and-play method to surgically integrate multimodal contextual information into existing pre-trained forecasting models. ContextFormer effectively distills forecast-specific information from rich multimodal contexts, including categorical, continuous, time-varying, and even textual information, to significantly enhance the performance of existing base forecasters. ContextFormer outperforms SOTA forecasting models by up to 30% on a range of real-world datasets spanning energy, traffic, environmental, and financial domains.

AnchorInv: Few-Shot Class-Incremental Learning of Physiological Signals via Representation Space Guided Inversion 2024-12-18
Show

Deep learning models have demonstrated exceptional performance in a variety of real-world applications. These successes are often attributed to strong base models that can generalize to novel tasks with limited supporting data while keeping prior knowledge intact. However, these impressive results are based on the availability of a large amount of high-quality data, which is often lacking in specialized biomedical applications. In such fields, models are usually developed with limited data that arrive incrementally with novel categories. This requires the model to adapt to new information while preserving existing knowledge. Few-Shot Class-Incremental Learning (FSCIL) methods offer a promising approach to addressing these challenges, but they also depend on strong base models that face the same aforementioned limitations. To overcome these constraints, we propose AnchorInv following the straightforward and efficient buffer-replay strategy. Instead of selecting and storing raw data, AnchorInv generates synthetic samples guided by anchor points in the feature space. This approach protects privacy and regularizes the model for adaptation. When evaluated on three public physiological time series datasets, AnchorInv exhibits efficient knowledge forgetting prevention and improved adaptation to novel classes, surpassing state-of-the-art baselines.

AAAI-...

AAAI-25 Extended Version

PreMixer: MLP-Based Pre-training Enhanced MLP-Mixers for Large-scale Traffic Forecasting 2024-12-18
Show

In urban computing, precise and swift forecasting of multivariate time series data from traffic networks is crucial. This data incorporates additional spatial contexts such as sensor placements and road network layouts, and exhibits complex temporal patterns that amplify challenges for predictive learning in traffic management, smart mobility demand, and urban planning. Consequently, there is an increasing need to forecast traffic flow across broader geographic regions and for higher temporal coverage. However, current research encounters limitations because of the inherent inefficiency of model and their unsuitability for large-scale traffic network applications due to model complexity. This paper proposes a novel framework, named PreMixer, designed to bridge this gap. It features a predictive model and a pre-training mechanism, both based on the principles of Multi-Layer Perceptrons (MLP). The PreMixer comprehensively consider temporal dependencies of traffic patterns in different time windows and processes the spatial dynamics as well. Additionally, we integrate spatio-temporal positional encoding to manage spatiotemporal heterogeneity without relying on predefined graphs. Furthermore, our innovative pre-training model uses a simple patch-wise MLP to conduct masked time series modeling, learning from long-term historical data segmented into patches to generate enriched contextual representations. This approach enhances the downstream forecasting model without incurring significant time consumption or computational resource demands owing to improved learning efficiency and data handling flexibility. Our framework achieves comparable state-of-the-art performance while maintaining high computational efficiency, as verified by extensive experiments on large-scale traffic datasets.

Timer-XL: Long-Context Transformers for Unified Time Series Forecasting 2024-12-18
Show

We present Timer-XL, a generative Transformer for unified time series forecasting. To uniformly predict 1D and 2D time series, we generalize next token prediction, predominantly adopted for causal generation of 1D sequences, to multivariate next token prediction. The proposed paradigm uniformly formulates various forecasting scenarios as a long-context generation problem. We opt for the generative Transformer, which can capture global-range and causal dependencies while providing contextual flexibility, to implement unified forecasting on univariate series characterized by non-stationarity, multivariate time series with complicated dynamics and correlations, and covariate-informed contexts that include both endogenous and exogenous time series. Technically, we propose a universal TimeAttention to facilitate generative Transformers on multiple time series, which can effectively capture fine-grained intra- and inter-series dependencies of flattened time series tokens (patches), and is further enhanced by deftly designed position embeddings for the temporal and variable dimensions. Timer-XL achieves state-of-the-art performance across challenging forecasting benchmarks through a unified approach. Based on large-scale pre-training, Timer-XL also demonstrates notable zero-shot performance, making it a promising architecture for large time series models.

Geolocation Representation from Large Language Models are Generic Enhancers for Spatio-Temporal Learning 2024-12-18
Show

In the geospatial domain, universal representation models are significantly less prevalent than their extensive use in natural language processing and computer vision. This discrepancy arises primarily from the high costs associated with the input of existing representation models, which often require street views and mobility data. To address this, we develop a novel, training-free method that leverages large language models (LLMs) and auxiliary map data from OpenStreetMap to derive geolocation representations (LLMGeovec). LLMGeovec can represent the geographic semantics of city, country, and global scales, which acts as a generic enhancer for spatio-temporal learning. Specifically, by direct feature concatenation, we introduce a simple yet effective paradigm for enhancing multiple spatio-temporal tasks including geographic prediction (GP), long-term time series forecasting (LTSF), and graph-based spatio-temporal forecasting (GSTF). LLMGeovec can seamlessly integrate into a wide spectrum of spatio-temporal learning models, providing immediate enhancements. Experimental results demonstrate that LLMGeovec achieves global coverage and significantly boosts the performance of leading GP, LTSF, and GSTF models. Our codes are available at \url{https://github.com/Umaruchain/LLMGeovec}.

Accep...

Accepted at AAAI25 main track

RelCon: Relative Contrastive Learning for a Motion Foundation Model for Wearable Data 2024-12-17
Show

We present RelCon, a novel self-supervised Relative Contrastive learning approach that uses a learnable distance measure in combination with a softened contrastive loss for training an motion foundation model from wearable sensors. The learnable distance measure captures motif similarity and domain-specific semantic information such as rotation invariance. The learned distance provides a measurement of semantic similarity between a pair of accelerometer time-series segments, which is used to measure the distance between an anchor and various other sampled candidate segments. The self-supervised model is trained on 1 billion segments from 87,376 participants from a large wearables dataset. The model achieves strong performance across multiple downstream tasks, encompassing both classification and regression. To our knowledge, we are the first to show the generalizability of a self-supervised learning model with motion data from wearables across distinct evaluation tasks.

TKAN: Temporal Kolmogorov-Arnold Networks 2024-12-17
Show

Recurrent Neural Networks (RNNs) have revolutionized many areas of machine learning, particularly in natural language and data sequence processing. Long Short-Term Memory (LSTM) has demonstrated its ability to capture long-term dependencies in sequential data. Inspired by the Kolmogorov-Arnold Networks (KANs) a promising alternatives to Multi-Layer Perceptrons (MLPs), we proposed a new neural networks architecture inspired by KAN and the LSTM, the Temporal Kolomogorov-Arnold Networks (TKANs). TKANs combined the strenght of both networks, it is composed of Recurring Kolmogorov-Arnold Networks (RKANs) Layers embedding memory management. This innovation enables us to perform multi-step time series forecasting with enhanced accuracy and efficiency. By addressing the limitations of traditional models in handling complex sequential patterns, the TKAN architecture offers significant potential for advancements in fields requiring more than one step ahead forecasting.

Dual Interpretation of Machine Learning Forecasts 2024-12-17
Show

Machine learning predictions are typically interpreted as the sum of contributions of predictors. Yet, each out-of-sample prediction can also be expressed as a linear combination of in-sample values of the predicted variable, with weights corresponding to pairwise proximity scores between current and past economic events. While this dual route leads nowhere in some contexts (e.g., large cross-sectional datasets), it provides sparser interpretations in settings with many regressors and little training data-like macroeconomic forecasting. In this case, the sequence of contributions can be visualized as a time series, allowing analysts to explain predictions as quantifiable combinations of historical analogies. Moreover, the weights can be viewed as those of a data portfolio, inspiring new diagnostic measures such as forecast concentration, short position, and turnover. We show how weights can be retrieved seamlessly for (kernel) ridge regression, random forest, boosted trees, and neural networks. Then, we apply these tools to analyze post-pandemic forecasts of inflation, GDP growth, and recession probabilities. In all cases, the approach opens the black box from a new angle and demonstrates how machine learning models leverage history partly repeating itself.

The Temporal Vadalog System: Temporal Datalog-based Reasoning 2024-12-17
Show

In the wake of the recent resurgence of the Datalog language of databases, together with its extensions for ontological reasoning settings, this work aims to bridge the gap between the theoretical studies of DatalogMTL (Datalog extended with metric temporal logic) and the development of production-ready reasoning systems. In particular, we lay out the functional and architectural desiderata of a modern reasoner and propose our system, Temporal Vadalog. Leveraging the vast amount of experience from the database community, we go beyond the typical chase-based implementations of reasoners, and propose a set of novel techniques and a system that adopts a modern data pipeline architecture. We discuss crucial architectural choices, such as how to guarantee termination when infinitely many time intervals are possibly generated, how to merge intervals, and how to sustain a limited memory footprint. We discuss advanced features of the system, such as the support for time series, and present an extensive experimental evaluation. This paper is a substantially extended version of "The Temporal Vadalog System" as presented at RuleML+RR '22. Under consideration in Theory and Practice of Logic Programming (TPLP).

LiNo: Advancing Recursive Residual Decomposition of Linear and Nonlinear Patterns for Robust Time Series Forecasting 2024-12-17
Show

Forecasting models are pivotal in a data-driven world with vast volumes of time series data that appear as a compound of vast Linear and Nonlinear patterns. Recent deep time series forecasting models struggle to utilize seasonal and trend decomposition to separate the entangled components. Such a strategy only explicitly extracts simple linear patterns like trends, leaving the other linear modes and vast unexplored nonlinear patterns to the residual. Their flawed linear and nonlinear feature extraction models and shallow-level decomposition limit their adaptation to the diverse patterns present in real-world scenarios. Given this, we innovate Recursive Residual Decomposition by introducing explicit extraction of both linear and nonlinear patterns. This deeper-level decomposition framework, which is named LiNo, captures linear patterns using a Li block which can be a moving average kernel, and models nonlinear patterns using a No block which can be a Transformer encoder. The extraction of these two patterns is performed alternatively and recursively. To achieve the full potential of LiNo, we develop the current simple linear pattern extractor to a general learnable autoregressive model, and design a novel No block that can handle all essential nonlinear patterns. Remarkably, the proposed LiNo achieves state-of-the-art on thirteen real-world benchmarks under univariate and multivariate forecasting scenarios. Experiments show that current forecasting models can deliver more robust and precise results through this advanced Recursive Residual Decomposition. We hope this work could offer insight into designing more effective forecasting models. Code is available at this Repository: https://github.com/Levi-Ackman/LiNo.

Content-aware Balanced Spectrum Encoding in Masked Modeling for Time Series Classification 2024-12-17
Show

Due to the superior ability of global dependency, transformer and its variants have become the primary choice in Masked Time-series Modeling (MTM) towards time-series classification task. In this paper, we experimentally analyze that existing transformer-based MTM methods encounter with two under-explored issues when dealing with time series data: (1) they encode features by performing long-dependency ensemble averaging, which easily results in rank collapse and feature homogenization as the layer goes deeper; (2) they exhibit distinct priorities in fitting different frequency components contained in the time-series, inevitably leading to spectrum energy imbalance of encoded feature. To tackle these issues, we propose an auxiliary content-aware balanced decoder (CBD) to optimize the encoding quality in the spectrum space within masked modeling scheme. Specifically, the CBD iterates on a series of fundamental blocks, and thanks to two tailored units, each block could progressively refine the masked representation via adjusting the interaction pattern based on local content variations of time-series and learning to recalibrate the energy distribution across different frequency components. Moreover, a dual-constraint loss is devised to enhance the mutual optimization of vanilla decoder and our CBD. Extensive experimental results on ten time-series classification datasets show that our method nearly surpasses a bunch of baselines. Meanwhile, a series of explanatory results are showcased to sufficiently demystify the behaviors of our method.

13 pa...

13 pages, Accepted by AAAI 25

TimeCHEAT: A Channel Harmony Strategy for Irregularly Sampled Multivariate Time Series Analysis 2024-12-17
Show

Irregularly sampled multivariate time series (ISMTS) are prevalent in reality. Due to their non-uniform intervals between successive observations and varying sampling rates among series, the channel-independent (CI) strategy, which has been demonstrated more desirable for complete multivariate time series forecasting in recent studies, has failed. This failure can be further attributed to the sampling sparsity, which provides insufficient information for effective CI learning, thereby reducing its capacity. When we resort to the channel-dependent (CD) strategy, even higher capacity cannot mitigate the potential loss of diversity in learning similar embedding patterns across different channels. We find that existing work considers CI and CD strategies to be mutually exclusive, primarily because they apply these strategies to the global channel. However, we hold the view that channel strategies do not necessarily have to be used globally. Instead, by appropriately applying them locally and globally, we can create an opportunity to take full advantage of both strategies. This leads us to introduce the Channel Harmony ISMTS Transformer (TimeCHEAT), which utilizes the CD locally and the CI globally. Specifically, we segment the ISMTS into sub-series level patches. Locally, the CD strategy aggregates information within each patch for time embedding learning, maximizing the use of relevant observations while reducing long-range irrelevant interference. Here, we enhance generality by transforming embedding learning into an edge weight prediction task using bipartite graphs, eliminating the need for special prior knowledge. Globally, the CI strategy is applied across patches, allowing the Transformer to learn individualized attention patterns for each channel. Experimental results indicate our proposed TimeCHEAT demonstrates competitive SOTA performance across three mainstream tasks.

Accep...

Accepted by AAAI 2025

A Comparative Study of Pruning Methods in Transformer-based Time Series Forecasting 2024-12-17
Show

The current landscape in time-series forecasting is dominated by Transformer-based models. Their high parameter count and corresponding demand in computational resources pose a challenge to real-world deployment, especially for commercial and scientific applications with low-power embedded devices. Pruning is an established approach to reduce neural network parameter count and save compute. However, the implications and benefits of pruning Transformer-based models for time series forecasting are largely unknown. To close this gap, we provide a comparative benchmark study by evaluating unstructured and structured pruning on various state-of-the-art multivariate time series models. We study the effects of these pruning strategies on model predictive performance and computational aspects like model size, operations, and inference time. Our results show that certain models can be pruned even up to high sparsity levels, outperforming their dense counterpart. However, fine-tuning pruned models is necessary. Furthermore, we demonstrate that even with corresponding hardware and software support, structured pruning is unable to provide significant time savings.

16 pa...

16 pages, 5 figures, submitted to ACM Transactions on Intelligent Systems and Technology

Comparative Analysis of Zero-Shot Capability of Time-Series Foundation Models in Short-Term Load Prediction 2024-12-17
Show

Short-term load prediction (STLP) is critical for modern power distribution system operations, particularly as demand and generation uncertainties grow with the integration of low-carbon technologies, such as electric vehicles and photovoltaics. In this study, we evaluate the zero-shot prediction capabilities of five Time-Series Foundation Models (TSFMs)-a new approach for STLP where models perform predictions without task-specific training-against two classical models, Gaussian Process (GP) and Support Vector Regression (SVR), which are trained on task-specific datasets. Our findings indicate that even without training, TSFMs like Chronos, TimesFM, and TimeGPT can surpass the performance of GP and SVR. This finding highlights the potential of TSFMs in STLP.

Conformal Prediction on Quantifying Uncertainty of Dynamic Systems 2024-12-17
Show

Numerous studies have focused on learning and understanding the dynamics of physical systems from video data, such as spatial intelligence. Artificial intelligence requires quantitative assessments of the uncertainty of the model to ensure reliability. However, there is still a relative lack of systematic assessment of the uncertainties, particularly the uncertainties of the physical data. Our motivation is to introduce conformal prediction into the uncertainty assessment of dynamical systems, providing a method supported by theoretical guarantees. This paper uses the conformal prediction method to assess uncertainties with benchmark operator learning methods. We have also compared the Monte Carlo Dropout and Ensemble methods in the partial differential equations dataset, effectively evaluating uncertainty through straight roll-outs, making it ideal for time-series tasks.

First-order integer-valued autoregressive processes with Generalized Katz innovations 2024-12-17
Show

A new integer--valued autoregressive process (INAR) with Generalised Lagrangian Katz (GLK) innovations is defined. This process family provides a flexible modelling framework for count data, allowing for under and over--dispersion, asymmetry, and excess of kurtosis and includes standard INAR models such as Generalized Poisson and Negative Binomial as special cases. We show that the GLK--INAR process is discrete semi--self--decomposable, infinite divisible, stable by aggregation and provides stationarity conditions. Some extensions are discussed, such as the Markov--Switching and the zero--inflated GLK--INARs. A Bayesian inference framework and an efficient posterior approximation procedure are introduced. The proposed models are applied to 130 time series from Google Trend, which proxy the worldwide public concern about climate change. New evidence is found of heterogeneity across time, countries and keywords in the persistence, uncertainty, and long--run public awareness level.

Moving Aggregate Modified Autoregressive Copula-Based Time Series Models (MAGMAR-Copulas) 2024-12-17
Show

Copula-based time series models implicitly assume a finite Markov order. In reality a time series may not follow the Markov property. We modify the copula-based time series models by introducing a moving aggregate (MAG) part into the model updating equation. The functional form of the MAG-part is given as the inverse of a conditional copula. The resulting MAG-modified Autoregressive Copula-Based Time Series model (MAGMAR-Copula) is discussed in detail and distributional properties are derived in a D-vine framework. The model nests the classical ARMA model and can be interpreted as a non-linear generalization of the ARMA-model. The modeling performance is evaluated by modeling US inflation. Our model is competitive with benchmark models in terms of information criteria.

Modeling Temporal Dependencies within the Target for Long-Term Time Series Forecasting 2024-12-17
Show

Long-term time series forecasting (LTSF) is a critical task across diverse domains. Despite significant advancements in LTSF research, we identify a performance bottleneck in existing LTSF methods caused by the inadequate modeling of Temporal Dependencies within the Target (TDT). To address this issue, we propose a novel and generic temporal modeling framework, Temporal Dependency Alignment (TDAlign), that equips existing LTSF methods with TDT learning capabilities. TDAlign introduces two key innovations: 1) a loss function that aligns the change values between adjacent time steps in the predictions with those in the target, ensuring consistency with variation patterns, and 2) an adaptive loss balancing strategy that seamlessly integrates the new loss function with existing LTSF methods without introducing additional learnable parameters. As a plug-and-play framework, TDAlign enhances existing methods with minimal computational overhead, featuring only linear time complexity and constant space complexity relative to the prediction length. Extensive experiments on six strong LTSF baselines across seven real-world datasets demonstrate the effectiveness and flexibility of TDAlign. On average, TDAlign reduces baseline prediction errors by \textbf{1.47%} to \textbf{9.19%} and change value errors by \textbf{4.57%} to \textbf{15.78%}, highlighting its substantial performance improvements.

CSformer: Combining Channel Independence and Mixing for Robust Multivariate Time Series Forecasting 2024-12-17
Show

In the domain of multivariate time series analysis, the concept of channel independence has been increasingly adopted, demonstrating excellent performance due to its ability to eliminate noise and the influence of irrelevant variables. However, such a concept often simplifies the complex interactions among channels, potentially leading to information loss. To address this challenge, we propose a strategy of channel independence followed by mixing. Based on this strategy, we introduce CSformer, a novel framework featuring a two-stage multiheaded self-attention mechanism. This mechanism is designed to extract and integrate both channel-specific and sequence-specific information. Distinctively, CSformer employs parameter sharing to enhance the cooperative effects between these two types of information. Moreover, our framework effectively incorporates sequence and channel adapters, significantly improving the model's ability to identify important information across various dimensions. Extensive experiments on several real-world datasets demonstrate that CSformer achieves state-of-the-art results in terms of overall performance.

Accep...

Accepted by AAAI 2025

Enhanced Momentum with Momentum Transformers 2024-12-17
Show

The primary objective of this research is to build a Momentum Transformer that is expected to outperform benchmark time-series momentum and mean-reversion trading strategies. We extend the ideas introduced in the paper Trading with the Momentum Transformer: An Intelligent and Interpretable Architecture to equities as the original paper primarily only builds upon futures and equity indices. Unlike conventional Long Short-Term Memory (LSTM) models, which operate sequentially and are optimized for processing local patterns, an attention mechanism equips our architecture with direct access to all prior time steps in the training window. This hybrid design, combining attention with an LSTM, enables the model to capture long-term dependencies, enhance performance in scenarios accounting for transaction costs, and seamlessly adapt to evolving market conditions, such as those witnessed during the Covid Pandemic. We average 4.14% returns which is similar to the original papers results. Our Sharpe is lower at an average of 1.12 due to much higher volatility which may be due to stocks being inherently more volatile than futures and indices.

7 pages, 5 figures
GG-SSMs: Graph-Generating State Space Models 2024-12-17
Show

State Space Models (SSMs) are powerful tools for modeling sequential data in computer vision and time series analysis domains. However, traditional SSMs are limited by fixed, one-dimensional sequential processing, which restricts their ability to model non-local interactions in high-dimensional data. While methods like Mamba and VMamba introduce selective and flexible scanning strategies, they rely on predetermined paths, which fails to efficiently capture complex dependencies. We introduce Graph-Generating State Space Models (GG-SSMs), a novel framework that overcomes these limitations by dynamically constructing graphs based on feature relationships. Using Chazelle's Minimum Spanning Tree algorithm, GG-SSMs adapt to the inherent data structure, enabling robust feature propagation across dynamically generated graphs and efficiently modeling complex dependencies. We validate GG-SSMs on 11 diverse datasets, including event-based eye-tracking, ImageNet classification, optical flow estimation, and six time series datasets. GG-SSMs achieve state-of-the-art performance across all tasks, surpassing existing methods by significant margins. Specifically, GG-SSM attains a top-1 accuracy of 84.9% on ImageNet, outperforming prior SSMs by 1%, reducing the KITTI-15 error rate to 2.77%, and improving eye-tracking detection rates by up to 0.33% with fewer parameters. These results demonstrate that dynamic scanning based on feature relationships significantly improves SSMs' representational power and efficiency, offering a versatile tool for various applications in computer vision and beyond.

11 pa...

11 pages, 7 tables, 2 figures

Deep-learning-based identification of individual motion characteristics from upper-limb trajectories towards disorder stage evaluation 2024-12-16
Show

The identification of individual movement characteristics sets the foundation for the assessment of personal rehabilitation progress and can provide diagnostic information on levels and stages of movement disorders. This work presents a preliminary study for differentiating individual motion patterns using a dataset of 3D upper-limb transport trajectories measured in task-space. Identifying individuals by deep time series learning can be a key step to abstracting individual motion properties. In this study, a classification accuracy of about 95% is reached for a subset of nine, and about 78% for the full set of 31 individuals. This provides insights into the separability of patient attributes by exerting a simple standardized task to be transferred to portable systems.

Quantum open system identification via global optimization: Optimally accurate Markovian models of open systems from time-series data 2024-12-16
Show

Accurate models of the dynamics of quantum circuits are essential for optimizing and advancing quantum devices. Since first-principles models of environmental noise and dissipation in real quantum systems are often unavailable, deriving accurate models from measured time-series data is critical. However, identifying open quantum systems poses significant challenges: powerful methods from systems engineering can perform poorly beyond weak damping (as we show) because they fail to incorporate essential constraints required for quantum evolution (e.g., positivity). Common methods that can include these constraints are typically multi-step, fitting linear models to physically grounded master equations, often resulting in non-convex functions in which local optimization algorithms get stuck in local extrema (as we show). In this work, we solve these problems by formulating quantum system identification directly from data as a polynomial optimization problem, enabling the use of recently developed global optimization methods. These methods are essentially guaranteed to reach global optima, allowing us for the first time to efficiently obtain the most accurate Markovian model for a given system. In addition to its practical importance, this allows us to take the error of these Markovian models as an alternative (operational) measure of the non-Markovianity of a system. We test our method with the spin-boson model -- a two-level system coupled to a bath of harmonic oscillators -- for which we obtain the exact evolution using matrix-product-state techniques. We show that polynomial optimization using moment/sum-of-squares approaches significantly outperforms traditional optimization algorithms, and we show that even for strong damping Lindblad-form master equations can provide accurate models of the spin-boson system.

signi...

significantly updated manuscript

Deep Learning for Hydroelectric Optimization: Generating Long-Term River Discharge Scenarios with Ensemble Forecasts from Global Circulation Models 2024-12-16
Show

Hydroelectric power generation is a critical component of the global energy matrix, particularly in countries like Brazil, where it represents the majority of the energy supply. However, its strong dependence on river discharges, which are inherently uncertain due to climate variability, poses significant challenges. River discharges are linked to precipitation patterns, making the development of accurate probabilistic forecasting models crucial for improving operational planning in systems heavily reliant on this resource. Traditionally, statistical models have been used to represent river discharges in energy optimization. Yet, these models are increasingly unable to produce realistic scenarios due to structural shifts in climate behavior. Changes in precipitation patterns have altered discharge dynamics, which traditional approaches struggle to capture. Machine learning methods, while effective as universal predictors for time series, often focus solely on historical data, ignoring key external factors such as meteorological and climatic conditions. Furthermore, these methods typically lack a probabilistic framework, which is vital for representing the inherent variability of hydrological processes. The limited availability of historical discharge data further complicates the application of large-scale deep learning models to this domain. To address these challenges, we propose a framework based on a modified recurrent neural network architecture. This model generates parameterized probability distributions conditioned on projections from global circulation models, effectively accounting for the stochastic nature of river discharges. Additionally, the architecture incorporates enhancements to improve its generalization capabilities. We validate this framework within the Brazilian Interconnected System, using projections from the SEAS5-ECMWF system as conditional variables.

11 pages, 15 figures
Risk and cross validation in ridge regression with correlated samples 2024-12-16
Show

Recent years have seen substantial advances in our understanding of high-dimensional ridge regression, but existing theories assume that training examples are independent. By leveraging techniques from random matrix theory and free probability, we provide sharp asymptotics for the in- and out-of-sample risks of ridge regression when the data points have arbitrary correlations. We demonstrate that in this setting, the generalized cross validation estimator (GCV) fails to correctly predict the out-of-sample risk. However, in the case where the noise residuals have the same correlations as the data points, one can modify the GCV to yield an efficiently-computable unbiased estimator that concentrates in the high-dimensional limit, which we dub CorrGCV. We further extend our asymptotic analysis to the case where the test point has nontrivial correlations with the training set, a setting often encountered in time series forecasting. Assuming knowledge of the correlation structure of the time series, this again yields an extension of the GCV estimator, and sharply characterizes the degree to which such test points yield an overly optimistic prediction of long-time risk. We validate the predictions of our theory across a variety of high dimensional data.

44 pa...

44 pages, 18 figures. v3: minor typos fixed

Spatiotemporal Persistence Landscapes 2024-12-16
Show

A method to apply and visualize persistent homology of time series is proposed. The method captures persistent features in space and time, in contrast to the existing procedures, where one usually chooses one while keeping the other fixed. An extended zigzag module that is built from a time series is defined. This module combines ideas from zigzag persistent homology and multiparameter persistent homology. Persistence landscapes are defined for the case of extended zigzag modules using a recent generalization of the rank invariant (Kim, M'emoli, 2021). This new invariant is called spatiotemporal persistence landscapes. Under certain finiteness assumptions, spatiotemporal persistence landscapes are a family of functions that take values in Lebesgue spaces, endowing the space of persistence landscapes with a distance. Stability of this invariant is shown with respect to an adapted interleaving distance for extended zigzag modules. Being an invariant that takes values in a Banach space, spatiotemporal persistence landscapes can be used for statistical analysis as well as for input to machine learning algorithms.

33 pages
FDR Control for Online Anomaly Detection 2024-12-16
Show

A new online multiple testing procedure is described in the context of anomaly detection, which controls the False Discovery Rate (FDR). An accurate anomaly detector must control the false positive rate at a prescribed level while keeping the false negative rate as low as possible. However in the online context, such a constraint remains highly challenging due to the usual lack of FDR control: the online framework makes it impossible to use classical multiple testing approaches such as the Benjamini-Hochberg (BH) procedure, which would require knowing the entire time series. The developed strategy relies on exploiting the local control of the ``modified FDR'' (mFDR) criterion. It turns out that the local control of mFDR enables global control of the FDR over the full series up to additional modifications of the multiple testing procedures. An important ingredient in this control is the cardinality of the calibration dataset used to compute the empirical p-values. A dedicated strategy for tuning this parameter is designed for achieving the prescribed FDR control over the entire time series. The good statistical performance of the full strategy is analyzed by theoretical guarantees. Its practical behavior is assessed by several simulation experiments which support our conclusions.

Proactive Model Adaptation Against Concept Drift for Online Time Series Forecasting 2024-12-16
Show

Time series forecasting always faces the challenge of concept drift, where data distributions evolve over time, leading to a decline in forecast model performance. Existing solutions are based on online learning, which continually organize recent time series observations as new training samples and update model parameters according to the forecasting feedback on recent data. However, they overlook a critical issue: obtaining ground-truth future values of each sample should be delayed until after the forecast horizon. This delay creates a temporal gap between the training samples and the test sample. Our empirical analysis reveals that the gap can introduce concept drift, causing forecast models to adapt to outdated concepts. In this paper, we present \textsc{Proceed}, a novel proactive model adaptation framework for online time series forecasting. \textsc{Proceed} first estimates the concept drift between the recently used training samples and the current test sample. It then employs an adaptation generator to efficiently translate the estimated drift into parameter adjustments, proactively adapting the model to the test sample. To enhance the generalization capability of the framework, \textsc{Proceed} is trained on synthetic diverse concept drifts. Extensive experiments on five real-world datasets across various forecast models demonstrate that \textsc{Proceed} brings more performance improvements than the state-of-the-art online learning methods, significantly facilitating forecast models' resilience against concept drifts. Code is available at \url{https://github.com/SJTU-DMTai/OnlineTSF}.

Accep...

Accepted by KDD 2025. Preprint version

Enhancing Transformer-based models for Long Sequence Time Series Forecasting via Structured Matrix 2024-12-16
Show

Recently, Transformer-based models for long sequence time series forecasting have demonstrated promising results. The self-attention mechanism as the core component of these Transformer-based models exhibits great potential in capturing various dependencies among data points. Despite these advancements, it has been a subject of concern to improve the efficiency of the self-attention mechanism. Unfortunately, current specific optimization methods are facing the challenges in applicability and scalability for the future design of long sequence time series forecasting models. Hence, in this article, we propose a novel architectural framework that enhances Transformer-based models through the integration of Surrogate Attention Blocks (SAB) and Surrogate Feed-Forward Neural Network Blocks (SFB). The framework reduces both time and space complexity by the replacement of the self-attention and feed-forward layers with SAB and SFB while maintaining their expressive power and architectural advantages. The equivalence of this substitution is fully demonstrated. The extensive experiments on 10 Transformer-based models across five distinct time series tasks demonstrate an average performance improvement of 12.4%, alongside 61.3% reduction in parameter counts.

Slice it up: Unmasking User Identities in Smartwatch Health Data 2024-12-16
Show

Wearables are widely used for health data collection due to their availability and advanced sensors, enabling smart health applications like stress detection. However, the sensitivity of personal health data raises significant privacy concerns. While user de-identification by removing direct identifiers such as names and addresses is commonly employed to protect privacy, the data itself can still be exploited to re-identify individuals. We introduce a novel framework for similarity-based Dynamic Time Warping (DTW) re-identification attacks on time series health data. Using the WESAD dataset and two larger synthetic datasets, we demonstrate that even short segments of sensor data can achieve perfect re-identification with our Slicing-DTW-Attack. Our attack is independent of training data and computes similarity rankings in about 2 minutes for 10,000 subjects on a single CPU core. These findings highlight that de-identification alone is insufficient to protect privacy. As a defense, we show that adding random noise to the signals significantly reduces re-identification risk while only moderately affecting usability in stress detection tasks, offering a promising approach to balancing privacy and utility.

Accep...

Accepted at 20th ACM ASIA Conference on Computer and Communications Security (AsiaCCS 2025)

Multimodal LLM for Intelligent Transportation Systems 2024-12-16
Show

In the evolving landscape of transportation systems, integrating Large Language Models (LLMs) offers a promising frontier for advancing intelligent decision-making across various applications. This paper introduces a novel 3-dimensional framework that encapsulates the intersection of applications, machine learning methodologies, and hardware devices, particularly emphasizing the role of LLMs. Instead of using multiple machine learning algorithms, our framework uses a single, data-centric LLM architecture that can analyze time series, images, and videos. We explore how LLMs can enhance data interpretation and decision-making in transportation. We apply this LLM framework to different sensor datasets, including time-series data and visual data from sources like Oxford Radar RobotCar, D-Behavior (D-Set), nuScenes by Motional, and Comma2k19. The goal is to streamline data processing workflows, reduce the complexity of deploying multiple models, and make intelligent transportation systems more efficient and accurate. The study was conducted using state-of-the-art hardware, leveraging the computational power of AMD RTX 3060 GPUs and Intel i9-12900 processors. The experimental results demonstrate that our framework achieves an average accuracy of 91.33% across these datasets, with the highest accuracy observed in time-series data (92.7%), showcasing the model's proficiency in handling sequential information essential for tasks such as motion planning and predictive maintenance. Through our exploration, we demonstrate the versatility and efficacy of LLMs in handling multimodal data within the transportation sector, ultimately providing insights into their application in real-world scenarios. Our findings align with the broader conference themes, highlighting the transformative potential of LLMs in advancing transportation technologies.

Accep...

Accepted at IEEE Symposium Series on Computational Intelligence (SSCI) 2025

EDformer: Embedded Decomposition Transformer for Interpretable Multivariate Time Series Predictions 2024-12-16
Show

Time series forecasting is a crucial challenge with significant applications in areas such as weather prediction, stock market analysis, and scientific simulations. This paper introduces an embedded decomposed transformer, 'EDformer', for multivariate time series forecasting tasks. Without altering the fundamental elements, we reuse the Transformer architecture and consider the capable functions of its constituent parts in this work. Edformer first decomposes the input multivariate signal into seasonal and trend components. Next, the prominent multivariate seasonal component is reconstructed across the reverse dimensions, followed by applying the attention mechanism and feed-forward network in the encoder stage. In particular, the feed-forward network is used for each variable frame to learn nonlinear representations, while the attention mechanism uses the time points of individual seasonal series embedded within variate frames to capture multivariate correlations. Therefore, the trend signal is added with projection and performs the final forecasting. The EDformer model obtains state-of-the-art predicting results in terms of accuracy and efficiency on complex real-world time series datasets. This paper also addresses model explainability techniques to provide insights into how the model makes its predictions and why specific features or time steps are important, enhancing the interpretability and trustworthiness of the forecasting results.

Apollo-Forecast: Overcoming Aliasing and Inference Speed Challenges in Language Models for Time Series Forecasting 2024-12-16
Show

Encoding time series into tokens and using language models for processing has been shown to substantially augment the models' ability to generalize to unseen tasks. However, existing language models for time series forecasting encounter several obstacles, including aliasing distortion and prolonged inference times, primarily due to the limitations of quantization processes and the computational demands of large models. This paper introduces Apollo-Forecast, a novel framework that tackles these challenges with two key innovations: the Anti-Aliasing Quantization Module (AAQM) and the Race Decoding (RD) technique. AAQM adeptly encodes sequences into tokens while mitigating high-frequency noise in the original signals, thus enhancing both signal fidelity and overall quantization efficiency. RD employs a draft model to enable parallel processing and results integration, which markedly accelerates the inference speed for long-term predictions, particularly in large-scale models. Extensive experiments on various real-world datasets show that Apollo-Forecast outperforms state-of-the-art methods by 35.41% and 18.99% in WQL and MASE metrics, respectively, in zero-shot scenarios. Furthermore, our method achieves a 1.9X-2.7X acceleration in inference speed over baseline methods.

TS-SatFire: A Multi-Task Satellite Image Time-Series Dataset for Wildfire Detection and Prediction 2024-12-16
Show

Wildfire monitoring and prediction are essential for understanding wildfire behaviour. With extensive Earth observation data, these tasks can be integrated and enhanced through multi-task deep learning models. We present a comprehensive multi-temporal remote sensing dataset for active fire detection, daily wildfire monitoring, and next-day wildfire prediction. Covering wildfire events in the contiguous U.S. from January 2017 to October 2021, the dataset includes 3552 surface reflectance images and auxiliary data such as weather, topography, land cover, and fuel information, totalling 71 GB. The lifecycle of each wildfire is documented, with labels for active fires (AF) and burned areas (BA), supported by manual quality assurance of AF and BA test labels. The dataset supports three tasks: a) active fire detection, b) daily burned area mapping, and c) wildfire progression prediction. Detection tasks use pixel-wise classification of multi-spectral, multi-temporal images, while prediction tasks integrate satellite and auxiliary data to model fire dynamics. This dataset and its benchmarks provide a foundation for advancing wildfire research using deep learning.

Modeling Latent Non-Linear Dynamical System over Time Series 2024-12-16
Show

We study the problem of modeling a non-linear dynamical system when given a time series by deriving equations directly from the data. Despite the fact that time series data are given as input, models for dynamics and estimation algorithms that incorporate long-term temporal dependencies are largely absent from existing studies. In this paper, we introduce a latent state to allow time-dependent modeling and formulate this problem as a dynamics estimation problem in latent states. We face multiple technical challenges, including (1) modeling latent non-linear dynamics and (2) solving circular dependencies caused by the presence of latent states. To tackle these challenging problems, we propose a new method, Latent Non-Linear equation modeling (LaNoLem), that can model a latent non-linear dynamical system and a novel alternating minimization algorithm for effectively estimating latent states and model parameters. In addition, we introduce criteria to control model complexity without human intervention. Compared with the state-of-the-art model, LaNoLem achieves competitive performance for estimating dynamics while outperforming other methods in prediction.

accepted at AAAI'25
Are Large Language Models Useful for Time Series Data Analysis? 2024-12-16
Show

Time series data plays a critical role across diverse domains such as healthcare, energy, and finance, where tasks like classification, anomaly detection, and forecasting are essential for informed decision-making. Recently, large language models (LLMs) have gained prominence for their ability to handle complex data and extract meaningful insights. This study investigates whether LLMs are effective for time series data analysis by comparing their performance with non-LLM-based approaches across three tasks: classification, anomaly detection, and forecasting. Through a series of experiments using GPT4TS and autoregressive models, we evaluate their performance on benchmark datasets and assess their accuracy, precision, and ability to generalize. Our findings indicate that while LLM-based methods excel in specific tasks like anomaly detection, their benefits are less pronounced in others, such as forecasting, where simpler models sometimes perform comparably or better. This research highlights the role of LLMs in time series analysis and lays the groundwork for future studies to systematically explore their applications and limitations in handling temporal data.

ChatTime: A Unified Multimodal Time Series Foundation Model Bridging Numerical and Textual Data 2024-12-16
Show

Human experts typically integrate numerical and textual multimodal information to analyze time series. However, most traditional deep learning predictors rely solely on unimodal numerical data, using a fixed-length window for training and prediction on a single dataset, and cannot adapt to different scenarios. The powered pre-trained large language model has introduced new opportunities for time series analysis. Yet, existing methods are either inefficient in training, incapable of handling textual information, or lack zero-shot forecasting capability. In this paper, we innovatively model time series as a foreign language and construct ChatTime, a unified framework for time series and text processing. As an out-of-the-box multimodal time series foundation model, ChatTime provides zero-shot forecasting capability and supports bimodal input/output for both time series and text. We design a series of experiments to verify the superior performance of ChatTime across multiple tasks and scenarios, and create four multimodal datasets to address data gaps. The experimental results demonstrate the potential and utility of ChatTime.

Accep...

Accepted by AAAI 2025

Individual Bus Trip Chain Prediction and Pattern Identification Considering Similarities 2024-12-16
Show

Predicting future bus trip chains for an existing user is of great significance for operators of public transit systems. Existing methods always treat this task as a time-series prediction problem, but the 1-dimensional time series structure cannot express the complex relationship between trips. To better capture the inherent patterns in bus travel behavior, this paper proposes a novel approach that synthesizes future bus trip chains based on those from similar days. Key similarity patterns are defined and tested using real-world data, and a similarity function is then developed to capture these patterns. Afterwards, a graph is constructed where each day is represented as a node and edge weight reflects the similarity between days. Besides, the trips on a given day can be regarded as labels for each node, transferring the bus trip chain prediction problem to a semi-supervised classification problem on a graph. To address this, we propose several methods and validate them on a real-world dataset of 10000 bus users, achieving state-of-the-art prediction results. Analyzing the parameters of similarity function reveals some interesting bus usage patterns, allowing us can to cluster bus users into three types: repeat-dominated, evolve-dominate and repeat-evolve balanced. In summary, our work demonstrates the effectiveness of similarity-based prediction for bus trip chains and provides a new perspective for analyzing individual bus travel patterns. The code for our prediction model is publicly available.

Grassmannian Geometry Meets Dynamic Mode Decomposition in DMD-GEN: A New Metric for Mode Collapse in Time Series Generative Models 2024-12-15
Show

Generative models like Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs) often fail to capture the full diversity of their training data, leading to mode collapse. While this issue is well-explored in image generation, it remains underinvestigated for time series data. We introduce a new definition of mode collapse specific to time series and propose a novel metric, DMD-GEN, to quantify its severity. Our metric utilizes Dynamic Mode Decomposition (DMD), a data-driven technique for identifying coherent spatiotemporal patterns, and employs Optimal Transport between DMD eigenvectors to assess discrepancies between the underlying dynamics of the original and generated data. This approach not only quantifies the preservation of essential dynamic characteristics but also provides interpretability by pinpointing which modes have collapsed. We validate DMD-GEN on both synthetic and real-world datasets using various generative models, including TimeGAN, TimeVAE, and DiffusionTS. The results demonstrate that DMD-GEN correlates well with traditional evaluation metrics for static data while offering the advantage of applicability to dynamic data. This work offers for the first time a definition of mode collapse for time series, improving understanding, and forming the basis of our tool for assessing and improving generative models in the time series domain.

Transformer-Based Bearing Fault Detection using Temporal Decomposition Attention Mechanism 2024-12-15
Show

Bearing fault detection is a critical task in predictive maintenance, where accurate and timely fault identification can prevent costly downtime and equipment damage. Traditional attention mechanisms in Transformer neural networks often struggle to capture the complex temporal patterns in bearing vibration data, leading to suboptimal performance. To address this limitation, we propose a novel attention mechanism, Temporal Decomposition Attention (TDA), which combines temporal bias encoding with seasonal-trend decomposition to capture both long-term dependencies and periodic fluctuations in time series data. Additionally, we incorporate the Hull Exponential Moving Average (HEMA) for feature extraction, enabling the model to effectively capture meaningful characteristics from the data while reducing noise. Our approach integrates TDA into the Transformer architecture, allowing the model to focus separately on the trend and seasonal components of the data. Experimental results on the Case Western Reserve University (CWRU) bearing fault detection dataset demonstrate that our approach outperforms traditional attention mechanisms and achieves state-of-the-art performance in terms of accuracy and interpretability. The HEMA-Transformer-TDA model achieves an accuracy of 98.1%, with exceptional precision, recall, and F1-scores, demonstrating its effectiveness in bearing fault detection and its potential for application in other time series tasks with seasonal patterns or trends.

Deep Learning-based Approaches for State Space Models: A Selective Review 2024-12-15
Show

State-space models (SSMs) offer a powerful framework for dynamical system analysis, wherein the temporal dynamics of the system are assumed to be captured through the evolution of the latent states, which govern the values of the observations. This paper provides a selective review of recent advancements in deep neural network-based approaches for SSMs, and presents a unified perspective for discrete time deep state space models and continuous time ones such as latent neural Ordinary Differential and Stochastic Differential Equations. It starts with an overview of the classical maximum likelihood based approach for learning SSMs, reviews variational autoencoder as a general learning pipeline for neural network-based approaches in the presence of latent variables, and discusses in detail representative deep learning models that fall under the SSM framework. Very recent developments, where SSMs are used as standalone architectural modules for improving efficiency in sequence modeling, are also examined. Finally, examples involving mixed frequency and irregularly-spaced time series data are presented to demonstrate the advantage of SSMs in these settings.

Semi-Supervised Risk Control via Prediction-Powered Inference 2024-12-15
Show

The risk-controlling prediction sets (RCPS) framework is a general tool for transforming the output of any machine learning model to design a predictive rule with rigorous error rate control. The key idea behind this framework is to use labeled hold-out calibration data to tune a hyper-parameter that affects the error rate of the resulting prediction rule. However, the limitation of such a calibration scheme is that with limited hold-out data, the tuned hyper-parameter becomes noisy and leads to a prediction rule with an error rate that is often unnecessarily conservative. To overcome this sample-size barrier, we introduce a semi-supervised calibration procedure that leverages unlabeled data to rigorously tune the hyper-parameter without compromising statistical validity. Our procedure builds upon the prediction-powered inference framework, carefully tailoring it to risk-controlling tasks. We demonstrate the benefits and validity of our proposal through two real-data experiments: few-shot image classification and early time series classification.

Learning Latent Spaces for Domain Generalization in Time Series Forecasting 2024-12-15
Show

Time series forecasting is vital in many real-world applications, yet developing models that generalize well on unseen relevant domains -- such as forecasting web traffic data on new platforms/websites or estimating e-commerce demand in new regions -- remains underexplored. Existing forecasting models often struggle with domain shifts in time series data, as the temporal patterns involve complex components like trends, seasonality, etc. While some prior work addresses this by matching feature distributions across domains or disentangling domain-shared features using label information, they fail to reveal insights into the latent temporal dependencies, which are critical for identifying common patterns across domains and achieving generalization. We propose a framework for domain generalization in time series forecasting by mining the latent factors that govern temporal dependencies across domains. Our approach uses a decomposition-based architecture with a new Conditional $\beta$-Variational Autoencoder (VAE), wherein time series data is first decomposed into trend-cyclical and seasonal components, each modeled independently through separate $\beta$-VAE modules. The $\beta$-VAE aims to capture disentangled latent factors that control temporal dependencies across domains. We enhance the learning of domain-specific information with a decoder-conditional design and introduce domain regularization to improve the separation of domain-shared and domain-specific latent factors. Our proposed method is flexible and can be applied to various time series forecasting models, enabling effective domain generalization with simplicity and efficiency. We validate its effectiveness on five real-world time series datasets, covering web traffic, e-commerce, finance and power consumption, demonstrating improved generalization performance over state-of-the-art methods.

18 pa...

18 pages with 8 figures

Missing data imputation for noisy time-series data and applications in healthcare 2024-12-15
Show

Healthcare time series data is vital for monitoring patient activity but often contains noise and missing values due to various reasons such as sensor errors or data interruptions. Imputation, i.e., filling in the missing values, is a common way to deal with this issue. In this study, we compare imputation methods, including Multiple Imputation with Random Forest (MICE-RF) and advanced deep learning approaches (SAITS, BRITS, Transformer) for noisy, missing time series data in terms of MAE, F1-score, AUC, and MCC, across missing data rates (10 % - 80 %). Our results show that MICE-RF can effectively impute missing data compared to deep learning methods and the improvement in classification of data imputed indicates that imputation can have denoising effects. Therefore, using an imputation algorithm on time series with missing data can, at the same time, offer denoising effects.

Hierarchical Bidirectional Transition Dispersion Entropy-based Lempel-Ziv Complexity and Its Application in Fault-Bearing Diagnosis 2024-12-15
Show

Lempel-Ziv complexity (LZC) is a key measure for detecting the irregularity and complexity of nonlinear time series and has seen various improvements in recent decades. However, existing LZC-based metrics, such as Permutation Lempel-Ziv complexity (PLZC) and Dispersion-Entropy based Lempel-Ziv complexity (DELZC), focus mainly on patterns of independent embedding vectors, often overlooking the transition patterns within the time series. To address this gap, this paper introduces a novel LZC-based method called Bidirectional Transition Dispersion Entropy-based Lempel-Ziv complexity (BT-DELZC). Leveraging Markov chain theory, this method integrates a bidirectional transition network framework with DELZC to better capture dynamic signal information. Additionally, an improved hierarchical decomposition algorithm is used to extract features from various frequency components of the time series. The proposed BT-DELZC method is first evaluated through four simulated experiments, demonstrating its robustness and effectiveness in characterizing nonlinear time series. Additionally, two fault-bearing diagnosis experiments are conducted by combining the hierarchical BT-DELZC method with various classifiers from the machine learning domain. The results indicate that BT-DELZC achieves the highest accuracy across both datasets, significantly outperforming existing methods such as LZC, PLZC, and DELZC in extracting features related to fault bearings.

Transparent Networks for Multivariate Time Series 2024-12-15
Show

Transparent models, which are machine learning models that produce inherently interpretable predictions, are receiving significant attention in high-stakes domains. However, despite much real-world data being collected as time series, there is a lack of studies on transparent time series models. To address this gap, we propose a novel transparent neural network model for time series called Generalized Additive Time Series Model (GATSM). GATSM consists of two parts: 1) independent feature networks to learn feature representations, and 2) a transparent temporal module to learn temporal patterns across different time steps using the feature representations. This structure allows GATSM to effectively capture temporal patterns and handle dynamic-length time series while preserving transparency. Empirical experiments show that GATSM significantly outperforms existing generalized additive models and achieves comparable performance to black-box time series models, such as recurrent neural networks and Transformer. In addition, we demonstrate that GATSM finds interesting patterns in time series. The source code is available at https://github.com/gim4855744/GATSM.

Addit...

Additional experiments are added in appendix

Unsupervised Learning Approach to Anomaly Detection in Gravitational Wave Data 2024-12-14
Show

Gravitational waves (GW), predicted by Einstein's General Theory of Relativity, provide a powerful probe of astrophysical phenomena and fundamental physics. In this work, we propose an unsupervised anomaly detection method using variational autoencoders (VAEs) to analyze GW time-series data. By training on noise-only data, the VAE accurately reconstructs noise inputs while failing to reconstruct anomalies, such as GW signals, which results in measurable spikes in the reconstruction error. The method was applied to data from the LIGO H1 and L1 detectors. Evaluation on testing datasets containing both noise and GW events demonstrated reliable detection, achieving an area under the ROC curve (AUC) of 0.89. This study introduces VAEs as a robust, unsupervised approach for identifying anomalies in GW data, which offers a scalable framework for detecting known and potentially new phenomena in physics.

The w...

The work is still in progress

Multiscale Autoregression on Adaptively Detected Timescales 2024-12-14
Show

We propose a multiscale approach to time series autoregression, in which linear regressors for the process in question include features of its own path that live on multiple timescales. We take these multiscale features to be the recent averages of the process over multiple timescales, whose number or spans are not known to the analyst and are estimated from the data via a change-point detection technique. The resulting construction, termed Adaptive Multiscale AutoRegression (AMAR) enables adaptive regularisation of linear autoregression of large orders. The AMAR model is designed to offer simplicity and interpretability on the one hand, and modelling flexibility on the other. Our theory permits the longest timescale to increase with the sample size. A simulation study is presented to show the usefulness of our approach. Some possible extensions are also discussed, including the Adaptive Multiscale Vector AutoRegressive model (AMVAR) for multivariate time series, which demonstrates promising performance in the data example on UK and US unemployment rates. The R package amar provides an efficient implementation of the AMAR framework.

64 pa...

64 pages, 8 figures; to be published in Statistica Sinica

DUET: Dual Clustering Enhanced Multivariate Time Series Forecasting 2024-12-14
Show

Multivariate time series forecasting is crucial for various applications, such as financial investment, energy management, weather forecasting, and traffic optimization. However, accurate forecasting is challenging due to two main factors. First, real-world time series often show heterogeneous temporal patterns caused by distribution shifts over time. Second, correlations among channels are complex and intertwined, making it hard to model the interactions among channels precisely and flexibly. In this study, we address these challenges by proposing a general framework called \textbf{DUET}, which introduces \underline{DU}al clustering on the temporal and channel dimensions to \underline{E}nhance multivariate \underline{T}ime series forecasting. First, we design a Temporal Clustering Module (TCM) that clusters time series into fine-grained distributions to handle heterogeneous temporal patterns. For different distribution clusters, we design various pattern extractors to capture their intrinsic temporal patterns, thus modeling the heterogeneity. Second, we introduce a novel Channel-Soft-Clustering strategy and design a Channel Clustering Module (CCM), which captures the relationships among channels in the frequency domain through metric learning and applies sparsification to mitigate the adverse effects of noisy channels. Finally, DUET combines TCM and CCM to incorporate both the temporal and channel dimensions. Extensive experiments on 25 real-world datasets from 10 application domains, demonstrate the state-of-the-art performance of DUET.

Accepted by KDD 2025
Uncovering Temporal Patterns in Visualizations of High-Dimensional Data 2024-12-14
Show

With the increasing availability of high-dimensional data, analysts often rely on exploratory data analysis to understand complex data sets. A key approach to exploring such data is dimensionality reduction, which embeds high-dimensional data in two dimensions to enable visual exploration. However, popular embedding techniques, such as t-SNE and UMAP, typically assume that data points are independent. When this assumption is violated, as in time-series data, the resulting visualizations may fail to reveal important temporal patterns and trends. To address this, we propose a formal extension to existing dimensionality reduction methods that incorporates two temporal loss terms that explicitly highlight temporal progression in the embedded visualizations. Through a series of experiments on both synthetic and real-world datasets, we demonstrate that our approach effectively uncovers temporal patterns and improves the interpretability of the visualizations. Furthermore, the method improves temporal coherence while preserving the fidelity of the embeddings, providing a robust tool for dynamic data analysis.

Diffusion-based Method for Satellite Pattern-of-Life Identification 2024-12-14
Show

Satellite pattern-of-life (PoL) identification is crucial for space safety and satellite monitoring, involving the analysis of typical satellite behaviors such as station-keeping, drift, etc. However, existing PoL identification methods remain underdeveloped due to the complexity of aerospace systems, variability in satellite behaviors, and fluctuating observation sampling rates. In a first attempt, we developed a domain expertise-informed machine learning method (Expert-ML) to combine satellite orbital movement knowledge and machine learning models. The Expert-ML method achieved high accuracy results in simulation data and real-world data with normal sampling rate. However, this approach lacks of generality as it requires domain expertise and its performance degraded significantly when data sampling rate varied. To achieve generality, we propose a novel diffusion-based PoL identification method. Distinct from prior approaches, the proposed method leverages a diffusion model to achieve end-to-end identification without manual refinement or domain-specific knowledge. Specifically, we employ a multivariate time-series encoder to capture hidden representations of satellite positional data. The encoded features are subsequently incorporated as conditional information in the denoising process to generate PoL labels. Through experimentation across real-world satellite settings, our proposed diffusion-based method demonstrates its high identification quality and provides a robust solution even with reduced data sampling rates, indicating its great potential in practical satellite behavior pattern identification, tracking and related mission deployment.

ConvTimeNet: A Deep Hierarchical Fully Convolutional Model for Multivariate Time Series Analysis 2024-12-14
Show

Designing effective models for learning time series representations is foundational for time series analysis. Many previous works have explored time series representation modeling approaches and have made progress in this area. Despite their effectiveness, they lack adaptive perception of local patterns in temporally dependent basic units and fail to capture the multi-scale dependency among these units. Instead of relying on prevalent methods centered around self-attention mechanisms, we propose ConvTimeNet, a hierarchical pure convolutional model designed for time series analysis. ConvTimeNet introduces a deformable patch layer that adaptively perceives local patterns of temporally dependent basic units in a data-driven manner. Based on the extracted local patterns, hierarchical pure convolutional blocks are designed to capture dependency relationships among the representations of basic units at different scales. Moreover, a large kernel mechanism is employed to ensure that convolutional blocks can be deeply stacked, thereby achieving a larger receptive field. In this way, local patterns and their multi-scale dependencies can be effectively modeled within a single model. Extensive experiments comparing a wide range of different types of models demonstrate that pure convolutional models still exhibit strong viability, effectively addressing the aforementioned two challenges and showing superior performance across multiple tasks. The code is available for reproducibility.

Multistep Brent Oil Price Forecasting with a Multi-Aspect Meta-heuristic Optimization and Ensemble Deep Learning Model 2024-12-14
Show

Accurate crude oil price forecasting is crucial for various economic activities, including energy trading, risk management, and investment planning. Although deep learning models have emerged as powerful tools for crude oil price forecasting, achieving accurate forecasts remains challenging. Deep learning models' performance is heavily influenced by hyperparameters tuning, and they are expected to perform differently under various circumstances. Furthermore, price volatility is also sensitive to external factors such as world events. To address these limitations, we propose a hybrid approach that integrates metaheuristic optimization with an ensemble of five widely used neural network architectures for time series forecasting. Unlike existing methods that apply metaheuristics to optimise hyperparameters within the neural network architecture, we exploit the GWO metaheuristic optimiser at four levels: feature selection, data preparation, model training, and forecast blending. The proposed approach has been evaluated for forecasting three-ahead days using real-world Brent crude oil price data, and the obtained results demonstrate that the proposed approach improves the forecasting performance measured using various benchmarks, achieving 0.000127 of MSE.

Investigating Central England Temperature Variability: Statistical Analysis of Associations with North Atlantic Oscillation (NAO) and Pacific Decadal Oscillation (PDO) 2024-12-14
Show

This study investigates the variability of the Central England Temperature (CET) series in relation to the North Atlantic Oscillation (NAO) and the Pacific Decadal Oscillation (PDO) using advanced time series modeling techniques. Leveraging the world's longest continuous instrumental temperature dataset (1723-2023), this research applies ARIMA and ARIMAX models to quantify the impact of climatic oscillations on regional temperature variability, while also accounting for long-term warming trends. Spectral and coherence analyses further explore the periodic interactions between CET and the oscillations. Results reveal that NAO exerts a stronger influence on CET variability compared to PDO, with significant coherence observed at cycles of 5 to 7.5 years and 2 to 2.5 years for NAO, while PDO shows no statistically significant coherence. The ARIMAX model effectively captures both the upward warming trend and the influence of climatic oscillations, with robust diagnostics confirming its reliability. This study contributes to understanding the interplay between regional temperature variability and large-scale climatic drivers, providing a framework for future research on climatic oscillations and their role in shaping regional climate dynamics. Limitations and potential future directions, including the integration of additional climatic indices and comparative regional analyses, are also discussed.

15 pages, 6 figures
A Multiprocess State Space Model with Feedback and Switching for Patterns of Clinical Measurements Associated with COVID-19 2024-12-14
Show

Clinical measurements, such as body temperature, are often collected over time to monitor an individual's underlying health condition. These measurements exhibit complex temporal dynamics, necessitating sophisticated statistical models to capture patterns and detect deviations. We propose a novel multiprocess state space model with feedback and switching mechanisms to analyze the dynamics of clinical measurements. This model captures the evolution of time series through distinct latent processes and incorporates feedback effects in the transition probabilities between latent processes. We develop estimation methods using the EM algorithm, integrated with multiprocess Kalman filtering and multiprocess fixed-interval smoothing. Simulation study shows that the algorithm is efficient and performs well. We apply the proposed model to body temperature measurements from COVID-19-infected hemodialysis patients to examine temporal dynamics and estimate infection and recovery probabilities.

Upstream flow geometries can be uniquely learnt from single-point turbulence signatures 2024-12-14
Show

We test the hypothesis that the microscopic temporal structure of near-field turbulence downstream of a sudden contraction contains geometry-identifiable information pertaining to the shape of the upstream obstruction. We measure a set of spatially sparse velocity time-series data downstream of differently-shaped orifices. We then train random forest multiclass classifier models on a vector of invariants derived from this time-series. We test the above hypothesis with 25 somewhat similar orifice shapes to push the model to its extreme limits. Remarkably, the algorithm was able to identify the orifice shape with 100% accuracy and 100% precision. This outcome is enabled by the uniqueness in the downstream temporal evolution of turbulence structures in the flow past orifices, combined with the random forests' ability to learn subtle yet discerning features in the turbulence microstructure. We are also able to explain the underlying flow physics that enables such classification by listing the invariant measures in the order of increasing information entropy. We show that the temporal autocorrelation coefficients of the time-series are most sensitive to orifice shape and are therefore informative. The ability to identify changes in system geometry without the need for physical disassembly offers tremendous potential for flow control and system identification. Furthermore, the proposed approach could potentially have significant applications in other unrelated fields as well, by deploying the core methodology of training random forest classifiers on vectors of invariant measures obtained from time-series data.

Manus...

Manuscript: 10 pages, 4 figures; SI Appendix: 24 pages, 3 figures; Submitted to PNAS

WaveGNN: Modeling Irregular Multivariate Time Series for Accurate Predictions 2024-12-14
Show

Accurately modeling and analyzing time series data is crucial for downstream applications across various fields, including healthcare, finance, astronomy, and epidemiology. However, real-world time series often exhibit irregularities such as misaligned timestamps, missing entries, and variable sampling rates, complicating their analysis. Existing approaches often rely on imputation, which can introduce biases. A few approaches that directly model irregularity tend to focus exclusively on either capturing intra-series patterns or inter-series relationships, missing the benefits of integrating both. To this end, we present WaveGNN, a novel framework designed to directly (i.e., no imputation) embed irregularly sampled multivariate time series data for accurate predictions. WaveGNN utilizes a Transformer-based encoder to capture intra-series patterns by directly encoding the temporal dynamics of each time series. To capture inter-series relationships, WaveGNN uses a dynamic graph neural network model, where each node represents a sensor, and the edges capture the long- and short-term relationships between them. Our experimental results on real-world healthcare datasets demonstrate that WaveGNN consistently outperforms existing state-of-the-art methods, with an average relative improvement of 14.7% in F1-score when compared to the second-best baseline in cases with extreme sparsity. Our ablation studies reveal that both intra-series and inter-series modeling significantly contribute to this notable improvement.

Higher Order Transformers: Enhancing Stock Movement Prediction On Multimodal Time-Series Data 2024-12-13
Show

In this paper, we tackle the challenge of predicting stock movements in financial markets by introducing Higher Order Transformers, a novel architecture designed for processing multivariate time-series data. We extend the self-attention mechanism and the transformer architecture to a higher order, effectively capturing complex market dynamics across time and variables. To manage computational complexity, we propose a low-rank approximation of the potentially large attention tensor using tensor decomposition and employ kernel attention, reducing complexity to linear with respect to the data size. Additionally, we present an encoder-decoder model that integrates technical and fundamental analysis, utilizing multimodal signals from historical prices and related tweets. Our experiments on the Stocknet dataset demonstrate the effectiveness of our method, highlighting its potential for enhancing stock movement prediction in financial markets.

KDD 2...

KDD 2024 Workshop on Machine Learning in Finance

A Call to Arms: AI Should be Critical for Social Media Analysis of Conflict Zones 2024-12-13
Show

The massive proliferation of social media data represents a transformative opportunity for conflict studies and for tracking the proliferation and use of weaponry, as conflicts are increasingly documented in these online spaces. At the same time, the scale and types of data available are problematic for traditional open-source intelligence. This paper focuses on identifying specific weapon systems and the insignias of the armed groups using them as documented in the Ukraine war, as these tasks are critical to operational intelligence and tracking weapon proliferation, especially given the scale of international military aid given to Ukraine. The large scale of social media makes manual assessment difficult, however, so this paper presents early work that uses computer vision models to support this task. We demonstrate that these models can both identify weapons embedded in images shared in social media and how the resulting collection of military-relevant images and their post times interact with the offline, real-world conflict. Not only can we then track changes in the prevalence of images of tanks, land mines, military trucks, etc., we find correlations among time series data associated with these images and the daily fatalities in this conflict. This work shows substantial opportunity for examining similar online documentation of conflict contexts, and we also point to future avenues where computer vision can be further improved for these open-source intelligence tasks.

Integrative Analysis of Financial Market Sentiment Using CNN and GRU for Risk Prediction and Alert Systems 2024-12-13
Show

This document presents an in-depth examination of stock market sentiment through the integration of Convolutional Neural Networks (CNN) and Gated Recurrent Units (GRU), enabling precise risk alerts. The robust feature extraction capability of CNN is utilized to preprocess and analyze extensive network text data, identifying local features and patterns. The extracted feature sequences are then input into the GRU model to understand the progression of emotional states over time and their potential impact on future market sentiment and risk. This approach addresses the order dependence and long-term dependencies inherent in time series data, resulting in a detailed analysis of stock market sentiment and effective early warnings of future risks.

Uncertainties in Signal Recovery from Heterogeneous and Convoluted Time Series with Principal Component Analysis 2024-12-13
Show

Principal Component Analysis (PCA) is one of the most used tools for extracting low-dimensional representations of data, in particular for time series. Performances are known to strongly depend on the quality (amount of noise) and the quantity of data. We here investigate the impact of heterogeneities, often present in real data, on the reconstruction of low-dimensional trajectories and of their associated modes. We focus in particular on the effects of sample-to-sample fluctuations and of component-dependent temporal convolution and noise in the measurements. We derive analytical predictions for the error on the reconstructed trajectory and the confusion between the modes using the replica method in a high-dimensional setting, in which the number and the dimension of the data are comparable. We find in particular that sample-to-sample variability, is deleterious for the reconstruction of the signal trajectory, but beneficial for the inference of the modes, and that the fluctuations in the temporal convolution kernels prevent perfect recovery of the latent modes even for very weak measurement noise. Our predictions are corroborated by simulations with synthetic data for a variety of control parameters.

18 pages, 9 figures
IGNITE: Individualized GeNeration of Imputations in Time-series Electronic health records 2024-12-13
Show

Electronic Health Records present a valuable modality for driving personalized medicine, where treatment is tailored to fit individual-level differences. For this purpose, many data-driven machine learning and statistical models rely on the wealth of longitudinal EHRs to study patients' physiological and treatment effects. However, longitudinal EHRs tend to be sparse and highly missing, where missingness could also be informative and reflect the underlying patient's health status. Therefore, the success of data-driven models for personalized medicine highly depends on how the EHR data is represented from physiological data, treatments, and the missing values in the data. To this end, we propose a novel deep-learning model that learns the underlying patient dynamics over time across multivariate data to generate personalized realistic values conditioning on an individual's demographic characteristics and treatments. Our proposed model, IGNITE (Individualized GeNeration of Imputations in Time-series Electronic health records), utilises a conditional dual-variational autoencoder augmented with dual-stage attention to generate missing values for an individual. In IGNITE, we further propose a novel individualized missingness mask (IMM), which helps our model generate values based on the individual's observed data and missingness patterns. We further extend the use of IGNITE from imputing missingness to a personalized data synthesizer, where it generates missing EHRs that were never observed prior or even generates new patients for various applications. We validate our model on three large publicly available datasets and show that IGNITE outperforms state-of-the-art approaches in missing data reconstruction and task prediction.

Trajectory

Title Date Abstract Comment
LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis 2024-12-19
Show

The intuitive nature of drag-based interaction has led to its growing adoption for controlling object trajectories in image-to-video synthesis. Still, existing methods that perform dragging in the 2D space usually face ambiguity when handling out-of-plane movements. In this work, we augment the interaction with a new dimension, i.e., the depth dimension, such that users are allowed to assign a relative depth for each point on the trajectory. That way, our new interaction paradigm not only inherits the convenience from 2D dragging, but facilitates trajectory control in the 3D space, broadening the scope of creativity. We propose a pioneering method for 3D trajectory control in image-to-video synthesis by abstracting object masks into a few cluster points. These points, accompanied by the depth information and the instance information, are finally fed into a video diffusion model as the control signal. Extensive experiments validate the effectiveness of our approach, dubbed LeviTor, in precisely manipulating the object movements when producing photo-realistic videos from static images. Project page: https://ppetrichor.github.io/levitor.github.io/

Proje...

Project page available at https://ppetrichor.github.io/levitor.github.io/

STRAP: Robot Sub-Trajectory Retrieval for Augmented Policy Learning 2024-12-19
Show

Robot learning is witnessing a significant increase in the size, diversity, and complexity of pre-collected datasets, mirroring trends in domains such as natural language processing and computer vision. Many robot learning methods treat such datasets as multi-task expert data and learn a multi-task, generalist policy by training broadly across them. Notably, while these generalist policies can improve the average performance across many tasks, the performance of generalist policies on any one task is often suboptimal due to negative transfer between partitions of the data, compared to task-specific specialist policies. In this work, we argue for the paradigm of training policies during deployment given the scenarios they encounter: rather than deploying pre-trained policies to unseen problems in a zero-shot manner, we non-parametrically retrieve and train models directly on relevant data at test time. Furthermore, we show that many robotics tasks share considerable amounts of low-level behaviors and that retrieval at the "sub"-trajectory granularity enables significantly improved data utilization, generalization, and robustness in adapting policies to novel problems. In contrast, existing full-trajectory retrieval methods tend to underutilize the data and miss out on shared cross-task content. This work proposes STRAP, a technique for leveraging pre-trained vision foundation models and dynamic time warping to retrieve sub-sequences of trajectories from large training corpora in a robust fashion. STRAP outperforms both prior retrieval algorithms and multi-task learning methods in simulated and real experiments, showing the ability to scale to much larger offline datasets in the real world as well as the ability to learn robust control policies with just a handful of real-world demonstrations.

Proje...

Project website at https://weirdlabuw.github.io/strap/

Tracing the Roots: Leveraging Temporal Dynamics in Diffusion Trajectories for Origin Attribution 2024-12-19
Show

Diffusion models have revolutionized image synthesis, garnering significant research interest in recent years. Diffusion is an iterative algorithm in which samples are generated step-by-step, starting from pure noise. This process introduces the notion of diffusion trajectories, i.e., paths from the standard Gaussian distribution to the target image distribution. In this context, we study discriminative algorithms operating on these trajectories. Specifically, given a pre-trained diffusion model, we consider the problem of classifying images as part of the training dataset, generated by the model or originating from an external source. Our approach demonstrates the presence of patterns across steps that can be leveraged for classification. We also conduct ablation studies, which reveal that using higher-order gradient features to characterize the trajectories leads to significant performance gains and more robust algorithms.

Multi-Agent Trajectory Prediction with Difficulty-Guided Feature Enhancement Network 2024-12-19
Show

Trajectory prediction is crucial for autonomous driving as it aims to forecast the future movements of traffic participants. Traditional methods usually perform holistic inference on the trajectories of agents, neglecting the differences in prediction difficulty among agents. This paper proposes a novel Difficulty-Guided Feature Enhancement Network (DGFNet), which leverages the prediction difficulty differences among agents for multi-agent trajectory prediction. Firstly, we employ spatio-temporal feature encoding and interaction to capture rich spatio-temporal features. Secondly, a difficulty-guided decoder controls the flow of future trajectories into subsequent modules, obtaining reliable future trajectories. Then, feature interaction and fusion are performed through the future feature interaction module. Finally, the fused agent features are fed into the final predictor to generate the predicted trajectory distributions for multiple participants. Experimental results demonstrate that our DGFNet achieves state-of-the-art performance on the Argoverse 1&2 motion forecasting benchmarks. Ablation studies further validate the effectiveness of each module. Moreover, compared with SOTA methods, our method balances trajectory prediction accuracy and real-time inference speed.

High-Accuracy Model Predictive Control with Inverse Hysteresis for High-Speed Trajectory Tracking of Piezoelectric Fast Steering Mirror 2024-12-19
Show

Piezoelectric fast steering mirrors (PFSM) are widely utilized in beam precision-pointing systems but encounter considerable challenges in achieving high-precision tracking of fast trajectories due to nonlinear hysteresis and mechanical dual-axis cross-coupling. This paper proposes a model predictive control (MPC) approach integrated with a hysteresis inverse based on the Hammerstein modeling structure of the PFSM. The MPC is designed to decouple the rate-dependent dual-axis linear components, with an augmented error integral variable introduced in the state space to eliminate steady-state errors. Moreover, proofs of zero steady-state error and disturbance rejection are provided. The hysteresis inverse model is then cascaded to compensate for the rate-independent nonlinear components. Finally, PFSM tracking experiments are conducted on step, sinusoidal, triangular, and composite trajectories. Compared to traditional model-free and existing model-based controllers, the proposed method significantly enhances tracking accuracy, demonstrating superior tracking performance and robustness to frequency variations. These results offer valuable insights for engineering applications.

EPN: An Ego Vehicle Planning-Informed Network for Target Trajectory Prediction 2024-12-19
Show

Trajectory prediction plays a crucial role in improving the safety and reliability of autonomous vehicles, serving as an intermediate link between perception and planning. However, due to the highly dynamic and multimodal nature of the task, accurately predicting the future trajectory of a target vehicle remains a significant challenge. To address these challenges, we propose an Ego vehicle Planning-informed Network (EPN) for multimodal trajectory prediction. Current trajectory prediction methods typically use the historical trajectory and vehicle attributes as inputs, focusing primarily on how historical information influences the future trajectory of the target vehicle. In real-world driving scenarios, however, the future trajectory of a vehicle is influenced not only by its own historical data but also by the behavior of other vehicles on the road. To address this, we incorporate the future planned trajectory of the ego vehicle as an additional input to simulate the mutual influence between the ego vehicle's planned trajectory and the predicted trajectory of the target vehicle. Furthermore, to tackle the challenges of intention ambiguity and large prediction errors often encountered in methods based on driving intentions, we propose a target's endpoint prediction module. This module first predicts the possible endpoints of the target vehicle, then refines these predictions through a correction mechanism, and finally generates a complete multimodal predicted trajectory based on the corrected endpoints. Experimental results demonstrate that, compared to other trajectory prediction methods, EPN achieves an average reduction of 34.9%, 30.7%, and 30.4% in RMSE, ADE, and FDE evaluation metrics on the NGSIM dataset, and an average reduction of 64.6%, 64.5%, and 64.3% in RMSE, ADE, and FDE on the HighD dataset. These results highlight the strong performance of EPN in trajectory prediction.

REVECA: Adaptive Planning and Trajectory-based Validation in Cooperative Language Agents using Information Relevance and Relative Proximity 2024-12-18
Show

We address the challenge of multi-agent cooperation, where agents achieve a common goal by cooperating with decentralized agents under complex partial observations. Existing cooperative agent systems often struggle with efficiently processing continuously accumulating information, managing globally suboptimal planning due to lack of consideration of collaborators, and addressing false planning caused by environmental changes introduced by other collaborators. To overcome these challenges, we propose the RElevance, Proximity, and Validation-Enhanced Cooperative Language Agent (REVECA), a novel cognitive architecture powered by GPT-4o-mini. REVECA enables efficient memory management, optimal planning, and cost-effective prevention of false planning by leveraging Relevance Estimation, Adaptive Planning, and Trajectory-based Validation. Extensive experimental results demonstrate REVECA's superiority over existing methods across various benchmarks, while a user study reveals its potential for achieving trustworthy human-AI cooperation.

v2 is...

v2 is the AAAI'25 camera-ready version, including the appendix, which has been enhanced based on the reviewers' comments

Disease Progression Modelling and Stratification for detecting sub-trajectories in the natural history of pathologies: application to Parkinson's Disease trajectory modelling 2024-12-18
Show

Modelling the progression of Degenerative Diseases (DD) is essential for detection, prevention, and treatment, yet it remains challenging due to the heterogeneity in disease trajectories among individuals. Factors such as demographics, genetic conditions, and lifestyle contribute to diverse phenotypical manifestations, necessitating patient stratification based on these variations. Recent methods like Subtype and Stage Inference (SuStaIn) have advanced unsupervised stratification of disease trajectories, but they face potential limitations in robustness, interpretability, and temporal granularity. To address these challenges, we introduce Disease Progression Modelling and Stratification (DP-MoSt), a novel probabilistic method that optimises clusters of continuous trajectories over a long-term disease time-axis while estimating the confidence of trajectory sub-types for each biomarker. We validate DP-MoSt using both synthetic and real-world data from the Parkinson's Progression Markers Initiative (PPMI). Our results demonstrate that DP-MoSt effectively identifies both sub-trajectories and subpopulations, and is a promising alternative to current state-of-the-art models.

Longi...

Longitudinal Disease Tracking and Modelling with Medical Images and Data, Oct 2024, Marrachech, Morocco

TAME: Temporal Audio-based Mamba for Enhanced Drone Trajectory Estimation and Classification 2024-12-18
Show

The increasing prevalence of compact UAVs has introduced significant risks to public safety, while traditional drone detection systems are often bulky and costly. To address these challenges, we present TAME, the Temporal Audio-based Mamba for Enhanced Drone Trajectory Estimation and Classification. This innovative anti-UAV detection model leverages a parallel selective state-space model to simultaneously capture and learn both the temporal and spectral features of audio, effectively analyzing propagation of sound. To further enhance temporal features, we introduce a Temporal Feature Enhancement Module, which integrates spectral features into temporal data using residual cross-attention. This enhanced temporal information is then employed for precise 3D trajectory estimation and classification. Our model sets a new standard of performance on the MMUAD benchmarks, demonstrating superior accuracy and effectiveness. The code and trained models are publicly available on GitHub \url{https://github.com/AmazingDay1/TAME}.

Audio Array-Based 3D UAV Trajectory Estimation with LiDAR Pseudo-Labeling 2024-12-18
Show

As small unmanned aerial vehicles (UAVs) become increasingly prevalent, there is growing concern regarding their impact on public safety and privacy, highlighting the need for advanced tracking and trajectory estimation solutions. In response, this paper introduces a novel framework that utilizes audio array for 3D UAV trajectory estimation. Our approach incorporates a self-supervised learning model, starting with the conversion of audio data into mel-spectrograms, which are analyzed through an encoder to extract crucial temporal and spectral information. Simultaneously, UAV trajectories are estimated using LiDAR point clouds via unsupervised methods. These LiDAR-based estimations act as pseudo labels, enabling the training of an Audio Perception Network without requiring labeled data. In this architecture, the LiDAR-based system operates as the Teacher Network, guiding the Audio Perception Network, which serves as the Student Network. Once trained, the model can independently predict 3D trajectories using only audio signals, with no need for LiDAR data or external ground truth during deployment. To further enhance precision, we apply Gaussian Process modeling for improved spatiotemporal tracking. Our method delivers top-tier performance on the MMAUD dataset, establishing a new benchmark in trajectory estimation using self-supervised learning techniques without reliance on ground truth annotations.

Exploring Transformer-Augmented LSTM for Temporal and Spatial Feature Learning in Trajectory Prediction 2024-12-18
Show

Accurate vehicle trajectory prediction is crucial for ensuring safe and efficient autonomous driving. This work explores the integration of Transformer based model with Long Short-Term Memory (LSTM) based technique to enhance spatial and temporal feature learning in vehicle trajectory prediction. Here, a hybrid model that combines LSTMs for temporal encoding with a Transformer encoder for capturing complex interactions between vehicles is proposed. Spatial trajectory features of the neighboring vehicles are processed and goes through a masked scatter mechanism in a grid based environment, which is then combined with temporal trajectory of the vehicles. This combined trajectory data are learned by sequential LSTM encoding and Transformer based attention layers. The proposed model is benchmarked against predecessor LSTM based methods, including STA-LSTM, SA-LSTM, CS-LSTM, and NaiveLSTM. Our results, while not outperforming it's predecessor, demonstrate the potential of integrating Transformers with LSTM based technique to build interpretable trajectory prediction model. Future work will explore alternative architectures using Transformer applications to further enhance performance. This study provides a promising direction for improving trajectory prediction models by leveraging transformer based architectures, paving the way for more robust and interpretable vehicle trajectory prediction system.

Choice Between Partial Trajectories: Disentangling Goals from Beliefs 2024-12-18
Show

As AI agents generate increasingly sophisticated behaviors, manually encoding human preferences to guide these agents becomes more challenging. To address this, it has been suggested that agents instead learn preferences from human choice data. This approach requires a model of choice behavior that the agent can use to interpret the data. For choices between partial trajectories of states and actions, previous models assume choice probabilities are determined by the partial return or the cumulative advantage. We consider an alternative model based instead on the bootstrapped return, which adds to the partial return an estimate of the future return. Benefits of the bootstrapped return model stem from its treatment of human beliefs. Unlike partial return, choices based on bootstrapped return reflect human beliefs about the environment. Further, while recovering the reward function from choices based on cumulative advantage requires that those beliefs are correct, doing so from choices based on bootstrapped return does not. To motivate the bootstrapped return model, we formulate axioms and prove an Alignment Theorem. This result formalizes how, for a general class of preferences, such models are able to disentangle goals from beliefs. This ensures recovery of an aligned reward function when learning from choices based on bootstrapped return. The bootstrapped return model also affords greater robustness to choice behavior. Even when choices are based on partial return, learning via a bootstrapped return model recovers an aligned reward function. The same holds with choices based on the cumulative advantage if the human and the agent both adhere to correct and consistent beliefs about the environment. On the other hand, if choices are based on bootstrapped return, learning via partial return or cumulative advantage models does not generally produce an aligned reward function..

C2F-TP: A Coarse-to-Fine Denoising Framework for Uncertainty-Aware Trajectory Prediction 2024-12-17
Show

Accurately predicting the trajectory of vehicles is critically important for ensuring safety and reliability in autonomous driving. Although considerable research efforts have been made recently, the inherent trajectory uncertainty caused by various factors including the dynamic driving intends and the diverse driving scenarios still poses significant challenges to accurate trajectory prediction. To address this issue, we propose C2F-TP, a coarse-to-fine denoising framework for uncertainty-aware vehicle trajectory prediction. C2F-TP features an innovative two-stage coarse-to-fine prediction process. Specifically, in the spatial-temporal interaction stage, we propose a spatial-temporal interaction module to capture the inter-vehicle interactions and learn a multimodal trajectory distribution, from which a certain number of noisy trajectories are sampled. Next, in the trajectory refinement stage, we design a conditional denoising model to reduce the uncertainty of the sampled trajectories through a step-wise denoising operation. Extensive experiments are conducted on two real datasets NGSIM and highD that are widely adopted in trajectory prediction. The result demonstrates the effectiveness of our proposal.

Safe Trajectory Sets for Online Operation of Power Systems under Uncertainty 2024-12-17
Show

Flexibility provision from active distribution grids requires efficient and robust methods of optimization and control suitable to online operation. In this paper we introduce conditions for the safe operation of feedback optimization based controllers. We use the feasible operating region of a controlled system as bounds for safe system states and evaluate the trajectories of the controller based on the projection of the full system state onto the two-dimensional PQ-plane. We demonstrate the defined conditions for an exemplary sub-transmission system. We show that the proposed method is suitable to evaluate controller performance and robustness for systems subject to disturbances.

Multi-UAV Collaborative Trajectory Planning for Seamless Data Collection and Transmission 2024-12-17
Show

Unmanned aerial vehicles (UAVs) have attracted plenty of attention due to their high flexibility and enhanced communication ability. However, the limited coverage and energy of UAVs make it difficult to provide timely wireless service for large-scale sensor networks, which also exist in multiple UAVs. To this end, the advanced collaboration mechanism of UAVs urgently needs to be designed. In this paper, we propose a multi-UAV collaborative scheme for seamless data collection and transmission, where UAVs are dispatched to collection points (CPs) to collect and transmit the time-critical data to the ground base station (BS) simultaneously through the cooperative backhaul link. Specifically, the mission completion time is minimized by optimizing the trajectories, task allocation, collection time scheduling, and transmission topology of UAVs while ensuring backhaul link to the BS. However, the formulated problem is non-convex and challenging to solve directly. To tackle this problem, the CP locations and transmission topology of UAVs are obtained by sensor node (SN) clustering and region division. Next, the transmission connectivity condition between UAVs is derived to facilitate the trajectory discretization and thus reduce the dimensions of variables. This simplifies the problem to optimizing the UAV hovering locations, hovering time, and CP serving sequence. Then, we propose a point-matching-based trajectory planning algorithm to solve the problem efficiently. The simulation results show that the proposed scheme achieves significant performance gains over the two benchmarks.

6 pag...

6 pages, 3 figures, submitted to WCNC Workshop 2025

Rapid and Robust Trajectory Optimization for Humanoids 2024-12-16
Show

Performing trajectory design for humanoid robots with high degrees of freedom is computationally challenging. The trajectory design process also often involves carefully selecting various hyperparameters and requires a good initial guess which can further complicate the development process. This work introduces a generalized gait optimization framework that directly generates smooth and physically feasible trajectories. The proposed method demonstrates faster and more robust convergence than existing techniques and explicitly incorporates closed-loop kinematic constraints that appear in many modern humanoids. The method is implemented as an open-source C++ codebase which can be found at https://roahmlab.github.io/RAPTOR/.

Deep-learning-based identification of individual motion characteristics from upper-limb trajectories towards disorder stage evaluation 2024-12-16
Show

The identification of individual movement characteristics sets the foundation for the assessment of personal rehabilitation progress and can provide diagnostic information on levels and stages of movement disorders. This work presents a preliminary study for differentiating individual motion patterns using a dataset of 3D upper-limb transport trajectories measured in task-space. Identifying individuals by deep time series learning can be a key step to abstracting individual motion properties. In this study, a classification accuracy of about 95% is reached for a subset of nine, and about 78% for the full set of 31 individuals. This provides insights into the separability of patient attributes by exerting a simple standardized task to be transferred to portable systems.

Efficient LiDAR Bundle Adjustment for Multi-Scan Alignment Utilizing Continuous-Time Trajectories 2024-12-16
Show

Constructing precise global maps is a key task in robotics and is required for localization, surveying, monitoring, or constructing digital twins. To build accurate maps, data from mobile 3D LiDAR sensors is often used. Mapping requires correctly aligning the individual point clouds to each other to obtain a globally consistent map. In this paper, we investigate the problem of multi-scan alignment to obtain globally consistent point cloud maps. We propose a 3D LiDAR bundle adjustment approach to solve the global alignment problem and jointly optimize the available data. Utilizing a continuous-time trajectory allows us to consider the ego-motion of the LiDAR scanner while recording a single scan directly in the least squares adjustment. Furthermore, pruning the search space of correspondences and utilizing out-of-core circular buffer enables our approach to align thousands of point clouds efficiently. We successfully align point clouds recorded with a handheld LiDAR, as well as ones mounted on a vehicle, and are able to perform multi-session alignment.

Submi...

Submitted to ICRA 2025

NEST: A Neuromodulated Small-world Hypergraph Trajectory Prediction Model for Autonomous Driving 2024-12-16
Show

Accurate trajectory prediction is essential for the safety and efficiency of autonomous driving. Traditional models often struggle with real-time processing, capturing non-linearity and uncertainty in traffic environments, efficiency in dense traffic, and modeling temporal dynamics of interactions. We introduce NEST (Neuromodulated Small-world Hypergraph Trajectory Prediction), a novel framework that integrates Small-world Networks and hypergraphs for superior interaction modeling and prediction accuracy. This integration enables the capture of both local and extended vehicle interactions, while the Neuromodulator component adapts dynamically to changing traffic conditions. We validate the NEST model on several real-world datasets, including nuScenes, MoCAD, and HighD. The results consistently demonstrate that NEST outperforms existing methods in various traffic scenarios, showcasing its exceptional generalization capability, efficiency, and temporal foresight. Our comprehensive evaluation illustrates that NEST significantly improves the reliability and operational efficiency of autonomous driving systems, making it a robust solution for trajectory prediction in complex traffic environments.

Accepted by AAAI-25
Poisson Multi-Bernoulli Mixtures for Sets of Trajectories 2024-12-15
Show

The Poisson Multi-Bernoulli Mixture (PMBM) density is a conjugate multi-target density for the standard point target model with Poisson point process birth. This means that both the filtering and predicted densities for the set of targets are PMBM. In this paper, we first show that the PMBM density is also conjugate for sets of trajectories with the standard point target measurement model. Second, based on this theoretical foundation, we develop two trajectory PMBM filters that provide recursions to calculate the posterior density for the set of all trajectories that have ever been present in the surveillance area, and the posterior density of the set of trajectories present at the current time step in the surveillance area. These two filters therefore provide complete probabilistic information on the considered trajectories enabling optimal trajectory estimation. Third, we establish that the density of the set of trajectories in any time window, given the measurements in a possibly different time window, is also a PMBM. Finally, the trajectory PMBM filters are evaluated via simulations, and are shown to yield state-of-the-art performance compared to other multi-target tracking algorithms based on random finite sets and multiple hypothesis tracking.

accep...

accepted in IEEE Transactions on Aerospace and Electronic Systems. Matlab code of trajectory PMBM filters can be found at https://github.com/Agarciafernandez and https://github.com/yuhsuansia

Economic MPC with an Online Reference Trajectory for Battery Scheduling Considering Demand Charge Management 2024-12-14
Show

Monthly demand charges form a significant portion of the electric bill for microgrids with variable renewable energy generation. A battery energy storage system (BESS) is commonly used to manage these demand charges. Economic model predictive control (EMPC) with a reference trajectory can be used to dispatch the BESS to optimize the microgrid operating cost. Since demand charges are incurred monthly, EMPC requires a full-month reference trajectory for asymptotic stability guarantees that result in optimal operating costs. However, a full-month reference trajectory is unrealistic from a renewable generation forecast perspective. Therefore, to construct a practical EMPC with a reference trajectory, an EMPC formulation considering both non-coincident demand and on-peak demand charges is designed in this work for 24 to 48 h prediction horizons. The corresponding reference trajectory is computed at each EMPC step by solving an optimal control problem over 24 to 48 h reference (trajectory) horizon. Furthermore, BESS state of charge regulation constraints are incorporated to guarantee the BESS energy level in the long term. Multiple reference and prediction horizon lengths are compared for both shrinking and rolling horizons with real-world data. The proposed EMPC with 48 h rolling reference and prediction horizons outperforms the traditional EMPC benchmark with a 2% reduction in the annual cost, proving its economic benefits.

13 pa...

13 pages, 6 figures, 2 tables, Submitted to IEEE Transactions on Smart Grid

SHIFT Planner: Speedy Hybrid Iterative Field and Segmented Trajectory Optimization with IKD-tree for Uniform Lightweight Coverage 2024-12-14
Show

This paper introduces a comprehensive planning and navigation framework that address these limitations by integrating semantic mapping, adaptive coverage planning, dynamic obstacle avoidance and precise trajectory tracking. Our framework begins by generating panoptic occupancy local semantic maps and accurate localization information from data aligned between a monocular camera, IMU, and GPS. This information is combined with input terrain point clouds or preloaded terrain information to initialize the planning process. We propose the Radiant Field-Informed Coverage Planning algorithm, which utilizes a diffusion field model to dynamically adjust the robot's coverage trajectory and speed based on environmental attributes such as dirtiness and dryness. By modeling the spatial influence of the robot's actions using a Gaussian field, ensures a speed-optimized, uniform coverage trajectory while adapting to varying environmental conditions.

Toy-GS: Assembling Local Gaussians for Precisely Rendering Large-Scale Free Camera Trajectories 2024-12-13
Show

Currently, 3D rendering for large-scale free camera trajectories, namely, arbitrary input camera trajectories, poses significant challenges: 1) The distribution and observation angles of the cameras are irregular, and various types of scenes are included in the free trajectories; 2) Processing the entire point cloud and all images at once for large-scale scenes requires a substantial amount of GPU memory. This paper presents a Toy-GS method for accurately rendering large-scale free camera trajectories. Specifically, we propose an adaptive spatial division approach for free trajectories to divide cameras and the sparse point cloud of the entire scene into various regions according to camera poses. Training each local Gaussian in parallel for each area enables us to concentrate on texture details and minimize GPU memory usage. Next, we use the multi-view constraint and position-aware point adaptive control (PPAC) to improve the rendering quality of texture details. In addition, our regional fusion approach combines local and global Gaussians to enhance rendering quality with an increasing number of divided areas. Extensive experiments have been carried out to confirm the effectiveness and efficiency of Toy-GS, leading to state-of-the-art results on two public large-scale datasets as well as our SCUTic dataset. Our proposal demonstrates an enhancement of 1.19 dB in PSNR and conserves 7 G of GPU memory when compared to various benchmarks.

AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials 2024-12-12
Show

Graphical User Interface (GUI) agents hold great potential for automating complex tasks across diverse digital environments, from web applications to desktop software. However, the development of such agents is hindered by the lack of high-quality, multi-step trajectory data required for effective training. Existing approaches rely on expensive and labor-intensive human annotation, making them unsustainable at scale. To address this challenge, we propose AgentTrek, a scalable data synthesis pipeline that generates high-quality GUI agent trajectories by leveraging web tutorials. Our method automatically gathers tutorial-like texts from the internet, transforms them into task goals with step-by-step instructions, and employs a visual-language model agent to simulate their execution in a real digital environment. A VLM-based evaluator ensures the correctness of the generated trajectories. We demonstrate that training GUI agents with these synthesized trajectories significantly improves their grounding and planning performance over the current models. Moreover, our approach is more cost-efficient compared to traditional human annotation methods. This work underscores the potential of guided replay with web tutorials as a viable strategy for large-scale GUI agent training, paving the way for more capable and autonomous digital agents.

https...

https://agenttrek.github.io

Slope Considered Online Nonlinear Trajectory Planning with Differential Energy Model for Autonomous Driving 2024-12-12
Show

Achieving energy-efficient trajectory planning for autonomous driving remains a challenge due to the limitations of model-agnostic approaches. This study addresses this gap by introducing an online nonlinear programming trajectory optimization framework that integrates a differentiable energy model into autonomous systems. By leveraging traffic and slope profile predictions within a safety-critical framework, the proposed method enhances fuel efficiency for both sedans and diesel trucks by 3.71% and 7.15%, respectively, when compared to traditional model-agnostic quadratic programming techniques. These improvements translate to a potential $6.14 billion economic benefit for the U.S. trucking industry. This work bridges the gap between model-agnostic autonomous driving and model-aware ECO-driving, highlighting a practical pathway for integrating energy efficiency into real-time trajectory planning.

Robot Agnostic Visual Servoing considering kinematic constraints enabled by a decoupled network trajectory planner structure 2024-12-12
Show

We propose a visual servoing method consisting of a detection network and a velocity trajectory planner. First, the detection network estimates the objects position and orientation in the image space. Furthermore, these are normalized and filtered. The direction and orientation is then the input to the trajectory planner, which considers the kinematic constrains of the used robotic system. This allows safe and stable control, since the kinematic boundary values are taken into account in planning. Also, by having direction estimation and velocity planner separated, the learning part of the method does not directly influence the control value. This also enables the transfer of the method to different robotic systems without retraining, therefore being robot agnostic. We evaluate our method on different visual servoing tasks with and without clutter on two different robotic systems. Our method achieved mean absolute position errors of <0.5 mm and orientation errors of <1{\deg}. Additionally, we transferred the method to a new system which differs in robot and camera, emphasizing robot agnostic capability of our method.

\copy...

\copyright 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

In-Dataset Trajectory Return Regularization for Offline Preference-based Reinforcement Learning 2024-12-12
Show

Offline preference-based reinforcement learning (PbRL) typically operates in two phases: first, use human preferences to learn a reward model and annotate rewards for a reward-free offline dataset; second, learn a policy by optimizing the learned reward via offline RL. However, accurately modeling step-wise rewards from trajectory-level preference feedback presents inherent challenges. The reward bias introduced, particularly the overestimation of predicted rewards, leads to optimistic trajectory stitching, which undermines the pessimism mechanism critical to the offline RL phase. To address this challenge, we propose In-Dataset Trajectory Return Regularization (DTR) for offline PbRL, which leverages conditional sequence modeling to mitigate the risk of learning inaccurate trajectory stitching under reward bias. Specifically, DTR employs Decision Transformer and TD-Learning to strike a balance between maintaining fidelity to the behavior policy with high in-dataset trajectory returns and selecting optimal actions based on high reward labels. Additionally, we introduce an ensemble normalization technique that effectively integrates multiple reward models, balancing the tradeoff between reward differentiation and accuracy. Empirical evaluations on various benchmarks demonstrate the superiority of DTR over other state-of-the-art baselines

7 pag...

7 pages, Proceedings of the 39th AAAI Conference on Artificial Intelligence (AAAI-25)

Temporal-Assisted Beamforming and Trajectory Prediction in Sensing-Enabled UAV Communications 2024-12-12
Show

In the evolving landscape of high-speed communication, the shift from traditional pilot-based methods to a Sensing-Oriented Approach (SOA) is anticipated to gain momentum. This paper delves into the development of an innovative Integrated Sensing and Communication (ISAC) framework, specifically tailored for beamforming and trajectory prediction processes. Central to this research is the exploration of an Unmanned Aerial Vehicle (UAV)-enabled communication system, which seamlessly integrates ISAC technology. This integration underscores the synergistic interplay between sensing and communication capabilities. The proposed system initially deploys omnidirectional beams for the sensing-focused phase, subsequently transitioning to directional beams for precise object tracking. This process incorporates an Extended Kalman Filtering (EKF) methodology for the accurate estimation and prediction of object states. A novel frame structure is introduced, employing historical sensing data to optimize beamforming in real-time for subsequent time slots, a strategy we refer to as 'temporal-assisted' beamforming. To refine the temporal-assisted beamforming technique, we employ Successive Convex Approximation (SCA) in tandem with Iterative Rank Minimization (IRM), yielding high-quality suboptimal solutions. Comparative analysis with conventional pilot-based systems reveals that our approach yields a substantial improvement of 156% in multi-object scenarios and 136% in single-object scenarios.

Mojito: Motion Trajectory and Intensity Control for Video Generation 2024-12-12
Show

Recent advancements in diffusion models have shown great promise in producing high-quality video content. However, efficiently training diffusion models capable of integrating directional guidance and controllable motion intensity remains a challenging and under-explored area. This paper introduces Mojito, a diffusion model that incorporates both \textbf{Mo}tion tra\textbf{j}ectory and \textbf{i}ntensi\textbf{t}y contr\textbf{o}l for text to video generation. Specifically, Mojito features a Directional Motion Control module that leverages cross-attention to efficiently direct the generated object's motion without additional training, alongside a Motion Intensity Modulator that uses optical flow maps generated from videos to guide varying levels of motion intensity. Extensive experiments demonstrate Mojito's effectiveness in achieving precise trajectory and intensity control with high computational efficiency, generating motion patterns that closely match specified directions and intensities, providing realistic dynamics that align well with natural motion in real-world scenarios.

From Text to Trajectory: Exploring Complex Constraint Representation and Decomposition in Safe Reinforcement Learning 2024-12-12
Show

Safe reinforcement learning (RL) requires the agent to finish a given task while obeying specific constraints. Giving constraints in natural language form has great potential for practical scenarios due to its flexible transfer capability and accessibility. Previous safe RL methods with natural language constraints typically need to design cost functions manually for each constraint, which requires domain expertise and lacks flexibility. In this paper, we harness the dual role of text in this task, using it not only to provide constraint but also as a training signal. We introduce the Trajectory-level Textual Constraints Translator (TTCT) to replace the manually designed cost function. Our empirical results demonstrate that TTCT effectively comprehends textual constraint and trajectory, and the policies trained by TTCT can achieve a lower violation rate than the standard cost function. Extra studies are conducted to demonstrate that the TTCT has zero-shot transfer capability to adapt to constraint-shift environments.

Accep...

Accepted by NeurIPS 2024

Towards modeling evolving longitudinal health trajectories with a transformer-based deep learning model 2024-12-12
Show

Health registers contain rich information about individuals' health histories. Here our interest lies in understanding how individuals' health trajectories evolve in a nationwide longitudinal dataset with coded features, such as clinical codes, procedures, and drug purchases. We introduce a straightforward approach for training a Transformer-based deep learning model in a way that lets us analyze how individuals' trajectories change over time. This is achieved by modifying the training objective and by applying a causal attention mask. We focus here on a general task of predicting the onset of a range of common diseases in a given future forecast interval. However, instead of providing a single prediction about diagnoses that could occur in this forecast interval, our approach enable the model to provide continuous predictions at every time point up until, and conditioned on, the time of the forecast period. We find that this model performs comparably to other models, including a bi-directional transformer model, in terms of basic prediction performance while at the same time offering promising trajectory modeling properties. We explore a couple of ways to use this model for analyzing health trajectories and aiding in early detection of events that forecast possible later disease onsets. We hypothesize that this method may be helpful in continuous monitoring of peoples' health trajectories and enabling interventions in ongoing health trajectories, as well as being useful in retrospective analyses.

Labits: Layered Bidirectional Time Surfaces Representation for Event Camera-based Continuous Dense Trajectory Estimation 2024-12-12
Show

Event cameras provide a compelling alternative to traditional frame-based sensors, capturing dynamic scenes with high temporal resolution and low latency. Moving objects trigger events with precise timestamps along their trajectory, enabling smooth continuous-time estimation. However, few works have attempted to optimize the information loss during event representation construction, imposing a ceiling on this task. Fully exploiting event cameras requires representations that simultaneously preserve fine-grained temporal information, stable and characteristic 2D visual features, and temporally consistent information density, an unmet challenge in existing representations. We introduce Labits: Layered Bidirectional Time Surfaces, a simple yet elegant representation designed to retain all these features. Additionally, we propose a dedicated module for extracting active pixel local optical flow (APLOF), significantly boosting the performance. Our approach achieves an impressive 49% reduction in trajectory end-point error (TEPE) compared to the previous state-of-the-art on the MultiFlow dataset. The code will be released upon acceptance.

24 pa...

24 pages, 12 figures, 9 tables

EMATO: Energy-Model-Aware Trajectory Optimization for Autonomous Driving 2024-12-12
Show

Autonomous driving lacks strong proof of energy efficiency with the energy-model-agnostic trajectory planning. To achieve an energy consumption model-aware trajectory planning for autonomous driving, this study proposes an online nonlinear programming method that optimizes the polynomial trajectories generated by the Frenet polynomial method while considering both traffic trajectories and road slope prediction. This study further investigates how the energy model can be leveraged in different driving conditions to achieve higher energy efficiency. Case studies, quantitative studies, and ablation studies are conducted in a sedan and truck model to prove the effectiveness of the method.

Advancing Operational Efficiency: Airspace Users' Perspective on Trajectory-Based Operations 2024-12-11
Show

This work explores the evolution of the Flight Operations Center (FOC) and flight trajectory exchange tools within Trajectory-Based Operations (TBO), emphasizing the benefits of the ICAO's Flight and Flow Information for a Collaborative Environment (FF-ICE) messaging framework and Electronic Flight Bags (EFBs). It highlights the collaborative management of four-dimensional flight trajectories, serving as a common reference for decision-making among stakeholders, including Air Navigation Service Providers (ANSPs), airspace users, and airport operators. Key enabling technologies such as Performance Based Navigation (PBN), data communications, and System-wide Information Management (SWIM) are discussed, showcasing their roles in rapid information exchange and trajectory optimization. A live flight case study demonstrates TBO concepts through international collaboration, indicating significant improvements in safety, efficiency, and sustainability. The paper presents results from TBO prototype implementations, including enhanced trajectory accuracy, improved flight path efficiency, and real-time adjustments based on evolving conditions. The integration of advanced trajectory optimization engines and automation within the FOC has led to more effective flight planning, allowing airlines to negotiate trajectory changes dynamically and optimize operations throughout the flight lifecycle. Findings suggest that TBO can enhance operational predictability, flexibility, and strategic planning while reducing uncertainty and improving alignment between strategic and tactical actions. Key conclusions include: TBO is feasible with most currently flying commercial aircraft; full TBO implementation can lead to a greener, more efficient aviation industry with widespread benefits; and continued collaboration among stakeholders is essential for the further development and realization of TBO.

Submi...

Submitted to 25th Integrated Communications, Navigation and Surveillance Conference (ICNS), April 8-10, 2025, Brussels

Real-Time Trajectory Generation for Soft Robot Manipulators Using Differential Flatness 2024-12-11
Show

Soft robots have the potential to interact with sensitive environments and perform complex tasks effectively. However, motion plans and trajectories for soft manipulators are challenging to calculate due to their deformable nature and nonlinear dynamics. This article introduces a fast real-time trajectory generation approach for soft robot manipulators, which creates dynamically-feasible motions for arbitrary kinematically-feasible paths of the robot's end effector. Our insight is that piecewise constant curvature (PCC) dynamics models of soft robots can be differentially flat, therefore control inputs can be calculated algebraically rather than through a nonlinear differential equation. We prove this flatness under certain conditions, with the curvatures of the robot as the flat outputs. Our two-step trajectory generation approach uses an inverse kinematics procedure to calculate a motion plan of robot curvatures per end-effector position, then, our flatness diffeomorphism generates corresponding control inputs that respect velocity. We validate our approach through simulations of our representative soft robot manipulator along three different trajectories, demonstrating a margin of 23x faster than real-time at a frequency of 100 Hz. This approach could allow fast verifiable replanning of soft robots' motions in safety-critical physical environments, crucial for deployment in the real world.

A Bi-Level Optimization Approach to Joint Trajectory Optimization for Redundant Manipulators 2024-12-10
Show

In this work, we present an approach to minimizing the time necessary for the end-effector of a redundant robot manipulator to traverse a Cartesian path by optimizing the trajectory of its joints. Each joint has limits in the ranges of position, velocity and acceleration, the latter making jerks in joint space undesirable. The proposed approach takes this nonlinear optimization problem whose variables are path speed and joint trajectory and reformulates it into a bi-level problem. The lower-level formulation is a convex subproblem that considers a fixed joint trajectory and maximizes path speed while considering all joint velocity and acceleration constraints. Under particular conditions, this subproblem has a closed-form solution. Then, we solve a higher-level subproblem by leveraging the directional derivative of the lower-level value with respect to the joint trajectory parameters. In particular, we use this direction to implement a Primal-Dual method that considers the path accuracy and joint position constraints. We show the efficacy of our proposed approach with simulations and experimental results.

16 pa...

16 pages, 14 pictures

3DTrajMaster: Mastering 3D Trajectory for Multi-Entity Motion in Video Generation 2024-12-10
Show

This paper aims to manipulate multi-entity 3D motions in video generation. Previous methods on controllable video generation primarily leverage 2D control signals to manipulate object motions and have achieved remarkable synthesis results. However, 2D control signals are inherently limited in expressing the 3D nature of object motions. To overcome this problem, we introduce 3DTrajMaster, a robust controller that regulates multi-entity dynamics in 3D space, given user-desired 6DoF pose (location and rotation) sequences of entities. At the core of our approach is a plug-and-play 3D-motion grounded object injector that fuses multiple input entities with their respective 3D trajectories through a gated self-attention mechanism. In addition, we exploit an injector architecture to preserve the video diffusion prior, which is crucial for generalization ability. To mitigate video quality degradation, we introduce a domain adaptor during training and employ an annealed sampling strategy during inference. To address the lack of suitable training data, we construct a 360-Motion Dataset, which first correlates collected 3D human and animal assets with GPT-generated trajectory and then captures their motion with 12 evenly-surround cameras on diverse 3D UE platforms. Extensive experiments show that 3DTrajMaster sets a new state-of-the-art in both accuracy and generalization for controlling multi-entity 3D motions. Project page: http://fuxiao0719.github.io/projects/3dtrajmaster

Proje...

Project Page & Code & Data: http://fuxiao0719.github.io/projects/3dtrajmaster

TraSCE: Trajectory Steering for Concept Erasure 2024-12-10
Show

Recent advancements in text-to-image diffusion models have brought them to the public spotlight, becoming widely accessible and embraced by everyday users. However, these models have been shown to generate harmful content such as not-safe-for-work (NSFW) images. While approaches have been proposed to erase such abstract concepts from the models, jail-breaking techniques have succeeded in bypassing such safety measures. In this paper, we propose TraSCE, an approach to guide the diffusion trajectory away from generating harmful content. Our approach is based on negative prompting, but as we show in this paper, conventional negative prompting is not a complete solution and can easily be bypassed in some corner cases. To address this issue, we first propose a modification of conventional negative prompting. Furthermore, we introduce a localized loss-based guidance that enhances the modified negative prompting technique by steering the diffusion trajectory. We demonstrate that our proposed method achieves state-of-the-art results on various benchmarks in removing harmful content including ones proposed by red teams; and erasing artistic styles and objects. Our proposed approach does not require any training, weight modifications, or training data (both image or prompt), making it easier for model owners to erase new concepts.

Multi-finger Manipulation via Trajectory Optimization with Differentiable Rolling and Geometric Constraints 2024-12-10
Show

Parameterizing finger rolling and finger-object contacts in a differentiable manner is important for formulating dexterous manipulation as a trajectory optimization problem. In contrast to previous methods which often assume simplified geometries of the robot and object or do not explicitly model finger rolling, we propose a method to further extend the capabilities of dexterous manipulation by accounting for non-trivial geometries of both the robot and the object. By integrating the object's Signed Distance Field (SDF) with a sampling method, our method estimates contact and rolling-related variables in a differentiable manner and includes those in a trajectory optimization framework. This formulation naturally allows for the emergence of finger-rolling behaviors, enabling the robot to locally adjust the contact points. To evaluate our method, we introduce a benchmark featuring challenging multi-finger dexterous manipulation tasks, such as screwdriver turning and in-hand reorientation. Our method outperforms baselines in terms of achieving desired object configurations and avoiding dropping the object. We also successfully apply our method to a real-world screwdriver turning task and a cuboid alignment task, demonstrating its robustness to the sim2real gap.

POMDP-Based Trajectory Planning for On-Ramp Highway Merging 2024-12-10
Show

This paper addresses the trajectory planning problem for automated vehicle on-ramp highway merging. To tackle this challenge, we extend our previous work on trajectory planning at unsignalized intersections using Partially Observable Markov Decision Processes (POMDPs). The method utilizes the Adaptive Belief Tree (ABT) algorithm, an approximate sampling-based approach to solve POMDPs efficiently. We outline the POMDP formulation process, beginning with discretizing the highway topology to reduce problem complexity. Additionally, we describe the dynamics and measurement models used to predict future states and establish the relationship between available noisy measurements and predictions. Building on our previous work, the dynamics model is expanded to account for lateral movements necessary for lane changes during the merging process. We also define the reward function, which serves as the primary mechanism for specifying the desired behavior of the automated vehicle, combining multiple goals such as avoiding collisions or maintaining appropriate velocity. Our simulation results, conducted on three scenarios based on real-life traffic data from German highways, demonstrate the method's ability to generate safe, collision-free, and efficient merging trajectories. This work shows the versatility of this POMDP-based approach in tackling various automated driving problems.

When UAV Meets Federated Learning: Latency Minimization via Joint Trajectory Design and Resource Allocation 2024-12-10
Show

Federated learning (FL) has emerged as a pivotal solution for training machine learning models over wireless networks, particularly for Internet of Things (IoT) devices with limited computation resources. Despite its benefits, the efficiency of FL is often restricted by the communication quality between IoT devices and the central server. To address this issue, we introduce an innovative approach by deploying an unmanned aerial vehicle (UAV) as a mobile FL server to enhance the training process of FL. By leveraging the UAV's maneuverability, we establish robust line-of-sight connections with IoT devices, significantly improving communication capacity. To improve the overall training efficiency, we formulate a latency minimization problem by jointly optimizing the bandwidth allocation, computing frequencies, transmit power for both the UAV and IoT devices, and the UAV's trajectory. Then, an efficient alternating optimization algorithm is developed to solve it efficiently. Furthermore, we analyze the convergence and computational complexity of the proposed algorithm. Finally, numerical results demonstrate that our proposed scheme not only outperforms existing benchmark schemes in terms of latency but also achieves training efficiency that closely approximate the ideal scenario.

This ...

This manuscript has been submitted to IEEE

Control-Aware Trajectory Predictions for Communication-Efficient Drone Swarm Coordination in Cluttered Environments 2024-12-10
Show

Swarms of Unmanned Aerial Vehicles (UAV) have demonstrated enormous potential in many industrial and commercial applications. However, before deploying UAVs in the real world, it is essential to ensure they can operate safely in complex environments, especially with limited communication capabilities. To address this challenge, we propose a control-aware learning-based trajectory prediction algorithm that can enable communication-efficient UAV swarm control in a cluttered environment. Specifically, our proposed algorithm can enable each UAV to predict the planned trajectories of its neighbors in scenarios with various levels of communication capabilities. The predicted planned trajectories will serve as input to a distributed model predictive control (DMPC) approach. The proposed algorithm combines (1) a trajectory prediction model based on EvolveGCN, a Graph Convolutional Network (GCN) that can handle dynamic graphs, which is further enhanced by compressed messages from adjacent UAVs, and (2) a KKT-informed training approach that applies the Karush-Kuhn-Tucker (KKT) conditions in the training process to encode DMPC information into the trained neural network. We evaluate our proposed algorithm in a funnel-like environment. Results show that the proposed algorithm outperforms state-of-the-art benchmarks, providing close-to-optimal control performance and robustness to limited communication capabilities and measurement noises.

ITPNet: Towards Instantaneous Trajectory Prediction for Autonomous Driving 2024-12-10
Show

Trajectory prediction of agents is crucial for the safety of autonomous vehicles, whereas previous approaches usually rely on sufficiently long-observed trajectory to predict the future trajectory of the agents. However, in real-world scenarios, it is not realistic to collect adequate observed locations for moving agents, leading to the collapse of most prediction models. For instance, when a moving car suddenly appears and is very close to an autonomous vehicle because of the obstruction, it is quite necessary for the autonomous vehicle to quickly and accurately predict the future trajectories of the car with limited observed trajectory locations. In light of this, we focus on investigating the task of instantaneous trajectory prediction, i.e., two observed locations are available during inference. To this end, we propose a general and plug-and-play instantaneous trajectory prediction approach, called ITPNet. Specifically, we propose a backward forecasting mechanism to reversely predict the latent feature representations of unobserved historical trajectories of the agent based on its two observed locations and then leverage them as complementary information for future trajectory prediction. Meanwhile, due to the inevitable existence of noise and redundancy in the predicted latent feature representations, we further devise a Noise Redundancy Reduction Former, aiming at to filter out noise and redundancy from unobserved trajectories and integrate the filtered features and observed features into a compact query for future trajectory predictions. In essence, ITPNet can be naturally compatible with existing trajectory prediction models, enabling them to gracefully handle the case of instantaneous trajectory prediction. Extensive experiments on the Argoverse and nuScenes datasets demonstrate ITPNet outperforms the baselines, and its efficacy with different trajectory prediction models.

Model predictive control-based trajectory generation for agile landing of unmanned aerial vehicle on a moving boat 2024-12-10
Show

This paper proposes a novel trajectory generation method based on Model Predictive Control (MPC) for agile landing of an Unmanned Aerial Vehicle (UAV) onto an Unmanned Surface Vehicle (USV)'s deck in harsh conditions. The trajectory generation exploits the state predictions of the USV to create periodically updated trajectories for a multirotor UAV to precisely land on the deck of a moving USV even in cases where the deck's inclination is continuously changing. We use an MPC-based scheme to create trajectories that consider both the UAV dynamics and the predicted states of the USV up to the first derivative of position and orientation. Compared to existing approaches, our method dynamically modifies the penalization matrices to precisely follow the corresponding states with respect to the flight phase. Especially during the landing maneuver, the UAV synchronizes attitude with the USV's, allowing for fast landing on a tilted deck. Simulations show the method's reliability in various sea conditions up to Rough sea (wave height 4 m), outperforming state-of-the-art methods in landing speed and accuracy, with twice the precision on average. Finally, real-world experiments validate the simulation results, demonstrating robust landings on a moving USV, while all computations are performed in real-time onboard the UAV.

18 pa...

18 pages, 17 figures, Ocean Engineering

PPT: Pre-Training with Pseudo-Labeled Trajectories for Motion Forecasting 2024-12-09
Show

Motion forecasting (MF) for autonomous driving aims at anticipating trajectories of surrounding agents in complex urban scenarios. In this work, we investigate a mixed strategy in MF training that first pre-train motion forecasters on pseudo-labeled data, then fine-tune them on annotated data. To obtain pseudo-labeled trajectories, we propose a simple pipeline that leverages off-the-shelf single-frame 3D object detectors and non-learning trackers. The whole pre-training strategy including pseudo-labeling is coined as PPT. Our extensive experiments demonstrate that: (1) combining PPT with supervised fine-tuning on annotated data achieves superior performance on diverse testbeds, especially under annotation-efficient regimes, (2) scaling up to multiple datasets improves the previous state-of-the-art and (3) PPT helps enhance cross-dataset generalization. Our findings showcase PPT as a promising pre-training solution for robust motion forecasting in diverse autonomous driving contexts.

Parameter Adjustments in POMDP-Based Trajectory Planning for Unsignalized Intersections 2024-12-09
Show

This paper investigates the problem of trajectory planning for autonomous vehicles at unsignalized intersections, specifically focusing on scenarios where the vehicle lacks the right of way and yet must cross safely. To address this issue, we have employed a method based on the Partially Observable Markov Decision Processes (POMDPs) framework designed for planning under uncertainty. The method utilizes the Adaptive Belief Tree (ABT) algorithm as an approximate solver for the POMDPs. We outline the POMDP formulation, beginning with discretizing the intersection's topology. Additionally, we present a dynamics model for the prediction of the evolving states of vehicles, such as their position and velocity. Using an observation model, we also describe the connection of those states with the imperfect (noisy) available measurements. Our results confirmed that the method is able to plan collision-free trajectories in a series of simulations utilizing real-world traffic data from aerial footage of two distinct intersections. Furthermore, we studied the impact of parameter adjustments of the ABT algorithm on the method's performance. This provides guidance in determining reasonable parameter settings, which is valuable for future method applications.

Deterministic Trajectory Optimization through Probabilistic Optimal Control 2024-12-09
Show

In this article, we discuss two algorithms tailored to discrete-time deterministic finite-horizon nonlinear optimal control problems or so-called deterministic trajectory optimization problems. Both algorithms can be derived from an emerging theoretical paradigm that we refer to as probabilistic optimal control. The paradigm reformulates stochastic optimal control as an equivalent probabilistic inference problem and can be viewed as a generalisation of the former. The merit of this perspective is that it allows to address the problem using the Expectation-Maximization algorithm. It is shown that the application of this algorithm results in a fixed point iteration of probabilistic policies that converge to the deterministic optimal policy. Two strategies for policy evaluation are discussed, using state-of-the-art uncertainty quantification methods resulting into two distinct algorithms. The algorithms are structurally closest related to the differential dynamic programming algorithm and related methods that use sigma-point methods to avoid direct gradient evaluations. The main advantage of the algorithms is an improved balance between exploration and exploitation over the iterations, leading to improved numerical stability and accelerated convergence. These properties are demonstrated on different nonlinear systems.

Efficient Data Representation for Motion Forecasting: A Scene-Specific Trajectory Set Approach 2024-12-09
Show

Representing diverse and plausible future trajectories is critical for motion forecasting in autonomous driving. However, efficiently capturing these trajectories in a compact set remains challenging. This study introduces a novel approach for generating scene-specific trajectory sets tailored to different contexts, such as intersections and straight roads, by leveraging map information and actor dynamics. A deterministic goal sampling algorithm identifies relevant map regions, while our Recursive In-Distribution Subsampling (RIDS) method enhances trajectory plausibility by condensing redundant representations. Experiments on the Argoverse 2 dataset demonstrate that our method achieves up to a 10% improvement in Driving Area Compliance (DAC) compared to baseline methods while maintaining competitive displacement errors. Our work highlights the benefits of mining such scene-aware trajectory sets and how they could capture the complex and heterogeneous nature of actor behavior in real-world driving scenarios.

Learning Soft Driving Constraints from Vectorized Scene Embeddings while Imitating Expert Trajectories 2024-12-07
Show

The primary goal of motion planning is to generate safe and efficient trajectories for vehicles. Traditionally, motion planning models are trained using imitation learning to mimic the behavior of human experts. However, these models often lack interpretability and fail to provide clear justifications for their decisions. We propose a method that integrates constraint learning into imitation learning by extracting driving constraints from expert trajectories. Our approach utilizes vectorized scene embeddings that capture critical spatial and temporal features, enabling the model to identify and generalize constraints across various driving scenarios. We formulate the constraint learning problem using a maximum entropy model, which scores the motion planner's trajectories based on their similarity to the expert trajectory. By separating the scoring process into distinct reward and constraint streams, we improve both the interpretability of the planner's behavior and its attention to relevant scene components. Unlike existing constraint learning methods that rely on simulators and are typically embedded in reinforcement learning (RL) or inverse reinforcement learning (IRL) frameworks, our method operates without simulators, making it applicable to a wider range of datasets and real-world scenarios. Experimental results on the InD and TrafficJams datasets demonstrate that incorporating driving constraints enhances model interpretability and improves closed-loop performance.

M$^3$PC: Test-time Model Predictive Control for Pretrained Masked Trajectory Model 2024-12-07
Show

Recent work in Offline Reinforcement Learning (RL) has shown that a unified Transformer trained under a masked auto-encoding objective can effectively capture the relationships between different modalities (e.g., states, actions, rewards) within given trajectory datasets. However, this information has not been fully exploited during the inference phase, where the agent needs to generate an optimal policy instead of just reconstructing masked components from unmasked ones. Given that a pretrained trajectory model can act as both a Policy Model and a World Model with appropriate mask patterns, we propose using Model Predictive Control (MPC) at test time to leverage the model's own predictive capability to guide its action selection. Empirical results on D4RL and RoboMimic show that our inference-phase MPC significantly improves the decision-making performance of a pretrained trajectory model without any additional parameter training. Furthermore, our framework can be adapted to Offline to Online (O2O) RL and Goal Reaching RL, resulting in more substantial performance gains when an additional online interaction budget is provided, and better generalization capabilities when different task targets are specified. Code is available: https://github.com/wkh923/m3pc.

Perturb-and-Revise: Flexible 3D Editing with Generative Trajectories 2024-12-06
Show

The fields of 3D reconstruction and text-based 3D editing have advanced significantly with the evolution of text-based diffusion models. While existing 3D editing methods excel at modifying color, texture, and style, they struggle with extensive geometric or appearance changes, thus limiting their applications. We propose Perturb-and-Revise, which makes possible a variety of NeRF editing. First, we perturb the NeRF parameters with random initializations to create a versatile initialization. We automatically determine the perturbation magnitude through analysis of the local loss landscape. Then, we revise the edited NeRF via generative trajectories. Combined with the generative process, we impose identity-preserving gradients to refine the edited NeRF. Extensive experiments demonstrate that Perturb-and-Revise facilitates flexible, effective, and consistent editing of color, appearance, and geometry in 3D. For 360{\deg} results, please visit our project page: https://susunghong.github.io/Perturb-and-Revise.

Proje...

Project page: https://susunghong.github.io/Perturb-and-Revise

TFT-multi: simultaneous forecasting of vital sign trajectories in the ICU 2024-12-06
Show

Trajectory forecasting in healthcare data has been an important area of research in precision care and clinical integration for computational methods. In recent years, generative AI models have demonstrated promising results in capturing short and long range dependencies in time series data. While these models have also been applied in healthcare, most of them only predict one value at a time, which is unrealistic in a clinical setting where multiple measures are taken at once. In this work, we extend the framework temporal fusion transformer (TFT), a multi-horizon time series prediction tool, and propose TFT-multi, an end-to-end framework that can predict multiple vital trajectories simultaneously. We apply TFT-multi to forecast 5 vital signs recorded in the intensive care unit: blood pressure, pulse, SpO2, temperature and respiratory rate. We hypothesize that by jointly predicting these measures, which are often correlated with one another, we can make more accurate predictions, especially in variables with large missingness. We validate our model on the public MIMIC dataset and an independent institutional dataset, and demonstrate that this approach outperforms state-of-the-art univariate prediction tools including the original TFT and Prophet, as well as vector regression modeling for multivariate prediction. Furthermore, we perform a study case analysis by applying our pipeline to forecast blood pressure changes in response to actual and hypothetical pressor administration.

Opt2Skill: Imitating Dynamically-feasible Whole-Body Trajectories for Versatile Humanoid Loco-Manipulation 2024-12-06
Show

Humanoid robots are designed to perform diverse loco-manipulation tasks. However, they face challenges due to their high-dimensional and unstable dynamics, as well as the complex contact-rich nature of the tasks. Model-based optimal control methods offer precise and systematic control but are limited by high computational complexity and accurate contact sensing. On the other hand, reinforcement learning (RL) provides robustness and handles high-dimensional spaces but suffers from inefficient learning, unnatural motion, and sim-to-real gaps. To address these challenges, we introduce Opt2Skill, an end-to-end pipeline that combines model-based trajectory optimization with RL to achieve robust whole-body loco-manipulation. We generate reference motions for the Digit humanoid robot using differential dynamic programming (DDP) and train RL policies to track these trajectories. Our results demonstrate that Opt2Skill outperforms pure RL methods in both training efficiency and task performance, with optimal trajectories that account for torque limits enhancing trajectory tracking. We successfully transfer our approach to real-world applications.

Socially-Informed Reconstruction for Pedestrian Trajectory Forecasting 2024-12-05
Show

Pedestrian trajectory prediction remains a challenge for autonomous systems, particularly due to the intricate dynamics of social interactions. Accurate forecasting requires a comprehensive understanding not only of each pedestrian's previous trajectory but also of their interaction with the surrounding environment, an important part of which are other pedestrians moving dynamically in the scene. To learn effective socially-informed representations, we propose a model that uses a reconstructor alongside a conditional variational autoencoder-based trajectory forecasting module. This module generates pseudo-trajectories, which we use as augmentations throughout the training process. To further guide the model towards social awareness, we propose a novel social loss that aids in forecasting of more stable trajectories. We validate our approach through extensive experiments, demonstrating strong performances in comparison to state of-the-art methods on the ETH/UCY and SDD benchmarks.

Accep...

Accepted at Winter Conference on Applications of Computer Vision (WACV), 2025

SmallToLarge (S2L): Scalable Data Selection for Fine-tuning Large Language Models by Summarizing Training Trajectories of Small Models 2024-12-05
Show

Despite the effectiveness of data selection for large language models (LLMs) during pretraining and instruction fine-tuning phases, improving data efficiency in supervised fine-tuning (SFT) for specialized domains poses significant challenges due to the complexity of fine-tuning data. To bridge this gap, we introduce an effective and scalable data selection method for SFT, SmallToLarge (S2L), which leverages training trajectories from small models to guide the data selection for larger models. We demonstrate through extensive experiments that S2L significantly improves data efficiency in SFT for mathematical problem-solving, reducing the training data to just 11% of the original MathInstruct dataset (Yue et al., 2023) to match full dataset performance while outperforming state-of-the-art data selection algorithms by an average of 4.7% across 6 in- and out-domain evaluation datasets. Remarkably, selecting only 50K data for SFT, S2L achieves a 32.7% accuracy on the most challenging MATH (Hendrycks et al., 2021) benchmark, improving Phi-2 (Li et al., 2023b) by 16.6%. In clinical text summarization on the MIMIC-III dataset (Johnson et al., 2016), S2L again outperforms training on the full dataset using only 50% of the data. Notably, S2L can perform data selection using a reference model 40x smaller than the target model, proportionally reducing the cost of data selection.

Modeling Eye Gaze Velocity Trajectories using GANs with Spectral Loss for Enhanced Fidelity 2024-12-05
Show

Accurate modeling of eye gaze dynamics is essential for advancement in human-computer interaction, neurological diagnostics, and cognitive research. Traditional generative models like Markov models often fail to capture the complex temporal dependencies and distributional nuance inherent in eye gaze trajectories data. This study introduces a GAN framework employing LSTM and CNN generators and discriminators to generate high-fidelity synthetic eye gaze velocity trajectories. We conducted a comprehensive evaluation of four GAN architectures: CNN-CNN, LSTM-CNN, CNN-LSTM, and LSTM-LSTM trained under two conditions: using only adversarial loss and using a weighted combination of adversarial and spectral losses. Our findings reveal that the LSTM-CNN architecture trained with this new loss function exhibits the closest alignment to the real data distribution, effectively capturing both the distribution tails and the intricate temporal dependencies. The inclusion of spectral regularization significantly enhances the GANs ability to replicate the spectral characteristics of eye gaze movements, leading to a more stable learning process and improved data fidelity. Comparative analysis with an HMM optimized to four hidden states further highlights the advantages of the LSTM-CNN GAN. Statistical metrics show that the HMM-generated data significantly diverges from the real data in terms of mean, standard deviation, skewness, and kurtosis. In contrast, the LSTM-CNN model closely matches the real data across these statistics, affirming its capacity to model the complexity of eye gaze dynamics effectively. These results position the spectrally regularized LSTM-CNN GAN as a robust tool for generating synthetic eye gaze velocity data with high fidelity.

16
Towards Fast and Safety-Guaranteed Trajectory Planning and Tracking for Time-Varying Systems 2024-12-05
Show

When deploying autonomous systems in unknown and changing environments, it is critical that their motion planning and control algorithms are computationally efficient and can be reapplied online in real time, whilst providing theoretical safety guarantees in the presence of disturbances. The satisfaction of these objectives becomes more challenging when considering time-varying dynamics and disturbances, which arise in real-world contexts. We develop methods with the potential to address these issues by applying an offline-computed safety guaranteeing controller on a physical system, to track a virtual system that evolves through a trajectory that is replanned online, accounting for constraints updated online. The first method we propose is designed for general time-varying systems over a finite horizon. Our second method overcomes the finite horizon restriction for periodic systems. We simulate our algorithms on a case study of an autonomous underwater vehicle subject to wave disturbances.

18 pa...

18 pages, 7 figures, submitted to Transactions on Automatic Control

Abstraction-based Control of Unknown Continuous-Space Models with Just Two Trajectories 2024-12-05
Show

Finite abstractions (a.k.a. symbolic models) offer an effective scheme for approximating the complex continuous-space systems with simpler models in the discrete-space domain. A crucial aspect, however, is to establish a formal relation between the original system and its symbolic model, ensuring that a discrete controller designed for the symbolic model can be effectively implemented as a hybrid controller (using an interface map) for the original system. This task becomes even more challenging when the exact mathematical model of the continuous-space system is unknown. To address this, the existing literature mainly employs scenario-based data-driven methods, which require collecting a large amount of data from the original system. In this work, we propose a data-driven framework that utilizes only two input-state trajectories collected from unknown nonlinear polynomial systems to synthesize a hybrid controller, enabling the desired behavior on the unknown system through the controller derived from its symbolic model. To accomplish this, we employ the concept of alternating simulation functions (ASFs) to quantify the closeness between the state trajectories of the unknown system and its data-driven symbolic model. By satisfying a specific rank condition on the collected data, which intuitively ensures that the unknown system is persistently excited, we directly design an ASF and its corresponding hybrid controller using finite-length data without explicitly identifying the unknown system, while providing correctness guarantees. This is achieved through proposing a data-based sum-of-squares (SOS) optimization program, enabling a systematic approach to the design process. We illustrate the effectiveness of our data-driven approach through a case study.

SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation 2024-12-05
Show

Methods for image-to-video generation have achieved impressive, photo-realistic quality. However, adjusting specific elements in generated videos, such as object motion or camera movement, is often a tedious process of trial and error, e.g., involving re-generating videos with different random seeds. Recent techniques address this issue by fine-tuning a pre-trained model to follow conditioning signals, such as bounding boxes or point trajectories. Yet, this fine-tuning procedure can be computationally expensive, and it requires datasets with annotated object motion, which can be difficult to procure. In this work, we introduce SG-I2V, a framework for controllable image-to-video generation that is self-guided$\unicode{x2013}$offering zero-shot control by relying solely on the knowledge present in a pre-trained image-to-video diffusion model without the need for fine-tuning or external knowledge. Our zero-shot method outperforms unsupervised baselines while significantly narrowing down the performance gap with supervised models in terms of visual quality and motion fidelity.

Proje...

Project page: https://kmcode1.github.io/Projects/SG-I2V/

PathletRL++: Optimizing Trajectory Pathlet Extraction and Dictionary Formation via Reinforcement Learning 2024-12-04
Show

Advances in tracking technologies have spurred the rapid growth of large-scale trajectory data. Building a compact collection of pathlets, referred to as a trajectory pathlet dictionary, is essential for supporting mobility-related applications. Existing methods typically adopt a top-down approach, generating numerous candidate pathlets and selecting a subset, leading to high memory usage and redundant storage from overlapping pathlets. To overcome these limitations, we propose a bottom-up strategy that incrementally merges basic pathlets to build the dictionary, reducing memory requirements by up to 24,000 times compared to baseline methods. The approach begins with unit-length pathlets and iteratively merges them while optimizing utility, which is defined using newly introduced metrics of trajectory loss and representability. We develop a deep reinforcement learning framework, PathletRL, which utilizes Deep Q-Networks (DQN) to approximate the utility function, resulting in a compact and efficient pathlet dictionary. Experiments on both synthetic and real-world datasets demonstrate that our method outperforms state-of-the-art techniques, reducing the size of the constructed dictionary by up to 65.8%. Additionally, our results show that only half of the dictionary pathlets are needed to reconstruct 85% of the original trajectory data. Building on PathletRL, we introduce PathletRL++, which extends the original model by incorporating a richer state representation and an improved reward function to optimize decision-making during pathlet merging. These enhancements enable the agent to gain a more nuanced understanding of the environment, leading to higher-quality pathlet dictionaries. PathletRL++ achieves even greater dictionary size reduction, surpassing the performance of PathletRL, while maintaining high trajectory representability.

Koopman Based Trajectory Optimization with Mixed Boundaries 2024-12-04
Show

Trajectory optimization is a widely used tool in the design and control of dynamical systems. Typically, not only nonlinear dynamics, but also couplings of the initial and final condition through implicit boundary constraints render the optimization problem non-convex. This paper investigates how the Koopman operator framework can be utilized to solve trajectory optimization problems in a (partially) convex fashion. While the Koopman operator has already been successfully employed in model predictive control, the challenge of addressing mixed boundary constraints within the Koopman framework has remained an open question. We first address this issue by explaining why a complete convexification of the problem is not possible. Secondly, we propose a method where we transform the trajectory optimization problem into a bilevel problem in which we are then able to convexify the high-dimensional lower-level problem. This separation yields a low-dimensional upper-level problem, which could be exploited in global optimization algorithms. Lastly, we demonstrate the effectiveness of the method on two example systems: the mathematical pendulum and the compass-gait walker.

submi...

submitted to 7th Annual Learning for Dynamics & Control Conference Research (L4DC 2025)

TGS: Trajectory Generation and Selection using Vision Language Models in Mapless Outdoor Environments 2024-12-04
Show

We present a multi-modal trajectory generation and selection algorithm for real-world mapless outdoor navigation in human-centered environments. Such environments contain rich features like crosswalks, grass, and curbs, which are easily interpretable by humans, but not by mobile robots. We aim to compute suitable trajectories that (1) satisfy the environment-specific traversability constraints and (2) generate human-like paths while navigating on crosswalks, sidewalks, etc. Our formulation uses a Conditional Variational Autoencoder (CVAE) generative model enhanced with traversability constraints to generate multiple candidate trajectories for global navigation. We develop a visual prompting approach and leverage the Visual Language Model's (VLM) zero-shot ability of semantic understanding and logical reasoning to choose the best trajectory given the contextual information about the task. We evaluate our method in various outdoor scenes with wheeled robots and compare the performance with other global navigation algorithms. In practice, we observe an average improvement of 22.07% in satisfying traversability constraints and 30.53% in terms of human-like navigation in four different outdoor navigation scenarios.

Topological Trajectory Classification and Landmark Inference on Simplicial Complexes 2024-12-04
Show

We consider the problem of classifying trajectories on a discrete or discretised 2-dimensional manifold modelled by a simplicial complex. Previous works have proposed to project the trajectories into the harmonic eigenspace of the Hodge Laplacian, and then cluster the resulting embeddings. However, if the considered space has vanishing homology (i.e., no "holes"), then the harmonic space of the 1-Hodge Laplacian is trivial and thus the approach fails. Here we propose to view this issue akin to a sensor placement problem and present an algorithm that aims to learn "optimal holes" to distinguish a set of given trajectory classes. Specifically, given a set of labelled trajectories, which we interpret as edge-flows on the underlying simplicial complex, we search for 2-simplices whose deletion results in an optimal separation of the trajectory labels according to the corresponding spectral embedding of the trajectories into the harmonic space. Finally, we generalise this approach to the unsupervised setting.

5 pag...

5 pages, 4 figures, Accepted at the 58th Annual Asilomar Conference on Signals, Systems, and Computers 2024

Social-Transmotion: Promptable Human Trajectory Prediction 2024-12-04
Show

Accurate human trajectory prediction is crucial for applications such as autonomous vehicles, robotics, and surveillance systems. Yet, existing models often fail to fully leverage the non-verbal social cues human subconsciously communicate when navigating the space. To address this, we introduce Social-Transmotion, a generic Transformer-based model that exploits diverse and numerous visual cues to predict human behavior. We translate the idea of a prompt from Natural Language Processing (NLP) to the task of human trajectory prediction, where a prompt can be a sequence of x-y coordinates on the ground, bounding boxes in the image plane, or body pose keypoints in either 2D or 3D. This, in turn, augments trajectory data, leading to enhanced human trajectory prediction. Using masking technique, our model exhibits flexibility and adaptability by capturing spatiotemporal interactions between agents based on the available visual cues. We delve into the merits of using 2D versus 3D poses, and a limited set of poses. Additionally, we investigate the spatial and temporal attention map to identify which keypoints and time-steps in the sequence are vital for optimizing human trajectory prediction. Our approach is validated on multiple datasets, including JTA, JRDB, Pedestrians and Cyclists in Road Traffic, and ETH-UCY. The code is publicly available: https://github.com/vita-epfl/social-transmotion.

ICLR 2024
Pairwise Spatiotemporal Partial Trajectory Matching for Co-movement Analysis 2024-12-03
Show

Spatiotemporal pairwise movement analysis involves identifying shared geographic-based behaviors between individuals within specific time frames. Traditionally, this task relies on sequence modeling and behavior analysis techniques applied to tabular or video-based data, but these methods often lack interpretability and struggle to capture partial matching. In this paper, we propose a novel method for pairwise spatiotemporal partial trajectory matching that transforms tabular spatiotemporal data into interpretable trajectory images based on specified time windows, allowing for partial trajectory analysis. This approach includes localization of trajectories, checking for spatial overlap, and pairwise matching using a Siamese Neural Network. We evaluate our method on a co-walking classification task, demonstrating its effectiveness in a novel co-behavior identification application. Our model surpasses established methods, achieving an F1-score up to 0.73. Additionally, we explore the method's utility for pair routine pattern analysis in real-world scenarios, providing insights into the frequency, timing, and duration of shared behaviors. This approach offers a powerful, interpretable framework for spatiotemporal behavior analysis, with potential applications in social behavior research, urban planning, and healthcare.

In su...

In submission. 17 pages, 5 figures

Motion Prompting: Controlling Video Generation with Motion Trajectories 2024-12-03
Show

Motion control is crucial for generating expressive and compelling video content; however, most existing video generation models rely mainly on text prompts for control, which struggle to capture the nuances of dynamic actions and temporal compositions. To this end, we train a video generation model conditioned on spatio-temporally sparse or dense motion trajectories. In contrast to prior motion conditioning work, this flexible representation can encode any number of trajectories, object-specific or global scene motion, and temporally sparse motion; due to its flexibility we refer to this conditioning as motion prompts. While users may directly specify sparse trajectories, we also show how to translate high-level user requests into detailed, semi-dense motion prompts, a process we term motion prompt expansion. We demonstrate the versatility of our approach through various applications, including camera and object motion control, "interacting" with an image, motion transfer, and image editing. Our results showcase emergent behaviors, such as realistic physics, suggesting the potential of motion prompts for probing video models and interacting with future generative world models. Finally, we evaluate quantitatively, conduct a human study, and demonstrate strong performance. Video results are available on our webpage: https://motion-prompting.github.io/

Proje...

Project page: https://motion-prompting.github.io/

Resonance: Learning to Predict Social-Aware Pedestrian Trajectories as Co-Vibrations 2024-12-03
Show

Learning to forecast the trajectories of intelligent agents like pedestrians has caught more researchers' attention. Despite researchers' efforts, it remains a challenge to accurately account for social interactions among agents when forecasting, and in particular, to simulate such social modifications to future trajectories in an explainable and decoupled way. Inspired by the resonance phenomenon of vibration systems, we propose the Resonance (short for Re) model to forecast pedestrian trajectories as co-vibrations, and regard that social interactions are associated with spectral properties of agents' trajectories. It forecasts future trajectories as three distinct vibration terms to represent agents' future plans from different perspectives in a decoupled way. Also, agents' social interactions and how they modify scheduled trajectories will be considered in a resonance-like manner by learning the similarities of their trajectory spectrums. Experiments on multiple datasets, whether pedestrian or vehicle, have verified the usefulness of our method both quantitatively and qualitatively.

Who Walks With You Matters: Perceiving Social Interactions with Groups for Pedestrian Trajectory Prediction 2024-12-03
Show

Understanding and anticipating human movement has become more critical and challenging in diverse applications such as autonomous driving and surveillance. The complex interactions brought by different relations between agents are a crucial reason that poses challenges to this task. Researchers have put much effort into designing a system using rule-based or data-based models to extract and validate the patterns between pedestrian trajectories and these interactions, which has not been adequately addressed yet. Inspired by how humans perceive social interactions with different level of relations to themself, this work proposes the GrouP ConCeption (short for GPCC) model composed of the Group method, which categorizes nearby agents into either group members or non-group members based on a long-term distance kernel function, and the Conception module, which perceives both visual and acoustic information surrounding the target agent. Evaluated across multiple datasets, the GPCC model demonstrates significant improvements in trajectory prediction accuracy, validating its effectiveness in modeling both social and individual dynamics. The qualitative analysis also indicates that the GPCC framework successfully leverages grouping and perception cues human-like intuitively to validate the proposed model's explainability in pedestrian trajectory forecasting.

15 pa...

15 pages, 10 figures, submitted to CVPR 2025

Trajectory-based Road Autolabeling with Lidar-Camera Fusion in Winter Conditions 2024-12-03
Show

Robust road segmentation in all road conditions is required for safe autonomous driving and advanced driver assistance systems. Supervised deep learning methods provide accurate road segmentation in the domain of their training data but cannot be trusted in out-of-distribution scenarios. Including the whole distribution in the trainset is challenging as each sample must be labeled by hand. Trajectory-based self-supervised methods offer a potential solution as they can learn from the traversed route without manual labels. However, existing trajectory-based methods use learning schemes that rely only on the camera or only on the lidar. In this paper, trajectory-based learning is implemented jointly with lidar and camera for increased performance. Our method outperforms recent standalone camera- and lidar-based methods when evaluated with a challenging winter driving dataset including countryside and suburb driving scenes. The source code is available at https://github.com/eerik98/lidar-camera-road-autolabeling.git

FlickerFusion: Intra-trajectory Domain Generalizing Multi-Agent RL 2024-12-03
Show

Multi-agent reinforcement learning has demonstrated significant potential in addressing complex cooperative tasks across various real-world applications. However, existing MARL approaches often rely on the restrictive assumption that the number of entities (e.g., agents, obstacles) remains constant between training and inference. This overlooks scenarios where entities are dynamically removed or added during the inference trajectory -- a common occurrence in real-world environments like search and rescue missions and dynamic combat situations. In this paper, we tackle the challenge of intra-trajectory dynamic entity composition under zero-shot out-of-domain (OOD) generalization, where such dynamic changes cannot be anticipated beforehand. Our empirical studies reveal that existing MARL methods suffer significant performance degradation and increased uncertainty in these scenarios. In response, we propose FlickerFusion, a novel OOD generalization method that acts as a universally applicable augmentation technique for MARL backbone methods. FlickerFusion stochastically drops out parts of the observation space, emulating being in-domain when inferenced OOD. The results show that FlickerFusion not only achieves superior inference rewards but also uniquely reduces uncertainty vis-`a-vis the backbone, compared to existing methods. Benchmarks, implementations, and model weights are organized and open-sourced at flickerfusion305.github.io, accompanied by ample demo video renderings.

NeurI...

NeurIPS '24 Open-World Agents Workshop

Optimizing Latent Goal by Learning from Trajectory Preference 2024-12-03
Show

A glowing body of work has emerged focusing on instruction-following policies for open-world agents, aiming to better align the agent's behavior with human intentions. However, the performance of these policies is highly susceptible to the initial prompt, which leads to extra efforts in selecting the best instructions. We propose a framework named Preference Goal Tuning (PGT). PGT allows an instruction following policy to interact with the environment to collect several trajectories, which will be categorized into positive and negative samples based on preference. Then we use preference learning to fine-tune the initial goal latent representation with the categorized trajectories while keeping the policy backbone frozen. The experiment result shows that with minimal data and training, PGT achieves an average relative improvement of 72.0% and 81.6% over 17 tasks in 2 different foundation policies respectively, and outperforms the best human-selected instructions. Moreover, PGT surpasses full fine-tuning in the out-of-distribution (OOD) task-execution environments by 13.4%, indicating that our approach retains strong generalization capabilities. Since our approach stores a single latent representation for each task independently, it can be viewed as an efficient method for continual learning, without the risk of catastrophic forgetting or task interference. In short, PGT enhances the performance of agents across nearly all tasks in the Minecraft Skillforge benchmark and demonstrates robustness to the execution environment.

Another Vertical View: A Hierarchical Network for Heterogeneous Trajectory Prediction via Spectrums 2024-12-03
Show

With the fast development of AI-related techniques, the applications of trajectory prediction are no longer limited to easier scenes and trajectories. More and more trajectories with different forms, such as coordinates, bounding boxes, and even high-dimensional human skeletons, need to be analyzed and forecasted. Among these heterogeneous trajectories, interactions between different elements within a frame of trajectory, which we call ``Dimension-wise Interactions'', would be more complex and challenging. However, most previous approaches focus mainly on a specific form of trajectories, and potential dimension-wise interactions are less concerned. In this work, we expand the trajectory prediction task by introducing the trajectory dimensionality $M$, thus extending its application scenarios to heterogeneous trajectories. We first introduce the Haar transform as an alternative to Fourier transform to better capture the time-frequency properties of each trajectory-dimension. Then, we adopt the bilinear structure to model and fuse two factors simultaneously, including the time-frequency response and the dimension-wise interaction, to forecast heterogeneous trajectories via trajectory spectrums hierarchically in a generic way. Experiments show that the proposed model outperforms most state-of-the-art methods on ETH-UCY, SDD, nuScenes, and Human3.6M with heterogeneous trajectories, including 2D coordinates, 2D/3D bounding boxes, and 3D human skeletons.

Autonomy in the Real-World: Autonomous Trajectory Planning for Asteroid Reconnaissance via Stochastic Optimization 2024-12-03
Show

This paper presents the development and evaluation of an optimization-based autonomous trajectory planning algorithm for the asteroid reconnaissance phase of a deep-space exploration mission. The reconnaissance phase is a low-altitude flyby to collect detailed information around a potential landing site. Although such autonomous deep-space exploration missions have garnered considerable interest recently, state-of-the-practice in trajectory design involves a time-intensive ground-based open-loop process that forward propagates multiple trajectories with a range of initial conditions and parameters to account for uncertainties in spacecraft knowledge and actuation. In this work, we introduce a stochastic trajectory optimization-based approach to generate trajectories that satisfy both the mission and spacecraft safety constraints during the reconnaissance phase of the Deep-space Autonomous Robotic Explorer (DARE) mission concept, which seeks to travel to and explore a near-Earth object autonomously, with minimal ground intervention. We first use the Multi-Spacecraft Concept and Autonomy Tool (MuSCAT) simulation framework to rigorously validate the underlying modeling assumptions for our trajectory planner and then propose a method to transform this stochastic optimal control problem into a deterministic one tailored for use with an off-the-shelf nonlinear solver. Finally, we demonstrate the efficacy of our proposed algorithmic approach through extensive numerical experiments and show that it outperforms the state-of-the-practice benchmark used for representative missions.

accep...

accepted for 2025 AIAA SciTech Forum (also selected a finalist for the 2025 GNC Graduate Student Paper Competition)

Driving Scene Synthesis on Free-form Trajectories with Generative Prior 2024-12-02
Show

Driving scene synthesis along free-form trajectories is essential for driving simulations to enable closed-loop evaluation of end-to-end driving policies. While existing methods excel at novel view synthesis on recorded trajectories, they face challenges with novel trajectories due to limited views of driving videos and the vastness of driving environments. To tackle this challenge, we propose a novel free-form driving view synthesis approach, dubbed DriveX, by leveraging video generative prior to optimize a 3D model across a variety of trajectories. Concretely, we crafted an inverse problem that enables a video diffusion model to be utilized as a prior for many-trajectory optimization of a parametric 3D model (e.g., Gaussian splatting). To seamlessly use the generative prior, we iteratively conduct this process during optimization. Our resulting model can produce high-fidelity virtual driving environments outside the recorded trajectory, enabling free-form trajectory driving simulation. Beyond real driving scenes, DriveX can also be utilized to simulate virtual driving worlds from AI-generated videos.

Outstanding framework for simulating and generating anchor trajectory in wireless sensor networks 2024-12-02
Show

This paper proposes a framework that has the ability to animate and generate different scenarios for the mobility of a movable anchor which can follow various paths in wireless sensor networks (WSNs). When the researchers use NS-2 to simulate a single anchor-assisted localization model, they face the problem of creating the movement file of the movable anchor. The proposed framework solved this problem by allowing them to create the movement scenario regarding different trajectories. The proposed framework lets the researcher set the needed parameters for simulating various static path models, which can be displayed through the graphical user interface. The researcher can also view the mobility of the movable anchor with control of its speed and communication range. The proposed framework has been validated by comparing its results to NS-2 outputs plus comparing it against existing tools. Finally, this framework has been published on the Code Project website and downloaded by many users.

Differential Flatness-based Fast Trajectory Planning for Fixed-wing Unmanned Aerial Vehicles 2024-12-02
Show

Due to the strong nonlinearity and nonholonomic dynamics, despite that various general trajectory optimization methods have been presented, few of them can guarantee efficient compu-tation and physical feasibility for relatively complicated fixed-wing UAV dynamics. Aiming at this issue, this paper investigates a differential flatness-based trajectory optimization method for fixed-wing UAVs (DFTO-FW), which transcribes the trajectory optimization into a lightweight, unconstrained, gradient-analytical optimization with linear time complexity in each itera-tion to achieve fast trajectory generation. Through differential flat characteristics analysis and polynomial parameterization, the customized trajectory representation is presented, which implies the equality constraints to avoid the heavy computational burdens of solving complex dynamics. Through the design of integral performance costs and deduction of analytical gradients, the original trajectory optimization is transcribed into an uncon-strained, gradient-analytical optimization with linear time com-plexity to further improve efficiency. The simulation experi-ments illustrate the superior efficiency of the DFTO-FW, which takes sub-second CPU time against other competitors by orders of magnitude to generate fixed-wing UAV trajectories in ran-domly generated obstacle environments.

Submi...

Submit to IEEE Transactions on Systems, Man, and Cybernetics: Systems; Recived Reject with major revision and encouragement to resubmit (31-Oct-2024)

TAS-TsC: A Data-Driven Framework for Estimating Time of Arrival Using Temporal-Attribute-Spatial Tri-space Coordination of Truck Trajectories 2024-12-02
Show

Accurately estimating time of arrival (ETA) for trucks is crucial for optimizing transportation efficiency in logistics. GPS trajectory data offers valuable information for ETA, but challenges arise due to temporal sparsity, variable sequence lengths, and the interdependencies among multiple trucks. To address these issues, we propose the Temporal-Attribute-Spatial Tri-space Coordination (TAS-TsC) framework, which leverages three feature spaces-temporal, attribute, and spatial-to enhance ETA. Our framework consists of a Temporal Learning Module (TLM) using state space models to capture temporal dependencies, an Attribute Extraction Module (AEM) that transforms sequential features into structured attribute embeddings, and a Spatial Fusion Module (SFM) that models the interactions among multiple trajectories using graph representation learning.These modules collaboratively learn trajectory embeddings, which are then used by a Downstream Prediction Module (DPM) to estimate arrival times. We validate TAS-TsC on real truck trajectory datasets collected from Shenzhen, China, demonstrating its superior performance compared to existing methods.

BIGCity: A Universal Spatiotemporal Model for Unified Trajectory and Traffic State Data Analysis 2024-12-01
Show

Typical dynamic ST data includes trajectory data (representing individual-level mobility) and traffic state data (representing population-level mobility). Traditional studies often treat trajectory and traffic state data as distinct, independent modalities, each tailored to specific tasks within a single modality. However, real-world applications, such as navigation apps, require joint analysis of trajectory and traffic state data. Treating these data types as two separate domains can lead to suboptimal model performance. Although recent advances in ST data pre-training and ST foundation models aim to develop universal models for ST data analysis, most existing models are "multi-task, solo-data modality" (MTSM), meaning they can handle multiple tasks within either trajectory data or traffic state data, but not both simultaneously. To address this gap, this paper introduces BIGCity, the first multi-task, multi-data modality (MTMD) model for ST data analysis. The model targets two key challenges in designing an MTMD ST model: (1) unifying the representations of different ST data modalities, and (2) unifying heterogeneous ST analysis tasks. To overcome the first challenge, BIGCity introduces a novel ST-unit that represents both trajectories and traffic states in a unified format. Additionally, for the second challenge, BIGCity adopts a tunable large model with ST task-oriented prompt, enabling it to perform a range of heterogeneous tasks without the need for fine-tuning. Extensive experiments on real-world datasets demonstrate that BIGCity achieves state-of-the-art performance across 8 tasks, outperforming 18 baselines. To the best of our knowledge, BIGCity is the first model capable of handling both trajectories and traffic states for diverse heterogeneous tasks. Our code are available at https://github.com/bigscity/BIGCity

Modification of muscle antagonistic relations and hand trajectory on the dynamic motion of Musculoskeletal Humanoid 2024-12-01
Show

In recent years, some research on musculoskeletal humanoids is in progress. However, there are some challenges such as unmeasurable transformation of body structure and muscle path, and difficulty in measuring own motion because of lack of joint angle sensor. In this study, we suggest two motion acquisition methods. One is a method to acquire antagonistic relations of muscles by tension sensing, and the other is a method to acquire correct hand trajectory by vision sensing. Finally, we realize badminton shuttlecock-hitting motion of Kengoro with these two acquisition methods.

Accep...

Accepted at Humanoids2019

TraCS: Trajectory Collection in Continuous Space under Local Differential Privacy 2024-12-01
Show

Trajectory collection is fundamental for location-based services but often involves sensitive information, such as a user's daily routine, raising privacy concerns. Local differential privacy (LDP) provides provable privacy guarantees for users, even when the data collector is untrusted. Existing trajectory collection methods ensure LDP only for discrete location spaces, where the number of locations affects their privacy guarantees and trajectory utility. Moreover, the location space is often naturally continuous, such as in flying and sailing trajectories, making these methods unsuitable. This paper proposes two trajectory collection methods that ensure LDP for continuous spaces: TraCS-D, which perturbs the direction and distance of locations, and TraCS-C, which perturbs the Cartesian coordinates of locations. Both methods are theoretically and experimentally analyzed for trajectory utility. TraCS can also be applied to discrete spaces by rounding perturbed locations to the nearest discrete points. It is independent of the number of locations and has only $\Theta(1)$ time complexity in each perturbation generation. Evaluation results on discrete location spaces validate this advantage and show that TraCS outperforms state-of-the-art methods with improved trajectory utility, especially for large privacy parameters.

Submi...

Submitted to VLDB 2025

Learning Dynamic Weight Adjustment for Spatial-Temporal Trajectory Planning in Crowd Navigation 2024-11-30
Show

Robot navigation in dense human crowds poses a significant challenge due to the complexity of human behavior in dynamic and obstacle-rich environments. In this work, we propose a dynamic weight adjustment scheme using a neural network to predict the optimal weights of objectives in an optimization-based motion planner. We adopt a spatial-temporal trajectory planner and incorporate diverse objectives to achieve a balance among safety, efficiency, and goal achievement in complex and dynamic environments. We design the network structure, observation encoding, and reward function to effectively train the policy network using reinforcement learning, allowing the robot to adapt its behavior in real time based on environmental and pedestrian information. Simulation results show improved safety compared to the fixed-weight planner and the state-of-the-art learning-based methods, and verify the ability of the learned policy to adaptively adjust the weights based on the observed situations. The approach's feasibility is demonstrated in a navigation task using an autonomous delivery robot across a crowded corridor over a 300 m distance.

submi...

submitted to ICRA 2025

Strategic Application of AIGC for UAV Trajectory Design: A Channel Knowledge Map Approach 2024-11-30
Show

Unmanned Aerial Vehicles (UAVs) are increasingly utilized in wireless communication, yet accurate channel loss prediction remains a significant challenge, limiting resource optimization performance. To address this issue, this paper leverages Artificial Intelligence Generated Content (AIGC) for the efficient construction of Channel Knowledge Maps (CKM) and UAV trajectory design. Given the time-consuming nature of channel data collection, AI techniques are employed in a Wasserstein Generative Adversarial Network (WGAN) to extract environmental features and augment the data. Experiment results demonstrate the effectiveness of the proposed framework in improving CKM construction accuracy. Moreover, integrating CKM into UAV trajectory planning reduces channel gain uncertainty, demonstrating its potential to enhance wireless communication efficiency.

InterHub: A Naturalistic Trajectory Dataset with Dense Interaction for Autonomous Driving 2024-11-30
Show

The driving interaction-a critical yet complex aspect of daily driving-lies at the core of autonomous driving research. However, real-world driving scenarios sparsely capture rich interaction events, limiting the availability of comprehensive trajectory datasets for this purpose. To address this challenge, we present InterHub, a dense interaction dataset derived by mining interaction events from extensive naturalistic driving records. We employ formal methods to describe and extract multi-agent interaction events, exposing the limitations of existing autonomous driving solutions. Additionally, we introduce a user-friendly toolkit enabling the expansion of InterHub with both public and private data. By unifying, categorizing, and analyzing diverse interaction events, InterHub facilitates cross-comparative studies and large-scale research, thereby advancing the evaluation and development of autonomous driving technologies.

A Multi-Loss Strategy for Vehicle Trajectory Prediction: Combining Off-Road, Diversity, and Directional Consistency Losses 2024-11-29
Show

Trajectory prediction is essential for the safety and efficiency of planning in autonomous vehicles. However, current models often fail to fully capture complex traffic rules and the complete range of potential vehicle movements. Addressing these limitations, this study introduces three novel loss functions: Offroad Loss, Direction Consistency Error, and Diversity Loss. These functions are designed to keep predicted paths within driving area boundaries, aligned with traffic directions, and cover a wider variety of plausible driving scenarios. As all prediction modes should adhere to road rules and conditions, this work overcomes the shortcomings of traditional "winner takes all" training methods by applying the loss functions to all prediction modes. These loss functions not only improve model training but can also serve as metrics for evaluating the realism and diversity of trajectory predictions. Extensive validation on the nuScenes and Argoverse 2 datasets with leading baseline models demonstrates that our approach not only maintains accuracy but significantly improves safety and robustness, reducing offroad errors on average by 47% on original and by 37% on attacked scenes. This work sets a new benchmark for trajectory prediction in autonomous driving, offering substantial improvements in navigating complex environments. Our code is available at https://github.com/vita-epfl/stay-on-track .

Prepr...

Preprint, 7 pages, 4 figures and 2 tables

Graph Neural Networks

Title Date Abstract Comment
Graph-neural-network predictions of solid-state NMR parameters from spherical tensor decomposition 2024-12-19
Show

Nuclear magnetic resonance (NMR) is a powerful spectroscopic technique that is sensitive to the local atomic structure of matter. Computational predictions of NMR parameters can help to interpret experimental data and validate structural models, and machine learning (ML) has emerged as an efficient route to making such predictions. Here, we systematically study graph-neural-network approaches to representing and learning tensor quantities for solid-state NMR -- specifically, the anisotropic magnetic shielding and the electric field gradient. We assess how the numerical accuracy of different ML models translates into prediction quality for experimentally relevant NMR properties: chemical shifts, quadrupolar coupling constants, tensor orientations, and even static 1D spectra. We apply these ML models to a structurally diverse dataset of amorphous SiO$_2$ configurations, spanning a wide range of density and local order, to larger configurations beyond the reach of traditional first-principles methods, and to the dynamics of the $\alpha\unicode{x2013}\beta$ inversion in cristobalite. Our work marks a step toward streamlining ML-driven NMR predictions for both static and dynamic behavior of complex materials, and toward bridging the gap between first-principles modeling and real-world experimental data.

13 pages, 7 figures
LLMs as Zero-shot Graph Learners: Alignment of GNN Representations with LLM Token Embeddings 2024-12-19
Show

Zero-shot graph machine learning, especially with graph neural networks (GNNs), has garnered significant interest due to the challenge of scarce labeled data. While methods like self-supervised learning and graph prompt learning have been extensively explored, they often rely on fine-tuning with task-specific labels, limiting their effectiveness in zero-shot scenarios. Inspired by the zero-shot capabilities of instruction-fine-tuned large language models (LLMs), we introduce a novel framework named Token Embedding-Aligned Graph Language Model (TEA-GLM) that leverages LLMs as cross-dataset and cross-task zero-shot learners for graph machine learning. Concretely, we pretrain a GNN, aligning its representations with token embeddings of an LLM. We then train a linear projector that transforms the GNN's representations into a fixed number of graph token embeddings without tuning the LLM. A unified instruction is designed for various graph tasks at different levels, such as node classification (node-level) and link prediction (edge-level). These design choices collectively enhance our method's effectiveness in zero-shot learning, setting it apart from existing methods. Experiments show that our graph token embeddings help the LLM predictor achieve state-of-the-art performance on unseen datasets and tasks compared to other methods using LLMs as predictors.

Learning from Linear Algebra: A Graph Neural Network Approach to Preconditioner Design for Conjugate Gradient Solvers 2024-12-19
Show

Large linear systems are ubiquitous in modern computational science and engineering. The main recipe for solving them is the use of Krylov subspace iterative methods with well-designed preconditioners. Deep learning models can be used as nonlinear preconditioners during the iteration of linear solvers such as the conjugate gradient (CG) method. Neural network models require an enormous number of parameters to approximate well in this setup. Another approach is to take advantage of small graph neural networks (GNNs) to construct preconditioners with predefined sparsity patterns. Recently, GNNs have been shown to be a promising tool for designing preconditioners to reduce the overall computational cost of iterative methods by constructing them more efficiently than with classical linear algebra techniques. However, preconditioners designed with these approaches cannot outperform those designed with classical methods in terms of the number of iterations in CG. In our work, we recall well-established preconditioners from linear algebra and use them as a starting point for training the GNN to obtain preconditioners that reduce the condition number of the system more significantly. Numerical experiments show that our approach outperforms both classical and neural network-based methods for an important class of parametric partial differential equations. We also provide a heuristic justification for the loss function used and show that preconditioners obtained by learning with this loss function reduce the condition number in a more desirable way for CG.

Erase then Rectify: A Training-Free Parameter Editing Approach for Cost-Effective Graph Unlearning 2024-12-19
Show

Graph unlearning, which aims to eliminate the influence of specific nodes, edges, or attributes from a trained Graph Neural Network (GNN), is essential in applications where privacy, bias, or data obsolescence is a concern. However, existing graph unlearning techniques often necessitate additional training on the remaining data, leading to significant computational costs, particularly with large-scale graphs. To address these challenges, we propose a two-stage training-free approach, Erase then Rectify (ETR), designed for efficient and scalable graph unlearning while preserving the model utility. Specifically, we first build a theoretical foundation showing that masking parameters critical for unlearned samples enables effective unlearning. Building on this insight, the Erase stage strategically edits model parameters to eliminate the impact of unlearned samples and their propagated influence on intercorrelated nodes. To further ensure the GNN's utility, the Rectify stage devises a gradient approximation method to estimate the model's gradient on the remaining dataset, which is then used to enhance model performance. Overall, ETR achieves graph unlearning without additional training or full training data access, significantly reducing computational overhead and preserving data privacy. Extensive experiments on seven public datasets demonstrate the consistent superiority of ETR in model utility, unlearning efficiency, and unlearning effectiveness, establishing it as a promising solution for real-world graph unlearning challenges.

Accepted by AAAI2025
Answer Set Networks: Casting Answer Set Programming into Deep Learning 2024-12-19
Show

Although Answer Set Programming (ASP) allows constraining neural-symbolic (NeSy) systems, its employment is hindered by the prohibitive costs of computing stable models and the CPU-bound nature of state-of-the-art solvers. To this end, we propose Answer Set Networks (ASN), a NeSy solver. Based on Graph Neural Networks (GNN), ASNs are a scalable approach to ASP-based Deep Probabilistic Logic Programming (DPPL). Specifically, we show how to translate ASPs into ASNs and demonstrate how ASNs can efficiently solve the encoded problem by leveraging GPU's batching and parallelization capabilities. Our experimental evaluations demonstrate that ASNs outperform state-of-the-art CPU-bound NeSy systems on multiple tasks. Simultaneously, we make the following two contributions based on the strengths of ASNs. Namely, we are the first to show the finetuning of Large Language Models (LLM) with DPPLs, employing ASNs to guide the training with logic. Further, we show the "constitutional navigation" of drones, i.e., encoding public aviation laws in an ASN for routing Unmanned Aerial Vehicles in uncertain environments.

16 pages, 9 figures
Shape error prediction in 5-axis machining using graph neural networks 2024-12-19
Show

This paper presents an innovative method for predicting shape errors in 5-axis machining using graph neural networks. The graph structure is defined with nodes representing workpiece surface points and edges denoting the neighboring relationships. The dataset encompasses data from a material removal simulation, process data, and post-machining quality information. Experimental results show that the presented approach can generalize the shape error prediction for the investigated workpiece geometry. Moreover, by modelling spatial and temporal connections within the workpiece, the approach handles a low number of labels compared to non-graphical methods such as Support Vector Machines.

Smoothness Really Matters: A Simple Yet Effective Approach for Unsupervised Graph Domain Adaptation 2024-12-19
Show

Unsupervised Graph Domain Adaptation (UGDA) seeks to bridge distribution shifts between domains by transferring knowledge from labeled source graphs to given unlabeled target graphs. Existing UGDA methods primarily focus on aligning features in the latent space learned by graph neural networks (GNNs) across domains, often overlooking structural shifts, resulting in limited effectiveness when addressing structurally complex transfer scenarios. Given the sensitivity of GNNs to local structural features, even slight discrepancies between source and target graphs could lead to significant shifts in node embeddings, thereby reducing the effectiveness of knowledge transfer. To address this issue, we introduce a novel approach for UGDA called Target-Domain Structural Smoothing (TDSS). TDSS is a simple and effective method designed to perform structural smoothing directly on the target graph, thereby mitigating structural distribution shifts and ensuring the consistency of node representations. Specifically, by integrating smoothing techniques with neighborhood sampling, TDSS maintains the structural coherence of the target graph while mitigating the risk of over-smoothing. Our theoretical analysis shows that TDSS effectively reduces target risk by improving model smoothness. Empirical results on three real-world datasets demonstrate that TDSS outperforms recent state-of-the-art baselines, achieving significant improvements across six transfer scenarios. The code is available in https://github.com/cwei01/TDSS.

11 pa...

11 pages, Accpected by AAAI2025

Boosting GNN Performance via Training Sample Selection Based on Adversarial Robustness Evaluation 2024-12-19
Show

Graph Neural Networks (GNNs) have established themselves as one of the most powerful neural network architectures, excelling in leveraging graph topology and node features for various tasks. However, GNNs are inherently vulnerable to noise in their inputs. Such noise can significantly degrade their performance. To address this challenge, we propose a novel approach that employs adversarial robustness evaluation techniques to identify nodes in the graph that are most susceptible to noise. By selecting and constructing a training set composed of these particularly noise-prone nodes, we then use them to train a Graph Convolutional Network (GCN). Our experimental results demonstrate that this strategy leads to substantial improvements in the GCN's performance.

Grimm: A Plug-and-Play Perturbation Rectifier for Graph Neural Networks Defending against Poisoning Attacks 2024-12-19
Show

Recent studies have revealed the vulnerability of graph neural networks (GNNs) to adversarial poisoning attacks on node classification tasks. Current defensive methods require substituting the original GNNs with defense models, regardless of the original's type. This approach, while targeting adversarial robustness, compromises the enhancements developed in prior research to boost GNNs' practical performance. Here we introduce Grimm, the first plug-and-play defense model. With just a minimal interface requirement for extracting features from any layer of the protected GNNs, Grimm is thus enabled to seamlessly rectify perturbations. Specifically, we utilize the feature trajectories (FTs) generated by GNNs, as they evolve through epochs, to reflect the training status of the networks. We then theoretically prove that the FTs of victim nodes will inevitably exhibit discriminable anomalies. Consequently, inspired by the natural parallelism between the biological nervous and immune systems, we construct Grimm, a comprehensive artificial immune system for GNNs. Grimm not only detects abnormal FTs and rectifies adversarial edges during training but also operates efficiently in parallel, thereby mirroring the concurrent functionalities of its biological counterparts. We experimentally confirm that Grimm offers four empirically validated advantages: 1) Harmlessness, as it does not actively interfere with GNN training; 2) Parallelism, ensuring monitoring, detection, and rectification functions operate independently of the GNN training process; 3) Generalizability, demonstrating compatibility with mainstream GNNs such as GCN, GAT, and GraphSAGE; and 4) Transferability, as the detectors for abnormal FTs can be efficiently transferred across different systems for one-step rectification.

19 pages, 13 figures
DG-Mamba: Robust and Efficient Dynamic Graph Structure Learning with Selective State Space Models 2024-12-19
Show

Dynamic graphs exhibit intertwined spatio-temporal evolutionary patterns, widely existing in the real world. Nevertheless, the structure incompleteness, noise, and redundancy result in poor robustness for Dynamic Graph Neural Networks (DGNNs). Dynamic Graph Structure Learning (DGSL) offers a promising way to optimize graph structures. However, aside from encountering unacceptable quadratic complexity, it overly relies on heuristic priors, making it hard to discover underlying predictive patterns. How to efficiently refine the dynamic structures, capture intrinsic dependencies, and learn robust representations, remains under-explored. In this work, we propose the novel DG-Mamba, a robust and efficient Dynamic Graph structure learning framework with the Selective State Space Models (Mamba). To accelerate the spatio-temporal structure learning, we propose a kernelized dynamic message-passing operator that reduces the quadratic time complexity to linear. To capture global intrinsic dynamics, we establish the dynamic graph as a self-contained system with State Space Model. By discretizing the system states with the cross-snapshot graph adjacency, we enable the long-distance dependencies capturing with the selective snapshot scan. To endow learned dynamic structures more expressive with informativeness, we propose the self-supervised Principle of Relevant Information for DGSL to regularize the most relevant yet least redundant information, enhancing global robustness. Extensive experiments demonstrate the superiority of the robustness and efficiency of our DG-Mamba compared with the state-of-the-art baselines against adversarial attacks.

Accep...

Accepted by the Main Technical Track of the 39th Annual AAAI Conference on Artificial Intelligence (AAAI-2025)

IOHunter: Graph Foundation Model to Uncover Online Information Operations 2024-12-19
Show

Social media platforms have become vital spaces for public discourse, serving as modern agor'as where a wide range of voices influence societal narratives. However, their open nature also makes them vulnerable to exploitation by malicious actors, including state-sponsored entities, who can conduct information operations (IOs) to manipulate public opinion. The spread of misinformation, false news, and misleading claims threatens democratic processes and societal cohesion, making it crucial to develop methods for the timely detection of inauthentic activity to protect the integrity of online discourse. In this work, we introduce a methodology designed to identify users orchestrating information operations, a.k.a. \textit{IO drivers}, across various influence campaigns. Our framework, named \texttt{IOHunter}, leverages the combined strengths of Language Models and Graph Neural Networks to improve generalization in \emph{supervised}, \emph{scarcely-supervised}, and \emph{cross-IO} contexts. Our approach achieves state-of-the-art performance across multiple sets of IOs originating from six countries, significantly surpassing existing approaches. This research marks a step toward developing Graph Foundation Models specifically tailored for the task of IO detection on social media platforms.

9 pages
Trainable Adaptive Activation Function Structure (TAAFS) Enhances Neural Network Force Field Performance with Only Dozens of Additional Parameters 2024-12-19
Show

At the heart of neural network force fields (NNFFs) is the architecture of neural networks, where the capacity to model complex interactions is typically enhanced through widening or deepening multilayer perceptrons (MLPs) or by increasing layers of graph neural networks (GNNs). These enhancements, while improving the model's performance, often come at the cost of a substantial increase in the number of parameters. By applying the Trainable Adaptive Activation Function Structure (TAAFS), we introduce a method that selects distinct mathematical formulations for non-linear activations, thereby increasing the precision of NNFFs with an insignificant addition to the parameter count. In this study, we integrate TAAFS into a variety of neural network models, resulting in observed accuracy improvements, and further validate these enhancements through molecular dynamics (MD) simulations using DeepMD.

Towards Scalable and Deep Graph Neural Networks via Noise Masking 2024-12-19
Show

In recent years, Graph Neural Networks (GNNs) have achieved remarkable success in many graph mining tasks. However, scaling them to large graphs is challenging due to the high computational and storage costs of repeated feature propagation and non-linear transformation during training. One commonly employed approach to address this challenge is model-simplification, which only executes the Propagation (P) once in the pre-processing, and Combine (C) these receptive fields in different ways and then feed them into a simple model for better performance. Despite their high predictive performance and scalability, these methods still face two limitations. First, existing approaches mainly focus on exploring different C methods from the model perspective, neglecting the crucial problem of performance degradation with increasing P depth from the data-centric perspective, known as the over-smoothing problem. Second, pre-processing overhead takes up most of the end-to-end processing time, especially for large-scale graphs. To address these limitations, we present random walk with noise masking (RMask), a plug-and-play module compatible with the existing model-simplification works. This module enables the exploration of deeper GNNs while preserving their scalability. Unlike the previous model-simplification works, we focus on continuous P and found that the noise existing inside each P is the cause of the over-smoothing issue, and use the efficient masking mechanism to eliminate them. Experimental results on six real-world datasets demonstrate that model-simplification works equipped with RMask yield superior performance compared to their original version and can make a good trade-off between accuracy and efficiency.

Learning and Reconstructing Conflicts in O-RAN: A Graph Neural Network Approach 2024-12-18
Show

The Open Radio Access Network (O-RAN) architecture enables the deployment of third-party applications on the RAN Intelligent Controllers (RICs) to provide Mobile Network Operators (MNOs) with different functionality. However, the operation of third-party applications in the Near Real-Time RIC (Near-RT RIC), known as xApps, can result in conflicting interactions. Each xApp can independently modify the same control parameters to achieve distinct outcomes, which has the potential to cause performance degradation and network instability. The current conflict detection and mitigation solutions in the literature assume that all conflicts are known a priori, which does not always hold due to complex and often hidden relationships between control parameters and Key Performance Indicators (KPIs). In this paper, we introduce a novel data-driven Graph Neural Network (GNN)-based method for reconstructing conflict graphs. Specifically, we leverage GraphSAGE, an inductive learning framework, to dynamically learn the hidden relationships between xApps, control parameters, and KPIs. Our experimental results validate our proposed method for reconstructing conflict graphs and identifying all types of conflicts in O-RAN.

Adversarial Robustness of Link Sign Prediction in Signed Graphs 2024-12-18
Show

Signed graphs serve as fundamental data structures for representing positive and negative relationships in social networks, with signed graph neural networks (SGNNs) emerging as the primary tool for their analysis. Our investigation reveals that balance theory, while essential for modeling signed relationships in SGNNs, inadvertently introduces exploitable vulnerabilities to black-box attacks. To demonstrate this vulnerability, we propose balance-attack, a novel adversarial strategy specifically designed to compromise graph balance degree, and develop an efficient heuristic algorithm to solve the associated NP-hard optimization problem. While existing approaches attempt to restore attacked graphs through balance learning techniques, they face a critical challenge we term "Irreversibility of Balance-related Information," where restored edges fail to align with original attack targets. To address this limitation, we introduce Balance Augmented-Signed Graph Contrastive Learning (BA-SGCL), an innovative framework that combines contrastive learning with balance augmentation techniques to achieve robust graph representations. By maintaining high balance degree in the latent space, BA-SGCL effectively circumvents the irreversibility challenge and enhances model resilience. Extensive experiments across multiple SGNN architectures and real-world datasets demonstrate both the effectiveness of our proposed balance-attack and the superior robustness of BA-SGCL, advancing the security and reliability of signed graph analysis in social networks. Datasets and codes of the proposed framework are at the github repository https://anonymous.4open.science/r/BA-SGCL-submit-DF41/.

Modality-Independent Graph Neural Networks with Global Transformers for Multimodal Recommendation 2024-12-18
Show

Multimodal recommendation systems can learn users' preferences from existing user-item interactions as well as the semantics of multimodal data associated with items. Many existing methods model this through a multimodal user-item graph, approaching multimodal recommendation as a graph learning task. Graph Neural Networks (GNNs) have shown promising performance in this domain. Prior research has capitalized on GNNs' capability to capture neighborhood information within certain receptive fields (typically denoted by the number of hops, $K$) to enrich user and item semantics. We observe that the optimal receptive fields for GNNs can vary across different modalities. In this paper, we propose GNNs with Modality-Independent Receptive Fields, which employ separate GNNs with independent receptive fields for different modalities to enhance performance. Our results indicate that the optimal $K$ for certain modalities on specific datasets can be as low as 1 or 2, which may restrict the GNNs' capacity to capture global information. To address this, we introduce a Sampling-based Global Transformer, which utilizes uniform global sampling to effectively integrate global information for GNNs. We conduct comprehensive experiments that demonstrate the superiority of our approach over existing methods. Our code is publicly available at https://github.com/CrawlScript/MIG-GT.

Accep...

Accepted by AAAI 2025

GraphAvatar: Compact Head Avatars with GNN-Generated 3D Gaussians 2024-12-18
Show

Rendering photorealistic head avatars from arbitrary viewpoints is crucial for various applications like virtual reality. Although previous methods based on Neural Radiance Fields (NeRF) can achieve impressive results, they lack fidelity and efficiency. Recent methods using 3D Gaussian Splatting (3DGS) have improved rendering quality and real-time performance but still require significant storage overhead. In this paper, we introduce a method called GraphAvatar that utilizes Graph Neural Networks (GNN) to generate 3D Gaussians for the head avatar. Specifically, GraphAvatar trains a geometric GNN and an appearance GNN to generate the attributes of the 3D Gaussians from the tracked mesh. Therefore, our method can store the GNN models instead of the 3D Gaussians, significantly reducing the storage overhead to just 10MB. To reduce the impact of face-tracking errors, we also present a novel graph-guided optimization module to refine face-tracking parameters during training. Finally, we introduce a 3D-aware enhancer for post-processing to enhance the rendering quality. We conduct comprehensive experiments to demonstrate the advantages of GraphAvatar, surpassing existing methods in visual fidelity and storage consumption. The ablation study sheds light on the trade-offs between rendering quality and model size. The code will be released at: https://github.com/ucwxb/GraphAvatar

accepted by AAAI2025
Towards Precise Prediction Uncertainty in GNNs: Refining GNNs with Topology-grouping Strategy 2024-12-18
Show

Recent advancements in graph neural networks (GNNs) have highlighted the critical need of calibrating model predictions, with neighborhood prediction similarity recognized as a pivotal component. Existing studies suggest that nodes with analogous neighborhood prediction similarity often exhibit similar calibration characteristics. Building on this insight, recent approaches incorporate neighborhood similarity into node-wise temperature scaling techniques. However, our analysis reveals that this assumption does not hold universally. Calibration errors can differ significantly even among nodes with comparable neighborhood similarity, depending on their confidence levels. This necessitates a re-evaluation of existing GNN calibration methods, as a single, unified approach may lead to sub-optimal calibration. In response, we introduce Simi-Mailbox, a novel approach that categorizes nodes by both neighborhood similarity and their own confidence, irrespective of proximity or connectivity. Our method allows fine-grained calibration by employing group-specific temperature scaling, with each temperature tailored to address the specific miscalibration level of affiliated nodes, rather than adhering to a uniform trend based on neighborhood similarity. Extensive experiments demonstrate the effectiveness of our Simi-Mailbox across diverse datasets on different GNN architectures, achieving up to 13.79% error reduction compared to uncalibrated GNN predictions.

Accep...

Accepted at AAAI 2025

Spatio-Temporal Forecasting of PM2.5 via Spatial-Diffusion guided Encoder-Decoder Architecture 2024-12-18
Show

In many problem settings that require spatio-temporal forecasting, the values in the time-series not only exhibit spatio-temporal correlations but are also influenced by spatial diffusion across locations. One such example is forecasting the concentration of fine particulate matter (PM2.5) in the atmosphere which is influenced by many complex factors, the most important ones being diffusion due to meteorological factors as well as transport across vast distances over a period of time. We present a novel Spatio-Temporal Graph Neural Network architecture, that specifically captures these dependencies to forecast the PM2.5 concentration. Our model is based on an encoder-decoder architecture where the encoder and decoder parts leverage gated recurrent units (GRU) augmented with a graph neural network (TransformerConv) to account for spatial diffusion. Our model can also be seen as a generalization of various existing models for time-series or spatio-temporal forecasting. We demonstrate the model's effectiveness on two real-world PM2.5 datasets: (1) data collected by us using a recently deployed network of low-cost PM$_{2.5}$ sensors from 511 locations spanning the entirety of the Indian state of Bihar over a period of one year, and (2) another publicly available dataset that covers severely polluted regions from China for a period of 4 years. Our experimental results show our model's impressive ability to account for both spatial as well as temporal dependencies precisely.

9 pag...

9 pages, 4 figures, International Conference on Data Science and Management of Data (CODS-COMAD), IIT Jodhpur, 2024

Graph Coarsening via Supervised Granular-Ball for Scalable Graph Neural Network Training 2024-12-18
Show

Graph Neural Networks (GNNs) have demonstrated significant achievements in processing graph data, yet scalability remains a substantial challenge. To address this, numerous graph coarsening methods have been developed. However, most existing coarsening methods are training-dependent, leading to lower efficiency, and they all require a predefined coarsening rate, lacking an adaptive approach. In this paper, we employ granular-ball computing to effectively compress graph data. We construct a coarsened graph network by iteratively splitting the graph into granular-balls based on a purity threshold and using these granular-balls as super vertices. This granulation process significantly reduces the size of the original graph, thereby greatly enhancing the training efficiency and scalability of GNNs. Additionally, our algorithm can adaptively perform splitting without requiring a predefined coarsening rate. Experimental results demonstrate that our method achieves accuracy comparable to training on the original graph. Noise injection experiments further indicate that our method exhibits robust performance. Moreover, our approach can reduce the graph size by up to 20 times without compromising test accuracy, substantially enhancing the scalability of GNNs.

AutoSGNN: Automatic Propagation Mechanism Discovery for Spectral Graph Neural Networks 2024-12-18
Show

In real-world applications, spectral Graph Neural Networks (GNNs) are powerful tools for processing diverse types of graphs. However, a single GNN often struggles to handle different graph types-such as homogeneous and heterogeneous graphs-simultaneously. This challenge has led to the manual design of GNNs tailored to specific graph types, but these approaches are limited by the high cost of labor and the constraints of expert knowledge, which cannot keep up with the rapid growth of graph data. To overcome these challenges, we propose AutoSGNN, an automated framework for discovering propagation mechanisms in spectral GNNs. AutoSGNN unifies the search space for spectral GNNs by integrating large language models with evolutionary strategies to automatically generate architectures that adapt to various graph types. Extensive experiments on nine widely-used datasets, encompassing both homophilic and heterophilic graphs, demonstrate that AutoSGNN outperforms state-of-the-art spectral GNNs and graph neural architecture search methods in both performance and efficiency.

KA-GNN: Kolmogorov-Arnold Graph Neural Networks for Molecular Property Prediction 2024-12-18
Show

As key models in geometric deep learning, graph neural networks have demonstrated enormous power in molecular data analysis. Recently, a specially-designed learning scheme, known as Kolmogorov-Arnold Network (KAN), shows unique potential for the improvement of model accuracy, efficiency, and explainability. Here we propose the first non-trivial Kolmogorov-Arnold Network-based Graph Neural Networks (KA-GNNs), including KAN-based graph convolutional networks(KA-GCN) and KAN-based graph attention network (KA-GAT). The essential idea is to utilizes KAN's unique power to optimize GNN architectures at three major levels, including node embedding, message passing, and readout. Further, with the strong approximation capability of Fourier series, we develop Fourier series-based KAN model and provide a rigorous mathematical prove of the robust approximation capability of this Fourier KAN architecture. To validate our KA-GNNs, we consider seven most-widely-used benchmark datasets for molecular property prediction and extensively compare with existing state-of-the-art models. It has been found that our KA-GNNs can outperform traditional GNN models. More importantly, our Fourier KAN module can not only increase the model accuracy but also reduce the computational time. This work not only highlights the great power of KA-GNNs in molecular property prediction but also provides a novel geometric deep learning framework for the general non-Euclidean data analysis.

PRAGA: Prototype-aware Graph Adaptive Aggregation for Spatial Multi-modal Omics Analysis 2024-12-18
Show

Spatial multi-modal omics technology, highlighted by Nature Methods as an advanced biological technique in 2023, plays a critical role in resolving biological regulatory processes with spatial context. Recently, graph neural networks based on K-nearest neighbor (KNN) graphs have gained prominence in spatial multi-modal omics methods due to their ability to model semantic relations between sequencing spots. However, the fixed KNN graph fails to capture the latent semantic relations hidden by the inevitable data perturbations during the biological sequencing process, resulting in the loss of semantic information. In addition, the common lack of spot annotation and class number priors in practice further hinders the optimization of spatial multi-modal omics models. Here, we propose a novel spatial multi-modal omics resolved framework, termed PRototype-Aware Graph Adaptative Aggregation for Spatial Multi-modal Omics Analysis (PRAGA). PRAGA constructs a dynamic graph to capture latent semantic relations and comprehensively integrate spatial information and feature semantics. The learnable graph structure can also denoise perturbations by learning cross-modal knowledge. Moreover, a dynamic prototype contrastive learning is proposed based on the dynamic adaptability of Bayesian Gaussian Mixture Models to optimize the multi-modal omics representations for unknown biological priors. Quantitative and qualitative experiments on simulated and real datasets with 7 competing methods demonstrate the superior performance of PRAGA. Code is available at https://github.com/Xubin-s-Lab/PRAGA.

Accep...

Accepted by AAAl2025; full version including appendix

Enhancing Persona Classification in Dialogue Systems: A Graph Neural Network Approach 2024-12-17
Show

In recent years, Large Language Models (LLMs) gain considerable attention for their potential to enhance personalized experiences in virtual assistants and chatbots. A key area of interest is the integration of personas into LLMs to improve dialogue naturalness and user engagement. This study addresses the challenge of persona classification, a crucial component in dialogue understanding, by proposing a framework that combines text embeddings with Graph Neural Networks (GNNs) for effective persona classification. Given the absence of dedicated persona classification datasets, we create a manually annotated dataset to facilitate model training and evaluation. Our method involves extracting semantic features from persona statements using text embeddings and constructing a graph where nodes represent personas and edges capture their similarities. The GNN component uses this graph structure to propagate relevant information, thereby improving classification performance. Experimental results show that our approach, in particular the integration of GNNs, significantly improves classification performance, especially with limited data. Our contributions include the development of a persona classification framework and the creation of a dataset.

Enhancing Internet of Things Security throughSelf-Supervised Graph Neural Networks 2024-12-17
Show

With the rapid rise of the Internet of Things (IoT), ensuring the security of IoT devices has become essential. One of the primary challenges in this field is that new types of attacks often have significantly fewer samples than more common attacks, leading to unbalanced datasets. Existing research on detecting intrusions in these unbalanced labeled datasets primarily employs Convolutional Neural Networks (CNNs) or conventional Machine Learning (ML) models, which result in incomplete detection, especially for new attacks. To handle these challenges, we suggest a new approach to IoT intrusion detection using Self-Supervised Learning (SSL) with a Markov Graph Convolutional Network (MarkovGCN). Graph learning excels at modeling complex relationships within data, while SSL mitigates the issue of limited labeled data for emerging attacks. Our approach leverages the inherent structure of IoT networks to pre-train a GCN, which is then fine-tuned for the intrusion detection task. The integration of Markov chains in GCN uncovers network structures and enriches node and edge features with contextual information. Experimental results demonstrate that our approach significantly improves detection accuracy and robustness compared to conventional supervised learning methods. Using the EdgeIIoT-set dataset, we attained an accuracy of 98.68%, a precision of 98.18%, a recall of 98.35%, and an F1-Score of 98.40%.

Cluster-guided Contrastive Class-imbalanced Graph Classification 2024-12-17
Show

This paper studies the problem of class-imbalanced graph classification, which aims at effectively classifying the categories of graphs in scenarios with imbalanced class distribution. Despite the tremendous success of graph neural networks (GNNs), their modeling ability for imbalanced graph-structured data is inadequate, which typically leads to predictions biased towards the majority classes. Besides, existing class-imbalanced learning methods in visions may overlook the rich graph semantic substructures of the majority classes and excessively emphasize learning from the minority classes. To tackle this issue, this paper proposes a simple yet powerful approach called C$^3$GNN that incorporates the idea of clustering into contrastive learning to enhance class-imbalanced graph classification. Technically, C$^3$GNN clusters graphs from each majority class into multiple subclasses, ensuring they have similar sizes to the minority class, thus alleviating class imbalance. Additionally, it utilizes the Mixup technique to synthesize new samples and enrich the semantic information of each subclass, and leverages supervised contrastive learning to hierarchically learn effective graph representations. In this way, we can not only sufficiently explore the semantic substructures within the majority class but also effectively alleviate excessive focus on the minority class. Extensive experiments on real-world graph benchmark datasets verify the superior performance of our proposed method.

Accep...

Accepted by Proceedings of the Thirty-Ninth AAAI Conference on Artificial Intelligence (AAAI-25)

Search Strategy Generation for Branch and Bound Using Genetic Programming 2024-12-17
Show

Branch-and-Bound (B&B) is an exact method in integer programming that recursively divides the search space into a tree. During the resolution process, determining the next subproblem to explore within the tree-known as the search strategy-is crucial. Hand-crafted heuristics are commonly used, but none are effective over all problem classes. Recent approaches utilizing neural networks claim to make more intelligent decisions but are computationally expensive. In this paper, we introduce GP2S (Genetic Programming for Search Strategy), a novel machine learning approach that automatically generates a B&B search strategy heuristic, aiming to make intelligent decisions while being computationally lightweight. We define a policy as a function that evaluates the quality of a B&B node by combining features from the node and the problem; the search strategy policy is then defined by a best-first search based on this node ranking. The policy space is explored using a genetic programming algorithm, and the policy that achieves the best performance on a training set is selected. We compare our approach with the standard method of the SCIP solver, a recent graph neural network-based method, and handcrafted heuristics. Our first evaluation includes three types of primal hard problems, tested on instances similar to the training set and on larger instances. Our method is at most 2% slower than the best baseline and consistently outperforms SCIP, achieving an average speedup of 11.3%. Additionally, GP2S is tested on the MIPLIB 2017 dataset, generating multiple heuristics from different subsets of instances. It exceeds SCIP's average performance in 7 out of 10 cases across 15 times more instances and under a time limit 15 times longer, with some GP2S methods leading on most experiments in terms of the number of feasible solutions or optimality gap.

Accep...

Accepted at AAAI 2025

Towards Effective Graph Rationalization via Boosting Environment Diversity 2024-12-17
Show

Graph Neural Networks (GNNs) perform effectively when training and testing graphs are drawn from the same distribution, but struggle to generalize well in the face of distribution shifts. To address this issue, existing mainstreaming graph rationalization methods first identify rationale and environment subgraphs from input graphs, and then diversify training distributions by augmenting the environment subgraphs. However, these methods merely combine the learned rationale subgraphs with environment subgraphs in the representation space to produce augmentation samples, failing to produce sufficiently diverse distributions. Thus, in this paper, we propose to achieve an effective Graph Rationalization by Boosting Environmental diversity, a GRBE approach that generates the augmented samples in the original graph space to improve the diversity of the environment subgraph. Firstly, to ensure the effectiveness of augmentation samples, we propose a precise rationale subgraph extraction strategy in GRBE to refine the rationale subgraph learning process in the original graph space. Secondly, to ensure the diversity of augmented samples, we propose an environment diversity augmentation strategy in GRBE that mixes the environment subgraphs of different graphs in the original graph space and then combines the new environment subgraphs with rationale subgraphs to generate augmented graphs. The average improvements of 7.65% and 6.11% in rationalization and classification performance on benchmark datasets demonstrate the superiority of GRBE over state-of-the-art approaches.

Can Large Language Models Act as Ensembler for Multi-GNNs? 2024-12-17
Show

Graph Neural Networks (GNNs) have emerged as powerful models for learning from graph-structured data. However, GNNs lack the inherent semantic understanding capability of rich textual node attributes, limiting their effectiveness in applications. On the other hand, we empirically observe that for existing GNN models, no one can consistently outperforms others across diverse datasets. In this paper, we study whether LLMs can act as an ensembler for multi-GNNs and propose the LensGNN model. The model first aligns multiple GNNs, mapping the representations of different GNNs into the same space. Then, through LoRA fine-tuning, it aligns the space between the GNN and the LLM, injecting graph tokens and textual information into LLMs. This allows LensGNN to ensemble multiple GNNs and take advantage of the strengths of LLM, leading to a deeper understanding of both textual semantic information and graph structural information. The experimental results show that LensGNN outperforms existing models. This research advances text-attributed graph ensemble learning by providing a robust and superior solution for integrating semantic and structural information. We provide our code and data here: https://anonymous.4open.science/r/EnsemGNN-E267/.

Interpreting GNN-based IDS Detections Using Provenance Graph Structural Features 2024-12-17
Show

Advanced cyber threats (e.g., Fileless Malware and Advanced Persistent Threat (APT)) have driven the adoption of provenance-based security solutions. These solutions employ Machine Learning (ML) models for behavioral modeling and critical security tasks such as malware and anomaly detection. However, the opacity of ML-based security models limits their broader adoption, as the lack of transparency in their decision-making processes restricts explainability and verifiability. We tailored our solution towards Graph Neural Network (GNN)-based security solutions since recent studies employ GNNs to comprehensively digest system provenance graphs for security critical tasks. To enhance the explainability of GNN-based security models, we introduce PROVEXPLAINER, a framework offering instance-level security-aware explanations using an interpretable surrogate model. PROVEXPLAINER's interpretable feature space consists of discriminant subgraph patterns and graph structural features, which can be directly mapped to the system provenance problem space, making the explanations human understandable. By considering prominent GNN architectures (e.g., GAT and GraphSAGE) for anomaly detection tasks, we show how PROVEXPLAINER synergizes with current state-of-the-art (SOTA) GNN explainers to deliver domain and instance-specific explanations. We measure the explanation quality using the fidelity+/fidelity- metric as used by traditional GNN explanation literature, and we incorporate the precision/recall metric where we consider the accuracy of the explanation against the ground truth. On malware and APT datasets, PROVEXPLAINER achieves up to 29%/27%/25% higher fidelity+, precision and recall, and 12% lower fidelity- respectively, compared to SOTA GNN explainers.

Multi-Object Graph Affordance Network: Goal-Oriented Planning through Learned Compound Object Affordances 2024-12-17
Show

Learning object affordances is an effective tool in the field of robot learning. While the data-driven models investigate affordances of single or paired objects, there is a gap in the exploration of affordances of compound objects composed of an arbitrary number of objects. We propose the Multi-Object Graph Affordance Network which models complex compound object affordances by learning the outcomes of robot actions that facilitate interactions between an object and a compound. Given the depth images of the objects, the object features are extracted via convolution operations and encoded in the nodes of graph neural networks. Graph convolution operations are used to encode the state of the compounds, which are used as input to decoders to predict the outcome of the object-compound interactions. After learning the compound object affordances, given different tasks, the learned outcome predictors are used to plan sequences of stack actions that involve stacking objects on top of each other, inserting smaller objects into larger containers and passing through ring-like objects through poles. We showed that our system successfully modeled the affordances of compound objects that include concave and convex objects, in both simulated and real-world environments. We benchmarked our system with a baseline model to highlight its advantages.

This ...

This work has been accepted by the IEEE for possible publication

Can Large Language Models Improve the Adversarial Robustness of Graph Neural Networks? 2024-12-17
Show

Graph neural networks (GNNs) are vulnerable to adversarial attacks, especially for topology perturbations, and many methods that improve the robustness of GNNs have received considerable attention. Recently, we have witnessed the significant success of large language models (LLMs), leading many to explore the great potential of LLMs on GNNs. However, they mainly focus on improving the performance of GNNs by utilizing LLMs to enhance the node features. Therefore, we ask: Will the robustness of GNNs also be enhanced with the powerful understanding and inference capabilities of LLMs? By presenting the empirical results, we find that despite that LLMs can improve the robustness of GNNs, there is still an average decrease of 23.1% in accuracy, implying that the GNNs remain extremely vulnerable against topology attacks. Therefore, another question is how to extend the capabilities of LLMs on graph adversarial robustness. In this paper, we propose an LLM-based robust graph structure inference framework, LLM4RGNN, which distills the inference capabilities of GPT-4 into a local LLM for identifying malicious edges and an LM-based edge predictor for finding missing important edges, so as to recover a robust graph structure. Extensive experiments demonstrate that LLM4RGNN consistently improves the robustness across various GNNs. Even in some cases where the perturbation ratio increases to 40%, the accuracy of GNNs is still better than that on the clean graph. The source code can be found in https://github.com/zhongjian-zhang/LLM4RGNN.

accepted by KDD2025
ST-FiT: Inductive Spatial-Temporal Forecasting with Limited Training Data 2024-12-17
Show

Spatial-temporal graphs are widely used in a variety of real-world applications. Spatial-Temporal Graph Neural Networks (STGNNs) have emerged as a powerful tool to extract meaningful insights from this data. However, in real-world applications, most nodes may not possess any available temporal data during training. For example, the pandemic dynamics of most cities on a geographical graph may not be available due to the asynchronous nature of outbreaks. Such a phenomenon disagrees with the training requirements of most existing spatial-temporal forecasting methods, which jeopardizes their effectiveness and thus blocks broader deployment. In this paper, we propose to formulate a novel problem of inductive forecasting with limited training data. In particular, given a spatial-temporal graph, we aim to learn a spatial-temporal forecasting model that can be easily generalized onto those nodes without any available temporal training data. To handle this problem, we propose a principled framework named ST-FiT. ST-FiT consists of two key learning components: temporal data augmentation and spatial graph topology learning. With such a design, ST-FiT can be used on top of any existing STGNNs to achieve superior performance on the nodes without training data. Extensive experiments verify the effectiveness of ST-FiT in multiple key perspectives.

Graph Learning in the Era of LLMs: A Survey from the Perspective of Data, Models, and Tasks 2024-12-17
Show

With the increasing prevalence of cross-domain Text-Attributed Graph (TAG) Data (e.g., citation networks, recommendation systems, social networks, and ai4science), the integration of Graph Neural Networks (GNNs) and Large Language Models (LLMs) into a unified Model architecture (e.g., LLM as enhancer, LLM as collaborators, LLM as predictor) has emerged as a promising technological paradigm. The core of this new graph learning paradigm lies in the synergistic combination of GNNs' ability to capture complex structural relationships and LLMs' proficiency in understanding informative contexts from the rich textual descriptions of graphs. Therefore, we can leverage graph description texts with rich semantic context to fundamentally enhance Data quality, thereby improving the representational capacity of model-centric approaches in line with data-centric machine learning principles. By leveraging the strengths of these distinct neural network architectures, this integrated approach addresses a wide range of TAG-based Task (e.g., graph learning, graph reasoning, and graph question answering), particularly in complex industrial scenarios (e.g., supervised, few-shot, and zero-shot settings). In other words, we can treat text as a medium to enable cross-domain generalization of graph learning Model, allowing a single graph model to effectively handle the diversity of downstream graph-based Task across different data domains. This work serves as a foundational reference for researchers and practitioners looking to advance graph learning methodologies in the rapidly evolving landscape of LLM. We consistently maintain the related open-source materials at \url{https://github.com/xkLi-Allen/Awesome-GNN-in-LLMs-Papers}.

In progress
Discovering Top-k Structural Hole Spanners in Dynamic Networks 2024-12-17
Show

Structural Hole (SH) theory states that the node which acts as a connecting link among otherwise disconnected communities gets positional advantages in the network. These nodes are called Structural Hole Spanners (SHS). Numerous solutions are proposed to discover SHSs; however, most of the solutions are only applicable to static networks. Since real-world networks are dynamic networks; consequently, in this study, we aim to discover SHSs in dynamic networks. Discovering SHSs is an NP-hard problem, due to which, instead of discovering exact k SHSs, we adopt a greedy approach to discover Top-k SHSs. We first propose an efficient Tracking-SHS algorithm for updating SHSs in dynamic networks. Our algorithm reuses the information obtained during the initial runs of the static algorithm and avoids the recomputations for the nodes unaffected by the updates. Besides, motivated from the success of Graph Neural Networks (GNNs) on various graph mining problems, we also design a Graph Neural Network-based model, GNN-SHS, to discover SHSs in dynamic networks, aiming to reduce the computational cost while achieving high accuracy. We provide a theoretical analysis of the Tracking-SHS algorithm, and our theoretical results prove that for a particular type of graphs, such as Preferential Attachment graphs [1], Tracking-SHS algorithm achieves 1.6 times of speedup compared with the static algorithm. We perform extensive experiments, and our results demonstrate that the Tracking-SHS algorithm attains a minimum of 3.24 times speedup over the static algorithm. Also, the proposed second model GNN-SHS is on an average 671.6 times faster than the Tracking-SHS algorithm.

arXiv...

arXiv admin note: substantial text overlap with arXiv:2212.08239t I have updates significant portions of the draft and corrected a few mistakes in the paper

DeepSN: A Sheaf Neural Framework for Influence Maximization 2024-12-16
Show

Influence maximization is key topic in data mining, with broad applications in social network analysis and viral marketing. In recent years, researchers have increasingly turned to machine learning techniques to address this problem. They have developed methods to learn the underlying diffusion processes in a data-driven manner, which enhances the generalizability of the solution, and have designed optimization objectives to identify the optimal seed set. Nonetheless, two fundamental gaps remain unsolved: (1) Graph Neural Networks (GNNs) are increasingly used to learn diffusion models, but in their traditional form, they often fail to capture the complex dynamics of influence diffusion, (2) Designing optimization objectives is challenging due to combinatorial explosion when solving this problem. To address these challenges, we propose a novel framework, DeepSN. Our framework employs sheaf neural diffusion to learn diverse influence patterns in a data-driven, end-to-end manner, providing enhanced separability in capturing diffusion characteristics. We also propose an optimization technique that accounts for overlapping influence between vertices, which helps to reduce the search space and identify the optimal seed set effectively and efficiently. Finally, we conduct extensive experiments on both synthetic and real-world datasets to demonstrate the effectiveness of our framework.

Accep...

Accepted to AAAI 2025

Graph-Guided Textual Explanation Generation Framework 2024-12-16
Show

Natural language explanations (NLEs) are commonly used to provide plausible free-text explanations of a model's reasoning about its predictions. However, recent work has questioned the faithfulness of NLEs, as they may not accurately reflect the model's internal reasoning process regarding its predicted answer. In contrast, highlight explanations -- input fragments identified as critical for the model's predictions -- exhibit measurable faithfulness, which has been incrementally improved through existing research. Building on this foundation, we propose G-Tex, a Graph-Guided Textual Explanation Generation framework designed to enhance the faithfulness of NLEs by leveraging highlight explanations. Specifically, highlight explanations are extracted as highly faithful cues representing the model's reasoning and are subsequently encoded through a graph neural network layer, which explicitly guides the NLE generation process. This alignment ensures that the generated explanations closely reflect the model's underlying reasoning. Experiments on T5 and BART using three reasoning datasets show that G-Tex improves NLE faithfulness by up to 17.59% compared to baseline methods. Additionally, G-Tex generates NLEs with greater semantic and lexical similarity to human-written ones. Human evaluations show that G-Tex can decrease redundant content and enhance the overall quality of NLEs. As our work introduces a novel method for explicitly guiding NLE generation to improve faithfulness, we hope it will serve as a stepping stone for addressing additional criteria for NLE and generated text overall.

RL-MILP Solver: A Reinforcement Learning Approach for Solving Mixed-Integer Linear Programs with Graph Neural Networks 2024-12-16
Show

Mixed-Integer Linear Programming (MILP) is an optimization technique widely used in various fields. Existing end-to-end learning methods for MILP generate values for a subset of decision variables and delegate the remaining problem to traditional MILP solvers. However, this approach does not guarantee solution feasibility (i.e., satisfying all constraints) due to inaccurate predictions and primarily focuses on prediction for binary decision variables. When addressing MILP involving non-binary integer variables using machine learning (ML), feasibility issues can become even more pronounced. Since finding an optimal solution requires satisfying all constraints, addressing feasibility is critical. To overcome these limitations, we propose a novel reinforcement learning (RL)-based solver that interacts with MILP to incrementally discover better feasible solutions without relying on traditional solvers. We design reward functions tailored for MILP, which enable the RL agent to learn relationships between decision variables and constraints. Furthermore, we leverage a Transformer encoder-based graph neural network (GNN) to effectively model complex relationships among decision variables. Our experimental results demonstrate that the proposed method can solve MILP problems and find near-optimal solutions without delegating the remainder to traditional solvers. The proposed method provides a meaningful step forward as an initial study in solving MILP problems entirely with ML in an end-to-end manner.

Accep...

Accepted at the 2025 AAAI Workshop on AI to Accelerate Science and Engineering (AI2ASE)

Thermodynamics-informed graph neural networks for real-time simulation of digital human twins 2024-12-16
Show

The growing importance of real-time simulation in the medical field has exposed the limitations and bottlenecks inherent in the digital representation of complex biological systems. This paper presents a novel methodology aimed at advancing current lines of research in soft tissue simulation. The proposed approach introduces a hybrid model that integrates the geometric bias of graph neural networks with the physical bias derived from the imposition of a metriplectic structure as soft and hard constrains in the architecture, being able to simulate hepatic tissue with dissipative properties. This approach provides an efficient solution capable of generating predictions at high feedback rate while maintaining a remarkable generalization ability for previously unseen anatomies. This makes these features particularly relevant in the context of precision medicine and haptic rendering. Based on the adopted methodologies, we propose a model that predicts human liver responses to traction and compression loads in as little as 7.3 milliseconds for optimized configurations and as fast as 1.65 milliseconds in the most efficient cases, all in the forward pass. The model achieves relative position errors below 0.15%, with stress tensor and velocity estimations maintaining relative errors under 7%. This demonstrates the robustness of the approach developed, which is capable of handling diverse load states and anatomies effectively. This work highlights the feasibility of integrating real-time simulation with patient-specific geometries through deep learning, paving the way for more robust digital human twins in medical applications.

Cost-Effective Label-free Node Classification with LLMs 2024-12-16
Show

Graph neural networks (GNNs) have emerged as go-to models for node classification in graph data due to their powerful abilities in fusing graph structures and attributes. However, such models strongly rely on adequate high-quality labeled data for training, which are expensive to acquire in practice. With the advent of large language models (LLMs), a promising way is to leverage their superb zero-shot capabilities and massive knowledge for node labeling. Despite promising results reported, this methodology either demands considerable queries to LLMs, or suffers from compromised performance caused by noisy labels produced by LLMs. To remedy these issues, this work presents Cella, an active self-training framework that integrates LLMs into GNNs in a cost-effective manner. The design recipe of Cella is to iteratively identify small sets of "critical" samples using GNNs and extract informative pseudo-labels for them with both LLMs and GNNs as additional supervision signals to enhance model training. Particularly, Cella includes three major components: (i) an effective active node selection strategy for initial annotations; (ii) a judicious sample selection scheme to sift out the "critical" nodes based on label disharmonicity and entropy; and (iii) a label refinement module combining LLMs and GNNs with rewired topology. Our extensive experiments over five benchmark text-attributed graph datasets demonstrate that Cella significantly outperforms the state of the arts under the same query budget to LLMs in terms of label-free node classification. In particular, on the DBLP dataset with 14.3k nodes, Cella is able to achieve an 8.08% conspicuous improvement in accuracy over the state-of-the-art at a cost of less than one cent.

15 pages, 5 figures
BetaExplainer: A Probabilistic Method to Explain Graph Neural Networks 2024-12-16
Show

Graph neural networks (GNNs) are powerful tools for conducting inference on graph data but are often seen as "black boxes" due to difficulty in extracting meaningful subnetworks driving predictive performance. Many interpretable GNN methods exist, but they cannot quantify uncertainty in edge weights and suffer in predictive accuracy when applied to challenging graph structures. In this work, we proposed BetaExplainer which addresses these issues by using a sparsity-inducing prior to mask unimportant edges during model training. To evaluate our approach, we examine various simulated data sets with diverse real-world characteristics. Not only does this implementation provide a notion of edge importance uncertainty, it also improves upon evaluation metrics for challenging datasets compared to state-of-the art explainer methods.

Asymmetric Learning for Spectral Graph Neural Networks 2024-12-16
Show

Optimizing spectral graph neural networks (GNNs) remains a critical challenge in the field, yet the underlying processes are not well understood. In this paper, we investigate the inherent differences between graph convolution parameters and feature transformation parameters in spectral GNNs and their impact on the optimization landscape. Our analysis reveals that these differences contribute to a poorly conditioned problem, resulting in suboptimal performance. To address this issue, we introduce the concept of the block condition number of the Hessian matrix, which characterizes the difficulty of poorly conditioned problems in spectral GNN optimization. We then propose an asymmetric learning approach, dynamically preconditioning gradients during training to alleviate poorly conditioned problems. Theoretically, we demonstrate that asymmetric learning can reduce block condition numbers, facilitating easier optimization. Extensive experiments on eighteen benchmark datasets show that asymmetric learning consistently improves the performance of spectral GNNs for both heterophilic and homophilic graphs. This improvement is especially notable for heterophilic graphs, where the optimization process is generally more complex than for homophilic graphs. Code is available at https://github.com/Mia-321/asym-opt.git.

EvoLlama: Enhancing LLMs' Understanding of Proteins via Multimodal Structure and Sequence Representations 2024-12-16
Show

Current Large Language Models (LLMs) for understanding proteins primarily treats amino acid sequences as a text modality. Meanwhile, Protein Language Models (PLMs), such as ESM-2, have learned massive sequential evolutionary knowledge from the universe of natural protein sequences. Furthermore, structure-based encoders like ProteinMPNN learn the structural information of proteins through Graph Neural Networks. However, whether the incorporation of protein encoders can enhance the protein understanding of LLMs has not been explored. To bridge this gap, we propose EvoLlama, a multimodal framework that connects a structure-based encoder, a sequence-based protein encoder and an LLM for protein understanding. EvoLlama consists of a ProteinMPNN structure encoder, an ESM-2 protein sequence encoder, a multimodal projector to align protein and text representations and a Llama-3 text decoder. To train EvoLlama, we fine-tune it on protein-oriented instructions and protein property prediction datasets verbalized via natural language instruction templates. Our experiments show that EvoLlama's protein understanding capabilities have been significantly enhanced, outperforming other fine-tuned protein-oriented LLMs in zero-shot settings by an average of 1%-8% and surpassing the state-of-the-art baseline with supervised fine-tuning by an average of 6%. On protein property prediction datasets, our approach achieves promising results that are competitive with state-of-the-art task-specific baselines. We will release our code in a future version.

PointNet with KAN versus PointNet with MLP for 3D Classification and Segmentation of Point Sets 2024-12-16
Show

Kolmogorov-Arnold Networks (KANs) have recently gained attention as an alternative to traditional Multilayer Perceptrons (MLPs) in deep learning frameworks. KANs have been integrated into various deep learning architectures such as convolutional neural networks, graph neural networks, and transformers, with their performance evaluated. However, their effectiveness within point-cloud-based neural networks remains unexplored. To address this gap, we incorporate KANs into PointNet for the first time to evaluate their performance on 3D point cloud classification and segmentation tasks. Specifically, we introduce PointNet-KAN, built upon two key components. First, it employs KANs instead of traditional MLPs. Second, it retains the core principle of PointNet by using shared KAN layers and applying symmetric functions for global feature extraction, ensuring permutation invariance with respect to the input features. In traditional MLPs, the goal is to train the weights and biases with fixed activation functions; however, in KANs, the goal is to train the activation functions themselves. We use Jacobi polynomials to construct the KAN layers. We extensively and systematically evaluate PointNet-KAN across various polynomial degrees and special types such as the Lagrange, Chebyshev, and Gegenbauer polynomials. Our results show that PointNet-KAN achieves competitive performance compared to PointNet with MLPs on benchmark datasets for 3D object classification and segmentation, despite employing a shallower and simpler network architecture. We hope this work serves as a foundation and provides guidance for integrating KANs, as an alternative to MLPs, into more advanced point cloud processing architectures.

Accelerating Sparse Graph Neural Networks with Tensor Core Optimization 2024-12-16
Show

Graph neural networks (GNNs) have seen extensive application in domains such as social networks, bioinformatics, and recommendation systems. However, the irregularity and sparsity of graph data challenge traditional computing methods, which are insufficient to meet the performance demands of GNNs. Recent research has explored parallel acceleration using CUDA Cores and Tensor Cores, but significant challenges persist: (1) kernel fusion leads to false high utilization, failing to treat CUDA and Tensor Cores as independent resources, and (2) heterogeneous cores have distinct computation preferences, causing inefficiencies. To address these issues, this paper proposes FTC-GNN, a novel acceleration framework that efficiently utilizes CUDA and Tensor Cores for GNN computation. FTC-GNN introduces (1) a collaborative design that enables the parallel utilization of CUDA and Tensor Cores and (2) a sparse-to-dense transformation strategy that assigns dense matrix operations to Tensor Cores while leveraging CUDA Cores for data management and sparse edge processing. This design optimizes GPU resource utilization and improves computational efficiency. Experimental results demonstrate the effectiveness of FTC-GNN using GCN and AGNN models across various datasets. For GCN, FTC-GNN achieves speedups of 4.90x, 7.10x, and 1.17x compared to DGL, PyG, and TC-GNN, respectively. For AGNN, it achieves speedups of 5.32x, 2.92x, and 1.02x, establishing its superiority in accelerating GNN computations.

PyTorch Frame: A Modular Framework for Multi-Modal Tabular Learning 2024-12-15
Show

We present PyTorch Frame, a PyTorch-based framework for deep learning over multi-modal tabular data. PyTorch Frame makes tabular deep learning easy by providing a PyTorch-based data structure to handle complex tabular data, introducing a model abstraction to enable modular implementation of tabular models, and allowing external foundation models to be incorporated to handle complex columns (e.g., LLMs for text columns). We demonstrate the usefulness of PyTorch Frame by implementing diverse tabular models in a modular way, successfully applying these models to complex multi-modal tabular data, and integrating our framework with PyTorch Geometric, a PyTorch library for Graph Neural Networks (GNNs), to perform end-to-end learning over relational databases.

https...

https://github.com/pyg-team/pytorch-frame

Can Graph Neural Networks Learn Language with Extremely Weak Text Supervision? 2024-12-15
Show

While great success has been achieved in building vision models with Contrastive Language-Image Pre-training (CLIP) over Internet-scale image-text pairs, building transferable Graph Neural Networks (GNNs) with CLIP pipeline is challenging because of three fundamental issues: the scarcity of labeled data and text supervision, different levels of downstream tasks, and the conceptual gaps between domains. In this work, to address these issues, we leverage multi-modal prompt learning to effectively adapt pre-trained GNN to downstream tasks and data, given only a few semantically labeled samples, each with extremely weak text supervision. Our new paradigm embeds the graphs directly in the same space as the Large Language Models (LLMs) by learning both graph prompts and text prompts simultaneously. To accomplish this, we improve state-of-the-art graph prompt method, and then propose the first graph-language multi-modal prompt learning approach for exploiting the knowledge in pre-trained models. Notably, due to the insufficient supervision for fine-tuning, in our paradigm, the pre-trained GNN and the LLM are kept frozen, so the learnable parameters are much fewer than fine-tuning any pre-trained model. Through extensive experiments on real-world datasets, we demonstrate the superior performance of our paradigm in few-shot, multi-task-level, and cross-domain settings. Moreover, we build the first CLIP-style zero-shot classification prototype that can generalize GNNs to unseen classes with extremely weak text supervision.

Preprint, 25 pages
A Comparative Study on Dynamic Graph Embedding based on Mamba and Transformers 2024-12-15
Show

Dynamic graph embedding has emerged as an important technique for modeling complex time-evolving networks across diverse domains. While transformer-based models have shown promise in capturing long-range dependencies in temporal graph data, they face scalability challenges due to quadratic computational complexity. This study presents a comparative analysis of dynamic graph embedding approaches using transformers and the recently proposed Mamba architecture, a state-space model with linear complexity. We introduce three novel models: TransformerG2G augment with graph convolutional networks, DG-Mamba, and GDG-Mamba with graph isomorphism network edge convolutions. Our experiments on multiple benchmark datasets demonstrate that Mamba-based models achieve comparable or superior performance to transformer-based approaches in link prediction tasks while offering significant computational efficiency gains on longer sequences. Notably, DG-Mamba variants consistently outperform transformer-based models on datasets with high temporal variability, such as UCI, Bitcoin, and Reality Mining, while maintaining competitive performance on more stable graphs like SBM. We provide insights into the learned temporal dependencies through analysis of attention weights and state matrices, revealing the models' ability to capture complex temporal patterns. By effectively combining state-space models with graph neural networks, our work addresses key limitations of previous approaches and contributes to the growing body of research on efficient temporal graph representation learning. These findings offer promising directions for scaling dynamic graph embedding to larger, more complex real-world networks, potentially enabling new applications in areas such as social network analysis, financial modeling, and biological system dynamics.

18 pages, 6 figures
GENIE: Watermarking Graph Neural Networks for Link Prediction 2024-12-15
Show

Graph Neural Networks (GNNs) have become invaluable intellectual property in graph-based machine learning. However, their vulnerability to model stealing attacks when deployed within Machine Learning as a Service (MLaaS) necessitates robust Ownership Demonstration (OD) techniques. Watermarking is a promising OD framework for Deep Neural Networks, but existing methods fail to generalize to GNNs due to the non-Euclidean nature of graph data. Previous works on GNN watermarking have primarily focused on node and graph classification, overlooking Link Prediction (LP). In this paper, we propose GENIE (watermarking Graph nEural Networks for lInk prEdiction), the first-ever scheme to watermark GNNs for LP. GENIE creates a novel backdoor for both node-representation and subgraph-based LP methods, utilizing a unique trigger set and a secret watermark vector. Our OD scheme is equipped with Dynamic Watermark Thresholding (DWT), ensuring high verification probability (>99.99%) while addressing practical issues in existing watermarking schemes. We extensively evaluate GENIE across 4 model architectures (i.e., SEAL, GCN, GraphSAGE and NeoGNN) and 7 real-world datasets. Furthermore, we validate the robustness of GENIE against 11 state-of-the-art watermark removal techniques and 3 model extraction attacks. We also show GENIE's resilience against ownership piracy attacks. Finally, we discuss a defense strategy to counter adaptive attacks against GENIE.

HAGNN: Hybrid Aggregation for Heterogeneous Graph Neural Networks 2024-12-15
Show

Heterogeneous graph neural networks (GNNs) have been successful in handling heterogeneous graphs. In existing heterogeneous GNNs, meta-path plays an essential role. However, recent work pointed out that simple homogeneous graph model without meta-path can also achieve comparable results, which calls into question the necessity of meta-path. In this paper, we first present the intrinsic difference about meta-path-based and meta-path-free models, i.e., how to select neighbors for node aggregation. Then, we propose a novel framework to utilize the rich type semantic information in heterogeneous graphs comprehensively, namely HAGNN (Hybrid Aggregation for Heterogeneous GNNs). The core of HAGNN is to leverage the meta-path neighbors and the directly connected neighbors simultaneously for node aggregations. HAGNN divides the overall aggregation process into two phases: meta-path-based intra-type aggregation and meta-path-free inter-type aggregation. During the intra-type aggregation phase, we propose a new data structure called fused meta-path graph and perform structural semantic aware aggregation on it. Finally, we combine the embeddings generated by each phase. Compared with existing heterogeneous GNN models, HAGNN can take full advantage of the heterogeneity in heterogeneous graphs. Extensive experimental results on node classification, node clustering, and link prediction tasks show that HAGNN outperforms the existing modes, demonstrating the effectiveness of HAGNN.

Accep...

Accepted by IEEE TNNLS

Concept Learning in the Wild: Towards Algorithmic Understanding of Neural Networks 2024-12-15
Show

Explainable AI (XAI) methods typically focus on identifying essential input features or more abstract concepts for tasks like image or text classification. However, for algorithmic tasks like combinatorial optimization, these concepts may depend not only on the input but also on the current state of the network, like in the graph neural networks (GNN) case. This work studies concept learning for an existing GNN model trained to solve Boolean satisfiability (SAT). \textcolor{black}{Our analysis reveals that the model learns key concepts matching those guiding human-designed SAT heuristics, particularly the notion of 'support.' We demonstrate that these concepts are encoded in the top principal components (PCs) of the embedding's covariance matrix, allowing for unsupervised discovery. Using sparse PCA, we establish the minimality of these concepts and show their teachability through a simplified GNN. Two direct applications of our framework are (a) We improve the convergence time of the classical WalkSAT algorithm and (b) We use the discovered concepts to "reverse-engineer" the black-box GNN and rewrite it as a white-box textbook algorithm. Our results highlight the potential of concept learning in understanding and enhancing algorithmic neural networks for combinatorial optimization tasks.

GNNs-to-MLPs by Teacher Injection and Dirichlet Energy Distillation 2024-12-15
Show

Graph Neural Networks (GNNs) are fundamental to graph-based learning and excel in node classification tasks. However, GNNs suffer from scalability issues due to the need for multi-hop data during inference, limiting their use in latency-sensitive applications. Recent studies attempt to distill GNNs into multi-layer perceptrons (MLPs) for faster inference. They typically treat GNN and MLP models as single units for distillation, insufficiently utilizing the fine-grained knowledge within GNN layers. In this paper, we propose TINED, a novel method that distills GNNs to MLPs layer-wise through Teacher Injection with fine-tuning and Dirichlet Energy Distillation techniques. We analyze key operations in GNN layers, feature transformation (FT) and graph propagation (GP), and identify that an FT performs the same computation as a fully-connected (FC) layer in MLPs. Thus, we propose directly injecting valuable teacher parameters of an FT in a GNN into an FC layer of the student MLP, assisted by fine-tuning. In TINED, FC layers in an MLP mirror the order of the corresponding FTs and GPs in GNN. We provide a theoretical bound on the approximation of GPs. Moreover, we observe that in a GNN layer, FT and GP operations often have opposing smoothing effects: GP is aggressive, while FT is conservative, in smoothing. Using Dirichlet energy, we design a DE ratio to quantify these smoothing effects and propose Dirichlet Energy Distillation to distill these characteristics from GNN layers to MLP layers. Extensive experiments demonstrate that TINED achieves superior performance over GNNs and state-of-the-art distillation methods under various settings across seven datasets. The code is in supplementary material.

14 pages
GDSG: Graph Diffusion-based Solution Generator for Optimization Problems in MEC Networks 2024-12-15
Show

Optimization is crucial for MEC networks to function efficiently and reliably, most of which are NP-hard and lack efficient approximation algorithms. This leads to a paucity of optimal solution, constraining the effectiveness of conventional deep learning approaches. Most existing learning-based methods necessitate extensive optimal data and fail to exploit the potential benefits of suboptimal data that can be obtained with greater efficiency and effectiveness. Taking the multi-server multi-user computation offloading (MSCO) problem, which is widely observed in systems like Internet-of-Vehicles (IoV) and Unmanned Aerial Vehicle (UAV) networks, as a concrete scenario, we present a Graph Diffusion-based Solution Generation (GDSG) method. This approach is designed to work with suboptimal datasets while converging to the optimal solution large probably. We transform the optimization issue into distribution-learning and offer a clear explanation of learning from suboptimal training datasets. We build GDSG as a multi-task diffusion model utilizing a Graph Neural Network (GNN) to acquire the distribution of high-quality solutions. We use a simple and efficient heuristic approach to obtain a sufficient amount of training data composed entirely of suboptimal solutions. In our implementation, we enhance the backbone GNN and achieve improved generalization. GDSG also reaches nearly 100% task orthogonality, ensuring no interference between the discrete and continuous generation tasks. We further reveal that this orthogonality arises from the diffusion-related training loss, rather than the neural network architecture itself. The experiments demonstrate that GDSG surpasses other benchmark methods on both the optimal and suboptimal training datasets. The MSCO datasets has open-sourced at http://ieee-dataport.org/13824, as well as the GDSG algorithm codes at https://github.com/qiyu3816/GDSG.

Dynamic Graph Attention Networks for Travel Time Distribution Prediction in Urban Arterial Roads 2024-12-15
Show

Effective congestion management along signalized corridors is essential for improving productivity and reducing costs, with arterial travel time serving as a key performance metric. Traditional approaches, such as Coordinated Signal Timing and Adaptive Traffic Control Systems, often lack scalability and generalizability across diverse urban layouts. We propose Fusion-based Dynamic Graph Neural Networks (FDGNN), a structured framework for simultaneous modeling of travel time distributions in both directions along arterial corridors. FDGNN utilizes attentional graph convolution on dynamic, bidirectional graphs and integrates fusion techniques to capture evolving spatiotemporal traffic dynamics. The framework is trained on extensive hours of simulation data and utilizes GPU computation to ensure scalability. The results demonstrate that our framework can efficiently and accurately model travel time as a normal distribution on arterial roads leveraging a unique dynamic graph representation of corridor traffic states. This representation integrates sequential traffic signal timing plans, local driving behaviors, temporal turning movement counts, and ingress traffic volumes, even when aggregated over intervals as short as a single cycle length. The results demonstrate resilience to effective traffic variations, including cycle lengths, green time percentages, traffic density, and counterfactual routes. Results further confirm its stability under varying conditions at different intersections. This framework supports dynamic signal timing, enhances congestion management, and improves travel time reliability in real-world applications.

11 pa...

11 pages, 4 figures, 3 tables

Temporal-Aware Evaluation and Learning for Temporal Graph Neural Networks 2024-12-15
Show

Temporal Graph Neural Networks (TGNNs) are a family of graph neural networks designed to model and learn dynamic information from temporal graphs. Given their substantial empirical success, there is an escalating interest in TGNNs within the research community. However, the majority of these efforts have been channelled towards algorithm and system design, with the evaluation metrics receiving comparatively less attention. Effective evaluation metrics are crucial for providing detailed performance insights, particularly in the temporal domain. This paper investigates the commonly used evaluation metrics for TGNNs and illustrates the failure mechanisms of these metrics in capturing essential temporal structures in the predictive behaviour of TGNNs. We provide a mathematical formulation of existing performance metrics and utilize an instance-based study to underscore their inadequacies in identifying volatility clustering (the occurrence of emerging errors within a brief interval). This phenomenon has profound implications for both algorithm and system design in the temporal domain. To address this deficiency, we introduce a new volatility-aware evaluation metric (termed volatility cluster statistics), designed for a more refined analysis of model temporal performance. Additionally, we demonstrate how this metric can serve as a temporal-volatility-aware training objective to alleviate the clustering of temporal errors. Through comprehensive experiments on various TGNN models, we validate our analysis and the proposed approach. The empirical results offer revealing insights: 1) existing TGNNs are prone to making errors with volatility clustering, and 2) TGNNs with different mechanisms to capture temporal information exhibit distinct volatility clustering patterns. Our empirical findings demonstrate that our proposed training objective effectively reduces volatility clusters in error.

Multi-Class and Multi-Task Strategies for Neural Directed Link Prediction 2024-12-14
Show

Link Prediction is a foundational task in Graph Representation Learning, supporting applications like link recommendation, knowledge graph completion and graph generation. Graph Neural Networks have shown the most promising results in this domain and are currently the de facto standard approach to learning from graph data. However, a key distinction exists between Undirected and Directed Link Prediction: the former just predicts the existence of an edge, while the latter must also account for edge directionality and bidirectionality. This translates to Directed Link Prediction (DLP) having three sub-tasks, each defined by how training, validation and test sets are structured. Most research on DLP overlooks this trichotomy, focusing solely on the "existence" sub-task, where training and test sets are random, uncorrelated samples of positive and negative directed edges. Even in the works that recognize the aforementioned trichotomy, models fail to perform well across all three sub-tasks. In this study, we experimentally demonstrate that training Neural DLP (NDLP) models only on the existence sub-task, using methods adapted from Neural Undirected Link Prediction, results in parameter configurations that fail to capture directionality and bidirectionality, even after rebalancing edge classes. To address this, we propose three strategies that handle the three tasks simultaneously. Our first strategy, the Multi-Class Framework for Neural Directed Link Prediction (MC-NDLP) maps NDLP to a Multi-Class training objective. The second and third approaches adopt a Multi-Task perspective, either with a Multi-Objective (MO-DLP) or a Scalarized (S-DLP) strategy. Our results show that these methods outperform traditional approaches across multiple datasets and models, achieving equivalent or superior performance in addressing the three DLP sub-tasks.

15 pages, 2 figures
Community-Centric Graph Unlearning 2024-12-14
Show

Graph unlearning technology has become increasingly important since the advent of the `right to be forgotten' and the growing concerns about the privacy and security of artificial intelligence. Graph unlearning aims to quickly eliminate the effects of specific data on graph neural networks (GNNs). However, most existing deterministic graph unlearning frameworks follow a balanced partition-submodel training-aggregation paradigm, resulting in a lack of structural information between subgraph neighborhoods and redundant unlearning parameter calculations. To address this issue, we propose a novel Graph Structure Mapping Unlearning paradigm (GSMU) and a novel method based on it named Community-centric Graph Eraser (CGE). CGE maps community subgraphs to nodes, thereby enabling the reconstruction of a node-level unlearning operation within a reduced mapped graph. CGE makes the exponential reduction of both the amount of training data and the number of unlearning parameters. Extensive experiments conducted on five real-world datasets and three widely used GNN backbones have verified the high performance and efficiency of our CGE method, highlighting its potential in the field of graph unlearning.

Improving Graph Neural Networks via Adversarial Robustness Evaluation 2024-12-14
Show

Graph Neural Networks (GNNs) are currently one of the most powerful types of neural network architectures. Their advantage lies in the ability to leverage both the graph topology, which represents the relationships between samples, and the features of the samples themselves. However, the given graph topology often contains noisy edges, and GNNs are vulnerable to noise in the graph structure. This issue remains unresolved. In this paper, we propose using adversarial robustness evaluation to select a small subset of robust nodes that are less affected by noise. We then only feed the features of these robust nodes, along with the KNN graph constructed from these nodes, into the GNN for classification. Additionally, we compute the centroids for each class. For the remaining non-robust nodes, we assign them to the class whose centroid is closest to them. Experimental results show that this method significantly improves the accuracy of GNNs.

Scaling Up Graph Propagation Computation on Large Graphs: A Local Chebyshev Approximation Approach 2024-12-14
Show

Graph propagation (GP) computation plays a crucial role in graph data analysis, supporting various applications such as graph node similarity queries, graph node ranking, graph clustering, and graph neural networks. Existing methods, mainly relying on power iteration or push computation frameworks, often face challenges with slow convergence rates when applied to large-scale graphs. To address this issue, we propose a novel and powerful approach that accelerates power iteration and push methods using Chebyshev polynomials. Specifically, we first present a novel Chebyshev expansion formula for general GP functions, offering a new perspective on GP computation and achieving accelerated convergence. Building on these theoretical insights, we develop a novel Chebyshev power iteration method (\ltwocheb) and a novel Chebyshev push method (\chebpush). Our \ltwocheb method demonstrates an approximate acceleration of $O(\sqrt{N})$ compared to existing power iteration techniques for both personalized PageRank and heat kernel PageRank computations, which are well-studied GP problems. For \chebpush, we propose an innovative subset Chebyshev recurrence technique, enabling the design of a push-style local algorithm with provable error guarantee and reduced time complexity compared to existing push methods. We conduct extensive experiments using 5 large real-world datasets to evaluate our proposed algorithms, demonstrating their superior efficiency compared to state-of-the-art approaches.

15 pages
Automated Molecular Concept Generation and Labeling with Large Language Models 2024-12-14
Show

Artificial intelligence (AI) is transforming scientific research, with explainable AI methods like concept-based models (CMs) showing promise for new discoveries. However, in molecular science, CMs are less common than black-box models like Graph Neural Networks (GNNs), due to their need for predefined concepts and manual labeling. This paper introduces the Automated Molecular Concept (AutoMolCo) framework, which leverages Large Language Models (LLMs) to automatically generate and label predictive molecular concepts. Through iterative concept refinement, AutoMolCo enables simple linear models to outperform GNNs and LLM in-context learning on several benchmarks. The framework operates without human knowledge input, overcoming limitations of existing CMs while maintaining explainability and allowing easy intervention. Experiments on MoleculeNet and High-Throughput Experimentation (HTE) datasets demonstrate that AutoMolCo-induced explainable CMs are beneficial for molecular science research.

Pretrained Event Classification Model for High Energy Physics Analysis 2024-12-14
Show

We introduce a foundation model for event classification in high-energy physics, built on a Graph Neural Network architecture and trained on 120 million simulated proton-proton collision events spanning 12 distinct physics processes. The model is pretrained to learn a general and robust representation of collision data using challenging multiclass and multilabel classification tasks. Its performance is evaluated across five event classification tasks, which include both physics processes used during pretraining and new processes not encountered during pretraining. Fine-tuning the pretrained model significantly improves classification performance, particularly in scenarios with limited training data, demonstrating gains in both accuracy and computational efficiency. To investigate the underlying mechanisms behind these performance improvements, we employ a representational similarity evaluation framework based on Centered Kernel Alignment. This analysis reveals notable differences in the learned representations of fine-tuned pretrained models compared to baseline models trained from scratch.

9 pages, 1 figure
WaveGNN: Modeling Irregular Multivariate Time Series for Accurate Predictions 2024-12-14
Show

Accurately modeling and analyzing time series data is crucial for downstream applications across various fields, including healthcare, finance, astronomy, and epidemiology. However, real-world time series often exhibit irregularities such as misaligned timestamps, missing entries, and variable sampling rates, complicating their analysis. Existing approaches often rely on imputation, which can introduce biases. A few approaches that directly model irregularity tend to focus exclusively on either capturing intra-series patterns or inter-series relationships, missing the benefits of integrating both. To this end, we present WaveGNN, a novel framework designed to directly (i.e., no imputation) embed irregularly sampled multivariate time series data for accurate predictions. WaveGNN utilizes a Transformer-based encoder to capture intra-series patterns by directly encoding the temporal dynamics of each time series. To capture inter-series relationships, WaveGNN uses a dynamic graph neural network model, where each node represents a sensor, and the edges capture the long- and short-term relationships between them. Our experimental results on real-world healthcare datasets demonstrate that WaveGNN consistently outperforms existing state-of-the-art methods, with an average relative improvement of 14.7% in F1-score when compared to the second-best baseline in cases with extreme sparsity. Our ablation studies reveal that both intra-series and inter-series modeling significantly contribute to this notable improvement.

A Novel Framework Using Deep Reinforcement Learning for Join Order Selection 2024-12-13
Show

Join order selection is a sub-field of query optimization that aims to find the optimal join order for an SQL query with the minimum cost. The challenge lies in the exponentially growing search space as the number of tables increases, making exhaustive enumeration impractical. Traditional optimizers use static heuristics to prune the search space, but they often fail to adapt to changes or improve based on feedback from the DBMS. Recent research addresses these limitations with Deep Reinforcement Learning (DRL), allowing models to use feedback to dynamically search for better join orders and enhance performance over time. Existing research primarily focuses on capturing join order sequences and their representations at various levels, with limited comparative analysis of reinforcement learning methods. In this paper, we propose GTDD, a novel framework that integrates Graph Neural Networks (GNN), Treestructured Long Short-Term Memory (Tree LSTM), and DuelingDQN. We conduct a series of experiments that demonstrate a clear advantage of GTDD over state-of the-art techniques.

Keep It Simple: Towards Accurate Vulnerability Detection for Large Code Graphs 2024-12-13
Show

Software vulnerability detection is crucial for high-quality software development. Recently, some studies utilizing Graph Neural Networks (GNNs) to learn the graph representation of code in vulnerability detection tasks have achieved remarkable success. However, existing graph-based approaches mainly face two limitations that prevent them from generalizing well to large code graphs: (1) the interference of noise information in the code graph; (2) the difficulty in capturing long-distance dependencies within the graph. To mitigate these problems, we propose a novel vulnerability detection method, ANGLE, whose novelty mainly embodies the hierarchical graph refinement and context-aware graph representation learning. The former hierarchically filters redundant information in the code graph, thereby reducing the size of the graph, while the latter collaboratively employs the Graph Transformer and GNN to learn code graph representations from both the global and local perspectives, thus capturing long-distance dependencies. Extensive experiments demonstrate promising results on three widely used benchmark datasets: our method significantly outperforms several other baselines in terms of the accuracy and F1 score. Particularly, in large code graphs, ANGLE achieves an improvement in accuracy of 34.27%-161.93% compared to the state-of-the-art method, AMPLE. Such results demonstrate the effectiveness of ANGLE in vulnerability detection tasks.

17 pages
Can LLMs Convert Graphs to Text-Attributed Graphs? 2024-12-13
Show

Graphs are ubiquitous data structures found in numerous real-world applications, such as drug discovery, recommender systems, and social network analysis. Graph neural networks (GNNs) have become a popular tool to learn node embeddings through message passing on these structures. However, a significant challenge arises when applying GNNs to multiple graphs with different feature spaces, as existing GNN architectures are not designed for cross-graph feature alignment. To address this, recent approaches introduce text-attributed graphs, where each node is associated with a textual description, enabling the use of a shared textual encoder to project nodes from different graphs into a unified feature space. While promising, this method relies heavily on the availability of text-attributed data, which can be difficult to obtain in practice. To bridge this gap, we propose a novel method named Topology-Aware Node description Synthesis (TANS), which leverages large language models (LLMs) to automatically convert existing graphs into text-attributed graphs. The key idea is to integrate topological information with each node's properties, enhancing the LLMs' ability to explain how graph topology influences node semantics. We evaluate our TANS on text-rich, text-limited, and text-free graphs, demonstrating that it enables a single GNN to operate across diverse graphs. Notably, on text-free graphs, our method significantly outperforms existing approaches that manually design node features, showcasing the potential of LLMs for preprocessing graph-structured data, even in the absence of textual information. The code and data are available at https://github.com/Zehong-Wang/TANS.

A Hybrid Real-Time Framework for Efficient Fussell-Vesely Importance Evaluation Using Virtual Fault Trees and Graph Neural Networks 2024-12-13
Show

The Fussell-Vesely Importance (FV) reflects the potential impact of a basic event on system failure, and is crucial for ensuring system reliability. However, traditional methods for calculating FV importance are complex and time-consuming, requiring the construction of fault trees and the calculation of minimal cut set. To address these limitations, this study proposes a hybrid real-time framework to evaluate the FV importance of basic events. Our framework combines expert knowledge with a data-driven model. First, we use Interpretive Structural Modeling (ISM) to build a virtual fault tree that captures the relationships between basic events. Unlike traditional fault trees, which include intermediate events, our virtual fault tree consists solely of basic events, reducing its complexity and space requirements. Additionally, our virtual fault tree considers the dependencies between basic events rather than assuming their independence, as is typically done in traditional fault trees. We then feed both the event relationships and relevant data into a graph neural network (GNN). This approach enables a rapid, data-driven calculation of FV importance, significantly reducing processing time and quickly identifying critical events, thus providing robust decision support for risk control. Results demonstrate that our model performs well in terms of MSE, RMSE, MAE, and R2, reducing computational energy consumption and offering real-time, risk-informed decision support for complex systems.

KAGNNs: Kolmogorov-Arnold Networks meet Graph Learning 2024-12-13
Show

In recent years, Graph Neural Networks (GNNs) have become the de facto tool for learning node and graph representations. Most GNNs typically consist of a sequence of neighborhood aggregation (a.k.a., message-passing) layers, within which the representation of each node is updated based on those of its neighbors. The most expressive message-passing GNNs can be obtained through the use of the sum aggregator and of MLPs for feature transformation, thanks to their universal approximation capabilities. However, the limitations of MLPs recently motivated the introduction of another family of universal approximators, called Kolmogorov-Arnold Networks (KANs) which rely on a different representation theorem. In this work, we compare the performance of KANs against that of MLPs on graph learning tasks. We evaluate two different implementations of KANs using two distinct base families of functions, namely B-splines and radial basis functions. We perform extensive experiments on node classification, graph classification and graph regression datasets. Our results indicate that KANs are on-par with or better than MLPs on all studied tasks, making them viable alternatives, at the cost of some computational complexity. Code is available at https: //github.com/RomanBresson/KAGNN.

Efficient Large-Scale Traffic Forecasting with Transformers: A Spatial Data Management Perspective 2024-12-13
Show

Road traffic forecasting is crucial in real-world intelligent transportation scenarios like traffic dispatching and path planning in city management and personal traveling. Spatio-temporal graph neural networks (STGNNs) stand out as the mainstream solution in this task. Nevertheless, the quadratic complexity of remarkable dynamic spatial modeling-based STGNNs has become the bottleneck over large-scale traffic data. From the spatial data management perspective, we present a novel Transformer framework called PatchSTG to efficiently and dynamically model spatial dependencies for large-scale traffic forecasting with interpretability and fidelity. Specifically, we design a novel irregular spatial patching to reduce the number of points involved in the dynamic calculation of Transformer. The irregular spatial patching first utilizes the leaf K-dimensional tree (KDTree) to recursively partition irregularly distributed traffic points into leaf nodes with a small capacity, and then merges leaf nodes belonging to the same subtree into occupancy-equaled and non-overlapped patches through padding and backtracking. Based on the patched data, depth and breadth attention are used interchangeably in the encoder to dynamically learn local and global spatial knowledge from points in a patch and points with the same index of patches. Experimental results on four real world large-scale traffic datasets show that our PatchSTG achieves train speed and memory utilization improvements up to $10\times$ and $4\times$ with the state-of-the-art performance.

Accep...

Accepted by SIGKDD 2025

GraSP: Simple yet Effective Graph Similarity Predictions 2024-12-13
Show

Graph similarity computation (GSC) is to calculate the similarity between one pair of graphs, which is a fundamental problem with fruitful applications in the graph community. In GSC, graph edit distance (GED) and maximum common subgraph (MCS) are two important similarity metrics, both of which are NP-hard to compute. Instead of calculating the exact values, recent solutions resort to leveraging graph neural networks (GNNs) to learn data-driven models for the estimation of GED and MCS. Most of them are built on components involving node-level interactions crossing graphs, which engender vast computation overhead but are of little avail in effectiveness. In the paper, we present GraSP, a simple yet effective GSC approach for GED and MCS prediction. GraSP achieves high result efficacy through several key instruments: enhanced node features via positional encoding and a GNN model augmented by a gating mechanism, residual connections, as well as multi-scale pooling. Theoretically, GraSP can surpass the 1-WL test, indicating its high expressiveness. Empirically, extensive experiments comparing GraSP against 10 competitors on multiple widely adopted benchmark datasets showcase the superiority of GraSP over prior arts in terms of both effectiveness and efficiency. The code is available at https://github.com/HaoranZ99/GraSP.

Accep...

Accepted by AAAI2025. 13 pages, 14 figures. The code is available at https://github.com/HaoranZ99/GraSP

Towards Fair Graph Neural Networks via Graph Counterfactual without Sensitive Attributes 2024-12-13
Show

Graph-structured data is ubiquitous in today's connected world, driving extensive research in graph analysis. Graph Neural Networks (GNNs) have shown great success in this field, leading to growing interest in developing fair GNNs for critical applications. However, most existing fair GNNs focus on statistical fairness notions, which may be insufficient when dealing with statistical anomalies. Hence, motivated by causal theory, there has been growing attention to mitigating root causes of unfairness utilizing graph counterfactuals. Unfortunately, existing methods for generating graph counterfactuals invariably require the sensitive attribute. Nevertheless, in many real-world applications, it is usually infeasible to obtain sensitive attributes due to privacy or legal issues, which challenge existing methods. In this paper, we propose a framework named Fairwos (improving Fairness without sensitive attributes). In particular, we first propose a mechanism to generate pseudo-sensitive attributes to remedy the problem of missing sensitive attributes, and then design a strategy for finding graph counterfactuals from the real dataset. To train fair GNNs, we propose a method to ensure that the embeddings from the original data are consistent with those from the graph counterfactuals, and dynamically adjust the weight of each pseudo-sensitive attribute to balance its contribution to fairness and utility. Furthermore, we theoretically demonstrate that minimizing the relation between these pseudo-sensitive attributes and the prediction can enable the fairness of GNNs. Experimental results on six real-world datasets show that our approach outperforms state-of-the-art methods in balancing utility and fairness.

ICDE 2025
Bootstrapping Heterogeneous Graph Representation Learning via Large Language Models: A Generalized Approach 2024-12-13
Show

Graph representation learning methods are highly effective in handling complex non-Euclidean data by capturing intricate relationships and features within graph structures. However, traditional methods face challenges when dealing with heterogeneous graphs that contain various types of nodes and edges due to the diverse sources and complex nature of the data. Existing Heterogeneous Graph Neural Networks (HGNNs) have shown promising results but require prior knowledge of node and edge types and unified node feature formats, which limits their applicability. Recent advancements in graph representation learning using Large Language Models (LLMs) offer new solutions by integrating LLMs' data processing capabilities, enabling the alignment of various graph representations. Nevertheless, these methods often overlook heterogeneous graph data and require extensive preprocessing. To address these limitations, we propose a novel method that leverages the strengths of both LLM and GNN, allowing for the processing of graph data with any format and type of nodes and edges without the need for type information or special preprocessing. Our method employs LLM to automatically summarize and classify different data formats and types, aligns node features, and uses a specialized GNN for targeted learning, thus obtaining effective graph representations for downstream tasks. Theoretical analysis and experimental validation have demonstrated the effectiveness of our method.

Accep...

Accepted by AAAI 2025

Brain-inspired Chaotic Graph Backpropagation for Large-scale Combinatorial Optimization 2024-12-13
Show

Graph neural networks (GNNs) with unsupervised learning can solve large-scale combinatorial optimization problems (COPs) with efficient time complexity, making them versatile for various applications. However, since this method maps the combinatorial optimization problem to the training process of a graph neural network, and the current mainstream backpropagation-based training algorithms are prone to fall into local minima, the optimization performance is still inferior to the current state-of-the-art (SOTA) COP methods. To address this issue, inspired by possibly chaotic dynamics of real brain learning, we introduce a chaotic training algorithm, i.e. chaotic graph backpropagation (CGBP), which introduces a local loss function in GNN that makes the training process not only chaotic but also highly efficient. Different from existing methods, we show that the global ergodicity and pseudo-randomness of such chaotic dynamics enable CGBP to learn each optimal GNN effectively and globally, thus solving the COP efficiently. We have applied CGBP to solve various COPs, such as the maximum independent set, maximum cut, and graph coloring. Results on several large-scale benchmark datasets showcase that CGBP can outperform not only existing GNN algorithms but also SOTA methods. In addition to solving large-scale COPs, CGBP as a universal learning algorithm for GNNs, i.e. as a plug-in unit, can be easily integrated into any existing method for improving the performance.

Accurate and Scalable Estimation of Epistemic Uncertainty for Graph Neural Networks 2024-12-13
Show

While graph neural networks (GNNs) are widely used for node and graph representation learning tasks, the reliability of GNN uncertainty estimates under distribution shifts remains relatively under-explored. Indeed, while post-hoc calibration strategies can be used to improve in-distribution calibration, they need not also improve calibration under distribution shift. However, techniques which produce GNNs with better intrinsic uncertainty estimates are particularly valuable, as they can always be combined with post-hoc strategies later. Therefore, in this work, we propose G-$\Delta$UQ, a novel training framework designed to improve intrinsic GNN uncertainty estimates. Our framework adapts the principle of stochastic data centering to graph data through novel graph anchoring strategies, and is able to support partially stochastic GNNs. While, the prevalent wisdom is that fully stochastic networks are necessary to obtain reliable estimates, we find that the functional diversity induced by our anchoring strategies when sampling hypotheses renders this unnecessary and allows us to support G-$\Delta$UQ on pretrained models. Indeed, through extensive evaluation under covariate, concept and graph size shifts, we show that G-$\Delta$UQ leads to better calibrated GNNs for node and graph classification. Further, it also improves performance on the uncertainty-based tasks of out-of-distribution detection and generalization gap estimation. Overall, our work provides insights into uncertainty estimation for GNNs, and demonstrates the utility of G-$\Delta$UQ in obtaining reliable estimates.

Publi...

Published at ICLR 2024; Project page: https://pujacomputes.github.io/gduq/

Universal Inceptive GNNs by Eliminating the Smoothness-generalization Dilemma 2024-12-13
Show

Graph Neural Networks (GNNs) have demonstrated remarkable success in various domains, such as transaction and social net-works. However, their application is often hindered by the varyinghomophily levels across different orders of neighboring nodes, ne-cessitating separate model designs for homophilic and heterophilicgraphs. In this paper, we aim to develop a unified framework ca-pable of handling neighborhoods of various orders and homophilylevels. Through theoretical exploration, we identify a previouslyoverlooked architectural aspect in multi-hop learning: the cascadedependency, which leads to asmoothness-generalization dilemma.This dilemma significantly affects the learning process, especiallyin the context of high-order neighborhoods and heterophilic graphs.To resolve this issue, we propose an Inceptive Graph Neural Net-work (IGNN), a universal message-passing framework that replacesthe cascade dependency with an inceptive architecture. IGNN pro-vides independent representations for each hop, allowing personal-ized generalization capabilities, and captures neighborhood-wiserelationships to select appropriate receptive fields. Extensive ex-periments show that our IGNN outperforms 23 baseline methods,demonstrating superior performance on both homophilic and het-erophilic graphs, while also scaling efficiently to large graphs.

12 pages
TransferLight: Zero-Shot Traffic Signal Control on any Road-Network 2024-12-12
Show

Traffic signal control plays a crucial role in urban mobility. However, existing methods often struggle to generalize beyond their training environments to unseen scenarios with varying traffic dynamics. We present TransferLight, a novel framework designed for robust generalization across road-networks, diverse traffic conditions and intersection geometries. At its core, we propose a log-distance reward function, offering spatially-aware signal prioritization while remaining adaptable to varied lane configurations - overcoming the limitations of traditional pressure-based rewards. Our hierarchical, heterogeneous, and directed graph neural network architecture effectively captures granular traffic dynamics, enabling transferability to arbitrary intersection layouts. Using a decentralized multi-agent approach, global rewards, and novel state transition priors, we develop a single, weight-tied policy that scales zero-shot to any road network without re-training. Through domain randomization during training, we additionally enhance generalization capabilities. Experimental results validate TransferLight's superior performance in unseen scenarios, advancing practical, generalizable intelligent transportation systems to meet evolving urban traffic demands.

AAAI ...

AAAI Workshop Paper (MALTA)

Leveraging Medical Foundation Model Features in Graph Neural Network-Based Retrieval of Breast Histopathology Images 2024-12-12
Show

Breast cancer is the most common cancer type in women worldwide. Early detection and appropriate treatment can significantly reduce its impact. While histopathology examinations play a vital role in rapid and accurate diagnosis, they often require experienced medical experts for proper recognition and cancer grading. Automated image retrieval systems have the potential to assist pathologists in identifying cancerous tissues, thereby accelerating the diagnostic process. Nevertheless, proposing an accurate image retrieval model is challenging due to considerable variability among the tissue and cell patterns in histological images. In this work, we leverage the features from foundation models in a novel attention-based adversarially regularized variational graph autoencoder model for breast histological image retrieval. Our results confirm the superior performance of models trained with foundation model features compared to those using pre-trained convolutional neural networks (up to 7.7% and 15.5% for mAP and mMV, respectively), with the pre-trained general-purpose self-supervised model for computational pathology (UNI) delivering the best overall performance. By evaluating two publicly available histology image datasets of breast cancer, our top-performing model, trained with UNI features, achieved average mAP/mMV scores of 96.7%/91.5% and 97.6%/94.2% for the BreakHis and BACH datasets, respectively. Our proposed retrieval model has the potential to be used in clinical settings to enhance diagnostic performance and ultimately benefit patients.

29 pages
MGM: Global Understanding of Audience Overlap Graphs for Predicting the Factuality and the Bias of News Media 2024-12-12
Show

In the current era of rapidly growing digital data, evaluating the political bias and factuality of news outlets has become more important for seeking reliable information online. In this work, we study the classification problem of profiling news media from the lens of political bias and factuality. Traditional profiling methods, such as Pre-trained Language Models (PLMs) and Graph Neural Networks (GNNs) have shown promising results, but they face notable challenges. PLMs focus solely on textual features, causing them to overlook the complex relationships between entities, while GNNs often struggle with media graphs containing disconnected components and insufficient labels. To address these limitations, we propose MediaGraphMind (MGM), an effective solution within a variational Expectation-Maximization (EM) framework. Instead of relying on limited neighboring nodes, MGM leverages features, structural patterns, and label information from globally similar nodes. Such a framework not only enables GNNs to capture long-range dependencies for learning expressive node representations but also enhances PLMs by integrating structural information and therefore improving the performance of both models. The extensive experiments demonstrate the effectiveness of the proposed framework and achieve new state-of-the-art results. Further, we share our repository1 which contains the dataset, code, and documentation

Opinion de-polarization of social networks with GNNs 2024-12-12
Show

Nowadays, social media is the ground for political debate and exchange of opinions. There is a significant amount of research that suggests that social media are highly polarized. A phenomenon that is commonly observed is the echo chamber structure, where users are organized in polarized communities and form connections only with similar-minded individuals, limiting themselves to consume specific content. In this paper we explore a way to decrease the polarization of networks with two echo chambers. Particularly, we observe that if some users adopt a moderate opinion about a topic, the polarization of the network decreases. Based on this observation, we propose an efficient algorithm to identify a good set of K users, such that if they adopt a moderate stance around a topic, the polarization is minimized. Our algorithm employs a Graph Neural Network and thus it can handle large graphs more effectively than other approaches

Hybrid variable spiking graph neural networks for energy-efficient scientific machine learning 2024-12-12
Show

Graph-based representations for samples of computational mechanics-related datasets can prove instrumental when dealing with problems like irregular domains or molecular structures of materials, etc. To effectively analyze and process such datasets, deep learning offers Graph Neural Networks (GNNs) that utilize techniques like message-passing within their architecture. The issue, however, is that as the individual graph scales and/ or GNN architecture becomes increasingly complex, the increased energy budget of the overall deep learning model makes it unsustainable and restricts its applications in applications like edge computing. To overcome this, we propose in this paper Hybrid Variable Spiking Graph Neural Networks (HVS-GNNs) that utilize Variable Spiking Neurons (VSNs) within their architecture to promote sparse communication and hence reduce the overall energy budget. VSNs, while promoting sparse event-driven computations, also perform well for regression tasks, which are often encountered in computational mechanics applications and are the main target of this paper. Three examples dealing with prediction of mechanical properties of material based on microscale/ mesoscale structures are shown to test the performance of the proposed HVS-GNNs in regression tasks. We have also compared the performance of HVS-GNN architectures with the performance of vanilla GNNs and GNNs utilizing leaky integrate and fire neurons. The results produced show that HVS-GNNs perform well for regression tasks, all while promoting sparse communication and, hence, energy efficiency.

Rumor Detection on Social Media with Temporal Propagation Structure Optimization 2024-12-12
Show

Traditional methods for detecting rumors on social media primarily focus on analyzing textual content, often struggling to capture the complexity of online interactions. Recent research has shifted towards leveraging graph neural networks to model the hierarchical conversation structure that emerges during rumor propagation. However, these methods tend to overlook the temporal aspect of rumor propagation and may disregard potential noise within the propagation structure. In this paper, we propose a novel approach that incorporates temporal information by constructing a weighted propagation tree, where the weight of each edge represents the time interval between connected posts. Drawing upon the theory of structural entropy, we transform this tree into a coding tree. This transformation aims to preserve the essential structure of rumor propagation while reducing noise. Finally, we introduce a recursive neural network to learn from the coding tree for rumor veracity prediction. Experimental results on two common datasets demonstrate the superiority of our approach.

COLING'25
Multi-Scale Heterogeneous Text-Attributed Graph Datasets From Diverse Domains 2024-12-12
Show

Heterogeneous Text-Attributed Graphs (HTAGs), where different types of entities are not only associated with texts but also connected by diverse relationships, have gained widespread popularity and application across various domains. However, current research on text-attributed graph learning predominantly focuses on homogeneous graphs, which feature a single node and edge type, thus leaving a gap in understanding how methods perform on HTAGs. One crucial reason is the lack of comprehensive HTAG datasets that offer original textual content and span multiple domains of varying sizes. To this end, we introduce a collection of challenging and diverse benchmark datasets for realistic and reproducible evaluation of machine learning models on HTAGs. Our HTAG datasets are multi-scale, span years in duration, and cover a wide range of domains, including movie, community question answering, academic, literature, and patent networks. We further conduct benchmark experiments on these datasets with various graph neural networks. All source data, dataset construction codes, processed HTAGs, data loaders, benchmark codes, and evaluation setup are publicly available at GitHub and Hugging Face.

Self-Explainable Graph Transformer for Link Sign Prediction 2024-12-12
Show

Signed Graph Neural Networks (SGNNs) have been shown to be effective in analyzing complex patterns in real-world situations where positive and negative links coexist. However, SGNN models suffer from poor explainability, which limit their adoptions in critical scenarios that require understanding the rationale behind predictions. To the best of our knowledge, there is currently no research work on the explainability of the SGNN models. Our goal is to address the explainability of decision-making for the downstream task of link sign prediction specific to signed graph neural networks. Since post-hoc explanations are not derived directly from the models, they may be biased and misrepresent the true explanations. Therefore, in this paper we introduce a Self-Explainable Signed Graph transformer (SE-SGformer) framework, which can not only outputs explainable information while ensuring high prediction accuracy. Specifically, We propose a new Transformer architecture for signed graphs and theoretically demonstrate that using positional encoding based on signed random walks has greater expressive power than current SGNN methods and other positional encoding graph Transformer-based approaches. We constructs a novel explainable decision process by discovering the $K$-nearest (farthest) positive (negative) neighbors of a node to replace the neural network-based decoder for predicting edge signs. These $K$ positive (negative) neighbors represent crucial information about the formation of positive (negative) edges between nodes and thus can serve as important explanatory information in the decision-making process. We conducted experiments on several real-world datasets to validate the effectiveness of SE-SGformer, which outperforms the state-of-the-art methods by improving 2.2% prediction accuracy and 73.1% explainablity accuracy in the best-case scenario.

Accep...

Accepted as a conference paper at AAAI 2025

Grothendieck Graph Neural Networks Framework: An Algebraic Platform for Crafting Topology-Aware GNNs 2024-12-12
Show

Due to the structural limitations of Graph Neural Networks (GNNs), in particular with respect to conventional neighborhoods, alternative aggregation strategies have recently been investigated. This paper investigates graph structure in message passing, aimed to incorporate topological characteristics. While the simplicity of neighborhoods remains alluring, we propose a novel perspective by introducing the concept of 'cover' as a generalization of neighborhoods. We design the Grothendieck Graph Neural Networks (GGNN) framework, offering an algebraic platform for creating and refining diverse covers for graphs. This framework translates covers into matrix forms, such as the adjacency matrix, expanding the scope of designing GNN models based on desired message-passing strategies. Leveraging algebraic tools, GGNN facilitates the creation of models that outperform traditional approaches. Based on the GGNN framework, we propose Sieve Neural Networks (SNN), a new GNN model that leverages the notion of sieves from category theory. SNN demonstrates outstanding performance in experiments, particularly on benchmarks designed to test the expressivity of GNNs, and exemplifies the versatility of GGNN in generating novel architectures.

Social Recommendation through Heterogeneous Graph Modeling of the Long-term and Short-term Preference Defined by Dynamic Time Spans 2024-12-11
Show

Social recommendations have been widely adopted in substantial domains. Recently, graph neural networks (GNN) have been employed in recommender systems due to their success in graph representation learning. However, dealing with the dynamic property of social network data is a challenge. This research presents a novel method that provides social recommendations by incorporating the dynamic property of social network data in a heterogeneous graph. The model aims to capture user preference over time without going through the complexities of a dynamic graph by adding period nodes to define users' long-term and short-term preferences and aggregating assigned edge weights. The model is applied to real-world data to argue its superior performance. Promising results demonstrate the effectiveness of this model.

Robustness of Graph Classification: failure modes, causes, and noise-resistant loss in Graph Neural Networks 2024-12-11
Show

Graph Neural Networks (GNNs) are powerful at solving graph classification tasks, yet applied problems often contain noisy labels. In this work, we study GNN robustness to label noise, demonstrate GNN failure modes when models struggle to generalise on low-order graphs, low label coverage, or when a model is over-parameterized. We establish both empirical and theoretical links between GNN robustness and the reduction of the total Dirichlet Energy of learned node representations, which encapsulates the hypothesized GNN smoothness inductive bias. Finally, we introduce two training strategies to enhance GNN robustness: (1) by incorporating a novel inductive bias in the weight matrices through the removal of negative eigenvalues, connected to Dirichlet Energy minimization; (2) by extending to GNNs a loss penalty that promotes learned smoothness. Importantly, neither approach negatively impacts performance in noise-free settings, supporting our hypothesis that the source of GNNs robustness is their smoothness inductive bias.

Graph Agent Network: Empowering Nodes with Inference Capabilities for Adversarial Resilience 2024-12-11
Show

End-to-end training with global optimization have popularized graph neural networks (GNNs) for node classification, yet inadvertently introduced vulnerabilities to adversarial edge-perturbing attacks. Adversaries can exploit the inherent opened interfaces of GNNs' input and output, perturbing critical edges and thus manipulating the classification results. Current defenses, due to their persistent utilization of global-optimization-based end-to-end training schemes, inherently encapsulate the vulnerabilities of GNNs. This is specifically evidenced in their inability to defend against targeted secondary attacks. In this paper, we propose the Graph Agent Network (GAgN) to address the aforementioned vulnerabilities of GNNs. GAgN is a graph-structured agent network in which each node is designed as an 1-hop-view agent. Through the decentralized interactions between agents, they can learn to infer global perceptions to perform tasks including inferring embeddings, degrees and neighbor relationships for given nodes. This empowers nodes to filtering adversarial edges while carrying out classification tasks. Furthermore, agents' limited view prevents malicious messages from propagating globally in GAgN, thereby resisting global-optimization-based secondary attacks. We prove that single-hidden-layer multilayer perceptrons (MLPs) are theoretically sufficient to achieve these functionalities. Experimental results show that GAgN effectively implements all its intended capabilities and, compared to state-of-the-art defenses, achieves optimal classification accuracy on the perturbed datasets.

Learning incomplete factorization preconditioners for GMRES 2024-12-11
Show

Incomplete LU factorizations of sparse matrices are widely used as preconditioners in Krylov subspace methods to speed up solving linear systems. Unfortunately, computing the preconditioner itself can be time-consuming and sensitive to hyper-parameters. Instead, we replace the hand-engineered algorithm with a graph neural network that is trained to approximate the matrix factorization directly. To apply the output of the neural network as a preconditioner, we propose an output activation function that guarantees that the predicted factorization is invertible. Further, applying a graph neural network architecture allows us to ensure that the output itself is sparse which is desirable from a computational standpoint. We theoretically analyze and empirically evaluate different loss functions to train the learned preconditioners and show their effectiveness in decreasing the number of GMRES iterations and improving the spectral properties on synthetic data. The code is available at https://github.com/paulhausner/neural-incomplete-factorization.

The f...

The first two authors contributed equally, Northern Lights Deep Learning Conference, 15 pages

Uncovering Capabilities of Model Pruning in Graph Contrastive Learning 2024-12-11
Show

Graph contrastive learning has achieved great success in pre-training graph neural networks without ground-truth labels. Leading graph contrastive learning follows the classical scheme of contrastive learning, forcing model to identify the essential information from augmented views. However, general augmented views are produced via random corruption or learning, which inevitably leads to semantics alteration. Although domain knowledge guided augmentations alleviate this issue, the generated views are domain specific and undermine the generalization. In this work, motivated by the firm representation ability of sparse model from pruning, we reformulate the problem of graph contrastive learning via contrasting different model versions rather than augmented views. We first theoretically reveal the superiority of model pruning in contrast to data augmentations. In practice, we take original graph as input and dynamically generate a perturbed graph encoder to contrast with the original encoder by pruning its transformation weights. Furthermore, considering the integrity of node embedding in our method, we are capable of developing a local contrastive loss to tackle the hard negative samples that disturb the model training. We extensively validate our method on various benchmarks regarding graph classification via unsupervised and transfer learning. Compared to the state-of-the-art (SOTA) works, better performance can always be obtained by the proposed method.

MM' 24
DistrictNet: Decision-aware learning for geographical districting 2024-12-11
Show

Districting is a complex combinatorial problem that consists in partitioning a geographical area into small districts. In logistics, it is a major strategic decision determining operating costs for several years. Solving districting problems using traditional methods is intractable even for small geographical areas and existing heuristics often provide sub-optimal results. We present a structured learning approach to find high-quality solutions to real-world districting problems in a few minutes. It is based on integrating a combinatorial optimization layer, the capacitated minimum spanning tree problem, into a graph neural network architecture. To train this pipeline in a decision-aware fashion, we show how to construct target solutions embedded in a suitable space and learn from target solutions. Experiments show that our approach outperforms existing methods as it can significantly reduce costs on real-world cities.

Accep...

Accepted at NeurIPS 2024

Adapting Unsigned Graph Neural Networks for Signed Graphs: A Few-Shot Prompt Tuning Approach 2024-12-11
Show

Signed Graph Neural Networks (SGNNs) are powerful tools for signed graph representation learning but struggle with limited generalization and heavy dependence on labeled data. While recent advancements in "graph pre-training and prompt tuning" have reduced label dependence in Graph Neural Networks (GNNs) and improved their generalization abilities by leveraging pre-training knowledge, these efforts have focused exclusively on unsigned graphs. The scarcity of publicly available signed graph datasets makes it essential to transfer knowledge from unsigned graphs to signed graph tasks. However, this transfer introduces significant challenges due to the graph-level and task-level divergences between the pre-training and downstream phases. To address these challenges, we propose Signed Graph Prompt Tuning (SGPT) in this paper. Specifically, SGPT employs a graph template and a semantic prompt to segregate mixed link semantics in the signed graph and then adaptively integrate the distinctive semantic information according to the needs of downstream tasks, thereby unifying the pre-training and downstream graphs. Additionally, SGPT utilizes a task template and a feature prompt to reformulate the downstream signed graph tasks, aligning them with pre-training tasks to ensure a unified optimization objective and consistent feature space across tasks. Finally, extensive experiments are conducted on popular signed graph datasets, demonstrating the superiority of SGPT over state-of-the-art methods.

About

Daily ArXiv Papers.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages